Coverage for src / basanos / math / optimizer.py: 100%

122 statements  

« prev     ^ index     » next       coverage.py v7.13.5, created at 2026-04-25 04:28 +0000

1"""Correlation-aware risk position optimizer (Basanos). 

2 

3This module provides utilities to compute correlation-adjusted risk positions 

4from price data and expected-return signals. It relies on volatility-adjusted 

5returns to estimate a dynamic correlation matrix (via EWM), applies shrinkage 

6towards identity, and solves a normalized linear system per timestamp to 

7obtain stable positions. 

8 

9Performance characteristics 

10--------------------------- 

11Let *N* be the number of assets and *T* the number of timestamps. 

12 

13**Computational complexity** 

14 

15+----------------------------------+------------------+--------------------------------------+ 

16| Operation | Complexity | Bottleneck | 

17+==================================+==================+======================================+ 

18| EWM volatility (``ret_adj``, | O(T·N) | Linear in both T and N; negligible | 

19| ``vola``) | | | 

20+----------------------------------+------------------+--------------------------------------+ 

21| EWM correlation (``cor``) | O(T·N²) | ``lfilter`` over all N² asset pairs | 

22| | | simultaneously | 

23+----------------------------------+------------------+--------------------------------------+ 

24| Linear solve per timestamp | O(N³) | Cholesky / LU per row in | 

25| (``cash_position``) | * T solves | ``cash_position`` | 

26+----------------------------------+------------------+--------------------------------------+ 

27 

28**Memory usage** (peak, approximate) 

29 

30``ewm_corr`` allocates roughly **14 float64 arrays** of shape 

31``(T, N, N)`` at peak (input sequences, IIR filter outputs, EWM components, 

32and the result tensor). Peak RAM ≈ **112 * T * N²** bytes. Typical 

33working sizes on a 16 GB machine: 

34 

35+--------+--------------------------+------------------------------------+ 

36| N | T (daily rows) | Peak memory (approx.) | 

37+========+==========================+====================================+ 

38| 50 | 252 (~1 yr) | ~70 MB | 

39+--------+--------------------------+------------------------------------+ 

40| 100 | 252 (~1 yr) | ~280 MB | 

41+--------+--------------------------+------------------------------------+ 

42| 100 | 2 520 (~10 yr) | ~2.8 GB | 

43+--------+--------------------------+------------------------------------+ 

44| 200 | 2 520 (~10 yr) | ~11 GB | 

45+--------+--------------------------+------------------------------------+ 

46| 500 | 2 520 (~10 yr) | ~70 GB ⚠ exceeds typical RAM | 

47+--------+--------------------------+------------------------------------+ 

48 

49**Practical limits (daily data)** 

50 

51* **≤ 150 assets, ≤ 5 years** — well within reach on an 8 GB laptop. 

52* **≤ 250 assets, ≤ 10 years** — requires ~11-12 GB; feasible on a 16 GB 

53 workstation. 

54* **> 500 assets with multi-year history** — peak memory exceeds 16 GB; 

55 reduce the time range or switch to a chunked / streaming approach. 

56* **> 1 000 assets** — the O(N³) per-solve cost alone makes real-time 

57 optimization impractical even with adequate RAM. 

58 

59See ``BENCHMARKS.md`` for measured wall-clock timings across representative 

60dataset sizes. 

61 

62Internal structure 

63------------------ 

64The implementation is split across focused private modules to keep each file 

65readable and independently testable: 

66 

67* `_config` — `BasanosConfig` and all 

68 covariance-mode configuration classes. 

69* `_ewm_corr` — `ewm_corr`, the vectorised 

70 IIR-filter implementation of per-row EWM correlation matrices. 

71* `_engine_solve` — private helpers providing the 

72 ``_iter_matrices`` and ``_iter_solve`` generators (per-timestamp solve 

73 logic). 

74* `_engine_diagnostics` — private helpers providing 

75 matrix-quality diagnostics (condition number, effective rank, solver 

76 residual, signal utilisation). 

77* `_engine_ic` — private helpers providing signal 

78 evaluation metrics (IC, Rank IC, ICIR, and summary statistics). 

79* This module — `BasanosEngine`, a single flat class that wires 

80 every method together in clearly delimited sections. 

81""" 

82 

83import dataclasses 

84import datetime 

85import logging 

86from typing import TYPE_CHECKING 

87 

88import numpy as np 

89import polars as pl 

90from jquantstats import Portfolio 

91 

92from ..exceptions import ( 

93 ColumnMismatchError, 

94 ExcessiveNullsError, 

95 MissingDateColumnError, 

96 MonotonicPricesError, 

97 NonPositivePricesError, 

98 ShapeMismatchError, 

99) 

100from ._config import ( 

101 BasanosConfig, 

102 CovarianceConfig, 

103 CovarianceMode, 

104 EwmaShrinkConfig, 

105 SlidingWindowConfig, 

106) 

107from ._engine_diagnostics import _DiagnosticsMixin as _DiagnosticsMixin 

108from ._engine_ic import _SignalEvaluatorMixin as _SignalEvaluatorMixin 

109from ._engine_solve import _SolveMixin as _SolveMixin 

110from ._ewm_corr import ewm_corr as _ewm_corr_numpy 

111from ._signal import vol_adj 

112 

113if TYPE_CHECKING: 

114 from ._config_report import ConfigReport 

115 

116_logger = logging.getLogger(__name__) 

117 

118 

119def _validate_inputs(prices: pl.DataFrame, mu: pl.DataFrame, cfg: "BasanosConfig") -> None: 

120 """Validate ``prices``, ``mu``, and ``cfg`` for use with `BasanosEngine`. 

121 

122 Checks that both DataFrames contain a ``'date'`` column, share identical 

123 shapes and column sets, contain no non-positive prices, no excessive NaN 

124 fractions, and no monotonically non-varying price series. Also emits a 

125 warning when the dataset is too short relative to a configured 

126 sliding-window size. 

127 

128 Args: 

129 prices: DataFrame of price levels per asset over time. 

130 mu: DataFrame of expected-return signals aligned with ``prices``. 

131 cfg: Engine configuration instance. 

132 

133 Raises: 

134 MissingDateColumnError: If ``'date'`` is absent from either frame. 

135 ShapeMismatchError: If ``prices`` and ``mu`` have different shapes. 

136 ColumnMismatchError: If the column sets of the two frames differ. 

137 NonPositivePricesError: If any asset contains a non-positive price. 

138 ExcessiveNullsError: If any asset column exceeds ``cfg.max_nan_fraction``. 

139 MonotonicPricesError: If any asset price series is monotonically 

140 non-decreasing or non-increasing. 

141 

142 Warns: 

143 UserWarning (via logging): If ``cfg.covariance`` is a 

144 `SlidingWindowConfig` and 

145 ``len(prices) < 2 * cfg.covariance.window``, a warning is emitted 

146 via the module logger rather than an exception. This is a 

147 deliberate soft boundary — callers may intentionally supply data 

148 shorter than the full warm-up period. During warm-up the first 

149 ``window - 1`` timestamps will yield zero positions. 

150 """ 

151 # ensure 'date' column exists in prices before any other validation 

152 if "date" not in prices.columns: 

153 raise MissingDateColumnError("prices") 

154 

155 # ensure 'date' column exists in mu as well (kept for symmetry and downstream assumptions) 

156 if "date" not in mu.columns: 

157 raise MissingDateColumnError("mu") 

158 

159 # check that prices and mu have the same shape 

160 if prices.shape != mu.shape: 

161 raise ShapeMismatchError(prices.shape, mu.shape) 

162 

163 # check that the columns of prices and mu are identical 

164 if not set(prices.columns) == set(mu.columns): 

165 raise ColumnMismatchError(prices.columns, mu.columns) 

166 

167 assets = [c for c in prices.columns if c != "date" and prices[c].dtype.is_numeric()] 

168 

169 # check for non-positive prices: log returns require strictly positive prices 

170 for asset in assets: 

171 col = prices[asset].drop_nulls() 

172 if col.len() > 0 and (col <= 0).any(): 

173 raise NonPositivePricesError(asset) 

174 

175 # check for excessive NaN values: more than cfg.max_nan_fraction null in any asset column 

176 n_rows = prices.height 

177 if n_rows > 0: 

178 for asset in assets: 

179 nan_frac = prices[asset].null_count() / n_rows 

180 if nan_frac > cfg.max_nan_fraction: 

181 raise ExcessiveNullsError(asset, nan_frac, cfg.max_nan_fraction) 

182 

183 # check for monotonic price series: a strictly non-decreasing or non-increasing 

184 # series has no variance in its return sign, indicating malformed or synthetic data 

185 for asset in assets: 

186 col = prices[asset].drop_nulls() 

187 if col.len() > 2: 

188 diffs = col.diff().drop_nulls() 

189 if (diffs >= 0).all() or (diffs <= 0).all(): 

190 raise MonotonicPricesError(asset) 

191 

192 # warn when the dataset is too short to benefit from the sliding window 

193 if cfg.covariance_mode == CovarianceMode.sliding_window and cfg.window is not None: 

194 w: int = cfg.window 

195 if n_rows < 2 * w: 

196 _logger.warning( 

197 "Dataset length (%d rows) is less than 2 * window (%d). " 

198 "The first %d timestamps will yield zero positions during warm-up; " 

199 "consider using a longer history or reducing 'window'.", 

200 n_rows, 

201 2 * w, 

202 w - 1, 

203 ) 

204 

205 

206# --------------------------------------------------------------------------- 

207# Re-export config symbols so ``from basanos.math.optimizer import …`` keeps 

208# working for existing callers. 

209# --------------------------------------------------------------------------- 

210__all__ = [ 

211 "BasanosConfig", 

212 "BasanosEngine", 

213 "CovarianceConfig", 

214 "CovarianceMode", 

215 "EwmaShrinkConfig", 

216 "SlidingWindowConfig", 

217] 

218 

219 

220@dataclasses.dataclass(frozen=True) 

221class BasanosEngine(_DiagnosticsMixin, _SignalEvaluatorMixin, _SolveMixin): 

222 """Engine to compute correlation matrices and optimize risk positions. 

223 

224 Encapsulates price data and configuration to build EWM-based 

225 correlations, apply shrinkage, and solve for normalized positions. 

226 

227 Public methods are organised into clearly delimited sections (some 

228 inherited from the private mixin classes): 

229 

230 * **Core data access** — `assets`, `ret_adj`, `vola`, `cor`, `cor_tensor` 

231 * **Solve / position logic** — `cash_position`, `position_status`, 

232 `risk_position`, `position_leverage`, `warmup_state` 

233 * **Portfolio and performance** — `portfolio`, `naive_sharpe`, 

234 `sharpe_at_shrink`, `sharpe_at_window_factors` 

235 * **Matrix diagnostics** — `condition_number`, `effective_rank`, 

236 `solver_residual`, `signal_utilisation` 

237 * **Signal evaluation** — `ic`, `rank_ic`, `ic_mean`, `ic_std`, `icir`, 

238 `rank_ic_mean`, `rank_ic_std` 

239 * **Reporting** — `config_report` 

240 

241 Data-flow diagram 

242 ----------------- 

243 

244 .. code-block:: text 

245 

246 prices (pl.DataFrame) 

247 

248 ├─ vol_adj ──► ret_adj (volatility-adjusted log returns) 

249 │ │ 

250 │ ├─ ewm_corr ──► cor / cor_tensor 

251 │ │ │ 

252 │ │ └─ shrink2id / FactorModel 

253 │ │ │ 

254 │ vola covariance matrix 

255 │ │ │ 

256 └── mu ──────────┴── _iter_solve ──────────┘ 

257 

258 cash_position 

259 

260 ┌────────┴────────┐ 

261 portfolio diagnostics 

262 (Portfolio) (condition_number, 

263 effective_rank, 

264 solver_residual, 

265 signal_utilisation, 

266 ic, rank_ic, …) 

267 

268 Attributes: 

269 prices: Polars DataFrame of price levels per asset over time. Must 

270 contain a ``'date'`` column and at least one numeric asset column 

271 with strictly positive values that are not monotonically 

272 non-decreasing or non-increasing (i.e. they must vary in sign). 

273 mu: Polars DataFrame of expected-return signals aligned with *prices*. 

274 Must share the same shape and column names as *prices*. 

275 cfg: Immutable `BasanosConfig` controlling EWMA half-lives, 

276 clipping, shrinkage intensity, and AUM. 

277 

278 Examples: 

279 Build an engine with two synthetic assets over 30 days and inspect the 

280 optimized positions and diagnostic properties. 

281 

282 >>> import numpy as np 

283 >>> import polars as pl 

284 >>> from basanos.math import BasanosConfig, BasanosEngine 

285 >>> dates = list(range(30)) 

286 >>> rng = np.random.default_rng(42) 

287 >>> prices = pl.DataFrame({ 

288 ... "date": dates, 

289 ... "A": np.cumprod(1 + rng.normal(0.001, 0.02, 30)) * 100.0, 

290 ... "B": np.cumprod(1 + rng.normal(0.001, 0.02, 30)) * 150.0, 

291 ... }) 

292 >>> mu = pl.DataFrame({ 

293 ... "date": dates, 

294 ... "A": rng.normal(0.0, 0.5, 30), 

295 ... "B": rng.normal(0.0, 0.5, 30), 

296 ... }) 

297 >>> cfg = BasanosConfig(vola=5, corr=10, clip=2.0, shrink=0.5, aum=1_000_000) 

298 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg) 

299 >>> engine.assets 

300 ['A', 'B'] 

301 >>> engine.cash_position.shape 

302 (30, 3) 

303 >>> engine.position_leverage.columns 

304 ['date', 'leverage'] 

305 """ 

306 

307 prices: pl.DataFrame 

308 mu: pl.DataFrame 

309 cfg: BasanosConfig 

310 

311 def __post_init__(self) -> None: 

312 """Validate inputs by delegating to `_validate_inputs`.""" 

313 _validate_inputs(self.prices, self.mu, self.cfg) 

314 

315 # ------------------------------------------------------------------ 

316 # Core data-access properties 

317 # ------------------------------------------------------------------ 

318 

319 @property 

320 def assets(self) -> list[str]: 

321 """List asset column names (numeric columns excluding 'date').""" 

322 return [c for c in self.prices.columns if c != "date" and self.prices[c].dtype.is_numeric()] 

323 

324 @property 

325 def ret_adj(self) -> pl.DataFrame: 

326 """Return per-asset volatility-adjusted log returns clipped by cfg.clip. 

327 

328 Uses an EWMA volatility estimate with lookback ``cfg.vola`` to 

329 standardize log returns for each numeric asset column. 

330 """ 

331 return self.prices.with_columns( 

332 [vol_adj(pl.col(asset), vola=self.cfg.vola, clip=self.cfg.clip) for asset in self.assets] 

333 ) 

334 

335 @property 

336 def vola(self) -> pl.DataFrame: 

337 """Per-asset EWMA volatility of percentage returns. 

338 

339 Computes percent changes for each numeric asset column and applies an 

340 exponentially weighted standard deviation using the lookback specified 

341 by ``cfg.vola``. The result is a DataFrame aligned with ``self.prices`` 

342 whose numeric columns hold per-asset volatility estimates. 

343 """ 

344 return self.prices.with_columns( 

345 pl.col(asset) 

346 .pct_change() 

347 .ewm_std(com=self.cfg.vola - 1, adjust=True, min_samples=self.cfg.vola) 

348 .alias(asset) 

349 for asset in self.assets 

350 ) 

351 

352 @property 

353 def cor(self) -> dict[datetime.date, np.ndarray]: 

354 """Compute per-timestamp EWM correlation matrices. 

355 

356 Builds volatility-adjusted returns for all assets, computes an 

357 exponentially weighted correlation using a pure NumPy implementation 

358 (with window ``cfg.corr``), and returns a mapping from each timestamp 

359 to the corresponding correlation matrix as a NumPy array. 

360 

361 Returns: 

362 dict: Mapping ``date -> np.ndarray`` of shape (n_assets, n_assets). 

363 

364 Performance: 

365 Delegates to `ewm_corr`, which is O(T·N²) in both 

366 time and memory. The returned dict holds *T* references into the 

367 result tensor (one N*N view per date); no extra copies are made. 

368 For large *N* or *T*, prefer ``cor_tensor`` to keep a single 

369 contiguous array rather than building a Python dict. 

370 """ 

371 index = self.prices["date"] 

372 ret_adj_np = self.ret_adj.select(self.assets).to_numpy() 

373 tensor = _ewm_corr_numpy( 

374 ret_adj_np, 

375 com=self.cfg.corr, 

376 min_periods=self.cfg.corr, 

377 min_corr_denom=self.cfg.min_corr_denom, 

378 ) 

379 return {index[t]: tensor[t] for t in range(len(index))} 

380 

381 @property 

382 def cor_tensor(self) -> np.ndarray: 

383 """Return all correlation matrices stacked as a 3-D tensor. 

384 

385 Converts the per-timestamp correlation dict (see `cor`) into a 

386 single contiguous NumPy array so that the full history can be saved to 

387 a flat ``.npy`` file with `save` and reloaded with 

388 `load`. 

389 

390 Returns: 

391 np.ndarray: Array of shape ``(T, N, N)`` where *T* is the number of 

392 timestamps and *N* the number of assets. ``tensor[t]`` is the 

393 correlation matrix for the *t*-th date (same ordering as 

394 ``self.prices["date"]``). 

395 

396 Examples: 

397 >>> import tempfile, pathlib 

398 >>> import numpy as np 

399 >>> import polars as pl 

400 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine 

401 >>> dates = pl.Series("date", list(range(100))) 

402 >>> rng0 = np.random.default_rng(0).lognormal(size=100) 

403 >>> rng1 = np.random.default_rng(1).lognormal(size=100) 

404 >>> prices = pl.DataFrame({"date": dates, "A": rng0, "B": rng1}) 

405 >>> rng2 = np.random.default_rng(2).normal(size=100) 

406 >>> rng3 = np.random.default_rng(3).normal(size=100) 

407 >>> mu = pl.DataFrame({"date": dates, "A": rng2, "B": rng3}) 

408 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6) 

409 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg) 

410 >>> tensor = engine.cor_tensor 

411 >>> with tempfile.TemporaryDirectory() as td: 

412 ... path = pathlib.Path(td) / "cor.npy" 

413 ... np.save(path, tensor) 

414 ... loaded = np.load(path) 

415 >>> np.testing.assert_array_equal(tensor, loaded) 

416 """ 

417 return np.stack(list(self.cor.values()), axis=0) 

418 

419 # ------------------------------------------------------------------ 

420 # Internal solve helpers — inherited from _SolveMixin 

421 # ------------------------------------------------------------------ 

422 # (_compute_mask, _check_signal, _scale_to_cash, _row_early_check, 

423 # _denom_guard_yield, _compute_position, _replay_positions, 

424 # _iter_matrices, _iter_solve, warmup_state) 

425 # Implementations live in _engine_solve.py; patch targets remain in that 

426 # module's namespace, e.g. ``patch("basanos.math._engine_solve.solve")``. 

427 

428 # ------------------------------------------------------------------ 

429 # Position properties 

430 # ------------------------------------------------------------------ 

431 

432 @property 

433 def cash_position(self) -> pl.DataFrame: 

434 r"""Optimize correlation-aware risk positions for each timestamp. 

435 

436 Supports two covariance modes controlled by ``cfg.covariance_config``: 

437 

438 * `EwmaShrinkConfig` (default): Computes EWMA correlations, applies 

439 linear shrinkage toward the identity, and solves a normalised linear 

440 system $C\,x = \mu$ per timestamp via Cholesky / LU. 

441 

442 * `SlidingWindowConfig`: At each timestamp uses the 

443 ``cfg.covariance_config.window`` most recent vol-adjusted returns to fit a 

444 rank-``cfg.covariance_config.n_factors`` factor model via truncated SVD and 

445 solves the system via the Woodbury identity at $O(k^3 + kn)$ rather 

446 than $O(n^3)$ per step. 

447 

448 Non-finite or ill-posed cases yield zero positions for safety. 

449 

450 Returns: 

451 pl.DataFrame: DataFrame with columns ['date'] + asset names containing 

452 the per-timestamp cash positions (risk divided by EWMA volatility). 

453 

454 Performance: 

455 For ``ewma_shrink``: dominant cost is ``self.cor`` (O(T·N²) time, 

456 O(T·N²) memory — see `ewm_corr`). The per-timestamp 

457 linear solve adds O(N³) per row. 

458 

459 For ``sliding_window``: O(T·W·N·k) for sliding SVDs plus 

460 O(T·(k³ + kN)) for Woodbury solves. Memory is O(W·N) per step, 

461 independent of T. 

462 """ 

463 assets = self.assets 

464 

465 # Compute risk positions row-by-row using _replay_positions. 

466 prices_num = self.prices.select(assets).to_numpy() 

467 

468 risk_pos_np = np.full_like(prices_num, fill_value=np.nan, dtype=float) 

469 cash_pos_np = np.full_like(prices_num, fill_value=np.nan, dtype=float) 

470 vola_np = self.vola.select(assets).to_numpy() 

471 

472 self._replay_positions(risk_pos_np, cash_pos_np, vola_np) 

473 

474 # Build Polars DataFrame for cash positions (numeric columns only) 

475 cash_position = self.prices.with_columns( 

476 [(pl.lit(cash_pos_np[:, i]).alias(asset)) for i, asset in enumerate(assets)] 

477 ) 

478 

479 return cash_position 

480 

481 @property 

482 def position_status(self) -> pl.DataFrame: 

483 """Per-timestamp reason code explaining each `cash_position` row. 

484 

485 Labels every row with exactly one of four `SolveStatus` 

486 codes (which compare equal to their string equivalents): 

487 

488 * ``'warmup'``: Insufficient history for the sliding-window 

489 covariance mode (``i + 1 < cfg.covariance_config.window``). 

490 Positions are ``NaN`` for all assets at this timestamp. 

491 * ``'zero_signal'``: The expected-return vector ``mu`` was 

492 all-zeros (or all-NaN) at this timestamp; the optimizer 

493 short-circuited and returned zero positions without solving. 

494 * ``'degenerate'``: The normalisation denominator was non-finite 

495 or below ``cfg.denom_tol``, the Cholesky / Woodbury solve 

496 failed, or no asset had a finite price; positions were zeroed 

497 for safety. 

498 * ``'valid'``: The linear system was solved successfully and 

499 positions are non-trivially non-zero. 

500 

501 The codes map one-to-one onto the three NaN / zero cases 

502 described in the issue and allow downstream consumers (backtests, 

503 risk monitors) to distinguish data gaps from signal silence from 

504 numerical ill-conditioning without re-inspecting ``mu`` or the 

505 engine configuration. 

506 

507 Returns: 

508 pl.DataFrame: Two-column DataFrame ``{'date': ..., 'status': ...}`` 

509 with one row per timestamp. The ``status`` column has 

510 ``Polars`` dtype ``String``. 

511 """ 

512 statuses = [status for _i, _t, _mask, _pos, status in self._iter_solve()] 

513 return pl.DataFrame({"date": self.prices["date"], "status": pl.Series(statuses, dtype=pl.String)}) 

514 

515 @property 

516 def risk_position(self) -> pl.DataFrame: 

517 """Risk positions (before EWMA-volatility scaling) at each timestamp. 

518 

519 Derives the un-volatility-scaled position by multiplying the cash 

520 position by the per-asset EWMA volatility. Equivalently, this is 

521 the quantity solved by the correlation-adjusted linear system before 

522 dividing by ``vola``. 

523 

524 Relationship to other properties:: 

525 

526 cash_position = risk_position / vola 

527 risk_position = cash_position * vola 

528 

529 Returns: 

530 pl.DataFrame: DataFrame with columns ``['date'] + assets`` where 

531 each value is ``cash_position_i * vola_i`` at the given timestamp. 

532 """ 

533 assets = self.assets 

534 cp_np = self.cash_position.select(assets).to_numpy() 

535 vola_np = self.vola.select(assets).to_numpy() 

536 with np.errstate(invalid="ignore"): 

537 risk_pos = cp_np * vola_np 

538 return self.prices.with_columns([pl.lit(risk_pos[:, i]).alias(asset) for i, asset in enumerate(assets)]) 

539 

540 @property 

541 def position_leverage(self) -> pl.DataFrame: 

542 """L1 norm of cash positions (gross leverage) at each timestamp. 

543 

544 Sums the absolute values of all asset cash positions at each row. 

545 NaN positions are treated as zero (they contribute nothing to gross 

546 leverage). 

547 

548 Returns: 

549 pl.DataFrame: Two-column DataFrame ``{'date': ..., 'leverage': ...}`` 

550 where ``leverage`` is the L1 norm of the cash-position vector. 

551 """ 

552 assets = self.assets 

553 cp_np = self.cash_position.select(assets).to_numpy() 

554 leverage = np.nansum(np.abs(cp_np), axis=1) 

555 return pl.DataFrame({"date": self.prices["date"], "leverage": pl.Series(leverage, dtype=pl.Float64)}) 

556 

557 # ------------------------------------------------------------------ 

558 # Portfolio and performance 

559 # ------------------------------------------------------------------ 

560 

561 @property 

562 def portfolio(self) -> Portfolio: 

563 """Construct a Portfolio from the optimized cash positions. 

564 

565 Converts the computed cash positions into a Portfolio using the 

566 configured AUM. The ``cost_per_unit`` from `cfg` is forwarded 

567 so that `net_cost_nav` and 

568 `position_delta_costs` work out 

569 of the box without any further configuration. 

570 

571 Returns: 

572 Portfolio: Instance built from cash positions with AUM scaling. 

573 """ 

574 cp = self.cash_position 

575 assets = [c for c in cp.columns if c != "date" and cp[c].dtype.is_numeric()] 

576 scaled = cp.with_columns(pl.col(a) * self.cfg.position_scale for a in assets) 

577 return Portfolio.from_cash_position(self.prices, scaled, aum=self.cfg.aum, cost_per_unit=self.cfg.cost_per_unit) 

578 

579 def sharpe_at_shrink(self, shrink: float) -> float: 

580 r"""Return the annualised portfolio Sharpe ratio for the given shrinkage weight. 

581 

582 Constructs a new `BasanosEngine` with all parameters identical to 

583 ``self`` except that ``cfg.shrink`` is replaced by ``shrink``, then 

584 returns the annualised Sharpe ratio of the resulting portfolio. 

585 

586 This is the canonical single-argument callable required by the benchmarks 

587 specification: ``f(λ) → Sharpe``. Use it to sweep λ across ``[0, 1]`` 

588 and measure whether correlation adjustment adds value over the 

589 signal-proportional baseline (λ = 0) or the unregularised limit (λ = 1). 

590 

591 Corner cases: 

592 * **λ = 0** — the shrunk matrix equals the identity, so the 

593 optimiser treats all assets as uncorrelated and positions are 

594 purely signal-proportional (no correlation adjustment). 

595 * **λ = 1** — the raw EWMA correlation matrix is used without 

596 shrinkage. 

597 

598 Args: 

599 shrink: Retention weight λ ∈ [0, 1]. See 

600 `shrink` for full documentation. 

601 

602 Returns: 

603 Annualised Sharpe ratio of the portfolio returns as a ``float``. 

604 Returns ``float("nan")`` when the Sharpe ratio cannot be computed 

605 (e.g. zero-variance returns). 

606 

607 Raises: 

608 ValidationError: When ``shrink`` is outside [0, 1] (delegated to 

609 `BasanosConfig` field validation). 

610 

611 Examples: 

612 >>> import numpy as np 

613 >>> import polars as pl 

614 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine 

615 >>> dates = pl.Series("date", list(range(200))) 

616 >>> rng = np.random.default_rng(0) 

617 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)}) 

618 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)}) 

619 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6) 

620 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg) 

621 >>> s = engine.sharpe_at_shrink(0.5) 

622 >>> isinstance(s, float) 

623 True 

624 """ 

625 new_cfg = self.cfg.replace(shrink=shrink) 

626 engine = BasanosEngine(prices=self.prices, mu=self.mu, cfg=new_cfg) 

627 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan")) 

628 

629 def sharpe_at_window_factors(self, window: int, n_factors: int) -> float: 

630 r"""Return the annualised portfolio Sharpe ratio for the given sliding-window parameters. 

631 

632 Constructs a new `BasanosEngine` with ``covariance_mode`` set to 

633 ``"sliding_window"`` and the supplied ``window`` / ``n_factors``, keeping 

634 all other configuration identical to ``self``. 

635 

636 Use this method to sweep ``(W, k)`` and compare the sliding-window 

637 estimator against the EWMA baseline (via `sharpe_at_shrink`). 

638 

639 Args: 

640 window: Rolling window length $W \geq 1$. 

641 Rule of thumb: $W \geq 2 \cdot n_{\text{assets}}$. 

642 n_factors: Number of latent factors $k \geq 1$. 

643 

644 Returns: 

645 Annualised Sharpe ratio of the portfolio returns as a ``float``. 

646 Returns ``float("nan")`` when the Sharpe ratio cannot be computed 

647 (e.g. not enough history to fill the first window). 

648 

649 Raises: 

650 ValidationError: When ``window`` or ``n_factors`` fail field 

651 constraints (delegated to `BasanosConfig`). 

652 

653 Examples: 

654 >>> import numpy as np 

655 >>> import polars as pl 

656 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine 

657 >>> dates = pl.Series("date", list(range(200))) 

658 >>> rng = np.random.default_rng(0) 

659 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)}) 

660 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)}) 

661 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6) 

662 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg) 

663 >>> s = engine.sharpe_at_window_factors(window=40, n_factors=2) 

664 >>> isinstance(s, float) 

665 True 

666 """ 

667 new_cfg = self.cfg.replace( 

668 covariance_config=SlidingWindowConfig(window=window, n_factors=n_factors), 

669 ) 

670 engine = BasanosEngine(prices=self.prices, mu=self.mu, cfg=new_cfg) 

671 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan")) 

672 

673 @property 

674 def naive_sharpe(self) -> float: 

675 r"""Sharpe ratio of the naïve equal-weight signal (μ = 1 for every asset/timestamp). 

676 

677 Replaces the expected-return signal ``mu`` with a constant matrix of 

678 ones, then runs the optimiser with the current configuration and returns 

679 the annualised Sharpe ratio of the resulting portfolio. 

680 

681 This provides the baseline answer to *"does the signal add value?"*: 

682 a real signal should produce a higher Sharpe than the naïve benchmark. 

683 Combined with `sharpe_at_shrink`, this yields a three-way 

684 comparison: 

685 

686 +--------------------+----------------------------------------------+ 

687 | Benchmark | What it measures | 

688 +====================+==============================================+ 

689 | ``naive_sharpe`` | No signal skill; pure correlation routing | 

690 +--------------------+----------------------------------------------+ 

691 | ``sharpe_at_shrink(0.0)`` | Signal skill, no correlation adj. | 

692 +--------------------+----------------------------------------------+ 

693 | ``sharpe_at_shrink(cfg.shrink)`` | Signal + correlation adj. | 

694 +--------------------+----------------------------------------------+ 

695 

696 Returns: 

697 Annualised Sharpe ratio of the equal-weight portfolio as a ``float``. 

698 Returns ``float("nan")`` when the Sharpe ratio cannot be computed. 

699 

700 Examples: 

701 >>> import numpy as np 

702 >>> import polars as pl 

703 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine 

704 >>> dates = pl.Series("date", list(range(200))) 

705 >>> rng = np.random.default_rng(0) 

706 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)}) 

707 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)}) 

708 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6) 

709 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg) 

710 >>> s = engine.naive_sharpe 

711 >>> isinstance(s, float) 

712 True 

713 """ 

714 naive_mu = self.mu.with_columns(pl.lit(1.0).alias(asset) for asset in self.assets) 

715 engine = BasanosEngine(prices=self.prices, mu=naive_mu, cfg=self.cfg) 

716 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan")) 

717 

718 # ------------------------------------------------------------------ 

719 # Reporting 

720 # ------------------------------------------------------------------ 

721 

722 @property 

723 def config_report(self) -> "ConfigReport": 

724 """Return a `ConfigReport` facade for this engine. 

725 

726 Returns a `ConfigReport` that 

727 includes the full **lambda-sweep chart** — an interactive plot of the 

728 annualised Sharpe ratio as `shrink` (λ) is swept 

729 across [0, 1] — in addition to the parameter table, shrinkage-guidance 

730 table, and theory section available from 

731 `report`. 

732 

733 Returns: 

734 basanos.math._config_report.ConfigReport: Report facade with 

735 ``to_html()`` and ``save()`` methods. 

736 

737 Examples: 

738 >>> import numpy as np 

739 >>> import polars as pl 

740 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine 

741 >>> dates = pl.Series("date", list(range(200))) 

742 >>> rng = np.random.default_rng(0) 

743 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)}) 

744 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)}) 

745 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6) 

746 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg) 

747 >>> report = engine.config_report 

748 >>> html = report.to_html() 

749 >>> "Lambda" in html 

750 True 

751 """ 

752 from ._config_report import ConfigReport 

753 

754 return ConfigReport(config=self.cfg, engine=self) 

755 

756 # ------------------------------------------------------------------ 

757 # Matrix diagnostics — inherited from _DiagnosticsMixin 

758 # ------------------------------------------------------------------ 

759 # (condition_number, effective_rank, solver_residual, signal_utilisation) 

760 # Implementations live in _engine_diagnostics.py; patch targets remain in 

761 # that module's namespace, e.g. 

762 # ``patch("basanos.math._engine_diagnostics.solve")``. 

763 

764 # ------------------------------------------------------------------ 

765 # Signal evaluation — inherited from _SignalEvaluatorMixin 

766 # ------------------------------------------------------------------ 

767 # (_ic_series, ic, rank_ic, ic_mean, ic_std, icir, 

768 # rank_ic_mean, rank_ic_std) 

769 # Implementations live in _engine_ic.py.