Coverage for src/basanos/math/optimizer.py: 100%

1"""Correlation-aware risk position optimizer (Basanos).

3This module provides utilities to compute correlation-adjusted risk positions

4from price data and expected-return signals. It relies on volatility-adjusted

5returns to estimate a dynamic correlation matrix (via EWM), applies shrinkage

6towards identity, and solves a normalized linear system per timestamp to

7obtain stable positions.

9Performance characteristics

10---------------------------

11Let *N* be the number of assets and *T* the number of timestamps.

13**Computational complexity**

15+----------------------------------+------------------+--------------------------------------+

16| Operation | Complexity | Bottleneck |

17+==================================+==================+======================================+

18| EWM volatility (``ret_adj``, | O(T·N) | Linear in both T and N; negligible |

19| ``vola``) | | |

20+----------------------------------+------------------+--------------------------------------+

21| EWM correlation (``cor``) | O(T·N²) | ``lfilter`` over all N² asset pairs |

22| | | simultaneously |

23+----------------------------------+------------------+--------------------------------------+

24| Linear solve per timestamp | O(N³) | Cholesky / LU per row in |

25| (``cash_position``) | * T solves | ``cash_position`` |

26+----------------------------------+------------------+--------------------------------------+

28**Memory usage** (peak, approximate)

30``ewm_corr`` allocates roughly **14 float64 arrays** of shape

31``(T, N, N)`` at peak (input sequences, IIR filter outputs, EWM components,

32and the result tensor). Peak RAM ≈ **112 * T * N²** bytes. Typical

33working sizes on a 16 GB machine:

35+--------+--------------------------+------------------------------------+

36| N | T (daily rows) | Peak memory (approx.) |

37+========+==========================+====================================+

38| 50 | 252 (~1 yr) | ~70 MB |

39+--------+--------------------------+------------------------------------+

40| 100 | 252 (~1 yr) | ~280 MB |

41+--------+--------------------------+------------------------------------+

42| 100 | 2 520 (~10 yr) | ~2.8 GB |

43+--------+--------------------------+------------------------------------+

44| 200 | 2 520 (~10 yr) | ~11 GB |

45+--------+--------------------------+------------------------------------+

46| 500 | 2 520 (~10 yr) | ~70 GB ⚠ exceeds typical RAM |

47+--------+--------------------------+------------------------------------+

49**Practical limits (daily data)**

51* **≤ 150 assets, ≤ 5 years** — well within reach on an 8 GB laptop.

52* **≤ 250 assets, ≤ 10 years** — requires ~11-12 GB; feasible on a 16 GB

53 workstation.

54* **> 500 assets with multi-year history** — peak memory exceeds 16 GB;

55 reduce the time range or switch to a chunked / streaming approach.

56* **> 1 000 assets** — the O(N³) per-solve cost alone makes real-time

57 optimization impractical even with adequate RAM.

59See ``BENCHMARKS.md`` for measured wall-clock timings across representative

60dataset sizes.

62Internal structure

63------------------

64The implementation is split across focused private modules to keep each file

65readable and independently testable:

67* `_config` — `BasanosConfig` and all

68 covariance-mode configuration classes.

69* `_ewm_corr` — `ewm_corr`, the vectorised

70 IIR-filter implementation of per-row EWM correlation matrices.

71* `_engine_solve` — private helpers providing the

72 ``_iter_matrices`` and ``_iter_solve`` generators (per-timestamp solve

73 logic).

74* `_engine_diagnostics` — private helpers providing

75 matrix-quality diagnostics (condition number, effective rank, solver

76 residual, signal utilisation).

77* `_engine_ic` — private helpers providing signal

78 evaluation metrics (IC, Rank IC, ICIR, and summary statistics).

79* This module — `BasanosEngine`, a single flat class that wires

80 every method together in clearly delimited sections.

81"""

83import dataclasses

84import datetime

85import logging

86from typing import TYPE_CHECKING

88import numpy as np

89import polars as pl

90from jquantstats import Portfolio

92from ..exceptions import (

93 ColumnMismatchError,

94 ExcessiveNullsError,

95 MissingDateColumnError,

96 MonotonicPricesError,

97 NonPositivePricesError,

98 ShapeMismatchError,

99)

100from ._config import (

101 BasanosConfig,

102 CovarianceConfig,

103 CovarianceMode,

104 EwmaShrinkConfig,

105 SlidingWindowConfig,

106)

107from ._engine_diagnostics import _DiagnosticsMixin as _DiagnosticsMixin

108from ._engine_ic import _SignalEvaluatorMixin as _SignalEvaluatorMixin

109from ._engine_solve import _SolveMixin as _SolveMixin

110from ._ewm_corr import ewm_corr as _ewm_corr_numpy

111from ._signal import vol_adj

112

113if TYPE_CHECKING:

114 from ._config_report import ConfigReport

115

116_logger = logging.getLogger(__name__)

117

118

119def _validate_inputs(prices: pl.DataFrame, mu: pl.DataFrame, cfg: "BasanosConfig") -> None:

120 """Validate ``prices``, ``mu``, and ``cfg`` for use with `BasanosEngine`.

121

122 Checks that both DataFrames contain a ``'date'`` column, share identical

123 shapes and column sets, contain no non-positive prices, no excessive NaN

124 fractions, and no monotonically non-varying price series. Also emits a

125 warning when the dataset is too short relative to a configured

126 sliding-window size.

127

128 Args:

129 prices: DataFrame of price levels per asset over time.

130 mu: DataFrame of expected-return signals aligned with ``prices``.

131 cfg: Engine configuration instance.

132

133 Raises:

134 MissingDateColumnError: If ``'date'`` is absent from either frame.

135 ShapeMismatchError: If ``prices`` and ``mu`` have different shapes.

136 ColumnMismatchError: If the column sets of the two frames differ.

137 NonPositivePricesError: If any asset contains a non-positive price.

138 ExcessiveNullsError: If any asset column exceeds ``cfg.max_nan_fraction``.

139 MonotonicPricesError: If any asset price series is monotonically

140 non-decreasing or non-increasing.

141

142 Warns:

143 UserWarning (via logging): If ``cfg.covariance`` is a

144 `SlidingWindowConfig` and

145 ``len(prices) < 2 * cfg.covariance.window``, a warning is emitted

146 via the module logger rather than an exception. This is a

147 deliberate soft boundary — callers may intentionally supply data

148 shorter than the full warm-up period. During warm-up the first

149 ``window - 1`` timestamps will yield zero positions.

150 """

151 # ensure 'date' column exists in prices before any other validation

152 if "date" not in prices.columns:

153 raise MissingDateColumnError("prices")

154

155 # ensure 'date' column exists in mu as well (kept for symmetry and downstream assumptions)

156 if "date" not in mu.columns:

157 raise MissingDateColumnError("mu")

158

159 # check that prices and mu have the same shape

160 if prices.shape != mu.shape:

161 raise ShapeMismatchError(prices.shape, mu.shape)

162

163 # check that the columns of prices and mu are identical

164 if not set(prices.columns) == set(mu.columns):

165 raise ColumnMismatchError(prices.columns, mu.columns)

166

167 assets = [c for c in prices.columns if c != "date" and prices[c].dtype.is_numeric()]

168

169 # check for non-positive prices: log returns require strictly positive prices

170 for asset in assets:

171 col = prices[asset].drop_nulls()

172 if col.len() > 0 and (col <= 0).any():

173 raise NonPositivePricesError(asset)

174

175 # check for excessive NaN values: more than cfg.max_nan_fraction null in any asset column

176 n_rows = prices.height

177 if n_rows > 0:

178 for asset in assets:

179 nan_frac = prices[asset].null_count() / n_rows

180 if nan_frac > cfg.max_nan_fraction:

181 raise ExcessiveNullsError(asset, nan_frac, cfg.max_nan_fraction)

182

183 # check for monotonic price series: a strictly non-decreasing or non-increasing

184 # series has no variance in its return sign, indicating malformed or synthetic data

185 for asset in assets:

186 col = prices[asset].drop_nulls()

187 if col.len() > 2:

188 diffs = col.diff().drop_nulls()

189 if (diffs >= 0).all() or (diffs <= 0).all():

190 raise MonotonicPricesError(asset)

191

192 # warn when the dataset is too short to benefit from the sliding window

193 if cfg.covariance_mode == CovarianceMode.sliding_window and cfg.window is not None:

194 w: int = cfg.window

195 if n_rows < 2 * w:

196 _logger.warning(

197 "Dataset length (%d rows) is less than 2 * window (%d). "

198 "The first %d timestamps will yield zero positions during warm-up; "

199 "consider using a longer history or reducing 'window'.",

200 n_rows,

201 2 * w,

202 w - 1,

203 )

204

205

206# ---------------------------------------------------------------------------

207# Re-export config symbols so ``from basanos.math.optimizer import …`` keeps

208# working for existing callers.

209# ---------------------------------------------------------------------------

210__all__ = [

211 "BasanosConfig",

212 "BasanosEngine",

213 "CovarianceConfig",

214 "CovarianceMode",

215 "EwmaShrinkConfig",

216 "SlidingWindowConfig",

217]

218

219

220@dataclasses.dataclass(frozen=True)

221class BasanosEngine(_DiagnosticsMixin, _SignalEvaluatorMixin, _SolveMixin):

222 """Engine to compute correlation matrices and optimize risk positions.

223

224 Encapsulates price data and configuration to build EWM-based

225 correlations, apply shrinkage, and solve for normalized positions.

226

227 Public methods are organised into clearly delimited sections (some

228 inherited from the private mixin classes):

229

230 * **Core data access** — `assets`, `ret_adj`, `vola`, `cor`, `cor_tensor`

231 * **Solve / position logic** — `cash_position`, `position_status`,

232 `risk_position`, `position_leverage`, `warmup_state`

233 * **Portfolio and performance** — `portfolio`, `naive_sharpe`,

234 `sharpe_at_shrink`, `sharpe_at_window_factors`

235 * **Matrix diagnostics** — `condition_number`, `effective_rank`,

236 `solver_residual`, `signal_utilisation`

237 * **Signal evaluation** — `ic`, `rank_ic`, `ic_mean`, `ic_std`, `icir`,

238 `rank_ic_mean`, `rank_ic_std`

239 * **Reporting** — `config_report`

240

241 Data-flow diagram

242 -----------------

243

244 .. code-block:: text

245

246 prices (pl.DataFrame)

247 │

248 ├─ vol_adj ──► ret_adj (volatility-adjusted log returns)

249 │ │

250 │ ├─ ewm_corr ──► cor / cor_tensor

251 │ │ │

252 │ │ └─ shrink2id / FactorModel

253 │ │ │

254 │ vola covariance matrix

255 │ │ │

256 └── mu ──────────┴── _iter_solve ──────────┘

257 │

258 cash_position

259 │

260 ┌────────┴────────┐

261 portfolio diagnostics

262 (Portfolio) (condition_number,

263 effective_rank,

264 solver_residual,

265 signal_utilisation,

266 ic, rank_ic, …)

267

268 Attributes:

269 prices: Polars DataFrame of price levels per asset over time. Must

270 contain a ``'date'`` column and at least one numeric asset column

271 with strictly positive values that are not monotonically

272 non-decreasing or non-increasing (i.e. they must vary in sign).

273 mu: Polars DataFrame of expected-return signals aligned with *prices*.

274 Must share the same shape and column names as *prices*.

275 cfg: Immutable `BasanosConfig` controlling EWMA half-lives,

276 clipping, shrinkage intensity, and AUM.

277

278 Examples:

279 Build an engine with two synthetic assets over 30 days and inspect the

280 optimized positions and diagnostic properties.

281

282 >>> import numpy as np

283 >>> import polars as pl

284 >>> from basanos.math import BasanosConfig, BasanosEngine

285 >>> dates = list(range(30))

286 >>> rng = np.random.default_rng(42)

287 >>> prices = pl.DataFrame({

288 ... "date": dates,

289 ... "A": np.cumprod(1 + rng.normal(0.001, 0.02, 30)) * 100.0,

290 ... "B": np.cumprod(1 + rng.normal(0.001, 0.02, 30)) * 150.0,

291 ... })

292 >>> mu = pl.DataFrame({

293 ... "date": dates,

294 ... "A": rng.normal(0.0, 0.5, 30),

295 ... "B": rng.normal(0.0, 0.5, 30),

296 ... })

297 >>> cfg = BasanosConfig(vola=5, corr=10, clip=2.0, shrink=0.5, aum=1_000_000)

298 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)

299 >>> engine.assets

300 ['A', 'B']

301 >>> engine.cash_position.shape

302 (30, 3)

303 >>> engine.position_leverage.columns

304 ['date', 'leverage']

305 """

306

307 prices: pl.DataFrame

308 mu: pl.DataFrame

309 cfg: BasanosConfig

310

311 def __post_init__(self) -> None:

312 """Validate inputs by delegating to `_validate_inputs`."""

313 _validate_inputs(self.prices, self.mu, self.cfg)

314

315 # ------------------------------------------------------------------

316 # Core data-access properties

317 # ------------------------------------------------------------------

318

319 @property

320 def assets(self) -> list[str]:

321 """List asset column names (numeric columns excluding 'date')."""

322 return [c for c in self.prices.columns if c != "date" and self.prices[c].dtype.is_numeric()]

323

324 @property

325 def ret_adj(self) -> pl.DataFrame:

326 """Return per-asset volatility-adjusted log returns clipped by cfg.clip.

327

328 Uses an EWMA volatility estimate with lookback ``cfg.vola`` to

329 standardize log returns for each numeric asset column.

330 """

331 return self.prices.with_columns(

332 [vol_adj(pl.col(asset), vola=self.cfg.vola, clip=self.cfg.clip) for asset in self.assets]

333 )

334

335 @property

336 def vola(self) -> pl.DataFrame:

337 """Per-asset EWMA volatility of percentage returns.

338

339 Computes percent changes for each numeric asset column and applies an

340 exponentially weighted standard deviation using the lookback specified

341 by ``cfg.vola``. The result is a DataFrame aligned with ``self.prices``

342 whose numeric columns hold per-asset volatility estimates.

343 """

344 return self.prices.with_columns(

345 pl.col(asset)

346 .pct_change()

347 .ewm_std(com=self.cfg.vola - 1, adjust=True, min_samples=self.cfg.vola)

348 .alias(asset)

349 for asset in self.assets

350 )

351

352 @property

353 def cor(self) -> dict[datetime.date, np.ndarray]:

354 """Compute per-timestamp EWM correlation matrices.

355

356 Builds volatility-adjusted returns for all assets, computes an

357 exponentially weighted correlation using a pure NumPy implementation

358 (with window ``cfg.corr``), and returns a mapping from each timestamp

359 to the corresponding correlation matrix as a NumPy array.

360

361 Returns:

362 dict: Mapping ``date -> np.ndarray`` of shape (n_assets, n_assets).

363

364 Performance:

365 Delegates to `ewm_corr`, which is O(T·N²) in both

366 time and memory. The returned dict holds *T* references into the

367 result tensor (one N*N view per date); no extra copies are made.

368 For large *N* or *T*, prefer ``cor_tensor`` to keep a single

369 contiguous array rather than building a Python dict.

370 """

371 index = self.prices["date"]

372 ret_adj_np = self.ret_adj.select(self.assets).to_numpy()

373 tensor = _ewm_corr_numpy(

374 ret_adj_np,

375 com=self.cfg.corr,

376 min_periods=self.cfg.corr,

377 min_corr_denom=self.cfg.min_corr_denom,

378 )

379 return {index[t]: tensor[t] for t in range(len(index))}

380

381 @property

382 def cor_tensor(self) -> np.ndarray:

383 """Return all correlation matrices stacked as a 3-D tensor.

384

385 Converts the per-timestamp correlation dict (see `cor`) into a

386 single contiguous NumPy array so that the full history can be saved to

387 a flat ``.npy`` file with `save` and reloaded with

388 `load`.

389

390 Returns:

391 np.ndarray: Array of shape ``(T, N, N)`` where *T* is the number of

392 timestamps and *N* the number of assets. ``tensor[t]`` is the

393 correlation matrix for the *t*-th date (same ordering as

394 ``self.prices["date"]``).

395

396 Examples:

397 >>> import tempfile, pathlib

398 >>> import numpy as np

399 >>> import polars as pl

400 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine

401 >>> dates = pl.Series("date", list(range(100)))

402 >>> rng0 = np.random.default_rng(0).lognormal(size=100)

403 >>> rng1 = np.random.default_rng(1).lognormal(size=100)

404 >>> prices = pl.DataFrame({"date": dates, "A": rng0, "B": rng1})

405 >>> rng2 = np.random.default_rng(2).normal(size=100)

406 >>> rng3 = np.random.default_rng(3).normal(size=100)

407 >>> mu = pl.DataFrame({"date": dates, "A": rng2, "B": rng3})

408 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)

409 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)

410 >>> tensor = engine.cor_tensor

411 >>> with tempfile.TemporaryDirectory() as td:

412 ... path = pathlib.Path(td) / "cor.npy"

413 ... np.save(path, tensor)

414 ... loaded = np.load(path)

415 >>> np.testing.assert_array_equal(tensor, loaded)

416 """

417 return np.stack(list(self.cor.values()), axis=0)

418

419 # ------------------------------------------------------------------

420 # Internal solve helpers — inherited from _SolveMixin

421 # ------------------------------------------------------------------

422 # (_compute_mask, _check_signal, _scale_to_cash, _row_early_check,

423 # _denom_guard_yield, _compute_position, _replay_positions,

424 # _iter_matrices, _iter_solve, warmup_state)

425 # Implementations live in _engine_solve.py; patch targets remain in that

426 # module's namespace, e.g. ``patch("basanos.math._engine_solve.solve")``.

427

428 # ------------------------------------------------------------------

429 # Position properties

430 # ------------------------------------------------------------------

431

432 @property

433 def cash_position(self) -> pl.DataFrame:

434 r"""Optimize correlation-aware risk positions for each timestamp.

435

436 Supports two covariance modes controlled by ``cfg.covariance_config``:

437

438 * `EwmaShrinkConfig` (default): Computes EWMA correlations, applies

439 linear shrinkage toward the identity, and solves a normalised linear

440 system $C\,x = \mu$ per timestamp via Cholesky / LU.

441

442 * `SlidingWindowConfig`: At each timestamp uses the

443 ``cfg.covariance_config.window`` most recent vol-adjusted returns to fit a

444 rank-``cfg.covariance_config.n_factors`` factor model via truncated SVD and

445 solves the system via the Woodbury identity at $O(k^3 + kn)$ rather

446 than $O(n^3)$ per step.

447

448 Non-finite or ill-posed cases yield zero positions for safety.

449

450 Returns:

451 pl.DataFrame: DataFrame with columns ['date'] + asset names containing

452 the per-timestamp cash positions (risk divided by EWMA volatility).

453

454 Performance:

455 For ``ewma_shrink``: dominant cost is ``self.cor`` (O(T·N²) time,

456 O(T·N²) memory — see `ewm_corr`). The per-timestamp

457 linear solve adds O(N³) per row.

458

459 For ``sliding_window``: O(T·W·N·k) for sliding SVDs plus

460 O(T·(k³ + kN)) for Woodbury solves. Memory is O(W·N) per step,

461 independent of T.

462 """

463 assets = self.assets

464

465 # Compute risk positions row-by-row using _replay_positions.

466 prices_num = self.prices.select(assets).to_numpy()

467

468 risk_pos_np = np.full_like(prices_num, fill_value=np.nan, dtype=float)

469 cash_pos_np = np.full_like(prices_num, fill_value=np.nan, dtype=float)

470 vola_np = self.vola.select(assets).to_numpy()

471

472 self._replay_positions(risk_pos_np, cash_pos_np, vola_np)

473

474 # Build Polars DataFrame for cash positions (numeric columns only)

475 cash_position = self.prices.with_columns(

476 [(pl.lit(cash_pos_np[:, i]).alias(asset)) for i, asset in enumerate(assets)]

477 )

478

479 return cash_position

480

481 @property

482 def position_status(self) -> pl.DataFrame:

483 """Per-timestamp reason code explaining each `cash_position` row.

484

485 Labels every row with exactly one of four `SolveStatus`

486 codes (which compare equal to their string equivalents):

487

488 * ``'warmup'``: Insufficient history for the sliding-window

489 covariance mode (``i + 1 < cfg.covariance_config.window``).

490 Positions are ``NaN`` for all assets at this timestamp.

491 * ``'zero_signal'``: The expected-return vector ``mu`` was

492 all-zeros (or all-NaN) at this timestamp; the optimizer

493 short-circuited and returned zero positions without solving.

494 * ``'degenerate'``: The normalisation denominator was non-finite

495 or below ``cfg.denom_tol``, the Cholesky / Woodbury solve

496 failed, or no asset had a finite price; positions were zeroed

497 for safety.

498 * ``'valid'``: The linear system was solved successfully and

499 positions are non-trivially non-zero.

500

501 The codes map one-to-one onto the three NaN / zero cases

502 described in the issue and allow downstream consumers (backtests,

503 risk monitors) to distinguish data gaps from signal silence from

504 numerical ill-conditioning without re-inspecting ``mu`` or the

505 engine configuration.

506

507 Returns:

508 pl.DataFrame: Two-column DataFrame ``{'date': ..., 'status': ...}``

509 with one row per timestamp. The ``status`` column has

510 ``Polars`` dtype ``String``.

511 """

512 statuses = [status for _i, _t, _mask, _pos, status in self._iter_solve()]

513 return pl.DataFrame({"date": self.prices["date"], "status": pl.Series(statuses, dtype=pl.String)})

514

515 @property

516 def risk_position(self) -> pl.DataFrame:

517 """Risk positions (before EWMA-volatility scaling) at each timestamp.

518

519 Derives the un-volatility-scaled position by multiplying the cash

520 position by the per-asset EWMA volatility. Equivalently, this is

521 the quantity solved by the correlation-adjusted linear system before

522 dividing by ``vola``.

523

524 Relationship to other properties::

525

526 cash_position = risk_position / vola

527 risk_position = cash_position * vola

528

529 Returns:

530 pl.DataFrame: DataFrame with columns ``['date'] + assets`` where

531 each value is ``cash_position_i * vola_i`` at the given timestamp.

532 """

533 assets = self.assets

534 cp_np = self.cash_position.select(assets).to_numpy()

535 vola_np = self.vola.select(assets).to_numpy()

536 with np.errstate(invalid="ignore"):

537 risk_pos = cp_np * vola_np

538 return self.prices.with_columns([pl.lit(risk_pos[:, i]).alias(asset) for i, asset in enumerate(assets)])

539

540 @property

541 def position_leverage(self) -> pl.DataFrame:

542 """L1 norm of cash positions (gross leverage) at each timestamp.

543

544 Sums the absolute values of all asset cash positions at each row.

545 NaN positions are treated as zero (they contribute nothing to gross

546 leverage).

547

548 Returns:

549 pl.DataFrame: Two-column DataFrame ``{'date': ..., 'leverage': ...}``

550 where ``leverage`` is the L1 norm of the cash-position vector.

551 """

552 assets = self.assets

553 cp_np = self.cash_position.select(assets).to_numpy()

554 leverage = np.nansum(np.abs(cp_np), axis=1)

555 return pl.DataFrame({"date": self.prices["date"], "leverage": pl.Series(leverage, dtype=pl.Float64)})

556

557 # ------------------------------------------------------------------

558 # Portfolio and performance

559 # ------------------------------------------------------------------

560

561 @property

562 def portfolio(self) -> Portfolio:

563 """Construct a Portfolio from the optimized cash positions.

564

565 Converts the computed cash positions into a Portfolio using the

566 configured AUM. The ``cost_per_unit`` from `cfg` is forwarded

567 so that `net_cost_nav` and

568 `position_delta_costs` work out

569 of the box without any further configuration.

570

571 Returns:

572 Portfolio: Instance built from cash positions with AUM scaling.

573 """

574 cp = self.cash_position

575 assets = [c for c in cp.columns if c != "date" and cp[c].dtype.is_numeric()]

576 scaled = cp.with_columns(pl.col(a) * self.cfg.position_scale for a in assets)

577 return Portfolio.from_cash_position(self.prices, scaled, aum=self.cfg.aum, cost_per_unit=self.cfg.cost_per_unit)

578

579 def sharpe_at_shrink(self, shrink: float) -> float:

580 r"""Return the annualised portfolio Sharpe ratio for the given shrinkage weight.

581

582 Constructs a new `BasanosEngine` with all parameters identical to

583 ``self`` except that ``cfg.shrink`` is replaced by ``shrink``, then

584 returns the annualised Sharpe ratio of the resulting portfolio.

585

586 This is the canonical single-argument callable required by the benchmarks

587 specification: ``f(λ) → Sharpe``. Use it to sweep λ across ``[0, 1]``

588 and measure whether correlation adjustment adds value over the

589 signal-proportional baseline (λ = 0) or the unregularised limit (λ = 1).

590

591 Corner cases:

592 * **λ = 0** — the shrunk matrix equals the identity, so the

593 optimiser treats all assets as uncorrelated and positions are

594 purely signal-proportional (no correlation adjustment).

595 * **λ = 1** — the raw EWMA correlation matrix is used without

596 shrinkage.

597

598 Args:

599 shrink: Retention weight λ ∈ [0, 1]. See

600 `shrink` for full documentation.

601

602 Returns:

603 Annualised Sharpe ratio of the portfolio returns as a ``float``.

604 Returns ``float("nan")`` when the Sharpe ratio cannot be computed

605 (e.g. zero-variance returns).

606

607 Raises:

608 ValidationError: When ``shrink`` is outside [0, 1] (delegated to

609 `BasanosConfig` field validation).

610

611 Examples:

612 >>> import numpy as np

613 >>> import polars as pl

614 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine

615 >>> dates = pl.Series("date", list(range(200)))

616 >>> rng = np.random.default_rng(0)

617 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})

618 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})

619 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)

620 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)

621 >>> s = engine.sharpe_at_shrink(0.5)

622 >>> isinstance(s, float)

623 True

624 """

625 new_cfg = self.cfg.replace(shrink=shrink)

626 engine = BasanosEngine(prices=self.prices, mu=self.mu, cfg=new_cfg)

627 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan"))

628

629 def sharpe_at_window_factors(self, window: int, n_factors: int) -> float:

630 r"""Return the annualised portfolio Sharpe ratio for the given sliding-window parameters.

631

632 Constructs a new `BasanosEngine` with ``covariance_mode`` set to

633 ``"sliding_window"`` and the supplied ``window`` / ``n_factors``, keeping

634 all other configuration identical to ``self``.

635

636 Use this method to sweep ``(W, k)`` and compare the sliding-window

637 estimator against the EWMA baseline (via `sharpe_at_shrink`).

638

639 Args:

640 window: Rolling window length $W \geq 1$.

641 Rule of thumb: $W \geq 2 \cdot n_{\text{assets}}$.

642 n_factors: Number of latent factors $k \geq 1$.

643

644 Returns:

645 Annualised Sharpe ratio of the portfolio returns as a ``float``.

646 Returns ``float("nan")`` when the Sharpe ratio cannot be computed

647 (e.g. not enough history to fill the first window).

648

649 Raises:

650 ValidationError: When ``window`` or ``n_factors`` fail field

651 constraints (delegated to `BasanosConfig`).

652

653 Examples:

654 >>> import numpy as np

655 >>> import polars as pl

656 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine

657 >>> dates = pl.Series("date", list(range(200)))

658 >>> rng = np.random.default_rng(0)

659 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})

660 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})

661 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)

662 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)

663 >>> s = engine.sharpe_at_window_factors(window=40, n_factors=2)

664 >>> isinstance(s, float)

665 True

666 """

667 new_cfg = self.cfg.replace(

668 covariance_config=SlidingWindowConfig(window=window, n_factors=n_factors),

669 )

670 engine = BasanosEngine(prices=self.prices, mu=self.mu, cfg=new_cfg)

671 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan"))

672

673 @property

674 def naive_sharpe(self) -> float:

675 r"""Sharpe ratio of the naïve equal-weight signal (μ = 1 for every asset/timestamp).

676

677 Replaces the expected-return signal ``mu`` with a constant matrix of

678 ones, then runs the optimiser with the current configuration and returns

679 the annualised Sharpe ratio of the resulting portfolio.

680

681 This provides the baseline answer to *"does the signal add value?"*:

682 a real signal should produce a higher Sharpe than the naïve benchmark.

683 Combined with `sharpe_at_shrink`, this yields a three-way

684 comparison:

685

686 +--------------------+----------------------------------------------+

687 | Benchmark | What it measures |

688 +====================+==============================================+

689 | ``naive_sharpe`` | No signal skill; pure correlation routing |

690 +--------------------+----------------------------------------------+

691 | ``sharpe_at_shrink(0.0)`` | Signal skill, no correlation adj. |

692 +--------------------+----------------------------------------------+

693 | ``sharpe_at_shrink(cfg.shrink)`` | Signal + correlation adj. |

694 +--------------------+----------------------------------------------+

695

696 Returns:

697 Annualised Sharpe ratio of the equal-weight portfolio as a ``float``.

698 Returns ``float("nan")`` when the Sharpe ratio cannot be computed.

699

700 Examples:

701 >>> import numpy as np

702 >>> import polars as pl

703 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine

704 >>> dates = pl.Series("date", list(range(200)))

705 >>> rng = np.random.default_rng(0)

706 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})

707 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})

708 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)

709 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)

710 >>> s = engine.naive_sharpe

711 >>> isinstance(s, float)

712 True

713 """

714 naive_mu = self.mu.with_columns(pl.lit(1.0).alias(asset) for asset in self.assets)

715 engine = BasanosEngine(prices=self.prices, mu=naive_mu, cfg=self.cfg)

716 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan"))

717

718 # ------------------------------------------------------------------

719 # Reporting

720 # ------------------------------------------------------------------

721

722 @property

723 def config_report(self) -> "ConfigReport":

724 """Return a `ConfigReport` facade for this engine.

725

726 Returns a `ConfigReport` that

727 includes the full **lambda-sweep chart** — an interactive plot of the

728 annualised Sharpe ratio as `shrink` (λ) is swept

729 across [0, 1] — in addition to the parameter table, shrinkage-guidance

730 table, and theory section available from

731 `report`.

732

733 Returns:

734 basanos.math._config_report.ConfigReport: Report facade with

735 ``to_html()`` and ``save()`` methods.

736

737 Examples:

738 >>> import numpy as np

739 >>> import polars as pl

740 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine

741 >>> dates = pl.Series("date", list(range(200)))

742 >>> rng = np.random.default_rng(0)

743 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})

744 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})

745 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)

746 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)

747 >>> report = engine.config_report

748 >>> html = report.to_html()

749 >>> "Lambda" in html

750 True

751 """

752 from ._config_report import ConfigReport

753

754 return ConfigReport(config=self.cfg, engine=self)

755

756 # ------------------------------------------------------------------

757 # Matrix diagnostics — inherited from _DiagnosticsMixin

758 # ------------------------------------------------------------------

759 # (condition_number, effective_rank, solver_residual, signal_utilisation)

760 # Implementations live in _engine_diagnostics.py; patch targets remain in

761 # that module's namespace, e.g.

762 # ``patch("basanos.math._engine_diagnostics.solve")``.

763

764 # ------------------------------------------------------------------

765 # Signal evaluation — inherited from _SignalEvaluatorMixin

766 # ------------------------------------------------------------------

767 # (_ic_series, ic, rank_ic, ic_mean, ic_std, icir,

768 # rank_ic_mean, rank_ic_std)

769 # Implementations live in _engine_ic.py.

Coverage for src / basanos / math / optimizer.py: 100%

122 statements