Coverage for src/basanos/math/optimizer.py: 100%

1"""Correlation-aware risk position optimizer (Basanos).

3This module provides utilities to compute correlation-adjusted risk positions

4from price data and expected-return signals. It relies on volatility-adjusted

5returns to estimate a dynamic correlation matrix (via EWM), applies shrinkage

6towards identity, and solves a normalized linear system per timestamp to

7obtain stable positions.

9Performance characteristics

10---------------------------

11Let *N* be the number of assets and *T* the number of timestamps.

13**Computational complexity**

15+----------------------------------+------------------+--------------------------------------+

16| Operation | Complexity | Bottleneck |

17+==================================+==================+======================================+

18| EWM volatility (``ret_adj``, | O(T·N) | Linear in both T and N; negligible |

19| ``vola``) | | |

20+----------------------------------+------------------+--------------------------------------+

21| EWM correlation (``cor``) | O(T·N²) | ``lfilter`` over all N² asset pairs |

22| | | simultaneously |

23+----------------------------------+------------------+--------------------------------------+

24| Linear solve per timestamp | O(N³) | Cholesky / LU per row in |

25| (``cash_position``) | * T solves | ``cash_position`` |

26+----------------------------------+------------------+--------------------------------------+

28**Memory usage** (peak, approximate)

30``ewm_corr`` allocates roughly **14 float64 arrays** of shape

31``(T, N, N)`` at peak (input sequences, IIR filter outputs, EWM components,

32and the result tensor). Peak RAM ≈ **112 * T * N²** bytes. Typical

33working sizes on a 16 GB machine:

35+--------+--------------------------+------------------------------------+

36| N | T (daily rows) | Peak memory (approx.) |

37+========+==========================+====================================+

38| 50 | 252 (~1 yr) | ~70 MB |

39+--------+--------------------------+------------------------------------+

40| 100 | 252 (~1 yr) | ~280 MB |

41+--------+--------------------------+------------------------------------+

42| 100 | 2 520 (~10 yr) | ~2.8 GB |

43+--------+--------------------------+------------------------------------+

44| 200 | 2 520 (~10 yr) | ~11 GB |

45+--------+--------------------------+------------------------------------+

46| 500 | 2 520 (~10 yr) | ~70 GB ⚠ exceeds typical RAM |

47+--------+--------------------------+------------------------------------+

49**Practical limits (daily data)**

51* **≤ 150 assets, ≤ 5 years** — well within reach on an 8 GB laptop.

52* **≤ 250 assets, ≤ 10 years** — requires ~11-12 GB; feasible on a 16 GB

53 workstation.

54* **> 500 assets with multi-year history** — peak memory exceeds 16 GB;

55 reduce the time range or switch to a chunked / streaming approach.

56* **> 1 000 assets** — the O(N³) per-solve cost alone makes real-time

57 optimization impractical even with adequate RAM.

59See ``BENCHMARKS.md`` for measured wall-clock timings across representative

60dataset sizes.

62Internal structure

63------------------

64The implementation is split across focused private modules to keep each file

65readable and independently testable:

67* :mod:`basanos.math._config` — :class:`BasanosConfig` and all

68 covariance-mode configuration classes.

69* :mod:`basanos.math._ewm_corr` — :func:`ewm_corr`, the vectorised

70 IIR-filter implementation of per-row EWM correlation matrices.

71* :mod:`basanos.math._engine_solve` — private helpers providing the

72 ``_iter_matrices`` and ``_iter_solve`` generators (per-timestamp solve

73 logic).

74* :mod:`basanos.math._engine_diagnostics` — private helpers providing

75 matrix-quality diagnostics (condition number, effective rank, solver

76 residual, signal utilisation).

77* :mod:`basanos.math._engine_ic` — private helpers providing signal

78 evaluation metrics (IC, Rank IC, ICIR, and summary statistics).

79* This module — :class:`BasanosEngine`, a single flat class that wires

80 every method together in clearly delimited sections.

81"""

83import dataclasses

84import datetime

85import logging

86from typing import TYPE_CHECKING

88import numpy as np

89import polars as pl

90from jquantstats import Portfolio

92from ..exceptions import (

93 ColumnMismatchError,

94 ExcessiveNullsError,

95 MissingDateColumnError,

96 MonotonicPricesError,

97 NonPositivePricesError,

98 ShapeMismatchError,

99)

100from ._config import (

101 BasanosConfig,

102 CovarianceConfig,

103 CovarianceMode,

104 EwmaShrinkConfig,

105 SlidingWindowConfig,

106)

107from ._engine_diagnostics import _DiagnosticsMixin as _DiagnosticsMixin

108from ._engine_ic import _SignalEvaluatorMixin as _SignalEvaluatorMixin

109from ._engine_solve import _SolveMixin as _SolveMixin

110from ._ewm_corr import ewm_corr as _ewm_corr_numpy

111from ._signal import vol_adj

112

113if TYPE_CHECKING:

114 from ._config_report import ConfigReport

115

116_logger = logging.getLogger(__name__)

117

118

119def _validate_inputs(prices: pl.DataFrame, mu: pl.DataFrame, cfg: "BasanosConfig") -> None:

120 """Validate ``prices``, ``mu``, and ``cfg`` for use with :class:`BasanosEngine`.

121

122 Checks that both DataFrames contain a ``'date'`` column, share identical

123 shapes and column sets, contain no non-positive prices, no excessive NaN

124 fractions, and no monotonically non-varying price series. Also emits a

125 warning when the dataset is too short relative to a configured

126 sliding-window size.

127

128 Args:

129 prices: DataFrame of price levels per asset over time.

130 mu: DataFrame of expected-return signals aligned with ``prices``.

131 cfg: Engine configuration instance.

132

133 Raises:

134 MissingDateColumnError: If ``'date'`` is absent from either frame.

135 ShapeMismatchError: If ``prices`` and ``mu`` have different shapes.

136 ColumnMismatchError: If the column sets of the two frames differ.

137 NonPositivePricesError: If any asset contains a non-positive price.

138 ExcessiveNullsError: If any asset column exceeds ``cfg.max_nan_fraction``.

139 MonotonicPricesError: If any asset price series is monotonically

140 non-decreasing or non-increasing.

141

142 Warns:

143 UserWarning (via logging): If ``cfg.covariance`` is a

144 :class:`SlidingWindowConfig` and

145 ``len(prices) < 2 * cfg.covariance.window``, a warning is emitted

146 via the module logger rather than an exception. This is a

147 deliberate soft boundary — callers may intentionally supply data

148 shorter than the full warm-up period. During warm-up the first

149 ``window - 1`` timestamps will yield zero positions.

150 """

151 # ensure 'date' column exists in prices before any other validation

152 if "date" not in prices.columns:

153 raise MissingDateColumnError("prices")

154

155 # ensure 'date' column exists in mu as well (kept for symmetry and downstream assumptions)

156 if "date" not in mu.columns:

157 raise MissingDateColumnError("mu")

158

159 # check that prices and mu have the same shape

160 if prices.shape != mu.shape:

161 raise ShapeMismatchError(prices.shape, mu.shape)

162

163 # check that the columns of prices and mu are identical

164 if not set(prices.columns) == set(mu.columns):

165 raise ColumnMismatchError(prices.columns, mu.columns)

166

167 assets = [c for c in prices.columns if c != "date" and prices[c].dtype.is_numeric()]

168

169 # check for non-positive prices: log returns require strictly positive prices

170 for asset in assets:

171 col = prices[asset].drop_nulls()

172 if col.len() > 0 and (col <= 0).any():

173 raise NonPositivePricesError(asset)

174

175 # check for excessive NaN values: more than cfg.max_nan_fraction null in any asset column

176 n_rows = prices.height

177 if n_rows > 0:

178 for asset in assets:

179 nan_frac = prices[asset].null_count() / n_rows

180 if nan_frac > cfg.max_nan_fraction:

181 raise ExcessiveNullsError(asset, nan_frac, cfg.max_nan_fraction)

182

183 # check for monotonic price series: a strictly non-decreasing or non-increasing

184 # series has no variance in its return sign, indicating malformed or synthetic data

185 for asset in assets:

186 col = prices[asset].drop_nulls()

187 if col.len() > 2:

188 diffs = col.diff().drop_nulls()

189 if (diffs >= 0).all() or (diffs <= 0).all():

190 raise MonotonicPricesError(asset)

191

192 # warn when the dataset is too short to benefit from the sliding window

193 if cfg.covariance_mode == CovarianceMode.sliding_window and cfg.window is not None:

194 w: int = cfg.window

195 if n_rows < 2 * w:

196 _logger.warning(

197 "Dataset length (%d rows) is less than 2 * window (%d). "

198 "The first %d timestamps will yield zero positions during warm-up; "

199 "consider using a longer history or reducing 'window'.",

200 n_rows,

201 2 * w,

202 w - 1,

203 )

204

205

206# ---------------------------------------------------------------------------

207# Re-export config symbols so ``from basanos.math.optimizer import …`` keeps

208# working for existing callers.

209# ---------------------------------------------------------------------------

210__all__ = [

211 "BasanosConfig",

212 "BasanosEngine",

213 "CovarianceConfig",

214 "CovarianceMode",

215 "EwmaShrinkConfig",

216 "SlidingWindowConfig",

217]

218

219

220@dataclasses.dataclass(frozen=True)

221class BasanosEngine(_DiagnosticsMixin, _SignalEvaluatorMixin, _SolveMixin):

222 """Engine to compute correlation matrices and optimize risk positions.

223

224 Encapsulates price data and configuration to build EWM-based

225 correlations, apply shrinkage, and solve for normalized positions.

226

227 Public methods are organised into clearly delimited sections (some

228 inherited from the private mixin classes):

229

230 * **Core data access** — :attr:`assets`, :attr:`ret_adj`, :attr:`vola`,

231 :attr:`cor`, :attr:`cor_tensor`

232 * **Solve / position logic** — :attr:`cash_position`,

233 :attr:`position_status`, :attr:`risk_position`,

234 :attr:`position_leverage`, :meth:`warmup_state`

235 (solve helpers inherited from :class:`~._engine_solve._SolveMixin`)

236 * **Portfolio and performance** — :attr:`portfolio`,

237 :attr:`naive_sharpe`, :meth:`sharpe_at_shrink`,

238 :meth:`sharpe_at_window_factors`

239 * **Matrix diagnostics** — :attr:`condition_number`,

240 :attr:`effective_rank`, :attr:`solver_residual`,

241 :attr:`signal_utilisation`

242 (inherited from :class:`~._engine_diagnostics._DiagnosticsMixin`)

243 * **Signal evaluation** — :attr:`ic`, :attr:`rank_ic`, :attr:`ic_mean`,

244 :attr:`ic_std`, :attr:`icir`, :attr:`rank_ic_mean`,

245 :attr:`rank_ic_std`

246 (inherited from :class:`~._engine_ic._SignalEvaluatorMixin`)

247 * **Reporting** — :attr:`config_report`

248

249 Data-flow diagram

250 -----------------

251

252 .. code-block:: text

253

254 prices (pl.DataFrame)

255 │

256 ├─ vol_adj ──► ret_adj (volatility-adjusted log returns)

257 │ │

258 │ ├─ ewm_corr ──► cor / cor_tensor

259 │ │ │

260 │ │ └─ shrink2id / FactorModel

261 │ │ │

262 │ vola covariance matrix

263 │ │ │

264 └── mu ──────────┴── _iter_solve ──────────┘

265 │

266 cash_position

267 │

268 ┌────────┴────────┐

269 portfolio diagnostics

270 (Portfolio) (condition_number,

271 effective_rank,

272 solver_residual,

273 signal_utilisation,

274 ic, rank_ic, …)

275

276 Attributes:

277 prices: Polars DataFrame of price levels per asset over time. Must

278 contain a ``'date'`` column and at least one numeric asset column

279 with strictly positive values that are not monotonically

280 non-decreasing or non-increasing (i.e. they must vary in sign).

281 mu: Polars DataFrame of expected-return signals aligned with *prices*.

282 Must share the same shape and column names as *prices*.

283 cfg: Immutable :class:`BasanosConfig` controlling EWMA half-lives,

284 clipping, shrinkage intensity, and AUM.

285

286 Examples:

287 Build an engine with two synthetic assets over 30 days and inspect the

288 optimized positions and diagnostic properties.

289

290 >>> import numpy as np

291 >>> import polars as pl

292 >>> from basanos.math import BasanosConfig, BasanosEngine

293 >>> dates = list(range(30))

294 >>> rng = np.random.default_rng(42)

295 >>> prices = pl.DataFrame({

296 ... "date": dates,

297 ... "A": np.cumprod(1 + rng.normal(0.001, 0.02, 30)) * 100.0,

298 ... "B": np.cumprod(1 + rng.normal(0.001, 0.02, 30)) * 150.0,

299 ... })

300 >>> mu = pl.DataFrame({

301 ... "date": dates,

302 ... "A": rng.normal(0.0, 0.5, 30),

303 ... "B": rng.normal(0.0, 0.5, 30),

304 ... })

305 >>> cfg = BasanosConfig(vola=5, corr=10, clip=2.0, shrink=0.5, aum=1_000_000)

306 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)

307 >>> engine.assets

308 ['A', 'B']

309 >>> engine.cash_position.shape

310 (30, 3)

311 >>> engine.position_leverage.columns

312 ['date', 'leverage']

313 """

314

315 prices: pl.DataFrame

316 mu: pl.DataFrame

317 cfg: BasanosConfig

318

319 def __post_init__(self) -> None:

320 """Validate inputs by delegating to :func:`_validate_inputs`."""

321 _validate_inputs(self.prices, self.mu, self.cfg)

322

323 # ------------------------------------------------------------------

324 # Core data-access properties

325 # ------------------------------------------------------------------

326

327 @property

328 def assets(self) -> list[str]:

329 """List asset column names (numeric columns excluding 'date')."""

330 return [c for c in self.prices.columns if c != "date" and self.prices[c].dtype.is_numeric()]

331

332 @property

333 def ret_adj(self) -> pl.DataFrame:

334 """Return per-asset volatility-adjusted log returns clipped by cfg.clip.

335

336 Uses an EWMA volatility estimate with lookback ``cfg.vola`` to

337 standardize log returns for each numeric asset column.

338 """

339 return self.prices.with_columns(

340 [vol_adj(pl.col(asset), vola=self.cfg.vola, clip=self.cfg.clip) for asset in self.assets]

341 )

342

343 @property

344 def vola(self) -> pl.DataFrame:

345 """Per-asset EWMA volatility of percentage returns.

346

347 Computes percent changes for each numeric asset column and applies an

348 exponentially weighted standard deviation using the lookback specified

349 by ``cfg.vola``. The result is a DataFrame aligned with ``self.prices``

350 whose numeric columns hold per-asset volatility estimates.

351 """

352 return self.prices.with_columns(

353 pl.col(asset)

354 .pct_change()

355 .ewm_std(com=self.cfg.vola - 1, adjust=True, min_samples=self.cfg.vola)

356 .alias(asset)

357 for asset in self.assets

358 )

359

360 @property

361 def cor(self) -> dict[datetime.date, np.ndarray]:

362 """Compute per-timestamp EWM correlation matrices.

363

364 Builds volatility-adjusted returns for all assets, computes an

365 exponentially weighted correlation using a pure NumPy implementation

366 (with window ``cfg.corr``), and returns a mapping from each timestamp

367 to the corresponding correlation matrix as a NumPy array.

368

369 Returns:

370 dict: Mapping ``date -> np.ndarray`` of shape (n_assets, n_assets).

371

372 Performance:

373 Delegates to :func:`ewm_corr`, which is O(T·N²) in both

374 time and memory. The returned dict holds *T* references into the

375 result tensor (one N*N view per date); no extra copies are made.

376 For large *N* or *T*, prefer ``cor_tensor`` to keep a single

377 contiguous array rather than building a Python dict.

378 """

379 index = self.prices["date"]

380 ret_adj_np = self.ret_adj.select(self.assets).to_numpy()

381 tensor = _ewm_corr_numpy(

382 ret_adj_np,

383 com=self.cfg.corr,

384 min_periods=self.cfg.corr,

385 min_corr_denom=self.cfg.min_corr_denom,

386 )

387 return {index[t]: tensor[t] for t in range(len(index))}

388

389 @property

390 def cor_tensor(self) -> np.ndarray:

391 """Return all correlation matrices stacked as a 3-D tensor.

392

393 Converts the per-timestamp correlation dict (see :py:attr:`cor`) into a

394 single contiguous NumPy array so that the full history can be saved to

395 a flat ``.npy`` file with :func:`numpy.save` and reloaded with

396 :func:`numpy.load`.

397

398 Returns:

399 np.ndarray: Array of shape ``(T, N, N)`` where *T* is the number of

400 timestamps and *N* the number of assets. ``tensor[t]`` is the

401 correlation matrix for the *t*-th date (same ordering as

402 ``self.prices["date"]``).

403

404 Examples:

405 >>> import tempfile, pathlib

406 >>> import numpy as np

407 >>> import polars as pl

408 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine

409 >>> dates = pl.Series("date", list(range(100)))

410 >>> rng0 = np.random.default_rng(0).lognormal(size=100)

411 >>> rng1 = np.random.default_rng(1).lognormal(size=100)

412 >>> prices = pl.DataFrame({"date": dates, "A": rng0, "B": rng1})

413 >>> rng2 = np.random.default_rng(2).normal(size=100)

414 >>> rng3 = np.random.default_rng(3).normal(size=100)

415 >>> mu = pl.DataFrame({"date": dates, "A": rng2, "B": rng3})

416 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)

417 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)

418 >>> tensor = engine.cor_tensor

419 >>> with tempfile.TemporaryDirectory() as td:

420 ... path = pathlib.Path(td) / "cor.npy"

421 ... np.save(path, tensor)

422 ... loaded = np.load(path)

423 >>> np.testing.assert_array_equal(tensor, loaded)

424 """

425 return np.stack(list(self.cor.values()), axis=0)

426

427 # ------------------------------------------------------------------

428 # Internal solve helpers — inherited from _SolveMixin

429 # ------------------------------------------------------------------

430 # (_compute_mask, _check_signal, _scale_to_cash, _row_early_check,

431 # _denom_guard_yield, _compute_position, _replay_positions,

432 # _iter_matrices, _iter_solve, warmup_state)

433 # Implementations live in _engine_solve.py; patch targets remain in that

434 # module's namespace, e.g. ``patch("basanos.math._engine_solve.solve")``.

435

436 # ------------------------------------------------------------------

437 # Position properties

438 # ------------------------------------------------------------------

439

440 @property

441 def cash_position(self) -> pl.DataFrame:

442 r"""Optimize correlation-aware risk positions for each timestamp.

443

444 Supports two covariance modes controlled by ``cfg.covariance_config``:

445

446 * :class:`EwmaShrinkConfig` (default): Computes EWMA correlations, applies

447 linear shrinkage toward the identity, and solves a normalised linear

448 system :math:`C\,x = \mu` per timestamp via Cholesky / LU.

449

450 * :class:`SlidingWindowConfig`: At each timestamp uses the

451 ``cfg.covariance_config.window`` most recent vol-adjusted returns to fit a

452 rank-``cfg.covariance_config.n_factors`` factor model via truncated SVD and

453 solves the system via the Woodbury identity at :math:`O(k^3 + kn)` rather

454 than :math:`O(n^3)` per step.

455

456 Non-finite or ill-posed cases yield zero positions for safety.

457

458 Returns:

459 pl.DataFrame: DataFrame with columns ['date'] + asset names containing

460 the per-timestamp cash positions (risk divided by EWMA volatility).

461

462 Performance:

463 For ``ewma_shrink``: dominant cost is ``self.cor`` (O(T·N²) time,

464 O(T·N²) memory — see :func:`ewm_corr`). The per-timestamp

465 linear solve adds O(N³) per row.

466

467 For ``sliding_window``: O(T·W·N·k) for sliding SVDs plus

468 O(T·(k³ + kN)) for Woodbury solves. Memory is O(W·N) per step,

469 independent of T.

470 """

471 assets = self.assets

472

473 # Compute risk positions row-by-row using _replay_positions.

474 prices_num = self.prices.select(assets).to_numpy()

475

476 risk_pos_np = np.full_like(prices_num, fill_value=np.nan, dtype=float)

477 cash_pos_np = np.full_like(prices_num, fill_value=np.nan, dtype=float)

478 vola_np = self.vola.select(assets).to_numpy()

479

480 self._replay_positions(risk_pos_np, cash_pos_np, vola_np)

481

482 # Build Polars DataFrame for cash positions (numeric columns only)

483 cash_position = self.prices.with_columns(

484 [(pl.lit(cash_pos_np[:, i]).alias(asset)) for i, asset in enumerate(assets)]

485 )

486

487 return cash_position

488

489 @property

490 def position_status(self) -> pl.DataFrame:

491 """Per-timestamp reason code explaining each :attr:`cash_position` row.

492

493 Labels every row with exactly one of four :class:`~basanos.math.SolveStatus`

494 codes (which compare equal to their string equivalents):

495

496 * ``'warmup'``: Insufficient history for the sliding-window

497 covariance mode (``i + 1 < cfg.covariance_config.window``).

498 Positions are ``NaN`` for all assets at this timestamp.

499 * ``'zero_signal'``: The expected-return vector ``mu`` was

500 all-zeros (or all-NaN) at this timestamp; the optimizer

501 short-circuited and returned zero positions without solving.

502 * ``'degenerate'``: The normalisation denominator was non-finite

503 or below ``cfg.denom_tol``, the Cholesky / Woodbury solve

504 failed, or no asset had a finite price; positions were zeroed

505 for safety.

506 * ``'valid'``: The linear system was solved successfully and

507 positions are non-trivially non-zero.

508

509 The codes map one-to-one onto the three NaN / zero cases

510 described in the issue and allow downstream consumers (backtests,

511 risk monitors) to distinguish data gaps from signal silence from

512 numerical ill-conditioning without re-inspecting ``mu`` or the

513 engine configuration.

514

515 Returns:

516 pl.DataFrame: Two-column DataFrame ``{'date': ..., 'status': ...}``

517 with one row per timestamp. The ``status`` column has

518 ``Polars`` dtype ``String``.

519 """

520 statuses = [status for _i, _t, _mask, _pos, status in self._iter_solve()]

521 return pl.DataFrame({"date": self.prices["date"], "status": pl.Series(statuses, dtype=pl.String)})

522

523 @property

524 def risk_position(self) -> pl.DataFrame:

525 """Risk positions (before EWMA-volatility scaling) at each timestamp.

526

527 Derives the un-volatility-scaled position by multiplying the cash

528 position by the per-asset EWMA volatility. Equivalently, this is

529 the quantity solved by the correlation-adjusted linear system before

530 dividing by ``vola``.

531

532 Relationship to other properties::

533

534 cash_position = risk_position / vola

535 risk_position = cash_position * vola

536

537 Returns:

538 pl.DataFrame: DataFrame with columns ``['date'] + assets`` where

539 each value is ``cash_position_i * vola_i`` at the given timestamp.

540 """

541 assets = self.assets

542 cp_np = self.cash_position.select(assets).to_numpy()

543 vola_np = self.vola.select(assets).to_numpy()

544 with np.errstate(invalid="ignore"):

545 risk_pos = cp_np * vola_np

546 return self.prices.with_columns([pl.lit(risk_pos[:, i]).alias(asset) for i, asset in enumerate(assets)])

547

548 @property

549 def position_leverage(self) -> pl.DataFrame:

550 """L1 norm of cash positions (gross leverage) at each timestamp.

551

552 Sums the absolute values of all asset cash positions at each row.

553 NaN positions are treated as zero (they contribute nothing to gross

554 leverage).

555

556 Returns:

557 pl.DataFrame: Two-column DataFrame ``{'date': ..., 'leverage': ...}``

558 where ``leverage`` is the L1 norm of the cash-position vector.

559 """

560 assets = self.assets

561 cp_np = self.cash_position.select(assets).to_numpy()

562 leverage = np.nansum(np.abs(cp_np), axis=1)

563 return pl.DataFrame({"date": self.prices["date"], "leverage": pl.Series(leverage, dtype=pl.Float64)})

564

565 # ------------------------------------------------------------------

566 # Portfolio and performance

567 # ------------------------------------------------------------------

568

569 @property

570 def portfolio(self) -> Portfolio:

571 """Construct a Portfolio from the optimized cash positions.

572

573 Converts the computed cash positions into a Portfolio using the

574 configured AUM. The ``cost_per_unit`` from :attr:`cfg` is forwarded

575 so that :attr:`~jquantstats.Portfolio.net_cost_nav` and

576 :attr:`~jquantstats.Portfolio.position_delta_costs` work out

577 of the box without any further configuration.

578

579 Returns:

580 Portfolio: Instance built from cash positions with AUM scaling.

581 """

582 cp = self.cash_position

583 assets = [c for c in cp.columns if c != "date" and cp[c].dtype.is_numeric()]

584 scaled = cp.with_columns(pl.col(a) * self.cfg.position_scale for a in assets)

585 return Portfolio.from_cash_position(self.prices, scaled, aum=self.cfg.aum, cost_per_unit=self.cfg.cost_per_unit)

586

587 def sharpe_at_shrink(self, shrink: float) -> float:

588 r"""Return the annualised portfolio Sharpe ratio for the given shrinkage weight.

589

590 Constructs a new :class:`BasanosEngine` with all parameters identical to

591 ``self`` except that ``cfg.shrink`` is replaced by ``shrink``, then

592 returns the annualised Sharpe ratio of the resulting portfolio.

593

594 This is the canonical single-argument callable required by the benchmarks

595 specification: ``f(λ) → Sharpe``. Use it to sweep λ across ``[0, 1]``

596 and measure whether correlation adjustment adds value over the

597 signal-proportional baseline (λ = 0) or the unregularised limit (λ = 1).

598

599 Corner cases:

600 * **λ = 0** — the shrunk matrix equals the identity, so the

601 optimiser treats all assets as uncorrelated and positions are

602 purely signal-proportional (no correlation adjustment).

603 * **λ = 1** — the raw EWMA correlation matrix is used without

604 shrinkage.

605

606 Args:

607 shrink: Retention weight λ ∈ [0, 1]. See

608 :attr:`BasanosConfig.shrink` for full documentation.

609

610 Returns:

611 Annualised Sharpe ratio of the portfolio returns as a ``float``.

612 Returns ``float("nan")`` when the Sharpe ratio cannot be computed

613 (e.g. zero-variance returns).

614

615 Raises:

616 ValidationError: When ``shrink`` is outside [0, 1] (delegated to

617 :class:`BasanosConfig` field validation).

618

619 Examples:

620 >>> import numpy as np

621 >>> import polars as pl

622 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine

623 >>> dates = pl.Series("date", list(range(200)))

624 >>> rng = np.random.default_rng(0)

625 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})

626 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})

627 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)

628 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)

629 >>> s = engine.sharpe_at_shrink(0.5)

630 >>> isinstance(s, float)

631 True

632 """

633 new_cfg = self.cfg.replace(shrink=shrink)

634 engine = BasanosEngine(prices=self.prices, mu=self.mu, cfg=new_cfg)

635 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan"))

636

637 def sharpe_at_window_factors(self, window: int, n_factors: int) -> float:

638 r"""Return the annualised portfolio Sharpe ratio for the given sliding-window parameters.

639

640 Constructs a new :class:`BasanosEngine` with ``covariance_mode`` set to

641 ``"sliding_window"`` and the supplied ``window`` / ``n_factors``, keeping

642 all other configuration identical to ``self``.

643

644 Use this method to sweep ``(W, k)`` and compare the sliding-window

645 estimator against the EWMA baseline (via :meth:`sharpe_at_shrink`).

646

647 Args:

648 window: Rolling window length :math:`W \geq 1`.

649 Rule of thumb: :math:`W \geq 2 \cdot n_{\text{assets}}`.

650 n_factors: Number of latent factors :math:`k \geq 1`.

651

652 Returns:

653 Annualised Sharpe ratio of the portfolio returns as a ``float``.

654 Returns ``float("nan")`` when the Sharpe ratio cannot be computed

655 (e.g. not enough history to fill the first window).

656

657 Raises:

658 ValidationError: When ``window`` or ``n_factors`` fail field

659 constraints (delegated to :class:`BasanosConfig`).

660

661 Examples:

662 >>> import numpy as np

663 >>> import polars as pl

664 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine

665 >>> dates = pl.Series("date", list(range(200)))

666 >>> rng = np.random.default_rng(0)

667 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})

668 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})

669 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)

670 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)

671 >>> s = engine.sharpe_at_window_factors(window=40, n_factors=2)

672 >>> isinstance(s, float)

673 True

674 """

675 new_cfg = self.cfg.replace(

676 covariance_config=SlidingWindowConfig(window=window, n_factors=n_factors),

677 )

678 engine = BasanosEngine(prices=self.prices, mu=self.mu, cfg=new_cfg)

679 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan"))

680

681 @property

682 def naive_sharpe(self) -> float:

683 r"""Sharpe ratio of the naïve equal-weight signal (μ = 1 for every asset/timestamp).

684

685 Replaces the expected-return signal ``mu`` with a constant matrix of

686 ones, then runs the optimiser with the current configuration and returns

687 the annualised Sharpe ratio of the resulting portfolio.

688

689 This provides the baseline answer to *"does the signal add value?"*:

690 a real signal should produce a higher Sharpe than the naïve benchmark.

691 Combined with :meth:`sharpe_at_shrink`, this yields a three-way

692 comparison:

693

694 +--------------------+----------------------------------------------+

695 | Benchmark | What it measures |

696 +====================+==============================================+

697 | ``naive_sharpe`` | No signal skill; pure correlation routing |

698 +--------------------+----------------------------------------------+

699 | ``sharpe_at_shrink(0.0)`` | Signal skill, no correlation adj. |

700 +--------------------+----------------------------------------------+

701 | ``sharpe_at_shrink(cfg.shrink)`` | Signal + correlation adj. |

702 +--------------------+----------------------------------------------+

703

704 Returns:

705 Annualised Sharpe ratio of the equal-weight portfolio as a ``float``.

706 Returns ``float("nan")`` when the Sharpe ratio cannot be computed.

707

708 Examples:

709 >>> import numpy as np

710 >>> import polars as pl

711 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine

712 >>> dates = pl.Series("date", list(range(200)))

713 >>> rng = np.random.default_rng(0)

714 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})

715 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})

716 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)

717 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)

718 >>> s = engine.naive_sharpe

719 >>> isinstance(s, float)

720 True

721 """

722 naive_mu = self.mu.with_columns(pl.lit(1.0).alias(asset) for asset in self.assets)

723 engine = BasanosEngine(prices=self.prices, mu=naive_mu, cfg=self.cfg)

724 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan"))

725

726 # ------------------------------------------------------------------

727 # Reporting

728 # ------------------------------------------------------------------

729

730 @property

731 def config_report(self) -> "ConfigReport":

732 """Return a :class:`~basanos.math._config_report.ConfigReport` facade for this engine.

733

734 Returns a :class:`~basanos.math._config_report.ConfigReport` that

735 includes the full **lambda-sweep chart** — an interactive plot of the

736 annualised Sharpe ratio as :attr:`~BasanosConfig.shrink` (λ) is swept

737 across [0, 1] — in addition to the parameter table, shrinkage-guidance

738 table, and theory section available from

739 :attr:`BasanosConfig.report`.

740

741 Returns:

742 basanos.math._config_report.ConfigReport: Report facade with

743 ``to_html()`` and ``save()`` methods.

744

745 Examples:

746 >>> import numpy as np

747 >>> import polars as pl

748 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine

749 >>> dates = pl.Series("date", list(range(200)))

750 >>> rng = np.random.default_rng(0)

751 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})

752 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})

753 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)

754 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)

755 >>> report = engine.config_report

756 >>> html = report.to_html()

757 >>> "Lambda" in html

758 True

759 """

760 from ._config_report import ConfigReport

761

762 return ConfigReport(config=self.cfg, engine=self)

763

764 # ------------------------------------------------------------------

765 # Matrix diagnostics — inherited from _DiagnosticsMixin

766 # ------------------------------------------------------------------

767 # (condition_number, effective_rank, solver_residual, signal_utilisation)

768 # Implementations live in _engine_diagnostics.py; patch targets remain in

769 # that module's namespace, e.g.

770 # ``patch("basanos.math._engine_diagnostics.solve")``.

771

772 # ------------------------------------------------------------------

773 # Signal evaluation — inherited from _SignalEvaluatorMixin

774 # ------------------------------------------------------------------

775 # (_ic_series, ic, rank_ic, ic_mean, ic_std, icir,

776 # rank_ic_mean, rank_ic_std)

777 # Implementations live in _engine_ic.py.

Coverage for src / basanos / math / optimizer.py: 100%

122 statements