Coverage for src / basanos / math / optimizer.py: 100%
122 statements
« prev ^ index » next coverage.py v7.13.5, created at 2026-04-25 04:28 +0000
« prev ^ index » next coverage.py v7.13.5, created at 2026-04-25 04:28 +0000
1"""Correlation-aware risk position optimizer (Basanos).
3This module provides utilities to compute correlation-adjusted risk positions
4from price data and expected-return signals. It relies on volatility-adjusted
5returns to estimate a dynamic correlation matrix (via EWM), applies shrinkage
6towards identity, and solves a normalized linear system per timestamp to
7obtain stable positions.
9Performance characteristics
10---------------------------
11Let *N* be the number of assets and *T* the number of timestamps.
13**Computational complexity**
15+----------------------------------+------------------+--------------------------------------+
16| Operation | Complexity | Bottleneck |
17+==================================+==================+======================================+
18| EWM volatility (``ret_adj``, | O(T·N) | Linear in both T and N; negligible |
19| ``vola``) | | |
20+----------------------------------+------------------+--------------------------------------+
21| EWM correlation (``cor``) | O(T·N²) | ``lfilter`` over all N² asset pairs |
22| | | simultaneously |
23+----------------------------------+------------------+--------------------------------------+
24| Linear solve per timestamp | O(N³) | Cholesky / LU per row in |
25| (``cash_position``) | * T solves | ``cash_position`` |
26+----------------------------------+------------------+--------------------------------------+
28**Memory usage** (peak, approximate)
30``ewm_corr`` allocates roughly **14 float64 arrays** of shape
31``(T, N, N)`` at peak (input sequences, IIR filter outputs, EWM components,
32and the result tensor). Peak RAM ≈ **112 * T * N²** bytes. Typical
33working sizes on a 16 GB machine:
35+--------+--------------------------+------------------------------------+
36| N | T (daily rows) | Peak memory (approx.) |
37+========+==========================+====================================+
38| 50 | 252 (~1 yr) | ~70 MB |
39+--------+--------------------------+------------------------------------+
40| 100 | 252 (~1 yr) | ~280 MB |
41+--------+--------------------------+------------------------------------+
42| 100 | 2 520 (~10 yr) | ~2.8 GB |
43+--------+--------------------------+------------------------------------+
44| 200 | 2 520 (~10 yr) | ~11 GB |
45+--------+--------------------------+------------------------------------+
46| 500 | 2 520 (~10 yr) | ~70 GB ⚠ exceeds typical RAM |
47+--------+--------------------------+------------------------------------+
49**Practical limits (daily data)**
51* **≤ 150 assets, ≤ 5 years** — well within reach on an 8 GB laptop.
52* **≤ 250 assets, ≤ 10 years** — requires ~11-12 GB; feasible on a 16 GB
53 workstation.
54* **> 500 assets with multi-year history** — peak memory exceeds 16 GB;
55 reduce the time range or switch to a chunked / streaming approach.
56* **> 1 000 assets** — the O(N³) per-solve cost alone makes real-time
57 optimization impractical even with adequate RAM.
59See ``BENCHMARKS.md`` for measured wall-clock timings across representative
60dataset sizes.
62Internal structure
63------------------
64The implementation is split across focused private modules to keep each file
65readable and independently testable:
67* `_config` — `BasanosConfig` and all
68 covariance-mode configuration classes.
69* `_ewm_corr` — `ewm_corr`, the vectorised
70 IIR-filter implementation of per-row EWM correlation matrices.
71* `_engine_solve` — private helpers providing the
72 ``_iter_matrices`` and ``_iter_solve`` generators (per-timestamp solve
73 logic).
74* `_engine_diagnostics` — private helpers providing
75 matrix-quality diagnostics (condition number, effective rank, solver
76 residual, signal utilisation).
77* `_engine_ic` — private helpers providing signal
78 evaluation metrics (IC, Rank IC, ICIR, and summary statistics).
79* This module — `BasanosEngine`, a single flat class that wires
80 every method together in clearly delimited sections.
81"""
83import dataclasses
84import datetime
85import logging
86from typing import TYPE_CHECKING
88import numpy as np
89import polars as pl
90from jquantstats import Portfolio
92from ..exceptions import (
93 ColumnMismatchError,
94 ExcessiveNullsError,
95 MissingDateColumnError,
96 MonotonicPricesError,
97 NonPositivePricesError,
98 ShapeMismatchError,
99)
100from ._config import (
101 BasanosConfig,
102 CovarianceConfig,
103 CovarianceMode,
104 EwmaShrinkConfig,
105 SlidingWindowConfig,
106)
107from ._engine_diagnostics import _DiagnosticsMixin as _DiagnosticsMixin
108from ._engine_ic import _SignalEvaluatorMixin as _SignalEvaluatorMixin
109from ._engine_solve import _SolveMixin as _SolveMixin
110from ._ewm_corr import ewm_corr as _ewm_corr_numpy
111from ._signal import vol_adj
113if TYPE_CHECKING:
114 from ._config_report import ConfigReport
116_logger = logging.getLogger(__name__)
119def _validate_inputs(prices: pl.DataFrame, mu: pl.DataFrame, cfg: "BasanosConfig") -> None:
120 """Validate ``prices``, ``mu``, and ``cfg`` for use with `BasanosEngine`.
122 Checks that both DataFrames contain a ``'date'`` column, share identical
123 shapes and column sets, contain no non-positive prices, no excessive NaN
124 fractions, and no monotonically non-varying price series. Also emits a
125 warning when the dataset is too short relative to a configured
126 sliding-window size.
128 Args:
129 prices: DataFrame of price levels per asset over time.
130 mu: DataFrame of expected-return signals aligned with ``prices``.
131 cfg: Engine configuration instance.
133 Raises:
134 MissingDateColumnError: If ``'date'`` is absent from either frame.
135 ShapeMismatchError: If ``prices`` and ``mu`` have different shapes.
136 ColumnMismatchError: If the column sets of the two frames differ.
137 NonPositivePricesError: If any asset contains a non-positive price.
138 ExcessiveNullsError: If any asset column exceeds ``cfg.max_nan_fraction``.
139 MonotonicPricesError: If any asset price series is monotonically
140 non-decreasing or non-increasing.
142 Warns:
143 UserWarning (via logging): If ``cfg.covariance`` is a
144 `SlidingWindowConfig` and
145 ``len(prices) < 2 * cfg.covariance.window``, a warning is emitted
146 via the module logger rather than an exception. This is a
147 deliberate soft boundary — callers may intentionally supply data
148 shorter than the full warm-up period. During warm-up the first
149 ``window - 1`` timestamps will yield zero positions.
150 """
151 # ensure 'date' column exists in prices before any other validation
152 if "date" not in prices.columns:
153 raise MissingDateColumnError("prices")
155 # ensure 'date' column exists in mu as well (kept for symmetry and downstream assumptions)
156 if "date" not in mu.columns:
157 raise MissingDateColumnError("mu")
159 # check that prices and mu have the same shape
160 if prices.shape != mu.shape:
161 raise ShapeMismatchError(prices.shape, mu.shape)
163 # check that the columns of prices and mu are identical
164 if not set(prices.columns) == set(mu.columns):
165 raise ColumnMismatchError(prices.columns, mu.columns)
167 assets = [c for c in prices.columns if c != "date" and prices[c].dtype.is_numeric()]
169 # check for non-positive prices: log returns require strictly positive prices
170 for asset in assets:
171 col = prices[asset].drop_nulls()
172 if col.len() > 0 and (col <= 0).any():
173 raise NonPositivePricesError(asset)
175 # check for excessive NaN values: more than cfg.max_nan_fraction null in any asset column
176 n_rows = prices.height
177 if n_rows > 0:
178 for asset in assets:
179 nan_frac = prices[asset].null_count() / n_rows
180 if nan_frac > cfg.max_nan_fraction:
181 raise ExcessiveNullsError(asset, nan_frac, cfg.max_nan_fraction)
183 # check for monotonic price series: a strictly non-decreasing or non-increasing
184 # series has no variance in its return sign, indicating malformed or synthetic data
185 for asset in assets:
186 col = prices[asset].drop_nulls()
187 if col.len() > 2:
188 diffs = col.diff().drop_nulls()
189 if (diffs >= 0).all() or (diffs <= 0).all():
190 raise MonotonicPricesError(asset)
192 # warn when the dataset is too short to benefit from the sliding window
193 if cfg.covariance_mode == CovarianceMode.sliding_window and cfg.window is not None:
194 w: int = cfg.window
195 if n_rows < 2 * w:
196 _logger.warning(
197 "Dataset length (%d rows) is less than 2 * window (%d). "
198 "The first %d timestamps will yield zero positions during warm-up; "
199 "consider using a longer history or reducing 'window'.",
200 n_rows,
201 2 * w,
202 w - 1,
203 )
206# ---------------------------------------------------------------------------
207# Re-export config symbols so ``from basanos.math.optimizer import …`` keeps
208# working for existing callers.
209# ---------------------------------------------------------------------------
210__all__ = [
211 "BasanosConfig",
212 "BasanosEngine",
213 "CovarianceConfig",
214 "CovarianceMode",
215 "EwmaShrinkConfig",
216 "SlidingWindowConfig",
217]
220@dataclasses.dataclass(frozen=True)
221class BasanosEngine(_DiagnosticsMixin, _SignalEvaluatorMixin, _SolveMixin):
222 """Engine to compute correlation matrices and optimize risk positions.
224 Encapsulates price data and configuration to build EWM-based
225 correlations, apply shrinkage, and solve for normalized positions.
227 Public methods are organised into clearly delimited sections (some
228 inherited from the private mixin classes):
230 * **Core data access** — `assets`, `ret_adj`, `vola`, `cor`, `cor_tensor`
231 * **Solve / position logic** — `cash_position`, `position_status`,
232 `risk_position`, `position_leverage`, `warmup_state`
233 * **Portfolio and performance** — `portfolio`, `naive_sharpe`,
234 `sharpe_at_shrink`, `sharpe_at_window_factors`
235 * **Matrix diagnostics** — `condition_number`, `effective_rank`,
236 `solver_residual`, `signal_utilisation`
237 * **Signal evaluation** — `ic`, `rank_ic`, `ic_mean`, `ic_std`, `icir`,
238 `rank_ic_mean`, `rank_ic_std`
239 * **Reporting** — `config_report`
241 Data-flow diagram
242 -----------------
244 .. code-block:: text
246 prices (pl.DataFrame)
247 │
248 ├─ vol_adj ──► ret_adj (volatility-adjusted log returns)
249 │ │
250 │ ├─ ewm_corr ──► cor / cor_tensor
251 │ │ │
252 │ │ └─ shrink2id / FactorModel
253 │ │ │
254 │ vola covariance matrix
255 │ │ │
256 └── mu ──────────┴── _iter_solve ──────────┘
257 │
258 cash_position
259 │
260 ┌────────┴────────┐
261 portfolio diagnostics
262 (Portfolio) (condition_number,
263 effective_rank,
264 solver_residual,
265 signal_utilisation,
266 ic, rank_ic, …)
268 Attributes:
269 prices: Polars DataFrame of price levels per asset over time. Must
270 contain a ``'date'`` column and at least one numeric asset column
271 with strictly positive values that are not monotonically
272 non-decreasing or non-increasing (i.e. they must vary in sign).
273 mu: Polars DataFrame of expected-return signals aligned with *prices*.
274 Must share the same shape and column names as *prices*.
275 cfg: Immutable `BasanosConfig` controlling EWMA half-lives,
276 clipping, shrinkage intensity, and AUM.
278 Examples:
279 Build an engine with two synthetic assets over 30 days and inspect the
280 optimized positions and diagnostic properties.
282 >>> import numpy as np
283 >>> import polars as pl
284 >>> from basanos.math import BasanosConfig, BasanosEngine
285 >>> dates = list(range(30))
286 >>> rng = np.random.default_rng(42)
287 >>> prices = pl.DataFrame({
288 ... "date": dates,
289 ... "A": np.cumprod(1 + rng.normal(0.001, 0.02, 30)) * 100.0,
290 ... "B": np.cumprod(1 + rng.normal(0.001, 0.02, 30)) * 150.0,
291 ... })
292 >>> mu = pl.DataFrame({
293 ... "date": dates,
294 ... "A": rng.normal(0.0, 0.5, 30),
295 ... "B": rng.normal(0.0, 0.5, 30),
296 ... })
297 >>> cfg = BasanosConfig(vola=5, corr=10, clip=2.0, shrink=0.5, aum=1_000_000)
298 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
299 >>> engine.assets
300 ['A', 'B']
301 >>> engine.cash_position.shape
302 (30, 3)
303 >>> engine.position_leverage.columns
304 ['date', 'leverage']
305 """
307 prices: pl.DataFrame
308 mu: pl.DataFrame
309 cfg: BasanosConfig
311 def __post_init__(self) -> None:
312 """Validate inputs by delegating to `_validate_inputs`."""
313 _validate_inputs(self.prices, self.mu, self.cfg)
315 # ------------------------------------------------------------------
316 # Core data-access properties
317 # ------------------------------------------------------------------
319 @property
320 def assets(self) -> list[str]:
321 """List asset column names (numeric columns excluding 'date')."""
322 return [c for c in self.prices.columns if c != "date" and self.prices[c].dtype.is_numeric()]
324 @property
325 def ret_adj(self) -> pl.DataFrame:
326 """Return per-asset volatility-adjusted log returns clipped by cfg.clip.
328 Uses an EWMA volatility estimate with lookback ``cfg.vola`` to
329 standardize log returns for each numeric asset column.
330 """
331 return self.prices.with_columns(
332 [vol_adj(pl.col(asset), vola=self.cfg.vola, clip=self.cfg.clip) for asset in self.assets]
333 )
335 @property
336 def vola(self) -> pl.DataFrame:
337 """Per-asset EWMA volatility of percentage returns.
339 Computes percent changes for each numeric asset column and applies an
340 exponentially weighted standard deviation using the lookback specified
341 by ``cfg.vola``. The result is a DataFrame aligned with ``self.prices``
342 whose numeric columns hold per-asset volatility estimates.
343 """
344 return self.prices.with_columns(
345 pl.col(asset)
346 .pct_change()
347 .ewm_std(com=self.cfg.vola - 1, adjust=True, min_samples=self.cfg.vola)
348 .alias(asset)
349 for asset in self.assets
350 )
352 @property
353 def cor(self) -> dict[datetime.date, np.ndarray]:
354 """Compute per-timestamp EWM correlation matrices.
356 Builds volatility-adjusted returns for all assets, computes an
357 exponentially weighted correlation using a pure NumPy implementation
358 (with window ``cfg.corr``), and returns a mapping from each timestamp
359 to the corresponding correlation matrix as a NumPy array.
361 Returns:
362 dict: Mapping ``date -> np.ndarray`` of shape (n_assets, n_assets).
364 Performance:
365 Delegates to `ewm_corr`, which is O(T·N²) in both
366 time and memory. The returned dict holds *T* references into the
367 result tensor (one N*N view per date); no extra copies are made.
368 For large *N* or *T*, prefer ``cor_tensor`` to keep a single
369 contiguous array rather than building a Python dict.
370 """
371 index = self.prices["date"]
372 ret_adj_np = self.ret_adj.select(self.assets).to_numpy()
373 tensor = _ewm_corr_numpy(
374 ret_adj_np,
375 com=self.cfg.corr,
376 min_periods=self.cfg.corr,
377 min_corr_denom=self.cfg.min_corr_denom,
378 )
379 return {index[t]: tensor[t] for t in range(len(index))}
381 @property
382 def cor_tensor(self) -> np.ndarray:
383 """Return all correlation matrices stacked as a 3-D tensor.
385 Converts the per-timestamp correlation dict (see `cor`) into a
386 single contiguous NumPy array so that the full history can be saved to
387 a flat ``.npy`` file with `save` and reloaded with
388 `load`.
390 Returns:
391 np.ndarray: Array of shape ``(T, N, N)`` where *T* is the number of
392 timestamps and *N* the number of assets. ``tensor[t]`` is the
393 correlation matrix for the *t*-th date (same ordering as
394 ``self.prices["date"]``).
396 Examples:
397 >>> import tempfile, pathlib
398 >>> import numpy as np
399 >>> import polars as pl
400 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine
401 >>> dates = pl.Series("date", list(range(100)))
402 >>> rng0 = np.random.default_rng(0).lognormal(size=100)
403 >>> rng1 = np.random.default_rng(1).lognormal(size=100)
404 >>> prices = pl.DataFrame({"date": dates, "A": rng0, "B": rng1})
405 >>> rng2 = np.random.default_rng(2).normal(size=100)
406 >>> rng3 = np.random.default_rng(3).normal(size=100)
407 >>> mu = pl.DataFrame({"date": dates, "A": rng2, "B": rng3})
408 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
409 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
410 >>> tensor = engine.cor_tensor
411 >>> with tempfile.TemporaryDirectory() as td:
412 ... path = pathlib.Path(td) / "cor.npy"
413 ... np.save(path, tensor)
414 ... loaded = np.load(path)
415 >>> np.testing.assert_array_equal(tensor, loaded)
416 """
417 return np.stack(list(self.cor.values()), axis=0)
419 # ------------------------------------------------------------------
420 # Internal solve helpers — inherited from _SolveMixin
421 # ------------------------------------------------------------------
422 # (_compute_mask, _check_signal, _scale_to_cash, _row_early_check,
423 # _denom_guard_yield, _compute_position, _replay_positions,
424 # _iter_matrices, _iter_solve, warmup_state)
425 # Implementations live in _engine_solve.py; patch targets remain in that
426 # module's namespace, e.g. ``patch("basanos.math._engine_solve.solve")``.
428 # ------------------------------------------------------------------
429 # Position properties
430 # ------------------------------------------------------------------
432 @property
433 def cash_position(self) -> pl.DataFrame:
434 r"""Optimize correlation-aware risk positions for each timestamp.
436 Supports two covariance modes controlled by ``cfg.covariance_config``:
438 * `EwmaShrinkConfig` (default): Computes EWMA correlations, applies
439 linear shrinkage toward the identity, and solves a normalised linear
440 system $C\,x = \mu$ per timestamp via Cholesky / LU.
442 * `SlidingWindowConfig`: At each timestamp uses the
443 ``cfg.covariance_config.window`` most recent vol-adjusted returns to fit a
444 rank-``cfg.covariance_config.n_factors`` factor model via truncated SVD and
445 solves the system via the Woodbury identity at $O(k^3 + kn)$ rather
446 than $O(n^3)$ per step.
448 Non-finite or ill-posed cases yield zero positions for safety.
450 Returns:
451 pl.DataFrame: DataFrame with columns ['date'] + asset names containing
452 the per-timestamp cash positions (risk divided by EWMA volatility).
454 Performance:
455 For ``ewma_shrink``: dominant cost is ``self.cor`` (O(T·N²) time,
456 O(T·N²) memory — see `ewm_corr`). The per-timestamp
457 linear solve adds O(N³) per row.
459 For ``sliding_window``: O(T·W·N·k) for sliding SVDs plus
460 O(T·(k³ + kN)) for Woodbury solves. Memory is O(W·N) per step,
461 independent of T.
462 """
463 assets = self.assets
465 # Compute risk positions row-by-row using _replay_positions.
466 prices_num = self.prices.select(assets).to_numpy()
468 risk_pos_np = np.full_like(prices_num, fill_value=np.nan, dtype=float)
469 cash_pos_np = np.full_like(prices_num, fill_value=np.nan, dtype=float)
470 vola_np = self.vola.select(assets).to_numpy()
472 self._replay_positions(risk_pos_np, cash_pos_np, vola_np)
474 # Build Polars DataFrame for cash positions (numeric columns only)
475 cash_position = self.prices.with_columns(
476 [(pl.lit(cash_pos_np[:, i]).alias(asset)) for i, asset in enumerate(assets)]
477 )
479 return cash_position
481 @property
482 def position_status(self) -> pl.DataFrame:
483 """Per-timestamp reason code explaining each `cash_position` row.
485 Labels every row with exactly one of four `SolveStatus`
486 codes (which compare equal to their string equivalents):
488 * ``'warmup'``: Insufficient history for the sliding-window
489 covariance mode (``i + 1 < cfg.covariance_config.window``).
490 Positions are ``NaN`` for all assets at this timestamp.
491 * ``'zero_signal'``: The expected-return vector ``mu`` was
492 all-zeros (or all-NaN) at this timestamp; the optimizer
493 short-circuited and returned zero positions without solving.
494 * ``'degenerate'``: The normalisation denominator was non-finite
495 or below ``cfg.denom_tol``, the Cholesky / Woodbury solve
496 failed, or no asset had a finite price; positions were zeroed
497 for safety.
498 * ``'valid'``: The linear system was solved successfully and
499 positions are non-trivially non-zero.
501 The codes map one-to-one onto the three NaN / zero cases
502 described in the issue and allow downstream consumers (backtests,
503 risk monitors) to distinguish data gaps from signal silence from
504 numerical ill-conditioning without re-inspecting ``mu`` or the
505 engine configuration.
507 Returns:
508 pl.DataFrame: Two-column DataFrame ``{'date': ..., 'status': ...}``
509 with one row per timestamp. The ``status`` column has
510 ``Polars`` dtype ``String``.
511 """
512 statuses = [status for _i, _t, _mask, _pos, status in self._iter_solve()]
513 return pl.DataFrame({"date": self.prices["date"], "status": pl.Series(statuses, dtype=pl.String)})
515 @property
516 def risk_position(self) -> pl.DataFrame:
517 """Risk positions (before EWMA-volatility scaling) at each timestamp.
519 Derives the un-volatility-scaled position by multiplying the cash
520 position by the per-asset EWMA volatility. Equivalently, this is
521 the quantity solved by the correlation-adjusted linear system before
522 dividing by ``vola``.
524 Relationship to other properties::
526 cash_position = risk_position / vola
527 risk_position = cash_position * vola
529 Returns:
530 pl.DataFrame: DataFrame with columns ``['date'] + assets`` where
531 each value is ``cash_position_i * vola_i`` at the given timestamp.
532 """
533 assets = self.assets
534 cp_np = self.cash_position.select(assets).to_numpy()
535 vola_np = self.vola.select(assets).to_numpy()
536 with np.errstate(invalid="ignore"):
537 risk_pos = cp_np * vola_np
538 return self.prices.with_columns([pl.lit(risk_pos[:, i]).alias(asset) for i, asset in enumerate(assets)])
540 @property
541 def position_leverage(self) -> pl.DataFrame:
542 """L1 norm of cash positions (gross leverage) at each timestamp.
544 Sums the absolute values of all asset cash positions at each row.
545 NaN positions are treated as zero (they contribute nothing to gross
546 leverage).
548 Returns:
549 pl.DataFrame: Two-column DataFrame ``{'date': ..., 'leverage': ...}``
550 where ``leverage`` is the L1 norm of the cash-position vector.
551 """
552 assets = self.assets
553 cp_np = self.cash_position.select(assets).to_numpy()
554 leverage = np.nansum(np.abs(cp_np), axis=1)
555 return pl.DataFrame({"date": self.prices["date"], "leverage": pl.Series(leverage, dtype=pl.Float64)})
557 # ------------------------------------------------------------------
558 # Portfolio and performance
559 # ------------------------------------------------------------------
561 @property
562 def portfolio(self) -> Portfolio:
563 """Construct a Portfolio from the optimized cash positions.
565 Converts the computed cash positions into a Portfolio using the
566 configured AUM. The ``cost_per_unit`` from `cfg` is forwarded
567 so that `net_cost_nav` and
568 `position_delta_costs` work out
569 of the box without any further configuration.
571 Returns:
572 Portfolio: Instance built from cash positions with AUM scaling.
573 """
574 cp = self.cash_position
575 assets = [c for c in cp.columns if c != "date" and cp[c].dtype.is_numeric()]
576 scaled = cp.with_columns(pl.col(a) * self.cfg.position_scale for a in assets)
577 return Portfolio.from_cash_position(self.prices, scaled, aum=self.cfg.aum, cost_per_unit=self.cfg.cost_per_unit)
579 def sharpe_at_shrink(self, shrink: float) -> float:
580 r"""Return the annualised portfolio Sharpe ratio for the given shrinkage weight.
582 Constructs a new `BasanosEngine` with all parameters identical to
583 ``self`` except that ``cfg.shrink`` is replaced by ``shrink``, then
584 returns the annualised Sharpe ratio of the resulting portfolio.
586 This is the canonical single-argument callable required by the benchmarks
587 specification: ``f(λ) → Sharpe``. Use it to sweep λ across ``[0, 1]``
588 and measure whether correlation adjustment adds value over the
589 signal-proportional baseline (λ = 0) or the unregularised limit (λ = 1).
591 Corner cases:
592 * **λ = 0** — the shrunk matrix equals the identity, so the
593 optimiser treats all assets as uncorrelated and positions are
594 purely signal-proportional (no correlation adjustment).
595 * **λ = 1** — the raw EWMA correlation matrix is used without
596 shrinkage.
598 Args:
599 shrink: Retention weight λ ∈ [0, 1]. See
600 `shrink` for full documentation.
602 Returns:
603 Annualised Sharpe ratio of the portfolio returns as a ``float``.
604 Returns ``float("nan")`` when the Sharpe ratio cannot be computed
605 (e.g. zero-variance returns).
607 Raises:
608 ValidationError: When ``shrink`` is outside [0, 1] (delegated to
609 `BasanosConfig` field validation).
611 Examples:
612 >>> import numpy as np
613 >>> import polars as pl
614 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine
615 >>> dates = pl.Series("date", list(range(200)))
616 >>> rng = np.random.default_rng(0)
617 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})
618 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})
619 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
620 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
621 >>> s = engine.sharpe_at_shrink(0.5)
622 >>> isinstance(s, float)
623 True
624 """
625 new_cfg = self.cfg.replace(shrink=shrink)
626 engine = BasanosEngine(prices=self.prices, mu=self.mu, cfg=new_cfg)
627 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan"))
629 def sharpe_at_window_factors(self, window: int, n_factors: int) -> float:
630 r"""Return the annualised portfolio Sharpe ratio for the given sliding-window parameters.
632 Constructs a new `BasanosEngine` with ``covariance_mode`` set to
633 ``"sliding_window"`` and the supplied ``window`` / ``n_factors``, keeping
634 all other configuration identical to ``self``.
636 Use this method to sweep ``(W, k)`` and compare the sliding-window
637 estimator against the EWMA baseline (via `sharpe_at_shrink`).
639 Args:
640 window: Rolling window length $W \geq 1$.
641 Rule of thumb: $W \geq 2 \cdot n_{\text{assets}}$.
642 n_factors: Number of latent factors $k \geq 1$.
644 Returns:
645 Annualised Sharpe ratio of the portfolio returns as a ``float``.
646 Returns ``float("nan")`` when the Sharpe ratio cannot be computed
647 (e.g. not enough history to fill the first window).
649 Raises:
650 ValidationError: When ``window`` or ``n_factors`` fail field
651 constraints (delegated to `BasanosConfig`).
653 Examples:
654 >>> import numpy as np
655 >>> import polars as pl
656 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine
657 >>> dates = pl.Series("date", list(range(200)))
658 >>> rng = np.random.default_rng(0)
659 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})
660 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})
661 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
662 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
663 >>> s = engine.sharpe_at_window_factors(window=40, n_factors=2)
664 >>> isinstance(s, float)
665 True
666 """
667 new_cfg = self.cfg.replace(
668 covariance_config=SlidingWindowConfig(window=window, n_factors=n_factors),
669 )
670 engine = BasanosEngine(prices=self.prices, mu=self.mu, cfg=new_cfg)
671 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan"))
673 @property
674 def naive_sharpe(self) -> float:
675 r"""Sharpe ratio of the naïve equal-weight signal (μ = 1 for every asset/timestamp).
677 Replaces the expected-return signal ``mu`` with a constant matrix of
678 ones, then runs the optimiser with the current configuration and returns
679 the annualised Sharpe ratio of the resulting portfolio.
681 This provides the baseline answer to *"does the signal add value?"*:
682 a real signal should produce a higher Sharpe than the naïve benchmark.
683 Combined with `sharpe_at_shrink`, this yields a three-way
684 comparison:
686 +--------------------+----------------------------------------------+
687 | Benchmark | What it measures |
688 +====================+==============================================+
689 | ``naive_sharpe`` | No signal skill; pure correlation routing |
690 +--------------------+----------------------------------------------+
691 | ``sharpe_at_shrink(0.0)`` | Signal skill, no correlation adj. |
692 +--------------------+----------------------------------------------+
693 | ``sharpe_at_shrink(cfg.shrink)`` | Signal + correlation adj. |
694 +--------------------+----------------------------------------------+
696 Returns:
697 Annualised Sharpe ratio of the equal-weight portfolio as a ``float``.
698 Returns ``float("nan")`` when the Sharpe ratio cannot be computed.
700 Examples:
701 >>> import numpy as np
702 >>> import polars as pl
703 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine
704 >>> dates = pl.Series("date", list(range(200)))
705 >>> rng = np.random.default_rng(0)
706 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})
707 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})
708 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
709 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
710 >>> s = engine.naive_sharpe
711 >>> isinstance(s, float)
712 True
713 """
714 naive_mu = self.mu.with_columns(pl.lit(1.0).alias(asset) for asset in self.assets)
715 engine = BasanosEngine(prices=self.prices, mu=naive_mu, cfg=self.cfg)
716 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan"))
718 # ------------------------------------------------------------------
719 # Reporting
720 # ------------------------------------------------------------------
722 @property
723 def config_report(self) -> "ConfigReport":
724 """Return a `ConfigReport` facade for this engine.
726 Returns a `ConfigReport` that
727 includes the full **lambda-sweep chart** — an interactive plot of the
728 annualised Sharpe ratio as `shrink` (λ) is swept
729 across [0, 1] — in addition to the parameter table, shrinkage-guidance
730 table, and theory section available from
731 `report`.
733 Returns:
734 basanos.math._config_report.ConfigReport: Report facade with
735 ``to_html()`` and ``save()`` methods.
737 Examples:
738 >>> import numpy as np
739 >>> import polars as pl
740 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine
741 >>> dates = pl.Series("date", list(range(200)))
742 >>> rng = np.random.default_rng(0)
743 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})
744 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})
745 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
746 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
747 >>> report = engine.config_report
748 >>> html = report.to_html()
749 >>> "Lambda" in html
750 True
751 """
752 from ._config_report import ConfigReport
754 return ConfigReport(config=self.cfg, engine=self)
756 # ------------------------------------------------------------------
757 # Matrix diagnostics — inherited from _DiagnosticsMixin
758 # ------------------------------------------------------------------
759 # (condition_number, effective_rank, solver_residual, signal_utilisation)
760 # Implementations live in _engine_diagnostics.py; patch targets remain in
761 # that module's namespace, e.g.
762 # ``patch("basanos.math._engine_diagnostics.solve")``.
764 # ------------------------------------------------------------------
765 # Signal evaluation — inherited from _SignalEvaluatorMixin
766 # ------------------------------------------------------------------
767 # (_ic_series, ic, rank_ic, ic_mean, ic_std, icir,
768 # rank_ic_mean, rank_ic_std)
769 # Implementations live in _engine_ic.py.