Coverage for src / basanos / math / optimizer.py: 100%
122 statements
« prev ^ index » next coverage.py v7.13.5, created at 2026-04-02 17:47 +0000
« prev ^ index » next coverage.py v7.13.5, created at 2026-04-02 17:47 +0000
1"""Correlation-aware risk position optimizer (Basanos).
3This module provides utilities to compute correlation-adjusted risk positions
4from price data and expected-return signals. It relies on volatility-adjusted
5returns to estimate a dynamic correlation matrix (via EWM), applies shrinkage
6towards identity, and solves a normalized linear system per timestamp to
7obtain stable positions.
9Performance characteristics
10---------------------------
11Let *N* be the number of assets and *T* the number of timestamps.
13**Computational complexity**
15+----------------------------------+------------------+--------------------------------------+
16| Operation | Complexity | Bottleneck |
17+==================================+==================+======================================+
18| EWM volatility (``ret_adj``, | O(T·N) | Linear in both T and N; negligible |
19| ``vola``) | | |
20+----------------------------------+------------------+--------------------------------------+
21| EWM correlation (``cor``) | O(T·N²) | ``lfilter`` over all N² asset pairs |
22| | | simultaneously |
23+----------------------------------+------------------+--------------------------------------+
24| Linear solve per timestamp | O(N³) | Cholesky / LU per row in |
25| (``cash_position``) | * T solves | ``cash_position`` |
26+----------------------------------+------------------+--------------------------------------+
28**Memory usage** (peak, approximate)
30``ewm_corr`` allocates roughly **14 float64 arrays** of shape
31``(T, N, N)`` at peak (input sequences, IIR filter outputs, EWM components,
32and the result tensor). Peak RAM ≈ **112 * T * N²** bytes. Typical
33working sizes on a 16 GB machine:
35+--------+--------------------------+------------------------------------+
36| N | T (daily rows) | Peak memory (approx.) |
37+========+==========================+====================================+
38| 50 | 252 (~1 yr) | ~70 MB |
39+--------+--------------------------+------------------------------------+
40| 100 | 252 (~1 yr) | ~280 MB |
41+--------+--------------------------+------------------------------------+
42| 100 | 2 520 (~10 yr) | ~2.8 GB |
43+--------+--------------------------+------------------------------------+
44| 200 | 2 520 (~10 yr) | ~11 GB |
45+--------+--------------------------+------------------------------------+
46| 500 | 2 520 (~10 yr) | ~70 GB ⚠ exceeds typical RAM |
47+--------+--------------------------+------------------------------------+
49**Practical limits (daily data)**
51* **≤ 150 assets, ≤ 5 years** — well within reach on an 8 GB laptop.
52* **≤ 250 assets, ≤ 10 years** — requires ~11-12 GB; feasible on a 16 GB
53 workstation.
54* **> 500 assets with multi-year history** — peak memory exceeds 16 GB;
55 reduce the time range or switch to a chunked / streaming approach.
56* **> 1 000 assets** — the O(N³) per-solve cost alone makes real-time
57 optimization impractical even with adequate RAM.
59See ``BENCHMARKS.md`` for measured wall-clock timings across representative
60dataset sizes.
62Internal structure
63------------------
64The implementation is split across focused private modules to keep each file
65readable and independently testable:
67* :mod:`basanos.math._config` — :class:`BasanosConfig` and all
68 covariance-mode configuration classes.
69* :mod:`basanos.math._ewm_corr` — :func:`ewm_corr`, the vectorised
70 IIR-filter implementation of per-row EWM correlation matrices.
71* :mod:`basanos.math._engine_solve` — private helpers providing the
72 ``_iter_matrices`` and ``_iter_solve`` generators (per-timestamp solve
73 logic).
74* :mod:`basanos.math._engine_diagnostics` — private helpers providing
75 matrix-quality diagnostics (condition number, effective rank, solver
76 residual, signal utilisation).
77* :mod:`basanos.math._engine_ic` — private helpers providing signal
78 evaluation metrics (IC, Rank IC, ICIR, and summary statistics).
79* This module — :class:`BasanosEngine`, a single flat class that wires
80 every method together in clearly delimited sections.
81"""
83import dataclasses
84import datetime
85import logging
86from typing import TYPE_CHECKING
88import numpy as np
89import polars as pl
90from jquantstats import Portfolio
92from ..exceptions import (
93 ColumnMismatchError,
94 ExcessiveNullsError,
95 MissingDateColumnError,
96 MonotonicPricesError,
97 NonPositivePricesError,
98 ShapeMismatchError,
99)
100from ._config import (
101 BasanosConfig,
102 CovarianceConfig,
103 CovarianceMode,
104 EwmaShrinkConfig,
105 SlidingWindowConfig,
106)
107from ._engine_diagnostics import _DiagnosticsMixin as _DiagnosticsMixin
108from ._engine_ic import _SignalEvaluatorMixin as _SignalEvaluatorMixin
109from ._engine_solve import _SolveMixin as _SolveMixin
110from ._ewm_corr import ewm_corr as _ewm_corr_numpy
111from ._signal import vol_adj
113if TYPE_CHECKING:
114 from ._config_report import ConfigReport
116_logger = logging.getLogger(__name__)
119def _validate_inputs(prices: pl.DataFrame, mu: pl.DataFrame, cfg: "BasanosConfig") -> None:
120 """Validate ``prices``, ``mu``, and ``cfg`` for use with :class:`BasanosEngine`.
122 Checks that both DataFrames contain a ``'date'`` column, share identical
123 shapes and column sets, contain no non-positive prices, no excessive NaN
124 fractions, and no monotonically non-varying price series. Also emits a
125 warning when the dataset is too short relative to a configured
126 sliding-window size.
128 Args:
129 prices: DataFrame of price levels per asset over time.
130 mu: DataFrame of expected-return signals aligned with ``prices``.
131 cfg: Engine configuration instance.
133 Raises:
134 MissingDateColumnError: If ``'date'`` is absent from either frame.
135 ShapeMismatchError: If ``prices`` and ``mu`` have different shapes.
136 ColumnMismatchError: If the column sets of the two frames differ.
137 NonPositivePricesError: If any asset contains a non-positive price.
138 ExcessiveNullsError: If any asset column exceeds ``cfg.max_nan_fraction``.
139 MonotonicPricesError: If any asset price series is monotonically
140 non-decreasing or non-increasing.
142 Warns:
143 UserWarning (via logging): If ``cfg.covariance`` is a
144 :class:`SlidingWindowConfig` and
145 ``len(prices) < 2 * cfg.covariance.window``, a warning is emitted
146 via the module logger rather than an exception. This is a
147 deliberate soft boundary — callers may intentionally supply data
148 shorter than the full warm-up period. During warm-up the first
149 ``window - 1`` timestamps will yield zero positions.
150 """
151 # ensure 'date' column exists in prices before any other validation
152 if "date" not in prices.columns:
153 raise MissingDateColumnError("prices")
155 # ensure 'date' column exists in mu as well (kept for symmetry and downstream assumptions)
156 if "date" not in mu.columns:
157 raise MissingDateColumnError("mu")
159 # check that prices and mu have the same shape
160 if prices.shape != mu.shape:
161 raise ShapeMismatchError(prices.shape, mu.shape)
163 # check that the columns of prices and mu are identical
164 if not set(prices.columns) == set(mu.columns):
165 raise ColumnMismatchError(prices.columns, mu.columns)
167 assets = [c for c in prices.columns if c != "date" and prices[c].dtype.is_numeric()]
169 # check for non-positive prices: log returns require strictly positive prices
170 for asset in assets:
171 col = prices[asset].drop_nulls()
172 if col.len() > 0 and (col <= 0).any():
173 raise NonPositivePricesError(asset)
175 # check for excessive NaN values: more than cfg.max_nan_fraction null in any asset column
176 n_rows = prices.height
177 if n_rows > 0:
178 for asset in assets:
179 nan_frac = prices[asset].null_count() / n_rows
180 if nan_frac > cfg.max_nan_fraction:
181 raise ExcessiveNullsError(asset, nan_frac, cfg.max_nan_fraction)
183 # check for monotonic price series: a strictly non-decreasing or non-increasing
184 # series has no variance in its return sign, indicating malformed or synthetic data
185 for asset in assets:
186 col = prices[asset].drop_nulls()
187 if col.len() > 2:
188 diffs = col.diff().drop_nulls()
189 if (diffs >= 0).all() or (diffs <= 0).all():
190 raise MonotonicPricesError(asset)
192 # warn when the dataset is too short to benefit from the sliding window
193 if cfg.covariance_mode == CovarianceMode.sliding_window and cfg.window is not None:
194 w: int = cfg.window
195 if n_rows < 2 * w:
196 _logger.warning(
197 "Dataset length (%d rows) is less than 2 * window (%d). "
198 "The first %d timestamps will yield zero positions during warm-up; "
199 "consider using a longer history or reducing 'window'.",
200 n_rows,
201 2 * w,
202 w - 1,
203 )
206# ---------------------------------------------------------------------------
207# Re-export config symbols so ``from basanos.math.optimizer import …`` keeps
208# working for existing callers.
209# ---------------------------------------------------------------------------
210__all__ = [
211 "BasanosConfig",
212 "BasanosEngine",
213 "CovarianceConfig",
214 "CovarianceMode",
215 "EwmaShrinkConfig",
216 "SlidingWindowConfig",
217]
220@dataclasses.dataclass(frozen=True)
221class BasanosEngine(_DiagnosticsMixin, _SignalEvaluatorMixin, _SolveMixin):
222 """Engine to compute correlation matrices and optimize risk positions.
224 Encapsulates price data and configuration to build EWM-based
225 correlations, apply shrinkage, and solve for normalized positions.
227 Public methods are organised into clearly delimited sections (some
228 inherited from the private mixin classes):
230 * **Core data access** — :attr:`assets`, :attr:`ret_adj`, :attr:`vola`,
231 :attr:`cor`, :attr:`cor_tensor`
232 * **Solve / position logic** — :attr:`cash_position`,
233 :attr:`position_status`, :attr:`risk_position`,
234 :attr:`position_leverage`, :meth:`warmup_state`
235 (solve helpers inherited from :class:`~._engine_solve._SolveMixin`)
236 * **Portfolio and performance** — :attr:`portfolio`,
237 :attr:`naive_sharpe`, :meth:`sharpe_at_shrink`,
238 :meth:`sharpe_at_window_factors`
239 * **Matrix diagnostics** — :attr:`condition_number`,
240 :attr:`effective_rank`, :attr:`solver_residual`,
241 :attr:`signal_utilisation`
242 (inherited from :class:`~._engine_diagnostics._DiagnosticsMixin`)
243 * **Signal evaluation** — :attr:`ic`, :attr:`rank_ic`, :attr:`ic_mean`,
244 :attr:`ic_std`, :attr:`icir`, :attr:`rank_ic_mean`,
245 :attr:`rank_ic_std`
246 (inherited from :class:`~._engine_ic._SignalEvaluatorMixin`)
247 * **Reporting** — :attr:`config_report`
249 Data-flow diagram
250 -----------------
252 .. code-block:: text
254 prices (pl.DataFrame)
255 │
256 ├─ vol_adj ──► ret_adj (volatility-adjusted log returns)
257 │ │
258 │ ├─ ewm_corr ──► cor / cor_tensor
259 │ │ │
260 │ │ └─ shrink2id / FactorModel
261 │ │ │
262 │ vola covariance matrix
263 │ │ │
264 └── mu ──────────┴── _iter_solve ──────────┘
265 │
266 cash_position
267 │
268 ┌────────┴────────┐
269 portfolio diagnostics
270 (Portfolio) (condition_number,
271 effective_rank,
272 solver_residual,
273 signal_utilisation,
274 ic, rank_ic, …)
276 Attributes:
277 prices: Polars DataFrame of price levels per asset over time. Must
278 contain a ``'date'`` column and at least one numeric asset column
279 with strictly positive values that are not monotonically
280 non-decreasing or non-increasing (i.e. they must vary in sign).
281 mu: Polars DataFrame of expected-return signals aligned with *prices*.
282 Must share the same shape and column names as *prices*.
283 cfg: Immutable :class:`BasanosConfig` controlling EWMA half-lives,
284 clipping, shrinkage intensity, and AUM.
286 Examples:
287 Build an engine with two synthetic assets over 30 days and inspect the
288 optimized positions and diagnostic properties.
290 >>> import numpy as np
291 >>> import polars as pl
292 >>> from basanos.math import BasanosConfig, BasanosEngine
293 >>> dates = list(range(30))
294 >>> rng = np.random.default_rng(42)
295 >>> prices = pl.DataFrame({
296 ... "date": dates,
297 ... "A": np.cumprod(1 + rng.normal(0.001, 0.02, 30)) * 100.0,
298 ... "B": np.cumprod(1 + rng.normal(0.001, 0.02, 30)) * 150.0,
299 ... })
300 >>> mu = pl.DataFrame({
301 ... "date": dates,
302 ... "A": rng.normal(0.0, 0.5, 30),
303 ... "B": rng.normal(0.0, 0.5, 30),
304 ... })
305 >>> cfg = BasanosConfig(vola=5, corr=10, clip=2.0, shrink=0.5, aum=1_000_000)
306 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
307 >>> engine.assets
308 ['A', 'B']
309 >>> engine.cash_position.shape
310 (30, 3)
311 >>> engine.position_leverage.columns
312 ['date', 'leverage']
313 """
315 prices: pl.DataFrame
316 mu: pl.DataFrame
317 cfg: BasanosConfig
319 def __post_init__(self) -> None:
320 """Validate inputs by delegating to :func:`_validate_inputs`."""
321 _validate_inputs(self.prices, self.mu, self.cfg)
323 # ------------------------------------------------------------------
324 # Core data-access properties
325 # ------------------------------------------------------------------
327 @property
328 def assets(self) -> list[str]:
329 """List asset column names (numeric columns excluding 'date')."""
330 return [c for c in self.prices.columns if c != "date" and self.prices[c].dtype.is_numeric()]
332 @property
333 def ret_adj(self) -> pl.DataFrame:
334 """Return per-asset volatility-adjusted log returns clipped by cfg.clip.
336 Uses an EWMA volatility estimate with lookback ``cfg.vola`` to
337 standardize log returns for each numeric asset column.
338 """
339 return self.prices.with_columns(
340 [vol_adj(pl.col(asset), vola=self.cfg.vola, clip=self.cfg.clip) for asset in self.assets]
341 )
343 @property
344 def vola(self) -> pl.DataFrame:
345 """Per-asset EWMA volatility of percentage returns.
347 Computes percent changes for each numeric asset column and applies an
348 exponentially weighted standard deviation using the lookback specified
349 by ``cfg.vola``. The result is a DataFrame aligned with ``self.prices``
350 whose numeric columns hold per-asset volatility estimates.
351 """
352 return self.prices.with_columns(
353 pl.col(asset)
354 .pct_change()
355 .ewm_std(com=self.cfg.vola - 1, adjust=True, min_samples=self.cfg.vola)
356 .alias(asset)
357 for asset in self.assets
358 )
360 @property
361 def cor(self) -> dict[datetime.date, np.ndarray]:
362 """Compute per-timestamp EWM correlation matrices.
364 Builds volatility-adjusted returns for all assets, computes an
365 exponentially weighted correlation using a pure NumPy implementation
366 (with window ``cfg.corr``), and returns a mapping from each timestamp
367 to the corresponding correlation matrix as a NumPy array.
369 Returns:
370 dict: Mapping ``date -> np.ndarray`` of shape (n_assets, n_assets).
372 Performance:
373 Delegates to :func:`ewm_corr`, which is O(T·N²) in both
374 time and memory. The returned dict holds *T* references into the
375 result tensor (one N*N view per date); no extra copies are made.
376 For large *N* or *T*, prefer ``cor_tensor`` to keep a single
377 contiguous array rather than building a Python dict.
378 """
379 index = self.prices["date"]
380 ret_adj_np = self.ret_adj.select(self.assets).to_numpy()
381 tensor = _ewm_corr_numpy(
382 ret_adj_np,
383 com=self.cfg.corr,
384 min_periods=self.cfg.corr,
385 min_corr_denom=self.cfg.min_corr_denom,
386 )
387 return {index[t]: tensor[t] for t in range(len(index))}
389 @property
390 def cor_tensor(self) -> np.ndarray:
391 """Return all correlation matrices stacked as a 3-D tensor.
393 Converts the per-timestamp correlation dict (see :py:attr:`cor`) into a
394 single contiguous NumPy array so that the full history can be saved to
395 a flat ``.npy`` file with :func:`numpy.save` and reloaded with
396 :func:`numpy.load`.
398 Returns:
399 np.ndarray: Array of shape ``(T, N, N)`` where *T* is the number of
400 timestamps and *N* the number of assets. ``tensor[t]`` is the
401 correlation matrix for the *t*-th date (same ordering as
402 ``self.prices["date"]``).
404 Examples:
405 >>> import tempfile, pathlib
406 >>> import numpy as np
407 >>> import polars as pl
408 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine
409 >>> dates = pl.Series("date", list(range(100)))
410 >>> rng0 = np.random.default_rng(0).lognormal(size=100)
411 >>> rng1 = np.random.default_rng(1).lognormal(size=100)
412 >>> prices = pl.DataFrame({"date": dates, "A": rng0, "B": rng1})
413 >>> rng2 = np.random.default_rng(2).normal(size=100)
414 >>> rng3 = np.random.default_rng(3).normal(size=100)
415 >>> mu = pl.DataFrame({"date": dates, "A": rng2, "B": rng3})
416 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
417 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
418 >>> tensor = engine.cor_tensor
419 >>> with tempfile.TemporaryDirectory() as td:
420 ... path = pathlib.Path(td) / "cor.npy"
421 ... np.save(path, tensor)
422 ... loaded = np.load(path)
423 >>> np.testing.assert_array_equal(tensor, loaded)
424 """
425 return np.stack(list(self.cor.values()), axis=0)
427 # ------------------------------------------------------------------
428 # Internal solve helpers — inherited from _SolveMixin
429 # ------------------------------------------------------------------
430 # (_compute_mask, _check_signal, _scale_to_cash, _row_early_check,
431 # _denom_guard_yield, _compute_position, _replay_positions,
432 # _iter_matrices, _iter_solve, warmup_state)
433 # Implementations live in _engine_solve.py; patch targets remain in that
434 # module's namespace, e.g. ``patch("basanos.math._engine_solve.solve")``.
436 # ------------------------------------------------------------------
437 # Position properties
438 # ------------------------------------------------------------------
440 @property
441 def cash_position(self) -> pl.DataFrame:
442 r"""Optimize correlation-aware risk positions for each timestamp.
444 Supports two covariance modes controlled by ``cfg.covariance_config``:
446 * :class:`EwmaShrinkConfig` (default): Computes EWMA correlations, applies
447 linear shrinkage toward the identity, and solves a normalised linear
448 system :math:`C\,x = \mu` per timestamp via Cholesky / LU.
450 * :class:`SlidingWindowConfig`: At each timestamp uses the
451 ``cfg.covariance_config.window`` most recent vol-adjusted returns to fit a
452 rank-``cfg.covariance_config.n_factors`` factor model via truncated SVD and
453 solves the system via the Woodbury identity at :math:`O(k^3 + kn)` rather
454 than :math:`O(n^3)` per step.
456 Non-finite or ill-posed cases yield zero positions for safety.
458 Returns:
459 pl.DataFrame: DataFrame with columns ['date'] + asset names containing
460 the per-timestamp cash positions (risk divided by EWMA volatility).
462 Performance:
463 For ``ewma_shrink``: dominant cost is ``self.cor`` (O(T·N²) time,
464 O(T·N²) memory — see :func:`ewm_corr`). The per-timestamp
465 linear solve adds O(N³) per row.
467 For ``sliding_window``: O(T·W·N·k) for sliding SVDs plus
468 O(T·(k³ + kN)) for Woodbury solves. Memory is O(W·N) per step,
469 independent of T.
470 """
471 assets = self.assets
473 # Compute risk positions row-by-row using _replay_positions.
474 prices_num = self.prices.select(assets).to_numpy()
476 risk_pos_np = np.full_like(prices_num, fill_value=np.nan, dtype=float)
477 cash_pos_np = np.full_like(prices_num, fill_value=np.nan, dtype=float)
478 vola_np = self.vola.select(assets).to_numpy()
480 self._replay_positions(risk_pos_np, cash_pos_np, vola_np)
482 # Build Polars DataFrame for cash positions (numeric columns only)
483 cash_position = self.prices.with_columns(
484 [(pl.lit(cash_pos_np[:, i]).alias(asset)) for i, asset in enumerate(assets)]
485 )
487 return cash_position
489 @property
490 def position_status(self) -> pl.DataFrame:
491 """Per-timestamp reason code explaining each :attr:`cash_position` row.
493 Labels every row with exactly one of four :class:`~basanos.math.SolveStatus`
494 codes (which compare equal to their string equivalents):
496 * ``'warmup'``: Insufficient history for the sliding-window
497 covariance mode (``i + 1 < cfg.covariance_config.window``).
498 Positions are ``NaN`` for all assets at this timestamp.
499 * ``'zero_signal'``: The expected-return vector ``mu`` was
500 all-zeros (or all-NaN) at this timestamp; the optimizer
501 short-circuited and returned zero positions without solving.
502 * ``'degenerate'``: The normalisation denominator was non-finite
503 or below ``cfg.denom_tol``, the Cholesky / Woodbury solve
504 failed, or no asset had a finite price; positions were zeroed
505 for safety.
506 * ``'valid'``: The linear system was solved successfully and
507 positions are non-trivially non-zero.
509 The codes map one-to-one onto the three NaN / zero cases
510 described in the issue and allow downstream consumers (backtests,
511 risk monitors) to distinguish data gaps from signal silence from
512 numerical ill-conditioning without re-inspecting ``mu`` or the
513 engine configuration.
515 Returns:
516 pl.DataFrame: Two-column DataFrame ``{'date': ..., 'status': ...}``
517 with one row per timestamp. The ``status`` column has
518 ``Polars`` dtype ``String``.
519 """
520 statuses = [status for _i, _t, _mask, _pos, status in self._iter_solve()]
521 return pl.DataFrame({"date": self.prices["date"], "status": pl.Series(statuses, dtype=pl.String)})
523 @property
524 def risk_position(self) -> pl.DataFrame:
525 """Risk positions (before EWMA-volatility scaling) at each timestamp.
527 Derives the un-volatility-scaled position by multiplying the cash
528 position by the per-asset EWMA volatility. Equivalently, this is
529 the quantity solved by the correlation-adjusted linear system before
530 dividing by ``vola``.
532 Relationship to other properties::
534 cash_position = risk_position / vola
535 risk_position = cash_position * vola
537 Returns:
538 pl.DataFrame: DataFrame with columns ``['date'] + assets`` where
539 each value is ``cash_position_i * vola_i`` at the given timestamp.
540 """
541 assets = self.assets
542 cp_np = self.cash_position.select(assets).to_numpy()
543 vola_np = self.vola.select(assets).to_numpy()
544 with np.errstate(invalid="ignore"):
545 risk_pos = cp_np * vola_np
546 return self.prices.with_columns([pl.lit(risk_pos[:, i]).alias(asset) for i, asset in enumerate(assets)])
548 @property
549 def position_leverage(self) -> pl.DataFrame:
550 """L1 norm of cash positions (gross leverage) at each timestamp.
552 Sums the absolute values of all asset cash positions at each row.
553 NaN positions are treated as zero (they contribute nothing to gross
554 leverage).
556 Returns:
557 pl.DataFrame: Two-column DataFrame ``{'date': ..., 'leverage': ...}``
558 where ``leverage`` is the L1 norm of the cash-position vector.
559 """
560 assets = self.assets
561 cp_np = self.cash_position.select(assets).to_numpy()
562 leverage = np.nansum(np.abs(cp_np), axis=1)
563 return pl.DataFrame({"date": self.prices["date"], "leverage": pl.Series(leverage, dtype=pl.Float64)})
565 # ------------------------------------------------------------------
566 # Portfolio and performance
567 # ------------------------------------------------------------------
569 @property
570 def portfolio(self) -> Portfolio:
571 """Construct a Portfolio from the optimized cash positions.
573 Converts the computed cash positions into a Portfolio using the
574 configured AUM. The ``cost_per_unit`` from :attr:`cfg` is forwarded
575 so that :attr:`~jquantstats.Portfolio.net_cost_nav` and
576 :attr:`~jquantstats.Portfolio.position_delta_costs` work out
577 of the box without any further configuration.
579 Returns:
580 Portfolio: Instance built from cash positions with AUM scaling.
581 """
582 cp = self.cash_position
583 assets = [c for c in cp.columns if c != "date" and cp[c].dtype.is_numeric()]
584 scaled = cp.with_columns(pl.col(a) * self.cfg.position_scale for a in assets)
585 return Portfolio.from_cash_position(self.prices, scaled, aum=self.cfg.aum, cost_per_unit=self.cfg.cost_per_unit)
587 def sharpe_at_shrink(self, shrink: float) -> float:
588 r"""Return the annualised portfolio Sharpe ratio for the given shrinkage weight.
590 Constructs a new :class:`BasanosEngine` with all parameters identical to
591 ``self`` except that ``cfg.shrink`` is replaced by ``shrink``, then
592 returns the annualised Sharpe ratio of the resulting portfolio.
594 This is the canonical single-argument callable required by the benchmarks
595 specification: ``f(λ) → Sharpe``. Use it to sweep λ across ``[0, 1]``
596 and measure whether correlation adjustment adds value over the
597 signal-proportional baseline (λ = 0) or the unregularised limit (λ = 1).
599 Corner cases:
600 * **λ = 0** — the shrunk matrix equals the identity, so the
601 optimiser treats all assets as uncorrelated and positions are
602 purely signal-proportional (no correlation adjustment).
603 * **λ = 1** — the raw EWMA correlation matrix is used without
604 shrinkage.
606 Args:
607 shrink: Retention weight λ ∈ [0, 1]. See
608 :attr:`BasanosConfig.shrink` for full documentation.
610 Returns:
611 Annualised Sharpe ratio of the portfolio returns as a ``float``.
612 Returns ``float("nan")`` when the Sharpe ratio cannot be computed
613 (e.g. zero-variance returns).
615 Raises:
616 ValidationError: When ``shrink`` is outside [0, 1] (delegated to
617 :class:`BasanosConfig` field validation).
619 Examples:
620 >>> import numpy as np
621 >>> import polars as pl
622 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine
623 >>> dates = pl.Series("date", list(range(200)))
624 >>> rng = np.random.default_rng(0)
625 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})
626 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})
627 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
628 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
629 >>> s = engine.sharpe_at_shrink(0.5)
630 >>> isinstance(s, float)
631 True
632 """
633 new_cfg = self.cfg.replace(shrink=shrink)
634 engine = BasanosEngine(prices=self.prices, mu=self.mu, cfg=new_cfg)
635 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan"))
637 def sharpe_at_window_factors(self, window: int, n_factors: int) -> float:
638 r"""Return the annualised portfolio Sharpe ratio for the given sliding-window parameters.
640 Constructs a new :class:`BasanosEngine` with ``covariance_mode`` set to
641 ``"sliding_window"`` and the supplied ``window`` / ``n_factors``, keeping
642 all other configuration identical to ``self``.
644 Use this method to sweep ``(W, k)`` and compare the sliding-window
645 estimator against the EWMA baseline (via :meth:`sharpe_at_shrink`).
647 Args:
648 window: Rolling window length :math:`W \geq 1`.
649 Rule of thumb: :math:`W \geq 2 \cdot n_{\text{assets}}`.
650 n_factors: Number of latent factors :math:`k \geq 1`.
652 Returns:
653 Annualised Sharpe ratio of the portfolio returns as a ``float``.
654 Returns ``float("nan")`` when the Sharpe ratio cannot be computed
655 (e.g. not enough history to fill the first window).
657 Raises:
658 ValidationError: When ``window`` or ``n_factors`` fail field
659 constraints (delegated to :class:`BasanosConfig`).
661 Examples:
662 >>> import numpy as np
663 >>> import polars as pl
664 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine
665 >>> dates = pl.Series("date", list(range(200)))
666 >>> rng = np.random.default_rng(0)
667 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})
668 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})
669 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
670 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
671 >>> s = engine.sharpe_at_window_factors(window=40, n_factors=2)
672 >>> isinstance(s, float)
673 True
674 """
675 new_cfg = self.cfg.replace(
676 covariance_config=SlidingWindowConfig(window=window, n_factors=n_factors),
677 )
678 engine = BasanosEngine(prices=self.prices, mu=self.mu, cfg=new_cfg)
679 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan"))
681 @property
682 def naive_sharpe(self) -> float:
683 r"""Sharpe ratio of the naïve equal-weight signal (μ = 1 for every asset/timestamp).
685 Replaces the expected-return signal ``mu`` with a constant matrix of
686 ones, then runs the optimiser with the current configuration and returns
687 the annualised Sharpe ratio of the resulting portfolio.
689 This provides the baseline answer to *"does the signal add value?"*:
690 a real signal should produce a higher Sharpe than the naïve benchmark.
691 Combined with :meth:`sharpe_at_shrink`, this yields a three-way
692 comparison:
694 +--------------------+----------------------------------------------+
695 | Benchmark | What it measures |
696 +====================+==============================================+
697 | ``naive_sharpe`` | No signal skill; pure correlation routing |
698 +--------------------+----------------------------------------------+
699 | ``sharpe_at_shrink(0.0)`` | Signal skill, no correlation adj. |
700 +--------------------+----------------------------------------------+
701 | ``sharpe_at_shrink(cfg.shrink)`` | Signal + correlation adj. |
702 +--------------------+----------------------------------------------+
704 Returns:
705 Annualised Sharpe ratio of the equal-weight portfolio as a ``float``.
706 Returns ``float("nan")`` when the Sharpe ratio cannot be computed.
708 Examples:
709 >>> import numpy as np
710 >>> import polars as pl
711 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine
712 >>> dates = pl.Series("date", list(range(200)))
713 >>> rng = np.random.default_rng(0)
714 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})
715 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})
716 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
717 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
718 >>> s = engine.naive_sharpe
719 >>> isinstance(s, float)
720 True
721 """
722 naive_mu = self.mu.with_columns(pl.lit(1.0).alias(asset) for asset in self.assets)
723 engine = BasanosEngine(prices=self.prices, mu=naive_mu, cfg=self.cfg)
724 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan"))
726 # ------------------------------------------------------------------
727 # Reporting
728 # ------------------------------------------------------------------
730 @property
731 def config_report(self) -> "ConfigReport":
732 """Return a :class:`~basanos.math._config_report.ConfigReport` facade for this engine.
734 Returns a :class:`~basanos.math._config_report.ConfigReport` that
735 includes the full **lambda-sweep chart** — an interactive plot of the
736 annualised Sharpe ratio as :attr:`~BasanosConfig.shrink` (λ) is swept
737 across [0, 1] — in addition to the parameter table, shrinkage-guidance
738 table, and theory section available from
739 :attr:`BasanosConfig.report`.
741 Returns:
742 basanos.math._config_report.ConfigReport: Report facade with
743 ``to_html()`` and ``save()`` methods.
745 Examples:
746 >>> import numpy as np
747 >>> import polars as pl
748 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine
749 >>> dates = pl.Series("date", list(range(200)))
750 >>> rng = np.random.default_rng(0)
751 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})
752 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})
753 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
754 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
755 >>> report = engine.config_report
756 >>> html = report.to_html()
757 >>> "Lambda" in html
758 True
759 """
760 from ._config_report import ConfigReport
762 return ConfigReport(config=self.cfg, engine=self)
764 # ------------------------------------------------------------------
765 # Matrix diagnostics — inherited from _DiagnosticsMixin
766 # ------------------------------------------------------------------
767 # (condition_number, effective_rank, solver_residual, signal_utilisation)
768 # Implementations live in _engine_diagnostics.py; patch targets remain in
769 # that module's namespace, e.g.
770 # ``patch("basanos.math._engine_diagnostics.solve")``.
772 # ------------------------------------------------------------------
773 # Signal evaluation — inherited from _SignalEvaluatorMixin
774 # ------------------------------------------------------------------
775 # (_ic_series, ic, rank_ic, ic_mean, ic_std, icir,
776 # rank_ic_mean, rank_ic_std)
777 # Implementations live in _engine_ic.py.