Coverage for src / basanos / math / optimizer.py: 100%
349 statements
« prev ^ index » next coverage.py v7.13.5, created at 2026-03-19 05:23 +0000
« prev ^ index » next coverage.py v7.13.5, created at 2026-03-19 05:23 +0000
1"""Correlation-aware risk position optimizer (Basanos).
3This module provides utilities to compute correlation-adjusted risk positions
4from price data and expected-return signals. It relies on volatility-adjusted
5returns to estimate a dynamic correlation matrix (via EWM), applies shrinkage
6towards identity, and solves a normalized linear system per timestamp to
7obtain stable positions.
9Performance characteristics
10---------------------------
11Let *N* be the number of assets and *T* the number of timestamps.
13**Computational complexity**
15+----------------------------------+------------------+--------------------------------------+
16| Operation | Complexity | Bottleneck |
17+==================================+==================+======================================+
18| EWM volatility (``ret_adj``, | O(T·N) | Linear in both T and N; negligible |
19| ``vola``) | | |
20+----------------------------------+------------------+--------------------------------------+
21| EWM correlation (``cor``) | O(T·N²) | ``lfilter`` over all N² asset pairs |
22| | | simultaneously |
23+----------------------------------+------------------+--------------------------------------+
24| Linear solve per timestamp | O(N³) | Cholesky / LU per row in |
25| (``cash_position``) | * T solves | ``cash_position`` |
26+----------------------------------+------------------+--------------------------------------+
28**Memory usage** (peak, approximate)
30``_ewm_corr_numpy`` allocates roughly **14 float64 arrays** of shape
31``(T, N, N)`` at peak (input sequences, IIR filter outputs, EWM components,
32and the result tensor). Peak RAM ≈ **112 * T * N²** bytes. Typical
33working sizes on a 16 GB machine:
35+--------+--------------------------+------------------------------------+
36| N | T (daily rows) | Peak memory (approx.) |
37+========+==========================+====================================+
38| 50 | 252 (~1 yr) | ~70 MB |
39+--------+--------------------------+------------------------------------+
40| 100 | 252 (~1 yr) | ~280 MB |
41+--------+--------------------------+------------------------------------+
42| 100 | 2 520 (~10 yr) | ~2.8 GB |
43+--------+--------------------------+------------------------------------+
44| 200 | 2 520 (~10 yr) | ~11 GB |
45+--------+--------------------------+------------------------------------+
46| 500 | 2 520 (~10 yr) | ~70 GB ⚠ exceeds typical RAM |
47+--------+--------------------------+------------------------------------+
49**Practical limits (daily data)**
51* **≤ 150 assets, ≤ 5 years** — well within reach on an 8 GB laptop.
52* **≤ 250 assets, ≤ 10 years** — requires ~11-12 GB; feasible on a 16 GB
53 workstation.
54* **> 500 assets with multi-year history** — peak memory exceeds 16 GB;
55 reduce the time range or switch to a chunked / streaming approach.
56* **> 1 000 assets** — the O(N³) per-solve cost alone makes real-time
57 optimization impractical even with adequate RAM.
59See ``BENCHMARKS.md`` for measured wall-clock timings across representative
60dataset sizes.
61"""
63import dataclasses
64import logging
65from typing import TYPE_CHECKING
67import numpy as np
68import polars as pl
69from pydantic import BaseModel, Field, ValidationInfo, field_validator
70from scipy.signal import lfilter
71from scipy.stats import spearmanr
73from ..analytics import Portfolio
74from ..exceptions import (
75 ColumnMismatchError,
76 ExcessiveNullsError,
77 MissingDateColumnError,
78 MonotonicPricesError,
79 NonPositivePricesError,
80 ShapeMismatchError,
81 SingularMatrixError,
82)
83from ._linalg import inv_a_norm, solve, valid
84from ._signal import shrink2id, vol_adj
86if TYPE_CHECKING:
87 from ._config_report import ConfigReport
89_logger = logging.getLogger(__name__)
92def _ewm_corr_numpy(
93 data: np.ndarray,
94 com: int,
95 min_periods: int,
96 min_corr_denom: float = 1e-14,
97) -> np.ndarray:
98 """Compute per-row EWM correlation matrices without pandas.
100 Matches ``pandas.DataFrame.ewm(com=com, min_periods=min_periods).corr()``
101 with the default ``adjust=True, ignore_na=False`` settings to within
102 floating-point rounding error.
104 All five EWM components used to compute ``corr(i, j)`` — namely
105 ``ewm(x_i)``, ``ewm(x_j)``, ``ewm(x_i²)``, ``ewm(x_j²)``, and
106 ``ewm(x_i·x_j)`` — share the **same joint weight structure**: weights
107 decay at every timestep (``ignore_na=False``) but a new observation is
108 only added at timesteps where *both* ``x_i`` and ``x_j`` are finite. As
109 a result the correlation for a pair is frozen once either asset goes
110 missing, exactly mirroring pandas behaviour.
112 The EWM recurrence ``s[t] = β·s[t-1] + v[t]`` is an IIR filter and is
113 solved for **all N² pairs simultaneously** via ``scipy.signal.lfilter``
114 — no Python loop over the T timesteps.
116 Args:
117 data: Float array of shape ``(T, N)`` - typically volatility-adjusted
118 log returns.
119 com: EWM centre-of-mass (``alpha = 1 / (1 + com)``).
120 min_periods: Minimum number of joint finite observations required
121 before a correlation value is reported; earlier rows are NaN.
122 min_corr_denom: Guard threshold below which the correlation denominator
123 is treated as zero and the result is set to NaN. Defaults to
124 ``1e-14``.
126 Returns:
127 np.ndarray of shape ``(T, N, N)`` containing the per-row correlation
128 matrices. Each matrix is symmetric with diagonal 1.0 (or NaN during
129 warm-up).
131 Performance:
132 **Time** — O(T·N²): ``lfilter`` processes all N² pairs simultaneously,
133 so wall-clock time scales linearly with both T and N².
135 **Memory** — approximately 14 float64 arrays of shape ``(T, N, N)``
136 exist at peak, giving roughly ``112 * T * N²`` bytes. For 100 assets
137 over 2 520 trading days (~10 years) that is ≈ 2.8 GB; for 500 assets
138 the same period requires ≈ 70 GB, which exceeds typical workstation
139 RAM. Reduce T or N before calling this function when working with
140 large universes.
141 """
142 _t_len, n_assets = data.shape
143 beta = com / (1.0 + com)
145 fin = np.isfinite(data) # (T, N) bool
146 xt_f = np.where(fin, data, 0.0) # (T, N) float - zeroed where not finite
148 # joint_fin[t, i, j] = True iff assets i and j are both finite at t
149 joint_fin = fin[:, :, np.newaxis] & fin[:, np.newaxis, :] # (T, N, N)
151 # Build per-pair input sequences for the recurrence s[t] = beta*s[t-1] + v[t].
152 #
153 # v_x[t, i, j] = x_i[t] where pair (i,j) jointly finite, else 0
154 # v_x2[t, i, j] = x_i[t]^2 where jointly finite, else 0
155 # v_xy[t, i, j] = x_i[t]*x_j[t] (xt_f is 0 for non-finite, so implicit mask)
156 # v_w[t, i, j] = 1 where jointly finite, else 0 (weight indicator)
157 #
158 # By symmetry v_x[t,j,i] carries x_j[t] for pair (i,j), so s_x.swapaxes(1,2)
159 # gives the EWM numerator of x_j without a separate v_y array.
160 v_x = xt_f[:, :, np.newaxis] * joint_fin # (T, N, N)
161 v_x2 = (xt_f * xt_f)[:, :, np.newaxis] * joint_fin # (T, N, N)
162 v_xy = xt_f[:, :, np.newaxis] * xt_f[:, np.newaxis, :] # (T, N, N)
163 v_w = joint_fin.astype(np.float64) # (T, N, N)
165 # Solve the IIR recurrence for every (i, j) pair in parallel.
166 # lfilter([1], [1, -beta], v, axis=0) computes s[t] = beta*s[t-1] + v[t].
167 filt_a = np.array([1.0, -beta])
168 s_x = lfilter([1.0], filt_a, v_x, axis=0) # (T, N, N)
169 s_x2 = lfilter([1.0], filt_a, v_x2, axis=0) # (T, N, N)
170 s_xy = lfilter([1.0], filt_a, v_xy, axis=0) # (T, N, N)
171 s_w = lfilter([1.0], filt_a, v_w, axis=0) # (T, N, N)
173 # Joint finite observation count per pair at each timestep (for min_periods)
174 count = np.cumsum(joint_fin, axis=0) # (T, N, N) int64
176 # EWM means: running numerator / running weight denominator.
177 # s_x.swapaxes(1,2)[t,i,j] = s_x[t,j,i] = EWM numerator of x_j for pair (i,j).
178 with np.errstate(divide="ignore", invalid="ignore"):
179 pos_w = s_w > 0
180 ewm_x = np.where(pos_w, s_x / s_w, np.nan) # EWM(x_i)
181 ewm_y = np.where(pos_w, s_x.swapaxes(1, 2) / s_w, np.nan) # EWM(x_j)
182 ewm_x2 = np.where(pos_w, s_x2 / s_w, np.nan) # EWM(x_i^2)
183 ewm_y2 = np.where(pos_w, s_x2.swapaxes(1, 2) / s_w, np.nan) # EWM(x_j^2)
184 ewm_xy = np.where(pos_w, s_xy / s_w, np.nan) # EWM(x_i*x_j)
186 var_x = np.maximum(ewm_x2 - ewm_x * ewm_x, 0.0)
187 var_y = np.maximum(ewm_y2 - ewm_y * ewm_y, 0.0)
188 denom = np.sqrt(var_x * var_y)
189 cov = ewm_xy - ewm_x * ewm_y
191 with np.errstate(divide="ignore", invalid="ignore"):
192 result = np.where(denom > min_corr_denom, cov / denom, np.nan)
194 result = np.clip(result, -1.0, 1.0)
196 # Apply min_periods mask for all pairs
197 result[count < min_periods] = np.nan
199 # Diagonal is exactly 1.0 where the asset has sufficient observations
200 diag_idx = np.arange(n_assets)
201 diag_count = count[:, diag_idx, diag_idx] # (T, N)
202 result[:, diag_idx, diag_idx] = np.where(diag_count >= min_periods, 1.0, np.nan)
204 return result
207class BasanosConfig(BaseModel):
208 r"""Configuration for correlation-aware position optimization.
210 The required parameters (``vola``, ``corr``, ``clip``, ``shrink``, ``aum``)
211 must be supplied by the caller. The optional parameters carry
212 carefully chosen defaults whose rationale is described below.
214 Shrinkage methodology
215 ---------------------
216 ``shrink`` controls linear shrinkage of the EWMA correlation matrix toward
217 the identity:
219 .. math::
221 C_{\\text{shrunk}} = \\lambda \\cdot C_{\\text{EWMA}} + (1 - \\lambda) \\cdot I_n
223 where :math:`\\lambda` = ``shrink`` and :math:`I_n` is the identity.
224 Shrinkage regularises the matrix when assets are few relative to the
225 lookback (high concentration ratio :math:`n / T`), reducing the impact of
226 extreme sample eigenvalues and improving the condition number of the matrix
227 passed to the linear solver.
229 **When to prefer strong shrinkage (low** ``shrink`` **/ high** ``1-shrink``\\ **):**
231 * Fewer than ~30 assets with a ``corr`` lookback shorter than 100 days.
232 * High-volatility or crisis regimes where correlations spike and the sample
233 matrix is less representative of the true structure.
234 * Portfolios where estimation noise is more costly than correlation bias
235 (e.g., when the signal-to-noise ratio of ``mu`` is low).
237 **When to prefer light shrinkage (high** ``shrink``\\ **):**
239 * Many assets with a long lookback (low concentration ratio).
240 * The EWMA correlation structure carries genuine diversification information
241 that you want the solver to exploit.
242 * Out-of-sample testing shows that position stability is not a concern.
244 **Practical starting points (daily return data):**
246 Here *n* = number of assets and *T* = ``cfg.corr`` (EWMA lookback).
248 +-----------------------+-------------------+--------------------------------+
249 | n (assets) / T (corr) | Suggested shrink | Notes |
250 +=======================+===================+================================+
251 | n > 20, T < 40 | 0.3 - 0.5 | Near-singular matrix likely; |
252 | | | strong regularisation needed. |
253 +-----------------------+-------------------+--------------------------------+
254 | n ~ 10, T ~ 60 | 0.5 - 0.7 | Balanced regime. |
255 +-----------------------+-------------------+--------------------------------+
256 | n < 10, T > 100 | 0.7 - 0.9 | Well-conditioned sample; |
257 | | | light shrinkage for stability. |
258 +-----------------------+-------------------+--------------------------------+
260 See :func:`~basanos.math._signal.shrink2id` for the full theoretical
261 background and academic references (Ledoit & Wolf, 2004; Chen et al., 2010).
263 Default rationale
264 -----------------
265 ``profit_variance_init = 1.0``
266 Unit variance is a neutral, uninformative starting point for the
267 exponential moving average of realized P&L variance. Because the
268 normalised risk positions are divided by the square root of this
269 quantity, initialising at 1.0 leaves the first few positions
270 unaffected until the EMA accumulates real data. Larger values
271 would shrink initial positions; smaller values would inflate them.
273 ``profit_variance_decay = 0.99``
274 An EMA decay factor of λ = 0.99 corresponds to a half-life of
275 ``log(0.5) / log(0.99) ≈ 69`` periods and an effective centre-of-
276 mass of ``1 / (1 - 0.99) = 100`` periods. For daily data this
277 represents approximately 100 trading days (~5 months), a
278 commonly used horizon for medium-frequency regime adaptation in
279 systematic strategies.
281 ``denom_tol = 1e-12``
282 Positions are zeroed when the normalisation denominator
283 ``inv_a_norm(μ, Σ)`` falls at or below this threshold. The
284 value 1e-12 provides ample headroom above float64 machine
285 epsilon (~2.2e-16) while remaining negligible relative to any
286 economically meaningful signal magnitude.
288 ``position_scale = 1e6``
289 The dimensionless risk position is multiplied by this factor
290 before being passed to :class:`~basanos.analytics.Portfolio`.
291 A value of 1e6 means positions are expressed in units of one
292 million of the base currency, a conventional denomination for
293 institutional-scale portfolios where AUM is measured in hundreds
294 of millions.
296 ``min_corr_denom = 1e-14``
297 The EWMA correlation denominator ``sqrt(var_x * var_y)`` is
298 compared against this threshold; when at or below it the
299 correlation is set to NaN rather than dividing by a near-zero
300 value. The default 1e-14 is safely above float64 underflow
301 while remaining negligible for any realistic return series.
302 Advanced users may tighten this guard (larger value) when
303 working with very-low-variance synthetic data.
305 ``max_nan_fraction = 0.9``
306 :class:`~basanos.exceptions.ExcessiveNullsError` is raised
307 during construction when the null fraction in any asset price
308 column **strictly exceeds** this threshold. The default 0.9
309 permits up to 90 % missing prices (e.g., illiquid or recently
310 listed assets in a long history) while rejecting columns that
311 are almost entirely null and would contribute no useful
312 information. Callers who want a stricter gate can lower this
313 value; callers running on sparse data can raise it toward 1.0.
315 Examples:
316 >>> cfg = BasanosConfig(vola=32, corr=64, clip=3.0, shrink=0.5, aum=1e8)
317 >>> cfg.vola
318 32
319 >>> cfg.corr
320 64
321 """
323 vola: int = Field(..., gt=0, description="EWMA lookback for volatility normalization.")
324 corr: int = Field(..., gt=0, description="EWMA lookback for correlation estimation.")
325 clip: float = Field(..., gt=0.0, description="Clipping threshold for volatility adjustment.")
326 shrink: float = Field(
327 ...,
328 ge=0.0,
329 le=1.0,
330 description=(
331 "Retention weight λ for linear shrinkage of the EWMA correlation matrix toward "
332 "the identity: C_shrunk = λ·C_ewma + (1-λ)·I. "
333 "λ=1.0 uses the raw EWMA matrix (no shrinkage); λ=0.0 replaces it entirely "
334 "with the identity (maximum shrinkage, positions are treated as uncorrelated). "
335 "Values in [0.3, 0.8] are typical for daily financial return data. "
336 "Lower values improve numerical stability when assets are many relative to the "
337 "lookback (high concentration ratio n/T). See shrink2id() for full guidance."
338 ),
339 )
340 aum: float = Field(..., gt=0.0, description="Assets under management for portfolio scaling.")
341 profit_variance_init: float = Field(
342 default=1.0,
343 gt=0.0,
344 description=(
345 "Initial value for the profit variance EMA used in position sizing. "
346 "Defaults to 1.0 (unit variance) so that the first positions are unscaled "
347 "until real P&L data accumulates."
348 ),
349 )
350 profit_variance_decay: float = Field(
351 default=0.99,
352 gt=0.0,
353 lt=1.0,
354 description=(
355 "EMA decay factor λ for the realized P&L variance (higher = slower adaptation). "
356 "The default 0.99 gives a half-life of ~69 periods and an effective window of "
357 "100 periods, suitable for daily data."
358 ),
359 )
360 denom_tol: float = Field(
361 default=1e-12,
362 gt=0.0,
363 description=(
364 "Minimum normalisation denominator; positions are zeroed at or below this value. "
365 "The default 1e-12 is well above float64 machine epsilon (~2.2e-16) while "
366 "remaining negligible for any economically meaningful signal."
367 ),
368 )
369 position_scale: float = Field(
370 default=1e6,
371 gt=0.0,
372 description=(
373 "Multiplicative scaling factor applied to dimensionless risk positions to obtain "
374 "cash positions in base-currency units. Defaults to 1e6 (one million), a "
375 "conventional denomination for institutional portfolios."
376 ),
377 )
378 min_corr_denom: float = Field(
379 default=1e-14,
380 gt=0.0,
381 description=(
382 "Guard threshold for the EWMA correlation denominator sqrt(var_x * var_y). "
383 "When the denominator is at or below this value the correlation is set to NaN "
384 "instead of dividing by a near-zero number. "
385 "The default 1e-14 is safely above float64 underflow while being negligible for "
386 "any realistic return variance."
387 ),
388 )
389 max_nan_fraction: float = Field(
390 default=0.9,
391 gt=0.0,
392 lt=1.0,
393 description=(
394 "Maximum tolerated fraction of null values in any asset price column. "
395 "ExcessiveNullsError is raised during construction when the null fraction "
396 "strictly exceeds this threshold. "
397 "The default 0.9 allows up to 90 % missing prices while rejecting columns "
398 "that are almost entirely null."
399 ),
400 )
402 model_config = {"frozen": True, "extra": "forbid"}
404 @property
405 def report(self) -> "ConfigReport":
406 """Return a :class:`~basanos.math._config_report.ConfigReport` facade for this config.
408 Generates a self-contained HTML report summarising all configuration
409 parameters, a shrinkage-guidance table, and a theory section on
410 Ledoit-Wolf shrinkage.
412 To also include a lambda-sweep chart (Sharpe vs λ), use
413 :attr:`BasanosEngine.config_report` instead, which requires price and
414 signal data.
416 Returns:
417 basanos.math._config_report.ConfigReport: Report facade with
418 ``to_html()`` and ``save()`` methods.
420 Examples:
421 >>> from basanos.math.optimizer import BasanosConfig
422 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
423 >>> report = cfg.report
424 >>> html = report.to_html()
425 >>> "Parameters" in html
426 True
427 """
428 from ._config_report import ConfigReport
430 return ConfigReport(config=self)
432 @field_validator("corr")
433 @classmethod
434 def corr_greater_than_vola(cls, v: int, info: ValidationInfo) -> int:
435 """Optionally enforce corr ≥ vola for stability.
437 Pydantic v2 passes ValidationInfo; use info.data to access other fields.
438 """
439 vola = info.data.get("vola") if hasattr(info, "data") else None
440 if vola is not None and v < vola:
441 raise ValueError
442 return v
445@dataclasses.dataclass(frozen=True)
446class BasanosEngine:
447 """Engine to compute correlation matrices and optimize risk positions.
449 Encapsulates price data and configuration to build EWM-based
450 correlations, apply shrinkage, and solve for normalized positions.
452 Attributes:
453 prices: Polars DataFrame of price levels per asset over time. Must
454 contain a ``'date'`` column and at least one numeric asset column
455 with strictly positive values that are not monotonically
456 non-decreasing or non-increasing (i.e. they must vary in sign).
457 mu: Polars DataFrame of expected-return signals aligned with *prices*.
458 Must share the same shape and column names as *prices*.
459 cfg: Immutable :class:`BasanosConfig` controlling EWMA half-lives,
460 clipping, shrinkage intensity, and AUM.
462 Examples:
463 Build an engine with two synthetic assets over 30 days and inspect the
464 optimized positions and diagnostic properties.
466 >>> import numpy as np
467 >>> import polars as pl
468 >>> from basanos.math import BasanosConfig, BasanosEngine
469 >>> dates = list(range(30))
470 >>> rng = np.random.default_rng(42)
471 >>> prices = pl.DataFrame({
472 ... "date": dates,
473 ... "A": np.cumprod(1 + rng.normal(0.001, 0.02, 30)) * 100.0,
474 ... "B": np.cumprod(1 + rng.normal(0.001, 0.02, 30)) * 150.0,
475 ... })
476 >>> mu = pl.DataFrame({
477 ... "date": dates,
478 ... "A": rng.normal(0.0, 0.5, 30),
479 ... "B": rng.normal(0.0, 0.5, 30),
480 ... })
481 >>> cfg = BasanosConfig(vola=5, corr=10, clip=2.0, shrink=0.5, aum=1_000_000)
482 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
483 >>> engine.assets
484 ['A', 'B']
485 >>> engine.cash_position.shape
486 (30, 3)
487 >>> engine.position_leverage.columns
488 ['date', 'leverage']
489 """
491 prices: pl.DataFrame
492 mu: pl.DataFrame
493 cfg: BasanosConfig
495 def __post_init__(self) -> None:
496 """Validate basic invariants right after initialization.
498 Ensures both ``prices`` and ``mu`` contain a ``'date'`` column and
499 share identical shapes/columns, which downstream computations rely on.
500 Also checks for data quality issues that would cause silent failures
501 downstream: non-positive prices, excessive NaN values, and monotonic
502 (non-varying) price series.
503 """
504 # ensure 'date' column exists in prices before any other validation
505 if "date" not in self.prices.columns:
506 raise MissingDateColumnError("prices")
508 # ensure 'date' column exists in mu as well (kept for symmetry and downstream assumptions)
509 if "date" not in self.mu.columns:
510 raise MissingDateColumnError("mu")
512 # check that prices and mu have the same shape
513 if self.prices.shape != self.mu.shape:
514 raise ShapeMismatchError(self.prices.shape, self.mu.shape)
516 # check that the columns of prices and mu are identical
517 if not set(self.prices.columns) == set(self.mu.columns):
518 raise ColumnMismatchError(self.prices.columns, self.mu.columns)
520 # check for non-positive prices: log returns require strictly positive prices
521 for asset in self.assets:
522 col = self.prices[asset].drop_nulls()
523 if col.len() > 0 and (col <= 0).any():
524 raise NonPositivePricesError(asset)
526 # check for excessive NaN values: more than cfg.max_nan_fraction null in any asset column
527 n_rows = self.prices.height
528 if n_rows > 0:
529 for asset in self.assets:
530 nan_frac = self.prices[asset].null_count() / n_rows
531 if nan_frac > self.cfg.max_nan_fraction:
532 raise ExcessiveNullsError(asset, nan_frac, self.cfg.max_nan_fraction)
534 # check for monotonic price series: a strictly non-decreasing or non-increasing
535 # series has no variance in its return sign, indicating malformed or synthetic data
536 for asset in self.assets:
537 col = self.prices[asset].drop_nulls()
538 if col.len() > 2:
539 diffs = col.diff().drop_nulls()
540 if (diffs >= 0).all() or (diffs <= 0).all():
541 raise MonotonicPricesError(asset)
543 @property
544 def assets(self) -> list[str]:
545 """List asset column names (numeric columns excluding 'date')."""
546 return [c for c in self.prices.columns if c != "date" and self.prices[c].dtype.is_numeric()]
548 @property
549 def ret_adj(self) -> pl.DataFrame:
550 """Return per-asset volatility-adjusted log returns clipped by cfg.clip.
552 Uses an EWMA volatility estimate with lookback ``cfg.vola`` to
553 standardize log returns for each numeric asset column.
554 """
555 return self.prices.with_columns(
556 [vol_adj(pl.col(asset), vola=self.cfg.vola, clip=self.cfg.clip) for asset in self.assets]
557 )
559 @property
560 def vola(self) -> pl.DataFrame:
561 """Per-asset EWMA volatility of percentage returns.
563 Computes percent changes for each numeric asset column and applies an
564 exponentially weighted standard deviation using the lookback specified
565 by ``cfg.vola``. The result is a DataFrame aligned with ``self.prices``
566 whose numeric columns hold per-asset volatility estimates.
567 """
568 return self.prices.with_columns(
569 pl.col(asset)
570 .pct_change()
571 .ewm_std(com=self.cfg.vola - 1, adjust=True, min_samples=self.cfg.vola)
572 .alias(asset)
573 for asset in self.assets
574 )
576 @property
577 def cor(self) -> dict[object, np.ndarray]:
578 """Compute per-timestamp EWM correlation matrices.
580 Builds volatility-adjusted returns for all assets, computes an
581 exponentially weighted correlation using a pure NumPy implementation
582 (with window ``cfg.corr``), and returns a mapping from each timestamp
583 to the corresponding correlation matrix as a NumPy array.
585 Returns:
586 dict: Mapping ``date -> np.ndarray`` of shape (n_assets, n_assets).
588 Performance:
589 Delegates to :func:`_ewm_corr_numpy`, which is O(T·N²) in both
590 time and memory. The returned dict holds *T* references into the
591 result tensor (one N*N view per date); no extra copies are made.
592 For large *N* or *T*, prefer ``cor_tensor`` to keep a single
593 contiguous array rather than building a Python dict.
594 """
595 index = self.prices["date"]
596 ret_adj_np = self.ret_adj.select(self.assets).to_numpy()
597 tensor = _ewm_corr_numpy(
598 ret_adj_np,
599 com=self.cfg.corr,
600 min_periods=self.cfg.corr,
601 min_corr_denom=self.cfg.min_corr_denom,
602 )
603 return {index[t]: tensor[t] for t in range(len(index))}
605 @property
606 def cor_tensor(self) -> np.ndarray:
607 """Return all correlation matrices stacked as a 3-D tensor.
609 Converts the per-timestamp correlation dict (see :py:attr:`cor`) into a
610 single contiguous NumPy array so that the full history can be saved to
611 a flat ``.npy`` file with :func:`numpy.save` and reloaded with
612 :func:`numpy.load`.
614 Returns:
615 np.ndarray: Array of shape ``(T, N, N)`` where *T* is the number of
616 timestamps and *N* the number of assets. ``tensor[t]`` is the
617 correlation matrix for the *t*-th date (same ordering as
618 ``self.prices["date"]``).
620 Examples:
621 >>> import tempfile, pathlib
622 >>> import numpy as np
623 >>> import polars as pl
624 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine
625 >>> dates = pl.Series("date", list(range(100)))
626 >>> rng0 = np.random.default_rng(0).lognormal(size=100)
627 >>> rng1 = np.random.default_rng(1).lognormal(size=100)
628 >>> prices = pl.DataFrame({"date": dates, "A": rng0, "B": rng1})
629 >>> rng2 = np.random.default_rng(2).normal(size=100)
630 >>> rng3 = np.random.default_rng(3).normal(size=100)
631 >>> mu = pl.DataFrame({"date": dates, "A": rng2, "B": rng3})
632 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
633 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
634 >>> tensor = engine.cor_tensor
635 >>> with tempfile.TemporaryDirectory() as td:
636 ... path = pathlib.Path(td) / "cor.npy"
637 ... np.save(path, tensor)
638 ... loaded = np.load(path)
639 >>> np.testing.assert_array_equal(tensor, loaded)
640 """
641 return np.stack(list(self.cor.values()), axis=0)
643 @property
644 def cash_position(self) -> pl.DataFrame:
645 """Optimize correlation-aware risk positions for each timestamp.
647 Computes EWMA correlations (via ``self.cor``), applies shrinkage toward
648 the identity matrix with intensity ``cfg.shrink``, and solves a
649 normalized linear system A x = mu per timestamp to obtain stable,
650 scale-invariant positions. Non-finite or ill-posed cases yield zero
651 positions for safety.
653 Returns:
654 pl.DataFrame: DataFrame with columns ['date'] + asset names containing
655 the per-timestamp cash positions (risk divided by EWMA volatility).
657 Performance:
658 Dominant cost is ``self.cor`` (O(T·N²) time, O(T·N²) memory — see
659 :func:`_ewm_corr_numpy`). The per-timestamp linear solve via
660 Cholesky / LU decomposition adds O(N³) per row for a total solve
661 cost of O(T·N³). For *N* = 100 and *T* = 2 520 (~10 years daily)
662 the solve contributes approximately 2.52 * 10^9 floating-point
663 operations (upper bound; Cholesky is N³/3 + O(N²)); at *N* = 1 000
664 this rises to roughly 2.52 * 10^12, making the per-row solve the
665 dominant compute bottleneck for large universes.
666 """
667 # compute the correlation matrices
668 cor = self.cor
669 assets = self.assets
671 # Compute risk positions row-by-row using correlation shrinkage (NumPy)
672 prices_num = self.prices.select(assets).to_numpy()
673 returns_num = np.zeros_like(prices_num, dtype=float)
674 returns_num[1:] = prices_num[1:] / prices_num[:-1] - 1.0
676 mu = self.mu.select(assets).to_numpy()
677 risk_pos_np = np.full_like(mu, fill_value=np.nan, dtype=float)
678 cash_pos_np = np.full_like(mu, fill_value=np.nan, dtype=float)
679 vola_np = self.vola.select(assets).to_numpy()
681 profit_variance = self.cfg.profit_variance_init
682 lamb = self.cfg.profit_variance_decay
684 for i, t in enumerate(cor.keys()):
685 # get the mask of finite prices for this timestamp
686 mask = np.isfinite(prices_num[i])
688 # Compute profit contribution using only finite returns and available positions
689 if i > 0:
690 ret_mask = np.isfinite(returns_num[i]) & mask
691 # Profit at time i comes from yesterday's cash position applied to today's returns
692 if ret_mask.any():
693 with np.errstate(invalid="ignore"):
694 cash_pos_np[i - 1] = risk_pos_np[i - 1] / vola_np[i - 1]
695 lhs = np.nan_to_num(cash_pos_np[i - 1, ret_mask], nan=0.0)
696 rhs = np.nan_to_num(returns_num[i, ret_mask], nan=0.0)
697 profit = lhs @ rhs
698 profit_variance = lamb * profit_variance + (1 - lamb) * profit**2
699 # we have no price data at all for this timestamp
700 if not mask.any():
701 continue
703 # get the correlation matrix for this timestamp
704 corr_n = cor[t]
706 # shrink the correlation matrix towards identity
707 matrix = shrink2id(corr_n, lamb=self.cfg.shrink)[np.ix_(mask, mask)]
709 # get the expected-return vector for this timestamp
710 expected_mu = np.nan_to_num(mu[i][mask])
712 # Short-circuit when signal is zero - no position needed, skip norm computation
713 if np.allclose(expected_mu, 0.0):
714 pos = np.zeros_like(expected_mu)
715 else:
716 # Normalize solution; guard against zero/near-zero norm to avoid NaNs.
717 # inv_a_norm returns float(np.nan) when no valid entries exist (never None).
718 denom = inv_a_norm(expected_mu, matrix)
720 if not np.isfinite(denom) or denom <= self.cfg.denom_tol:
721 _logger.warning(
722 "Positions zeroed at t=%s: normalisation denominator is degenerate "
723 "(denom=%s, denom_tol=%s). Check signal magnitude and covariance matrix.",
724 t,
725 denom,
726 self.cfg.denom_tol,
727 extra={
728 "context": {
729 "t": str(t),
730 "denom": denom,
731 "denom_tol": self.cfg.denom_tol,
732 }
733 },
734 )
735 pos = np.zeros_like(expected_mu)
736 else:
737 pos = solve(matrix, expected_mu) / denom
739 risk_pos_np[i, mask] = pos / profit_variance
740 with np.errstate(invalid="ignore"):
741 cash_pos_np[i, mask] = risk_pos_np[i, mask] / vola_np[i, mask]
743 # Build Polars DataFrame for risk positions (numeric columns only)
744 cash_position = self.prices.with_columns(
745 [(pl.lit(cash_pos_np[:, i]).alias(asset)) for i, asset in enumerate(assets)]
746 )
748 return cash_position
750 @property
751 def risk_position(self) -> pl.DataFrame:
752 """Risk positions (before EWMA-volatility scaling) at each timestamp.
754 Derives the un-volatility-scaled position by multiplying the cash
755 position by the per-asset EWMA volatility. Equivalently, this is
756 the quantity solved by the correlation-adjusted linear system before
757 dividing by ``vola``.
759 Relationship to other properties::
761 cash_position = risk_position / vola
762 risk_position = cash_position * vola
764 Returns:
765 pl.DataFrame: DataFrame with columns ``['date'] + assets`` where
766 each value is ``cash_position_i * vola_i`` at the given timestamp.
767 """
768 assets = self.assets
769 cp_np = self.cash_position.select(assets).to_numpy()
770 vola_np = self.vola.select(assets).to_numpy()
771 with np.errstate(invalid="ignore"):
772 risk_pos = cp_np * vola_np
773 return self.prices.with_columns([pl.lit(risk_pos[:, i]).alias(asset) for i, asset in enumerate(assets)])
775 @property
776 def position_leverage(self) -> pl.DataFrame:
777 """L1 norm of cash positions (gross leverage) at each timestamp.
779 Sums the absolute values of all asset cash positions at each row.
780 NaN positions are treated as zero (they contribute nothing to gross
781 leverage).
783 Returns:
784 pl.DataFrame: Two-column DataFrame ``{'date': ..., 'leverage': ...}``
785 where ``leverage`` is the L1 norm of the cash-position vector.
786 """
787 assets = self.assets
788 cp_np = self.cash_position.select(assets).to_numpy()
789 leverage = np.nansum(np.abs(cp_np), axis=1)
790 return pl.DataFrame({"date": self.prices["date"], "leverage": pl.Series(leverage, dtype=pl.Float64)})
792 @property
793 def condition_number(self) -> pl.DataFrame:
794 """Condition number κ of the shrunk correlation matrix C_shrunk at each timestamp.
796 Applies the same shrinkage as ``cash_position`` (``cfg.shrink``) to the
797 per-timestamp EWM correlation matrix and computes ``np.linalg.cond``.
798 Only the sub-matrix corresponding to assets with finite prices at that
799 timestamp is used; rows with no finite prices yield ``NaN``.
801 Returns:
802 pl.DataFrame: Two-column DataFrame ``{'date': ..., 'condition_number': ...}``.
803 """
804 cor = self.cor
805 assets = self.assets
806 prices_num = self.prices.select(assets).to_numpy()
808 kappas: list[float] = []
809 for i, (_t, corr_n) in enumerate(cor.items()):
810 mask = np.isfinite(prices_num[i])
811 if not mask.any():
812 kappas.append(float(np.nan))
813 continue
814 matrix = shrink2id(corr_n, lamb=self.cfg.shrink)[np.ix_(mask, mask)]
815 _v, mat = valid(matrix)
816 if not _v.any():
817 kappas.append(float(np.nan))
818 continue
819 kappas.append(float(np.linalg.cond(mat)))
821 return pl.DataFrame({"date": self.prices["date"], "condition_number": pl.Series(kappas, dtype=pl.Float64)})
823 @property
824 def effective_rank(self) -> pl.DataFrame:
825 r"""Effective rank of the shrunk correlation matrix C_shrunk at each timestamp.
827 Measures the true dimensionality of the portfolio by computing the
828 entropy-based effective rank:
830 .. math::
832 \\text{eff\\_rank} = \\exp\\!\\left(-\\sum_i p_i \\ln p_i\\right),
833 \\quad p_i = \\frac{\\lambda_i}{\\sum_j \\lambda_j}
835 where :math:`\\lambda_i` are the eigenvalues of ``C_shrunk`` (restricted
836 to assets with finite prices at that timestamp). A value equal to the
837 number of assets indicates a perfectly uniform spectrum (identity-like
838 matrix); a value of 1 indicates a rank-1 matrix dominated by one
839 direction.
841 Returns:
842 pl.DataFrame: Two-column DataFrame ``{'date': ..., 'effective_rank': ...}``.
843 """
844 cor = self.cor
845 assets = self.assets
846 prices_num = self.prices.select(assets).to_numpy()
848 ranks: list[float] = []
849 for i, (_t, corr_n) in enumerate(cor.items()):
850 mask = np.isfinite(prices_num[i])
851 if not mask.any():
852 ranks.append(float(np.nan))
853 continue
854 matrix = shrink2id(corr_n, lamb=self.cfg.shrink)[np.ix_(mask, mask)]
855 _v, mat = valid(matrix)
856 if not _v.any():
857 ranks.append(float(np.nan))
858 continue
859 eigvals = np.linalg.eigvalsh(mat)
860 eigvals = np.clip(eigvals, 0.0, None)
861 total = eigvals.sum()
862 if total <= 0.0:
863 ranks.append(float(np.nan))
864 continue
865 p = eigvals / total
866 p_pos = p[p > 0.0]
867 entropy = float(-np.sum(p_pos * np.log(p_pos)))
868 ranks.append(float(np.exp(entropy)))
870 return pl.DataFrame({"date": self.prices["date"], "effective_rank": pl.Series(ranks, dtype=pl.Float64)})
872 @property
873 def solver_residual(self) -> pl.DataFrame:
874 r"""Per-timestamp solver residual ``‖C_shrunk·x - μ‖₂``.
876 After solving the normalised linear system ``C_shrunk · x = μ`` at
877 each timestamp, this property reports the Euclidean residual norm.
878 For a well-posed, well-conditioned system the residual is near machine
879 epsilon; large values flag numerical difficulties (near-singular
880 matrices, extreme condition numbers, or solver fall-back to LU).
882 Returns:
883 pl.DataFrame: Two-column DataFrame ``{'date': ..., 'residual': ...}``.
884 Zero is returned when ``μ`` is the zero vector (no solve is
885 performed). ``NaN`` is returned when no asset has finite prices.
886 """
887 cor = self.cor
888 assets = self.assets
889 mu_np = self.mu.select(assets).to_numpy()
890 prices_num = self.prices.select(assets).to_numpy()
892 residuals: list[float] = []
893 for i, (t, corr_n) in enumerate(cor.items()):
894 mask = np.isfinite(prices_num[i])
895 if not mask.any():
896 residuals.append(float(np.nan))
897 continue
898 matrix = shrink2id(corr_n, lamb=self.cfg.shrink)[np.ix_(mask, mask)]
899 expected_mu = np.nan_to_num(mu_np[i][mask])
900 if np.allclose(expected_mu, 0.0):
901 residuals.append(0.0)
902 continue
903 try:
904 x = solve(matrix, expected_mu)
905 except SingularMatrixError:
906 # The (shrunk) covariance matrix is degenerate — e.g. the EWM
907 # window has not yet accumulated enough data, or the asset
908 # returns are collinear. The linear system has no unique
909 # solution, so the residual norm is undefined and NaN is the
910 # correct sentinel.
911 _logger.warning(
912 "solver_residual: SingularMatrixError at t=%s - covariance matrix is "
913 "degenerate; residual set to NaN.",
914 t,
915 )
916 residuals.append(float(np.nan))
917 continue
918 finite_x = np.isfinite(x)
919 if not finite_x.any():
920 residuals.append(float(np.nan))
921 continue
922 residuals.append(
923 float(np.linalg.norm(matrix[np.ix_(finite_x, finite_x)] @ x[finite_x] - expected_mu[finite_x]))
924 )
926 return pl.DataFrame({"date": self.prices["date"], "residual": pl.Series(residuals, dtype=pl.Float64)})
928 @property
929 def signal_utilisation(self) -> pl.DataFrame:
930 r"""Per-asset signal utilisation: fraction of μ_i surviving the correlation filter.
932 For each asset *i* and timestamp *t*, computes
934 .. math::
936 u_i = \\frac{(C_{\\text{shrunk}}^{-1}\\,\\mu)_i}{\\mu_i}
938 where :math:`C_{\\text{shrunk}}^{-1}\\,\\mu` is the unnormalised solve
939 result. When :math:`C = I` (identity) all assets have utilisation 1.
940 Off-diagonal correlations attenuate some assets (:math:`u_i < 1`) and
941 may amplify negatively correlated ones (:math:`u_i > 1`).
943 A value of ``0.0`` is returned when the entire signal vector
944 :math:`\\mu` is near zero at that timestamp (no solve is performed).
945 ``NaN`` is returned for individual assets where :math:`|\\mu_i|` is
946 below machine-epsilon precision or where prices are unavailable.
948 Returns:
949 pl.DataFrame: DataFrame with columns ``['date'] + assets``.
950 """
951 cor = self.cor
952 assets = self.assets
953 mu_np = self.mu.select(assets).to_numpy()
954 prices_num = self.prices.select(assets).to_numpy()
956 _mu_tol = 1e-14 # treat |μ_i| below this as zero to avoid spurious large ratios
957 n_assets = len(assets)
958 util_np = np.full((self.prices.height, n_assets), np.nan)
960 for i, (t, corr_n) in enumerate(cor.items()):
961 mask = np.isfinite(prices_num[i])
962 if not mask.any():
963 continue
964 matrix = shrink2id(corr_n, lamb=self.cfg.shrink)[np.ix_(mask, mask)]
965 expected_mu = np.nan_to_num(mu_np[i][mask])
966 if np.allclose(expected_mu, 0.0):
967 util_np[i, mask] = 0.0
968 continue
969 try:
970 x = solve(matrix, expected_mu)
971 except SingularMatrixError:
972 # The (shrunk) covariance matrix is degenerate — the portfolio
973 # allocation is undefined at this timestamp. All asset
974 # utilisations remain NaN (the array was pre-filled with NaN),
975 # which correctly signals that no valid ratio could be computed.
976 _logger.warning(
977 "signal_utilisation: SingularMatrixError at t=%s - covariance matrix is "
978 "degenerate; utilisation set to NaN.",
979 t,
980 )
981 continue
982 with np.errstate(divide="ignore", invalid="ignore"):
983 ratio = np.where(np.abs(expected_mu) > _mu_tol, x / expected_mu, np.nan)
984 util_np[i, mask] = ratio
986 return self.prices.with_columns([pl.lit(util_np[:, j]).alias(asset) for j, asset in enumerate(assets)])
988 @property
989 def portfolio(self) -> Portfolio:
990 """Construct a Portfolio from the optimized cash positions.
992 Converts the computed cash positions into a Portfolio using the
993 configured AUM.
995 Returns:
996 Portfolio: Instance built from cash positions with AUM scaling.
997 """
998 cp = self.cash_position
999 assets = [c for c in cp.columns if c != "date" and cp[c].dtype.is_numeric()]
1000 scaled = cp.with_columns(pl.col(a) * self.cfg.position_scale for a in assets)
1001 return Portfolio.from_cash_position(self.prices, scaled, aum=self.cfg.aum)
1003 def sharpe_at_shrink(self, shrink: float) -> float:
1004 r"""Return the annualised portfolio Sharpe ratio for the given shrinkage weight.
1006 Constructs a new :class:`BasanosEngine` with all parameters identical to
1007 ``self`` except that ``cfg.shrink`` is replaced by ``shrink``, then
1008 returns the annualised Sharpe ratio of the resulting portfolio.
1010 This is the canonical single-argument callable required by the benchmarks
1011 specification: ``f(λ) → Sharpe``. Use it to sweep λ across ``[0, 1]``
1012 and measure whether correlation adjustment adds value over the
1013 signal-proportional baseline (λ = 0) or the unregularised limit (λ = 1).
1015 Corner cases:
1016 * **λ = 0** — the shrunk matrix equals the identity, so the
1017 optimiser treats all assets as uncorrelated and positions are
1018 purely signal-proportional (no correlation adjustment).
1019 * **λ = 1** — the raw EWMA correlation matrix is used without
1020 shrinkage.
1022 Args:
1023 shrink: Retention weight λ ∈ [0, 1]. See
1024 :attr:`BasanosConfig.shrink` for full documentation.
1026 Returns:
1027 Annualised Sharpe ratio of the portfolio returns as a ``float``.
1028 Returns ``float("nan")`` when the Sharpe ratio cannot be computed
1029 (e.g. zero-variance returns).
1031 Raises:
1032 ValidationError: When ``shrink`` is outside [0, 1] (delegated to
1033 :class:`BasanosConfig` field validation).
1035 Examples:
1036 >>> import numpy as np
1037 >>> import polars as pl
1038 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine
1039 >>> dates = pl.Series("date", list(range(200)))
1040 >>> rng = np.random.default_rng(0)
1041 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})
1042 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})
1043 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
1044 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
1045 >>> s = engine.sharpe_at_shrink(0.5)
1046 >>> isinstance(s, float)
1047 True
1048 """
1049 new_cfg = self.cfg.model_copy(update={"shrink": shrink})
1050 engine = BasanosEngine(prices=self.prices, mu=self.mu, cfg=new_cfg)
1051 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan"))
1053 @property
1054 def naive_sharpe(self) -> float:
1055 r"""Sharpe ratio of the naïve equal-weight signal (μ = 1 for every asset/timestamp).
1057 Replaces the expected-return signal ``mu`` with a constant matrix of
1058 ones, then runs the optimiser with the current configuration and returns
1059 the annualised Sharpe ratio of the resulting portfolio.
1061 This provides the baseline answer to *"does the signal add value?"*:
1062 a real signal should produce a higher Sharpe than the naïve benchmark.
1063 Combined with :meth:`sharpe_at_shrink`, this yields a three-way
1064 comparison:
1066 +--------------------+----------------------------------------------+
1067 | Benchmark | What it measures |
1068 +====================+==============================================+
1069 | ``naive_sharpe`` | No signal skill; pure correlation routing |
1070 +--------------------+----------------------------------------------+
1071 | ``sharpe_at_shrink(0.0)`` | Signal skill, no correlation adj. |
1072 +--------------------+----------------------------------------------+
1073 | ``sharpe_at_shrink(cfg.shrink)`` | Signal + correlation adj. |
1074 +--------------------+----------------------------------------------+
1076 Returns:
1077 Annualised Sharpe ratio of the equal-weight portfolio as a ``float``.
1078 Returns ``float("nan")`` when the Sharpe ratio cannot be computed.
1080 Examples:
1081 >>> import numpy as np
1082 >>> import polars as pl
1083 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine
1084 >>> dates = pl.Series("date", list(range(200)))
1085 >>> rng = np.random.default_rng(0)
1086 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})
1087 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})
1088 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
1089 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
1090 >>> s = engine.naive_sharpe
1091 >>> isinstance(s, float)
1092 True
1093 """
1094 naive_mu = self.mu.with_columns(pl.lit(1.0).alias(asset) for asset in self.assets)
1095 engine = BasanosEngine(prices=self.prices, mu=naive_mu, cfg=self.cfg)
1096 return float(engine.portfolio.stats.sharpe().get("returns") or float("nan"))
1098 def _ic_series(self, use_rank: bool) -> pl.DataFrame:
1099 """Compute the cross-sectional IC time series.
1101 For each timestamp *t* (from 0 to T-2), correlates the signal vector
1102 ``mu[t, :]`` with the one-period forward return vector
1103 ``prices[t+1, :] / prices[t, :] - 1`` across all assets where both
1104 quantities are finite. When fewer than two valid asset pairs are
1105 available, the IC value is set to ``NaN``.
1107 Args:
1108 use_rank: When ``True`` the Spearman rank correlation is used
1109 (Rank IC); when ``False`` the Pearson correlation is used (IC).
1111 Returns:
1112 pl.DataFrame: Two-column frame with ``date`` (signal date) and
1113 either ``ic`` or ``rank_ic``.
1114 """
1115 assets = self.assets
1116 prices_np = self.prices.select(assets).to_numpy().astype(float)
1117 mu_np = self.mu.select(assets).to_numpy().astype(float)
1118 dates = self.prices["date"].to_list()
1120 col_name = "rank_ic" if use_rank else "ic"
1121 ic_values: list[float] = []
1122 ic_dates = []
1124 for t in range(len(dates) - 1):
1125 fwd_ret = prices_np[t + 1] / prices_np[t] - 1.0
1126 signal = mu_np[t]
1128 # Both signal and forward return must be finite
1129 mask = np.isfinite(signal) & np.isfinite(fwd_ret)
1130 n_valid = int(mask.sum())
1132 if n_valid < 2:
1133 ic_values.append(float("nan"))
1134 elif use_rank:
1135 corr, _ = spearmanr(signal[mask], fwd_ret[mask])
1136 ic_values.append(float(corr))
1137 else:
1138 ic_values.append(float(np.corrcoef(signal[mask], fwd_ret[mask])[0, 1]))
1140 ic_dates.append(dates[t])
1142 return pl.DataFrame({"date": ic_dates, col_name: pl.Series(ic_values, dtype=pl.Float64)})
1144 @property
1145 def ic(self) -> pl.DataFrame:
1146 """Cross-sectional Pearson Information Coefficient (IC) time series.
1148 For each timestamp *t* (excluding the last), computes the Pearson
1149 correlation between the signal ``mu[t, :]`` and the one-period forward
1150 return ``prices[t+1, :] / prices[t, :] - 1`` across all assets where
1151 both quantities are finite.
1153 An IC value close to +1 means the signal ranked assets in the same
1154 order as forward returns; close to -1 means the opposite; near 0 means
1155 no predictive relationship.
1157 Returns:
1158 pl.DataFrame: Frame with columns ``['date', 'ic']``. ``date`` is
1159 the timestamp at which the signal was observed. ``ic`` is a
1160 ``Float64`` series (``NaN`` when fewer than 2 valid asset pairs
1161 are available for a given timestamp).
1163 See Also:
1164 :py:attr:`rank_ic` — Spearman variant, more robust to outliers.
1165 :py:attr:`ic_mean`, :py:attr:`ic_std`, :py:attr:`icir` — summary
1166 statistics.
1167 """
1168 return self._ic_series(use_rank=False)
1170 @property
1171 def rank_ic(self) -> pl.DataFrame:
1172 """Cross-sectional Spearman Rank Information Coefficient time series.
1174 Identical to :py:attr:`ic` but uses the Spearman rank correlation
1175 instead of the Pearson correlation, making it more robust to fat-tailed
1176 return distributions and outliers.
1178 Returns:
1179 pl.DataFrame: Frame with columns ``['date', 'rank_ic']``.
1180 ``rank_ic`` is a ``Float64`` series.
1182 See Also:
1183 :py:attr:`ic` — Pearson variant.
1184 :py:attr:`rank_ic_mean`, :py:attr:`rank_ic_std` — summary
1185 statistics.
1186 """
1187 return self._ic_series(use_rank=True)
1189 @property
1190 def ic_mean(self) -> float:
1191 """Mean of the IC time series, ignoring NaN values.
1193 Returns:
1194 float: Arithmetic mean of all finite IC values, or ``NaN`` if
1195 no finite values exist.
1196 """
1197 arr = self.ic["ic"].drop_nulls().to_numpy()
1198 finite = arr[np.isfinite(arr)]
1199 return float(np.mean(finite)) if len(finite) > 0 else float("nan")
1201 @property
1202 def ic_std(self) -> float:
1203 """Standard deviation of the IC time series, ignoring NaN values.
1205 Uses ``ddof=1`` (sample standard deviation).
1207 Returns:
1208 float: Sample standard deviation of all finite IC values, or
1209 ``NaN`` if fewer than 2 finite values exist.
1210 """
1211 arr = self.ic["ic"].drop_nulls().to_numpy()
1212 finite = arr[np.isfinite(arr)]
1213 return float(np.std(finite, ddof=1)) if len(finite) > 1 else float("nan")
1215 @property
1216 def icir(self) -> float:
1217 """Information Coefficient Information Ratio (ICIR).
1219 Defined as ``IC mean / IC std``. A higher absolute ICIR indicates a
1220 more consistent signal: the mean IC is large relative to its
1221 variability.
1223 Returns:
1224 float: ``ic_mean / ic_std``, or ``NaN`` when ``ic_std`` is zero
1225 or non-finite.
1226 """
1227 mean = self.ic_mean
1228 std = self.ic_std
1229 if not np.isfinite(std) or std == 0.0:
1230 return float("nan")
1231 return float(mean / std)
1233 @property
1234 def rank_ic_mean(self) -> float:
1235 """Mean of the Rank IC time series, ignoring NaN values.
1237 Returns:
1238 float: Arithmetic mean of all finite Rank IC values, or ``NaN``
1239 if no finite values exist.
1240 """
1241 arr = self.rank_ic["rank_ic"].drop_nulls().to_numpy()
1242 finite = arr[np.isfinite(arr)]
1243 return float(np.mean(finite)) if len(finite) > 0 else float("nan")
1245 @property
1246 def rank_ic_std(self) -> float:
1247 """Standard deviation of the Rank IC time series, ignoring NaN values.
1249 Uses ``ddof=1`` (sample standard deviation).
1251 Returns:
1252 float: Sample standard deviation of all finite Rank IC values, or
1253 ``NaN`` if fewer than 2 finite values exist.
1254 """
1255 arr = self.rank_ic["rank_ic"].drop_nulls().to_numpy()
1256 finite = arr[np.isfinite(arr)]
1257 return float(np.std(finite, ddof=1)) if len(finite) > 1 else float("nan")
1259 @property
1260 def config_report(self) -> "ConfigReport":
1261 """Return a :class:`~basanos.math._config_report.ConfigReport` facade for this engine.
1263 Returns a :class:`~basanos.math._config_report.ConfigReport` that
1264 includes the full **lambda-sweep chart** — an interactive plot of the
1265 annualised Sharpe ratio as :attr:`~BasanosConfig.shrink` (λ) is swept
1266 across [0, 1] — in addition to the parameter table, shrinkage-guidance
1267 table, and theory section available from
1268 :attr:`BasanosConfig.report`.
1270 Returns:
1271 basanos.math._config_report.ConfigReport: Report facade with
1272 ``to_html()`` and ``save()`` methods.
1274 Examples:
1275 >>> import numpy as np
1276 >>> import polars as pl
1277 >>> from basanos.math.optimizer import BasanosConfig, BasanosEngine
1278 >>> dates = pl.Series("date", list(range(200)))
1279 >>> rng = np.random.default_rng(0)
1280 >>> prices = pl.DataFrame({"date": dates, "A": rng.lognormal(size=200), "B": rng.lognormal(size=200)})
1281 >>> mu = pl.DataFrame({"date": dates, "A": rng.normal(size=200), "B": rng.normal(size=200)})
1282 >>> cfg = BasanosConfig(vola=10, corr=20, clip=3.0, shrink=0.5, aum=1e6)
1283 >>> engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
1284 >>> report = engine.config_report
1285 >>> html = report.to_html()
1286 >>> "Lambda" in html
1287 True
1288 """
1289 from ._config_report import ConfigReport
1291 return ConfigReport(config=self.cfg, engine=self)