Coverage for src/basanos/math/

1"""Internal signal utilities (private to basanos.math).

3This module contains low-level helpers for building signals and

4transformations. It is considered an internal implementation detail of

5``basanos.math``. Do not import this module directly from outside the

6package; instead import the public symbols from ``basanos.math``.

7"""

9from __future__ import annotations

11import numpy as np

12import polars as pl

15def shrink2id(matrix: np.ndarray, lamb: float = 1.0) -> np.ndarray:

16 r"""Shrink a square matrix linearly towards the identity matrix.

18 This implements the **convex linear shrinkage** estimator

20 .. math::

22 \\hat{\\Sigma}(\\lambda) = \\lambda \\cdot M + (1 - \\lambda) \\cdot I_n

24 where :math:`M` is the sample matrix, :math:`I_n` is the :math:`n \\times n`

25 identity matrix, and :math:`\\lambda \\in [0, 1]` is the *retention weight*

26 (equivalently, ``1 - lambda`` is the *shrinkage intensity*).

28 **Why shrink toward the identity?**

30 Sample covariance/correlation matrices estimated from a finite number of

31 observations :math:`T` are poorly conditioned when the number of assets

32 :math:`n` is large relative to :math:`T`. This is the classical

33 *curse of dimensionality*: extreme eigenvalues of the sample matrix are

34 biased away from their population counterparts (the Marchenko-Pastur law

35 describes the bias as a function of the concentration ratio :math:`n / T`).

36 Shrinkage pulls eigenvalues toward a common target — here the unit sphere —

37 reducing estimation error at the cost of a small bias [1]_.

39 **Relationship to Ledoit-Wolf shrinkage**

41 Ledoit and Wolf (2004) [2]_ derive the *optimal* scalar shrinkage

42 intensity :math:`\\alpha^*` by minimizing the expected Frobenius loss

43 :math:`\\mathbb{E}[\\|\\hat{\\Sigma}(\\alpha) - \\Sigma\\|_F^2]` under a

44 general factor model. Their closed-form estimator is a special case of

45 this function where ``lamb = 1 - alpha*``. The Oracle Approximating

46 Shrinkage (OAS) estimator [3]_ improves on Ledoit-Wolf by accounting for

47 the bias in the analytic formula, often yielding better finite-sample

48 performance.

50 **Basanos usage**

52 In Basanos the target matrix is always the *correlation* identity (diagonal

53 ones, off-diagonal zeros), and ``lamb`` is supplied via

54 :attr:`~basanos.math.BasanosConfig.shrink` as a user-controlled

55 hyperparameter rather than an analytically chosen optimal value. This is

56 appropriate in the context of *regularising a solver* (the system

57 :math:`C x = \\mu` must be well-posed at every timestamp) rather than

58 *estimating a covariance matrix* — here practical stability often matters

59 more than minimum Frobenius loss.

61 **Empirical guidance for choosing** ``lamb`` **(= cfg.shrink)**

63 The table below offers practical starting points for daily financial return

64 data. All recommendations should be validated on out-of-sample data.

66 +--------------------------+---------------------------+------------------+

67 | Regime | Suggested ``lamb`` | Rationale |

68 +==========================+===========================+==================+

69 | Many assets, short | 0.3 - 0.6 | High |

70 | lookback (n/T > 0.5) | | concentration |

71 | | | ratio; sample |

72 | | | matrix is noisy. |

73 +--------------------------+---------------------------+------------------+

74 | Moderate assets, | 0.5 - 0.8 | Balanced |

75 | moderate lookback | | regularisation. |

76 | (n/T ~ 0.1 - 0.5) | | |

77 +--------------------------+---------------------------+------------------+

78 | Few assets, long | 0.7 - 1.0 | Sample matrix |

79 | lookback (n/T < 0.1) | | is reliable; |

80 | | | light shrinkage |

81 | | | for robustness. |

82 +--------------------------+---------------------------+------------------+

84 A simple heuristic: start with ``lamb = 1 - n / (2 * T)`` where

85 ``n`` is the number of assets and ``T`` is the EWMA correlation lookback

86 (``cfg.corr``) — a rough approximation of the Ledoit-Wolf formula —

87 then tune on held-out data.

89 **Sensitivity note**

91 Shrinkage is most sensitive in the range :math:`\\lambda \\in [0.3, 0.8]`.

92 Below ~0.3 the matrix can become nearly singular for small portfolios

93 (``n > 10`` with ``corr < 50``); above ~0.8 the off-diagonal correlations

94 are so heavily damped that the optimizer behaves almost as if assets were

95 uncorrelated.

97 Args:

98 matrix: Square matrix to shrink (typically a correlation matrix).

99 lamb: Retention weight :math:`\\lambda \\in [0, 1]`. ``1.0`` returns

100 the original matrix unchanged; ``0.0`` returns the identity.

101

102 Returns:

103 The shrunk matrix with the same shape as ``matrix``.

104

105 References:

106 .. [1] Stein, C. (1956). *Inadmissibility of the usual estimator for

107 the mean of a multivariate normal distribution.* Proceedings

108 of the Third Berkeley Symposium, 1, 197-206.

109 .. [2] Ledoit, O., & Wolf, M. (2004). *A well-conditioned estimator for

110 large-dimensional covariance matrices.* Journal of Multivariate

111 Analysis, 88(2), 365-411.

112 https://doi.org/10.1016/S0047-259X(03)00096-4

113 .. [3] Chen, Y., Wiesel, A., Eldar, Y. C., & Hero, A. O. (2010).

114 *Shrinkage algorithms for MMSE covariance estimation.* IEEE

115 Transactions on Signal Processing, 58(10), 5016-5029.

116 https://doi.org/10.1109/TSP.2010.2053029

117

118 Examples:

119 >>> import numpy as np

120 >>> # Full retention: original matrix unchanged

121 >>> shrink2id(np.array([[2.0, 1.0], [1.0, 3.0]]), lamb=1.0).tolist()

122 [[2.0, 1.0], [1.0, 3.0]]

123 >>> # Full shrinkage: identity matrix

124 >>> shrink2id(np.array([[2.0, 0.0], [0.0, 2.0]]), lamb=0.0).tolist()

125 [[1.0, 0.0], [0.0, 1.0]]

126 >>> # Half-way: average of matrix and identity

127 >>> m = np.array([[2.0, 1.0], [1.0, 3.0]])

128 >>> shrink2id(m, lamb=0.5).tolist()

129 [[1.5, 0.5], [0.5, 2.0]]

130 """

131 return matrix * lamb + (1 - lamb) * np.eye(N=matrix.shape[0])

132

133

134def vol_adj(x: pl.Expr, vola: int, clip: float, min_samples: int = 1) -> pl.Expr:

135 """Compute clipped, volatility-adjusted log returns per column.

136

137 - ``vola`` controls the EWM std smoothing (converted to alpha internally).

138 - ``clip`` applies symmetric clipping to the standardized returns.

139

140 Args:

141 x: Polars expression (price series) to transform.

142 vola: EWMA lookback (span-equivalent) for std.

143 clip: Symmetric clipping threshold applied after standardization.

144 min_samples: Minimum samples required by EWM to yield non-null values.

145

146 Returns:

147 A Polars expression with standardized and clipped log returns.

148

149 Examples:

150 >>> import polars as pl

151 >>> df = pl.DataFrame({"p": [1.0, 1.1, 1.05, 1.15, 1.2]})

152 >>> result = df.select(vol_adj(pl.col("p"), vola=2, clip=3.0))

153 >>> result.shape

154 (5, 1)

155 """

156 # compute the log returns

157 log_returns = x.log().diff()

158

159 # compute the volatility of the log returns

160 vol = log_returns.ewm_std(com=vola - 1, adjust=True, min_samples=min_samples)

161

162 # compute the volatility-adjusted returns

163 vol_adj_returns = (log_returns / vol).clip(-clip, clip)

164

165 return vol_adj_returns

Coverage for src / basanos / math / _signal.py: 100%

10 statements