Streaming API¶

BasanosStream provides an incremental, one-row-at-a-time interface for computing optimised positions in real time without re-running the full batch engine.

BasanosStream¶

`basanos.math.BasanosStream` ¶

Incremental (streaming) optimiser backed by a single _StreamState.

After warming up on a historical batch via from_warmup, each call to step advances the internal state by exactly one row in O(N^2) time — without revisiting the full warmup history.

Attributes:

Name	Type	Description
`assets`	`list[str]`	Ordered list of asset column names (read-only).

Examples:

>>> import numpy as np
>>> import polars as pl
>>> from datetime import date, timedelta
>>> from basanos.math import BasanosConfig, BasanosStream
>>> rng = np.random.default_rng(0)
>>> warmup_len = 60
>>> dates = pl.date_range(
...     start=date(2024, 1, 1),
...     end=date(2024, 1, 1) + timedelta(days=warmup_len),
...     interval="1d",
...     eager=True,
... )
>>> prices = pl.DataFrame({
...     "date": dates,
...     "A": np.cumprod(1 + rng.normal(0.001, 0.02, warmup_len + 1)) * 100.0,
...     "B": np.cumprod(1 + rng.normal(0.001, 0.02, warmup_len + 1)) * 150.0,
... })
>>> mu = pl.DataFrame({
...     "date": dates,
...     "A": rng.normal(0, 0.5, warmup_len + 1),
...     "B": rng.normal(0, 0.5, warmup_len + 1),
... })
>>> cfg = BasanosConfig(vola=5, corr=10, clip=3.0, shrink=0.5, aum=1e6)
>>> stream = BasanosStream.from_warmup(prices.head(warmup_len), mu.head(warmup_len), cfg)
>>> result = stream.step(
...     prices.select(["A", "B"]).to_numpy()[warmup_len],
...     mu.select(["A", "B"]).to_numpy()[warmup_len],
...     prices["date"][warmup_len],
... )
>>> isinstance(result, StepResult)
True
>>> result.cash_position.shape
(2,)

Source code in src/basanos/math/_stream.py

class BasanosStream:
    """Incremental (streaming) optimiser backed by a single `_StreamState`.

    After warming up on a historical batch via `from_warmup`, each call
    to `step` advances the internal state by exactly one row in
    O(N^2) time — without revisiting the full warmup history.

    Attributes:
        assets: Ordered list of asset column names (read-only).

    Examples:
        >>> import numpy as np
        >>> import polars as pl
        >>> from datetime import date, timedelta
        >>> from basanos.math import BasanosConfig, BasanosStream
        >>> rng = np.random.default_rng(0)
        >>> warmup_len = 60
        >>> dates = pl.date_range(
        ...     start=date(2024, 1, 1),
        ...     end=date(2024, 1, 1) + timedelta(days=warmup_len),
        ...     interval="1d",
        ...     eager=True,
        ... )
        >>> prices = pl.DataFrame({
        ...     "date": dates,
        ...     "A": np.cumprod(1 + rng.normal(0.001, 0.02, warmup_len + 1)) * 100.0,
        ...     "B": np.cumprod(1 + rng.normal(0.001, 0.02, warmup_len + 1)) * 150.0,
        ... })
        >>> mu = pl.DataFrame({
        ...     "date": dates,
        ...     "A": rng.normal(0, 0.5, warmup_len + 1),
        ...     "B": rng.normal(0, 0.5, warmup_len + 1),
        ... })
        >>> cfg = BasanosConfig(vola=5, corr=10, clip=3.0, shrink=0.5, aum=1e6)
        >>> stream = BasanosStream.from_warmup(prices.head(warmup_len), mu.head(warmup_len), cfg)
        >>> result = stream.step(
        ...     prices.select(["A", "B"]).to_numpy()[warmup_len],
        ...     mu.select(["A", "B"]).to_numpy()[warmup_len],
        ...     prices["date"][warmup_len],
        ... )
        >>> isinstance(result, StepResult)
        True
        >>> result.cash_position.shape
        (2,)
    """

    _cfg: BasanosConfig
    _assets: list[str]
    _state: _StreamState

    def __init__(self, cfg: BasanosConfig, assets: list[str], state: _StreamState) -> None:
        """Initialise from an explicit config, asset list, and state container."""
        object.__setattr__(self, "_cfg", cfg)
        object.__setattr__(self, "_assets", assets)
        object.__setattr__(self, "_state", state)

    def __setattr__(self, name: str, value: object) -> None:
        """Prevent accidental attribute mutation — BasanosStream is immutable."""
        raise dataclasses.FrozenInstanceError(f"{type(self).__name__}.{name}")

    @property
    def assets(self) -> list[str]:
        """Ordered list of asset column names."""
        return self._assets

    # ------------------------------------------------------------------
    # from_warmup
    # ------------------------------------------------------------------

    @classmethod
    def from_warmup(
        cls,
        prices: pl.DataFrame,
        mu: pl.DataFrame,
        cfg: BasanosConfig,
    ) -> BasanosStream:
        """Build a `BasanosStream` from a historical warmup batch.

        Runs `BasanosEngine` on the full warmup batch
        exactly once and extracts the minimal IIR-filter state required for
        subsequent `step` calls.  After this call, each `step`
        advances the optimiser in O(N^2) time without touching the warmup
        data again.

        Parameters
        ----------
        prices:
            Historical price DataFrame.  Must contain a ``'date'`` column and
            at least one numeric asset column with strictly positive,
            non-monotonic values.
        mu:
            Expected-return signal DataFrame aligned row-by-row with
            ``prices``.
        cfg:
            Engine configuration.  Both `EwmaShrinkConfig`
            and `SlidingWindowConfig` are supported.

        Returns:
        -------
        BasanosStream
            A stream instance whose `step` method is ready to accept the
            row immediately following the last warmup row.

        Notes:
        ------
        **Short-warmup behaviour with** ``SlidingWindowConfig``: when
        ``len(prices) < cfg.covariance_config.window``, the internal rolling
        buffer (``sw_ret_buf``) is NaN-padded for the missing prefix rows.
        `step` returns ``StepResult(status="warmup")`` for each of the
        first ``window - len(prices)`` calls, exactly matching the EWM warmup
        semantics.  By the time `step` returns the first non-warmup
        result the buffer contains only real data — no NaN-padded rows remain.

        Raises:
        ------
        MissingDateColumnError
            If ``'date'`` is absent from ``prices``.
        """
        # 1. Validate -------------------------------------------------------
        if "date" not in prices.columns:
            raise MissingDateColumnError("prices")

        # 2. Build the engine on the full warmup batch ----------------------
        # Import here to avoid a circular dependency at module level.
        from .optimizer import BasanosEngine

        engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
        assets = engine.assets
        n_assets = len(assets)
        n_rows = prices.height
        prices_np = prices.select(assets).to_numpy()  # (n_rows, n_assets)

        # 3. Extract mode-specific state from WarmupState --------------------
        ws = engine.warmup_state()
        if isinstance(cfg.covariance_config, EwmaShrinkConfig):
            # EWM: seed the per-step lfilter from the IIR state captured
            # during the single batch pass in warmup_state().
            iir = cast("_EwmCorrState", ws.corr_iir_state)
            corr_zi_x = iir.corr_zi_x
            corr_zi_x2 = iir.corr_zi_x2
            corr_zi_xy = iir.corr_zi_xy
            corr_zi_w = iir.corr_zi_w
            corr_count: np.ndarray = iir.count
            sw_ret_buf: np.ndarray | None = None
        else:
            # SW: carry the last W vol-adjusted returns as a rolling buffer.
            # The IIR fields are initialised to zeros and left unused.
            sw_config = cast(SlidingWindowConfig, cfg.covariance_config)
            win_w = sw_config.window
            ret_adj_np = engine.ret_adj.select(assets).to_numpy()  # (n_rows, N)
            if n_rows >= win_w:
                sw_ret_buf = ret_adj_np[-win_w:].copy()
            else:
                sw_ret_buf = np.full((win_w, n_assets), np.nan)
                sw_ret_buf[-n_rows:] = ret_adj_np
            corr_zi_x = np.zeros((1, n_assets, n_assets))
            corr_zi_x2 = np.zeros((1, n_assets, n_assets))
            corr_zi_xy = np.zeros((1, n_assets, n_assets))
            corr_zi_w = np.zeros((1, n_assets, n_assets))
            corr_count = np.zeros((n_assets, n_assets), dtype=np.int64)

        # 4. Derive EWMA volatility accumulators (vectorised) ---------------
        # Both log-return (for vol_adj) and pct-return (for vola) use the
        # same beta = (vola-1)/vola.  NaN observations (leading NaN at row 0
        # from diff/pct_change) are skipped — the filter input is 0 for NaN
        # rows and the weight accumulator (s_w) only increments for finite
        # observations, matching Polars' effective behaviour for a
        # leading-NaN series.
        #
        # Delegate to the shared helper _ewm_vol_accumulators_from_batch so
        # that the batch and incremental recurrences share a single definition.
        beta_vola: float = (cfg.vola - 1) / cfg.vola
        beta_vola_sq: float = beta_vola**2

        log_ret = np.full((n_rows, n_assets), np.nan, dtype=float)
        pct_ret = np.full((n_rows, n_assets), np.nan, dtype=float)
        if n_rows > 1:
            with np.errstate(divide="ignore", invalid="ignore"):
                log_ret[1:] = np.log(prices_np[1:] / prices_np[:-1])
                pct_ret[1:] = prices_np[1:] / prices_np[:-1] - 1.0

        vola_s_x, vola_s_x2, vola_s_w, vola_s_w2, vola_count = _ewm_vol_accumulators_from_batch(
            log_ret, beta_vola, beta_vola_sq
        )
        pct_s_x, pct_s_x2, pct_s_w, pct_s_w2, pct_count = _ewm_vol_accumulators_from_batch(
            pct_ret, beta_vola, beta_vola_sq
        )

        # 5. Extract prev_cash_pos from WarmupState --------------------------
        prev_cash_pos: np.ndarray = ws.prev_cash_pos
        prev_price: np.ndarray = prices_np[-1].copy()

        # 6. Construct _StreamState and return ------------------------------
        state = _StreamState(
            corr_zi_x=corr_zi_x,
            corr_zi_x2=corr_zi_x2,
            corr_zi_xy=corr_zi_xy,
            corr_zi_w=corr_zi_w,
            corr_count=corr_count,
            vola_s_x=vola_s_x,
            vola_s_x2=vola_s_x2,
            vola_s_w=vola_s_w,
            vola_s_w2=vola_s_w2,
            vola_count=vola_count,
            pct_s_x=pct_s_x,
            pct_s_x2=pct_s_x2,
            pct_s_w=pct_s_w,
            pct_s_w2=pct_s_w2,
            pct_count=pct_count,
            prev_price=prev_price,
            prev_cash_pos=prev_cash_pos,
            step_count=n_rows,
            sw_ret_buf=sw_ret_buf,
        )
        return cls(cfg=cfg, assets=assets, state=state)

    # ------------------------------------------------------------------
    # step
    # ------------------------------------------------------------------

    @staticmethod
    def _warmup_threshold(cfg: BasanosConfig) -> int:
        """Return the step count at which warmup ends for the configured mode."""
        if isinstance(cfg.covariance_config, SlidingWindowConfig):
            return cfg.covariance_config.window
        return cfg.corr

    @staticmethod
    def _persist_state(
        state: _StreamState,
        *,
        corr_zi_x: np.ndarray,
        corr_zi_x2: np.ndarray,
        corr_zi_xy: np.ndarray,
        corr_zi_w: np.ndarray,
        corr_count: np.ndarray,
        vola_s_x: np.ndarray,
        vola_s_x2: np.ndarray,
        vola_s_w: np.ndarray,
        vola_s_w2: np.ndarray,
        vola_count: np.ndarray,
        pct_s_x: np.ndarray,
        pct_s_x2: np.ndarray,
        pct_s_w: np.ndarray,
        pct_s_w2: np.ndarray,
        pct_count: np.ndarray,
        new_price: np.ndarray,
        new_cash_pos: np.ndarray | None = None,
    ) -> None:
        """Persist accumulators, last-seen vectors, and increment step count."""
        state.corr_zi_x = corr_zi_x
        state.corr_zi_x2 = corr_zi_x2
        state.corr_zi_xy = corr_zi_xy
        state.corr_zi_w = corr_zi_w
        state.corr_count = corr_count
        state.vola_s_x = vola_s_x
        state.vola_s_x2 = vola_s_x2
        state.vola_s_w = vola_s_w
        state.vola_s_w2 = vola_s_w2
        state.vola_count = vola_count
        state.pct_s_x = pct_s_x
        state.pct_s_x2 = pct_s_x2
        state.pct_s_w = pct_s_w
        state.pct_s_w2 = pct_s_w2
        state.pct_count = pct_count
        state.prev_price = new_price.copy()
        if new_cash_pos is not None:
            state.prev_cash_pos = new_cash_pos.copy()
        state.step_count += 1

    @staticmethod
    def _warmup_result(n_assets: int, date: Any) -> StepResult:
        """Build a standard warmup ``StepResult`` payload."""
        return StepResult(
            date=date,
            cash_position=np.full(n_assets, np.nan),
            status=SolveStatus.WARMUP,
            vola=np.full(n_assets, np.nan),
        )

    def _solve_sliding_window_position(
        self,
        *,
        cfg: BasanosConfig,
        state: _StreamState,
        mask: np.ndarray,
        new_m: np.ndarray,
        vola_vec: np.ndarray,
        n_assets: int,
        date: Any,
    ) -> tuple[np.ndarray, SolveStatus]:
        """Solve one step in SlidingWindow mode and return cash position + status."""
        from ..exceptions import SingularMatrixError

        new_cash_pos = np.full(n_assets, np.nan, dtype=float)
        status = SolveStatus.DEGENERATE
        sw_config = cast(SlidingWindowConfig, cfg.covariance_config)
        if not mask.any():
            return new_cash_pos, status

        win_w = sw_config.window
        win_k = sw_config.n_factors
        window_ret = np.where(
            np.isfinite(state.sw_ret_buf[:, mask]),  # type: ignore[index]
            state.sw_ret_buf[:, mask],  # type: ignore[index]
            0.0,
        )
        n_sub = int(mask.sum())
        k_eff = min(win_k, win_w, n_sub)
        if sw_config.max_components is not None:
            k_eff = min(k_eff, sw_config.max_components)
        try:
            fm = FactorModel.from_returns(window_ret, k=k_eff)
        except (np.linalg.LinAlgError, ValueError) as exc:
            _logger.debug("Sliding window SVD failed at date=%s: %s", date, exc)
            new_cash_pos[mask] = 0.0
            return new_cash_pos, status

        expected_mu = np.nan_to_num(new_m[mask])
        if np.allclose(expected_mu, 0.0):
            new_cash_pos[mask] = 0.0
            return new_cash_pos, SolveStatus.ZERO_SIGNAL

        try:
            x = fm.solve(expected_mu)
            denom_val = float(np.sqrt(max(0.0, float(np.dot(expected_mu, x)))))
        except (SingularMatrixError, np.linalg.LinAlgError) as exc:
            _logger.warning("Woodbury solve failed at date=%s: %s", date, exc)
            new_cash_pos[mask] = 0.0
            return new_cash_pos, status

        if not np.isfinite(denom_val) or denom_val <= cfg.denom_tol:
            _logger.warning(
                "Positions zeroed at date=%s (sliding_window): normalisation "
                "denominator degenerate (denom=%s, denom_tol=%s).",
                date,
                denom_val,
                cfg.denom_tol,
            )
            new_cash_pos[mask] = 0.0
            return new_cash_pos, status

        risk_pos = x / denom_val
        vola_sub = vola_vec[mask]
        with np.errstate(invalid="ignore"):
            new_cash_pos[mask] = risk_pos / vola_sub
        return new_cash_pos, SolveStatus.VALID

    @staticmethod
    def _solve_ewma_position(
        *,
        cfg: BasanosConfig,
        state: _StreamState,
        mask: np.ndarray,
        new_m: np.ndarray,
        vola_vec: np.ndarray,
        y_x: np.ndarray,
        y_x2: np.ndarray,
        y_xy: np.ndarray,
        y_w: np.ndarray,
        corr_count: np.ndarray,
        n_assets: int,
        date: Any,
    ) -> tuple[np.ndarray, SolveStatus]:
        """Solve one step in EWMA mode and return cash position + status."""
        new_cash_pos = np.full(n_assets, np.nan, dtype=float)
        corr = _corr_from_ewm_accumulators(
            y_x[0],
            y_x2[0],
            y_xy[0],
            y_w[0],
            corr_count,
            min_periods=cfg.corr,
            min_corr_denom=cfg.min_corr_denom,
        )
        matrix = shrink2id(corr, lamb=cfg.shrink)
        expected_mu, early = _SolveMixin._row_early_check(state.step_count, date, mask, new_m)
        if early is not None:
            _, _, _, pos, status = early
            new_cash_pos[mask] = pos
            return new_cash_pos, status

        corr_sub = matrix[np.ix_(mask, mask)]
        _, _, _, pos, status = _SolveMixin._compute_position(
            state.step_count, date, mask, expected_mu, MatrixBundle(matrix=corr_sub), cfg.denom_tol
        )
        if status == SolveStatus.VALID:
            new_cash_pos[mask] = _SolveMixin._scale_to_cash(cast(np.ndarray, pos), vola_vec[mask])
        else:
            new_cash_pos[mask] = pos
        return new_cash_pos, status

    def step(
        self,
        new_prices: np.ndarray | dict[str, float],
        new_mu: np.ndarray | dict[str, float],
        date: Any = None,
    ) -> StepResult:
        """Advance the stream by one row and return the new optimised position.

        Parameters
        ----------
        new_prices:
            Per-asset prices for the new timestep.  Either a numpy array of
            shape ``(N,)`` (assets ordered as in `assets`) or a dict
            mapping asset names to price values.
        new_mu:
            Per-asset expected-return signals, same format as ``new_prices``.
        date:
            Timestamp for this step (stored in `date`
            verbatim; not used in any computation).

        Returns:
        -------
        StepResult
            Frozen dataclass with ``cash_position``, ``vola``, ``status``, and
            ``date`` for this timestep.
        """
        cfg = self._cfg
        assets = self._assets
        state = self._state
        n_assets = len(assets)

        # ── Check if still in the warmup period ──────────────────────────────
        # step_count is initialised to n_rows in from_warmup.
        #
        # EwmaShrinkConfig: in_warmup is True for the first (cfg.corr - n_rows)
        # calls when the warmup batch was shorter than cfg.corr (not enough rows
        # to populate the EWM correlation matrix).
        #
        # SlidingWindowConfig: in_warmup is True for the first (window - n_rows)
        # calls when the warmup batch was shorter than the window.  During this
        # period sw_ret_buf still contains NaN-padded prefix rows; each step
        # shifts one NaN out and appends a real row, so the buffer is fully
        # populated with real data exactly when in_warmup becomes False.
        #
        # In both modes all accumulators are still updated during warmup so that
        # the state is ready the moment the warmup period ends.
        _warmup_thresh = self._warmup_threshold(cfg)
        in_warmup: bool = state.step_count < _warmup_thresh

        # ── Resolve inputs to (N,) float64 arrays ──────────────────────────
        new_p = _resolve_step_vector(new_prices, assets, n_assets, "new_prices")
        new_m = _resolve_step_vector(new_mu, assets, n_assets, "new_mu")

        prev_p = state.prev_price
        beta_vola: float = (cfg.vola - 1) / cfg.vola
        beta_vola_sq: float = beta_vola**2
        beta_corr: float = cfg.corr / (1.0 + cfg.corr)

        # ── Compute new log-returns and pct-returns ─────────────────────────
        with np.errstate(divide="ignore", invalid="ignore"):
            ratio = np.where(
                np.isfinite(new_p) & np.isfinite(prev_p) & (prev_p > 0),
                new_p / prev_p,
                np.nan,
            )
            log_ret = np.log(ratio)
            pct_ret = ratio - 1.0

        # ── Update log-return EWMA accumulators ────────────────────────────
        fin_log = np.isfinite(log_ret)
        vola_s_x = beta_vola * state.vola_s_x + np.where(fin_log, log_ret, 0.0)
        vola_s_x2 = beta_vola * state.vola_s_x2 + np.where(fin_log, log_ret**2, 0.0)
        vola_s_w = beta_vola * state.vola_s_w + fin_log.astype(float)
        vola_s_w2 = beta_vola_sq * state.vola_s_w2 + fin_log.astype(float)
        vola_count = state.vola_count + fin_log.astype(int)

        # ── Update pct-return EWMA accumulators ────────────────────────────
        fin_pct = np.isfinite(pct_ret)
        pct_s_x = beta_vola * state.pct_s_x + np.where(fin_pct, pct_ret, 0.0)
        pct_s_x2 = beta_vola * state.pct_s_x2 + np.where(fin_pct, pct_ret**2, 0.0)
        pct_s_w = beta_vola * state.pct_s_w + fin_pct.astype(float)
        pct_s_w2 = beta_vola_sq * state.pct_s_w2 + fin_pct.astype(float)
        pct_count = state.pct_count + fin_pct.astype(int)

        # ── Compute vol-adjusted return (for the correlation IIR input) ─────
        log_vol = _ewm_std_from_state(vola_s_x, vola_s_x2, vola_s_w, vola_s_w2, vola_count, min_samples=1)
        # Divide; std == 0 yields ±inf → clipped to ±cfg.clip (matches Polars)
        with np.errstate(divide="ignore", invalid="ignore"):
            vol_adj_val = np.where(
                fin_log,
                np.clip(log_ret / log_vol, -cfg.clip, cfg.clip),
                np.nan,
            )

        # ── Mode-specific correlation state update ───────────────────────────
        if isinstance(cfg.covariance_config, SlidingWindowConfig):
            # SW: shift the rolling window buffer in-place and append this row.
            # The corr_zi_* fields are unused; alias them to their old values so
            # the early-return and persist blocks below can reference them safely.
            buf = state.sw_ret_buf  # (W, N), already owned by state
            buf[:-1] = buf[1:]  # type: ignore[index]
            buf[-1] = vol_adj_val  # type: ignore[index]
            corr_zi_x = state.corr_zi_x
            corr_zi_x2 = state.corr_zi_x2
            corr_zi_xy = state.corr_zi_xy
            corr_zi_w = state.corr_zi_w
            corr_count = state.corr_count
        else:
            # EWM: Update IIR filter state for EWM correlation
            fin_va = np.isfinite(vol_adj_val)
            va_f = np.where(fin_va, vol_adj_val, 0.0)
            joint_fin = fin_va[:, np.newaxis] & fin_va[np.newaxis, :]  # (N, N)

            new_v_x = (va_f[:, np.newaxis] * joint_fin)[np.newaxis]  # (1, N, N)
            new_v_x2 = ((va_f**2)[:, np.newaxis] * joint_fin)[np.newaxis]  # (1, N, N)
            new_v_xy = (va_f[:, np.newaxis] * va_f[np.newaxis, :])[np.newaxis]  # (1, N, N)
            new_v_w = joint_fin.astype(np.float64)[np.newaxis]  # (1, N, N)

            filt_a_corr = np.array([1.0, -beta_corr])
            # y_x[0] is the current-step EWM state (filter output); corr_zi_x is
            # the new filter memory (zf = beta * y[0]) passed as zi next step.
            y_x, corr_zi_x = lfilter([1.0], filt_a_corr, new_v_x, axis=0, zi=state.corr_zi_x)
            y_x2, corr_zi_x2 = lfilter([1.0], filt_a_corr, new_v_x2, axis=0, zi=state.corr_zi_x2)
            y_xy, corr_zi_xy = lfilter([1.0], filt_a_corr, new_v_xy, axis=0, zi=state.corr_zi_xy)
            y_w, corr_zi_w = lfilter([1.0], filt_a_corr, new_v_w, axis=0, zi=state.corr_zi_w)
            corr_count = state.corr_count + joint_fin.astype(np.int64)

        # ── Early return during EWM warmup period ───────────────────────────
        # All accumulators are already updated above; skip the O(N²) matrix
        # reconstruction and O(N³) Cholesky solve which are wasteful during
        # warmup — the computed positions would be discarded anyway.
        if in_warmup:
            self._persist_state(
                state,
                corr_zi_x=corr_zi_x,
                corr_zi_x2=corr_zi_x2,
                corr_zi_xy=corr_zi_xy,
                corr_zi_w=corr_zi_w,
                corr_count=corr_count,
                vola_s_x=vola_s_x,
                vola_s_x2=vola_s_x2,
                vola_s_w=vola_s_w,
                vola_s_w2=vola_s_w2,
                vola_count=vola_count,
                pct_s_x=pct_s_x,
                pct_s_x2=pct_s_x2,
                pct_s_w=pct_s_w,
                pct_s_w2=pct_s_w2,
                pct_count=pct_count,
                new_price=new_p,
            )
            return self._warmup_result(n_assets, date)

        # ── Compute EWMA volatility (pct-return std) — shared ───────────────
        vola_vec = _ewm_std_from_state(pct_s_x, pct_s_x2, pct_s_w, pct_s_w2, pct_count, min_samples=cfg.vola)

        # ── Solve for position ───────────────────────────────────────────────
        mask = np.isfinite(new_p)
        if isinstance(cfg.covariance_config, SlidingWindowConfig):
            new_cash_pos, status = self._solve_sliding_window_position(
                cfg=cfg,
                state=state,
                mask=mask,
                new_m=new_m,
                vola_vec=vola_vec,
                n_assets=n_assets,
                date=date,
            )
        else:
            new_cash_pos, status = self._solve_ewma_position(
                cfg=cfg,
                state=state,
                mask=mask,
                new_m=new_m,
                vola_vec=vola_vec,
                y_x=y_x,
                y_x2=y_x2,
                y_xy=y_xy,
                y_w=y_w,
                corr_count=corr_count,
                n_assets=n_assets,
                date=date,
            )

        # ── Apply turnover constraint ─────────────────────────────────────────
        if cfg.max_turnover is not None and status == SolveStatus.VALID:
            new_cash_pos[mask] = _SolveMixin._apply_turnover_constraint(
                new_cash_pos[mask],
                state.prev_cash_pos[mask],
                cfg.max_turnover,
            )

        # ── Persist updated state ───────────────────────────────────────────
        self._persist_state(
            state,
            corr_zi_x=corr_zi_x,
            corr_zi_x2=corr_zi_x2,
            corr_zi_xy=corr_zi_xy,
            corr_zi_w=corr_zi_w,
            corr_count=corr_count,
            vola_s_x=vola_s_x,
            vola_s_x2=vola_s_x2,
            vola_s_w=vola_s_w,
            vola_s_w2=vola_s_w2,
            vola_count=vola_count,
            pct_s_x=pct_s_x,
            pct_s_x2=pct_s_x2,
            pct_s_w=pct_s_w,
            pct_s_w2=pct_s_w2,
            pct_count=pct_count,
            new_price=new_p,
            new_cash_pos=new_cash_pos,
        )

        return StepResult(
            date=date,
            cash_position=new_cash_pos,
            status=status,
            vola=vola_vec,
        )

    def save(self, path: str | os.PathLike[str]) -> None:
        """Serialise the stream to a ``.npz`` archive at *path*.

        All `_StreamState` arrays, the configuration, and the asset
        list are written in a single `savez` call.  A stream
        restored via `load` produces bit-for-bit identical
        `step` output.

        Args:
            path: Destination file path.  `savez` appends
                ``.npz`` automatically when the suffix is absent.

        Examples:
            >>> import tempfile, pathlib, numpy as np
            >>> import polars as pl
            >>> from datetime import date, timedelta
            >>> from basanos.math import BasanosConfig, BasanosStream
            >>> rng = np.random.default_rng(0)
            >>> n = 60
            >>> end = date(2024, 1, 1) + timedelta(days=n - 1)
            >>> dates = pl.date_range(
            ...     date(2024, 1, 1), end, interval="1d", eager=True
            ... )
            >>> prices = pl.DataFrame({
            ...     "date": dates,
            ...     "A": np.cumprod(1 + rng.normal(0.001, 0.02, n)) * 100.0,
            ...     "B": np.cumprod(1 + rng.normal(0.001, 0.02, n)) * 150.0,
            ... })
            >>> mu = pl.DataFrame({
            ...     "date": dates,
            ...     "A": rng.normal(0, 0.5, n),
            ...     "B": rng.normal(0, 0.5, n),
            ... })
            >>> cfg = BasanosConfig(vola=5, corr=10, clip=3.0, shrink=0.5, aum=1e6)
            >>> stream = BasanosStream.from_warmup(prices, mu, cfg)
            >>> with tempfile.TemporaryDirectory() as tmp:
            ...     p = pathlib.Path(tmp) / "stream.npz"
            ...     stream.save(p)
            ...     restored = BasanosStream.load(p)
            ...     restored.assets == stream.assets
            True
        """
        state = self._state
        # Build the per-field dict automatically from _StreamState so that any
        # new field added to the dataclass is included without manual updates.
        state_arrays: dict[str, Any] = {}
        for field in dataclasses.fields(_StreamState):
            value = getattr(state, field.name)
            if field.name == "sw_ret_buf":
                # Sentinel: use an empty (0, 0) array to represent None so the
                # key is always present in the archive and load() can detect it.
                state_arrays[field.name] = value if value is not None else np.empty((0, 0), dtype=float)
            elif field.name == "step_count":
                state_arrays[field.name] = np.array(value)
            else:
                state_arrays[field.name] = value
        np.savez(
            path,
            format_version=np.array(_SAVE_FORMAT_VERSION),
            cfg_json=np.array(self._cfg.model_dump_json()),
            assets=np.array(self._assets),
            **state_arrays,
        )

    @classmethod
    def load(cls, path: str | os.PathLike[str]) -> BasanosStream:
        """Restore a stream previously saved with `save`.

        Args:
            path: Path to a ``.npz`` archive written by `save`.

        Returns:
            A `BasanosStream` whose `step` output is
            bit-for-bit identical to the original stream at the time
            `save` was called.

        Examples:
            >>> import tempfile, pathlib, numpy as np
            >>> import polars as pl
            >>> from datetime import date, timedelta
            >>> from basanos.math import BasanosConfig, BasanosStream
            >>> rng = np.random.default_rng(1)
            >>> n = 60
            >>> end = date(2024, 1, 1) + timedelta(days=n - 1)
            >>> dates = pl.date_range(
            ...     date(2024, 1, 1), end, interval="1d", eager=True
            ... )
            >>> prices = pl.DataFrame({
            ...     "date": dates,
            ...     "A": np.cumprod(1 + rng.normal(0.001, 0.02, n)) * 100.0,
            ...     "B": np.cumprod(1 + rng.normal(0.001, 0.02, n)) * 150.0,
            ... })
            >>> mu = pl.DataFrame({
            ...     "date": dates,
            ...     "A": rng.normal(0, 0.5, n),
            ...     "B": rng.normal(0, 0.5, n),
            ... })
            >>> cfg = BasanosConfig(vola=5, corr=10, clip=3.0, shrink=0.5, aum=1e6)
            >>> stream = BasanosStream.from_warmup(prices, mu, cfg)
            >>> with tempfile.TemporaryDirectory() as tmp:
            ...     p = pathlib.Path(tmp) / "stream.npz"
            ...     stream.save(p)
            ...     restored = BasanosStream.load(p)
            ...     restored.assets == stream.assets
            True
        """
        with np.load(path, allow_pickle=False) as data:
            if "format_version" not in data:
                raise ValueError(  # noqa: TRY003
                    "Stream file is missing a format version tag. "
                    "It was written with an incompatible version of BasanosStream. "
                    "Re-generate it via BasanosStream.from_warmup()."
                )
            found = int(data["format_version"])
            if found != _SAVE_FORMAT_VERSION:
                raise ValueError(  # noqa: TRY003
                    f"Stream file was written with format version {found}, "
                    f"but the current version is {_SAVE_FORMAT_VERSION}. "
                    "Re-generate it via BasanosStream.from_warmup()."
                )
            # Validate that every required key is present.  This catches archives
            # that were produced by an older codebase missing a newly added field,
            # or archives that have been manually edited, with a descriptive error
            # instead of a bare KeyError.
            archive_keys = frozenset(data.files)
            missing = _REQUIRED_KEYS - archive_keys
            if missing:
                raise StreamStateCorruptError(missing)
            cfg = BasanosConfig.model_validate_json(data["cfg_json"].item())
            assets: list[str] = list(data["assets"])
            state_kwargs: dict[str, Any] = {}
            for field in dataclasses.fields(_StreamState):
                raw = data[field.name]
                if field.name == "sw_ret_buf":
                    state_kwargs[field.name] = raw if raw.size > 0 else None
                elif field.name == "step_count":
                    state_kwargs[field.name] = int(raw)
                else:
                    state_kwargs[field.name] = raw
        state = _StreamState(**state_kwargs)
        return cls(cfg=cfg, assets=assets, state=state)

`assets` `property` ¶

Ordered list of asset column names.

`init(cfg, assets, state)` ¶

Initialise from an explicit config, asset list, and state container.

Source code in src/basanos/math/_stream.py

def __init__(self, cfg: BasanosConfig, assets: list[str], state: _StreamState) -> None:
    """Initialise from an explicit config, asset list, and state container."""
    object.__setattr__(self, "_cfg", cfg)
    object.__setattr__(self, "_assets", assets)
    object.__setattr__(self, "_state", state)

`setattr(name, value)` ¶

Prevent accidental attribute mutation — BasanosStream is immutable.

Source code in src/basanos/math/_stream.py

def __setattr__(self, name: str, value: object) -> None:
    """Prevent accidental attribute mutation — BasanosStream is immutable."""
    raise dataclasses.FrozenInstanceError(f"{type(self).__name__}.{name}")

`from_warmup(prices, mu, cfg)` `classmethod` ¶

Build a BasanosStream from a historical warmup batch.

Runs BasanosEngine on the full warmup batch exactly once and extracts the minimal IIR-filter state required for subsequent step calls. After this call, each step advances the optimiser in O(N^2) time without touching the warmup data again.

Parameters¶

prices: Historical price DataFrame. Must contain a 'date' column and at least one numeric asset column with strictly positive, non-monotonic values. mu: Expected-return signal DataFrame aligned row-by-row with prices. cfg: Engine configuration. Both EwmaShrinkConfig and SlidingWindowConfig are supported.

Returns:¶

BasanosStream A stream instance whose step method is ready to accept the row immediately following the last warmup row.

Notes:¶

Short-warmup behaviour with SlidingWindowConfig: when len(prices) < cfg.covariance_config.window, the internal rolling buffer (sw_ret_buf) is NaN-padded for the missing prefix rows. step returns StepResult(status="warmup") for each of the first window - len(prices) calls, exactly matching the EWM warmup semantics. By the time step returns the first non-warmup result the buffer contains only real data — no NaN-padded rows remain.

Raises:¶

MissingDateColumnError If 'date' is absent from prices.

Source code in src/basanos/math/_stream.py

@classmethod
def from_warmup(
    cls,
    prices: pl.DataFrame,
    mu: pl.DataFrame,
    cfg: BasanosConfig,
) -> BasanosStream:
    """Build a `BasanosStream` from a historical warmup batch.

    Runs `BasanosEngine` on the full warmup batch
    exactly once and extracts the minimal IIR-filter state required for
    subsequent `step` calls.  After this call, each `step`
    advances the optimiser in O(N^2) time without touching the warmup
    data again.

    Parameters
    ----------
    prices:
        Historical price DataFrame.  Must contain a ``'date'`` column and
        at least one numeric asset column with strictly positive,
        non-monotonic values.
    mu:
        Expected-return signal DataFrame aligned row-by-row with
        ``prices``.
    cfg:
        Engine configuration.  Both `EwmaShrinkConfig`
        and `SlidingWindowConfig` are supported.

    Returns:
    -------
    BasanosStream
        A stream instance whose `step` method is ready to accept the
        row immediately following the last warmup row.

    Notes:
    ------
    **Short-warmup behaviour with** ``SlidingWindowConfig``: when
    ``len(prices) < cfg.covariance_config.window``, the internal rolling
    buffer (``sw_ret_buf``) is NaN-padded for the missing prefix rows.
    `step` returns ``StepResult(status="warmup")`` for each of the
    first ``window - len(prices)`` calls, exactly matching the EWM warmup
    semantics.  By the time `step` returns the first non-warmup
    result the buffer contains only real data — no NaN-padded rows remain.

    Raises:
    ------
    MissingDateColumnError
        If ``'date'`` is absent from ``prices``.
    """
    # 1. Validate -------------------------------------------------------
    if "date" not in prices.columns:
        raise MissingDateColumnError("prices")

    # 2. Build the engine on the full warmup batch ----------------------
    # Import here to avoid a circular dependency at module level.
    from .optimizer import BasanosEngine

    engine = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
    assets = engine.assets
    n_assets = len(assets)
    n_rows = prices.height
    prices_np = prices.select(assets).to_numpy()  # (n_rows, n_assets)

    # 3. Extract mode-specific state from WarmupState --------------------
    ws = engine.warmup_state()
    if isinstance(cfg.covariance_config, EwmaShrinkConfig):
        # EWM: seed the per-step lfilter from the IIR state captured
        # during the single batch pass in warmup_state().
        iir = cast("_EwmCorrState", ws.corr_iir_state)
        corr_zi_x = iir.corr_zi_x
        corr_zi_x2 = iir.corr_zi_x2
        corr_zi_xy = iir.corr_zi_xy
        corr_zi_w = iir.corr_zi_w
        corr_count: np.ndarray = iir.count
        sw_ret_buf: np.ndarray | None = None
    else:
        # SW: carry the last W vol-adjusted returns as a rolling buffer.
        # The IIR fields are initialised to zeros and left unused.
        sw_config = cast(SlidingWindowConfig, cfg.covariance_config)
        win_w = sw_config.window
        ret_adj_np = engine.ret_adj.select(assets).to_numpy()  # (n_rows, N)
        if n_rows >= win_w:
            sw_ret_buf = ret_adj_np[-win_w:].copy()
        else:
            sw_ret_buf = np.full((win_w, n_assets), np.nan)
            sw_ret_buf[-n_rows:] = ret_adj_np
        corr_zi_x = np.zeros((1, n_assets, n_assets))
        corr_zi_x2 = np.zeros((1, n_assets, n_assets))
        corr_zi_xy = np.zeros((1, n_assets, n_assets))
        corr_zi_w = np.zeros((1, n_assets, n_assets))
        corr_count = np.zeros((n_assets, n_assets), dtype=np.int64)

    # 4. Derive EWMA volatility accumulators (vectorised) ---------------
    # Both log-return (for vol_adj) and pct-return (for vola) use the
    # same beta = (vola-1)/vola.  NaN observations (leading NaN at row 0
    # from diff/pct_change) are skipped — the filter input is 0 for NaN
    # rows and the weight accumulator (s_w) only increments for finite
    # observations, matching Polars' effective behaviour for a
    # leading-NaN series.
    #
    # Delegate to the shared helper _ewm_vol_accumulators_from_batch so
    # that the batch and incremental recurrences share a single definition.
    beta_vola: float = (cfg.vola - 1) / cfg.vola
    beta_vola_sq: float = beta_vola**2

    log_ret = np.full((n_rows, n_assets), np.nan, dtype=float)
    pct_ret = np.full((n_rows, n_assets), np.nan, dtype=float)
    if n_rows > 1:
        with np.errstate(divide="ignore", invalid="ignore"):
            log_ret[1:] = np.log(prices_np[1:] / prices_np[:-1])
            pct_ret[1:] = prices_np[1:] / prices_np[:-1] - 1.0

    vola_s_x, vola_s_x2, vola_s_w, vola_s_w2, vola_count = _ewm_vol_accumulators_from_batch(
        log_ret, beta_vola, beta_vola_sq
    )
    pct_s_x, pct_s_x2, pct_s_w, pct_s_w2, pct_count = _ewm_vol_accumulators_from_batch(
        pct_ret, beta_vola, beta_vola_sq
    )

    # 5. Extract prev_cash_pos from WarmupState --------------------------
    prev_cash_pos: np.ndarray = ws.prev_cash_pos
    prev_price: np.ndarray = prices_np[-1].copy()

    # 6. Construct _StreamState and return ------------------------------
    state = _StreamState(
        corr_zi_x=corr_zi_x,
        corr_zi_x2=corr_zi_x2,
        corr_zi_xy=corr_zi_xy,
        corr_zi_w=corr_zi_w,
        corr_count=corr_count,
        vola_s_x=vola_s_x,
        vola_s_x2=vola_s_x2,
        vola_s_w=vola_s_w,
        vola_s_w2=vola_s_w2,
        vola_count=vola_count,
        pct_s_x=pct_s_x,
        pct_s_x2=pct_s_x2,
        pct_s_w=pct_s_w,
        pct_s_w2=pct_s_w2,
        pct_count=pct_count,
        prev_price=prev_price,
        prev_cash_pos=prev_cash_pos,
        step_count=n_rows,
        sw_ret_buf=sw_ret_buf,
    )
    return cls(cfg=cfg, assets=assets, state=state)

`load(path)` `classmethod` ¶

Restore a stream previously saved with save.

Parameters:

Name	Type	Description	Default
`path`	`str \| PathLike[str]`	Path to a `.npz` archive written by `save`.	required

Returns:

Type	Description
`BasanosStream`	A `BasanosStream` whose `step` output is
`BasanosStream`	bit-for-bit identical to the original stream at the time
`BasanosStream`	`save` was called.

Examples:

>>> import tempfile, pathlib, numpy as np
>>> import polars as pl
>>> from datetime import date, timedelta
>>> from basanos.math import BasanosConfig, BasanosStream
>>> rng = np.random.default_rng(1)
>>> n = 60
>>> end = date(2024, 1, 1) + timedelta(days=n - 1)
>>> dates = pl.date_range(
...     date(2024, 1, 1), end, interval="1d", eager=True
... )
>>> prices = pl.DataFrame({
...     "date": dates,
...     "A": np.cumprod(1 + rng.normal(0.001, 0.02, n)) * 100.0,
...     "B": np.cumprod(1 + rng.normal(0.001, 0.02, n)) * 150.0,
... })
>>> mu = pl.DataFrame({
...     "date": dates,
...     "A": rng.normal(0, 0.5, n),
...     "B": rng.normal(0, 0.5, n),
... })
>>> cfg = BasanosConfig(vola=5, corr=10, clip=3.0, shrink=0.5, aum=1e6)
>>> stream = BasanosStream.from_warmup(prices, mu, cfg)
>>> with tempfile.TemporaryDirectory() as tmp:
...     p = pathlib.Path(tmp) / "stream.npz"
...     stream.save(p)
...     restored = BasanosStream.load(p)
...     restored.assets == stream.assets
True

Source code in src/basanos/math/_stream.py

@classmethod
def load(cls, path: str | os.PathLike[str]) -> BasanosStream:
    """Restore a stream previously saved with `save`.

    Args:
        path: Path to a ``.npz`` archive written by `save`.

    Returns:
        A `BasanosStream` whose `step` output is
        bit-for-bit identical to the original stream at the time
        `save` was called.

    Examples:
        >>> import tempfile, pathlib, numpy as np
        >>> import polars as pl
        >>> from datetime import date, timedelta
        >>> from basanos.math import BasanosConfig, BasanosStream
        >>> rng = np.random.default_rng(1)
        >>> n = 60
        >>> end = date(2024, 1, 1) + timedelta(days=n - 1)
        >>> dates = pl.date_range(
        ...     date(2024, 1, 1), end, interval="1d", eager=True
        ... )
        >>> prices = pl.DataFrame({
        ...     "date": dates,
        ...     "A": np.cumprod(1 + rng.normal(0.001, 0.02, n)) * 100.0,
        ...     "B": np.cumprod(1 + rng.normal(0.001, 0.02, n)) * 150.0,
        ... })
        >>> mu = pl.DataFrame({
        ...     "date": dates,
        ...     "A": rng.normal(0, 0.5, n),
        ...     "B": rng.normal(0, 0.5, n),
        ... })
        >>> cfg = BasanosConfig(vola=5, corr=10, clip=3.0, shrink=0.5, aum=1e6)
        >>> stream = BasanosStream.from_warmup(prices, mu, cfg)
        >>> with tempfile.TemporaryDirectory() as tmp:
        ...     p = pathlib.Path(tmp) / "stream.npz"
        ...     stream.save(p)
        ...     restored = BasanosStream.load(p)
        ...     restored.assets == stream.assets
        True
    """
    with np.load(path, allow_pickle=False) as data:
        if "format_version" not in data:
            raise ValueError(  # noqa: TRY003
                "Stream file is missing a format version tag. "
                "It was written with an incompatible version of BasanosStream. "
                "Re-generate it via BasanosStream.from_warmup()."
            )
        found = int(data["format_version"])
        if found != _SAVE_FORMAT_VERSION:
            raise ValueError(  # noqa: TRY003
                f"Stream file was written with format version {found}, "
                f"but the current version is {_SAVE_FORMAT_VERSION}. "
                "Re-generate it via BasanosStream.from_warmup()."
            )
        # Validate that every required key is present.  This catches archives
        # that were produced by an older codebase missing a newly added field,
        # or archives that have been manually edited, with a descriptive error
        # instead of a bare KeyError.
        archive_keys = frozenset(data.files)
        missing = _REQUIRED_KEYS - archive_keys
        if missing:
            raise StreamStateCorruptError(missing)
        cfg = BasanosConfig.model_validate_json(data["cfg_json"].item())
        assets: list[str] = list(data["assets"])
        state_kwargs: dict[str, Any] = {}
        for field in dataclasses.fields(_StreamState):
            raw = data[field.name]
            if field.name == "sw_ret_buf":
                state_kwargs[field.name] = raw if raw.size > 0 else None
            elif field.name == "step_count":
                state_kwargs[field.name] = int(raw)
            else:
                state_kwargs[field.name] = raw
    state = _StreamState(**state_kwargs)
    return cls(cfg=cfg, assets=assets, state=state)

`save(path)` ¶

Serialise the stream to a .npz archive at path.

All _StreamState arrays, the configuration, and the asset list are written in a single savez call. A stream restored via load produces bit-for-bit identical step output.

Parameters:

Name	Type	Description	Default
`path`	`str \| PathLike[str]`	Destination file path. `savez` appends `.npz` automatically when the suffix is absent.	required

Examples:

>>> import tempfile, pathlib, numpy as np
>>> import polars as pl
>>> from datetime import date, timedelta
>>> from basanos.math import BasanosConfig, BasanosStream
>>> rng = np.random.default_rng(0)
>>> n = 60
>>> end = date(2024, 1, 1) + timedelta(days=n - 1)
>>> dates = pl.date_range(
...     date(2024, 1, 1), end, interval="1d", eager=True
... )
>>> prices = pl.DataFrame({
...     "date": dates,
...     "A": np.cumprod(1 + rng.normal(0.001, 0.02, n)) * 100.0,
...     "B": np.cumprod(1 + rng.normal(0.001, 0.02, n)) * 150.0,
... })
>>> mu = pl.DataFrame({
...     "date": dates,
...     "A": rng.normal(0, 0.5, n),
...     "B": rng.normal(0, 0.5, n),
... })
>>> cfg = BasanosConfig(vola=5, corr=10, clip=3.0, shrink=0.5, aum=1e6)
>>> stream = BasanosStream.from_warmup(prices, mu, cfg)
>>> with tempfile.TemporaryDirectory() as tmp:
...     p = pathlib.Path(tmp) / "stream.npz"
...     stream.save(p)
...     restored = BasanosStream.load(p)
...     restored.assets == stream.assets
True

Source code in src/basanos/math/_stream.py

def save(self, path: str | os.PathLike[str]) -> None:
    """Serialise the stream to a ``.npz`` archive at *path*.

    All `_StreamState` arrays, the configuration, and the asset
    list are written in a single `savez` call.  A stream
    restored via `load` produces bit-for-bit identical
    `step` output.

    Args:
        path: Destination file path.  `savez` appends
            ``.npz`` automatically when the suffix is absent.

    Examples:
        >>> import tempfile, pathlib, numpy as np
        >>> import polars as pl
        >>> from datetime import date, timedelta
        >>> from basanos.math import BasanosConfig, BasanosStream
        >>> rng = np.random.default_rng(0)
        >>> n = 60
        >>> end = date(2024, 1, 1) + timedelta(days=n - 1)
        >>> dates = pl.date_range(
        ...     date(2024, 1, 1), end, interval="1d", eager=True
        ... )
        >>> prices = pl.DataFrame({
        ...     "date": dates,
        ...     "A": np.cumprod(1 + rng.normal(0.001, 0.02, n)) * 100.0,
        ...     "B": np.cumprod(1 + rng.normal(0.001, 0.02, n)) * 150.0,
        ... })
        >>> mu = pl.DataFrame({
        ...     "date": dates,
        ...     "A": rng.normal(0, 0.5, n),
        ...     "B": rng.normal(0, 0.5, n),
        ... })
        >>> cfg = BasanosConfig(vola=5, corr=10, clip=3.0, shrink=0.5, aum=1e6)
        >>> stream = BasanosStream.from_warmup(prices, mu, cfg)
        >>> with tempfile.TemporaryDirectory() as tmp:
        ...     p = pathlib.Path(tmp) / "stream.npz"
        ...     stream.save(p)
        ...     restored = BasanosStream.load(p)
        ...     restored.assets == stream.assets
        True
    """
    state = self._state
    # Build the per-field dict automatically from _StreamState so that any
    # new field added to the dataclass is included without manual updates.
    state_arrays: dict[str, Any] = {}
    for field in dataclasses.fields(_StreamState):
        value = getattr(state, field.name)
        if field.name == "sw_ret_buf":
            # Sentinel: use an empty (0, 0) array to represent None so the
            # key is always present in the archive and load() can detect it.
            state_arrays[field.name] = value if value is not None else np.empty((0, 0), dtype=float)
        elif field.name == "step_count":
            state_arrays[field.name] = np.array(value)
        else:
            state_arrays[field.name] = value
    np.savez(
        path,
        format_version=np.array(_SAVE_FORMAT_VERSION),
        cfg_json=np.array(self._cfg.model_dump_json()),
        assets=np.array(self._assets),
        **state_arrays,
    )

`step(new_prices, new_mu, date=None)` ¶

Advance the stream by one row and return the new optimised position.

Parameters¶

new_prices: Per-asset prices for the new timestep. Either a numpy array of shape (N,) (assets ordered as in assets) or a dict mapping asset names to price values. new_mu: Per-asset expected-return signals, same format as new_prices. date: Timestamp for this step (stored in date verbatim; not used in any computation).

Returns:¶

StepResult Frozen dataclass with cash_position, vola, status, and date for this timestep.

Source code in src/basanos/math/_stream.py

def step(
    self,
    new_prices: np.ndarray | dict[str, float],
    new_mu: np.ndarray | dict[str, float],
    date: Any = None,
) -> StepResult:
    """Advance the stream by one row and return the new optimised position.

    Parameters
    ----------
    new_prices:
        Per-asset prices for the new timestep.  Either a numpy array of
        shape ``(N,)`` (assets ordered as in `assets`) or a dict
        mapping asset names to price values.
    new_mu:
        Per-asset expected-return signals, same format as ``new_prices``.
    date:
        Timestamp for this step (stored in `date`
        verbatim; not used in any computation).

    Returns:
    -------
    StepResult
        Frozen dataclass with ``cash_position``, ``vola``, ``status``, and
        ``date`` for this timestep.
    """
    cfg = self._cfg
    assets = self._assets
    state = self._state
    n_assets = len(assets)

    # ── Check if still in the warmup period ──────────────────────────────
    # step_count is initialised to n_rows in from_warmup.
    #
    # EwmaShrinkConfig: in_warmup is True for the first (cfg.corr - n_rows)
    # calls when the warmup batch was shorter than cfg.corr (not enough rows
    # to populate the EWM correlation matrix).
    #
    # SlidingWindowConfig: in_warmup is True for the first (window - n_rows)
    # calls when the warmup batch was shorter than the window.  During this
    # period sw_ret_buf still contains NaN-padded prefix rows; each step
    # shifts one NaN out and appends a real row, so the buffer is fully
    # populated with real data exactly when in_warmup becomes False.
    #
    # In both modes all accumulators are still updated during warmup so that
    # the state is ready the moment the warmup period ends.
    _warmup_thresh = self._warmup_threshold(cfg)
    in_warmup: bool = state.step_count < _warmup_thresh

    # ── Resolve inputs to (N,) float64 arrays ──────────────────────────
    new_p = _resolve_step_vector(new_prices, assets, n_assets, "new_prices")
    new_m = _resolve_step_vector(new_mu, assets, n_assets, "new_mu")

    prev_p = state.prev_price
    beta_vola: float = (cfg.vola - 1) / cfg.vola
    beta_vola_sq: float = beta_vola**2
    beta_corr: float = cfg.corr / (1.0 + cfg.corr)

    # ── Compute new log-returns and pct-returns ─────────────────────────
    with np.errstate(divide="ignore", invalid="ignore"):
        ratio = np.where(
            np.isfinite(new_p) & np.isfinite(prev_p) & (prev_p > 0),
            new_p / prev_p,
            np.nan,
        )
        log_ret = np.log(ratio)
        pct_ret = ratio - 1.0

    # ── Update log-return EWMA accumulators ────────────────────────────
    fin_log = np.isfinite(log_ret)
    vola_s_x = beta_vola * state.vola_s_x + np.where(fin_log, log_ret, 0.0)
    vola_s_x2 = beta_vola * state.vola_s_x2 + np.where(fin_log, log_ret**2, 0.0)
    vola_s_w = beta_vola * state.vola_s_w + fin_log.astype(float)
    vola_s_w2 = beta_vola_sq * state.vola_s_w2 + fin_log.astype(float)
    vola_count = state.vola_count + fin_log.astype(int)

    # ── Update pct-return EWMA accumulators ────────────────────────────
    fin_pct = np.isfinite(pct_ret)
    pct_s_x = beta_vola * state.pct_s_x + np.where(fin_pct, pct_ret, 0.0)
    pct_s_x2 = beta_vola * state.pct_s_x2 + np.where(fin_pct, pct_ret**2, 0.0)
    pct_s_w = beta_vola * state.pct_s_w + fin_pct.astype(float)
    pct_s_w2 = beta_vola_sq * state.pct_s_w2 + fin_pct.astype(float)
    pct_count = state.pct_count + fin_pct.astype(int)

    # ── Compute vol-adjusted return (for the correlation IIR input) ─────
    log_vol = _ewm_std_from_state(vola_s_x, vola_s_x2, vola_s_w, vola_s_w2, vola_count, min_samples=1)
    # Divide; std == 0 yields ±inf → clipped to ±cfg.clip (matches Polars)
    with np.errstate(divide="ignore", invalid="ignore"):
        vol_adj_val = np.where(
            fin_log,
            np.clip(log_ret / log_vol, -cfg.clip, cfg.clip),
            np.nan,
        )

    # ── Mode-specific correlation state update ───────────────────────────
    if isinstance(cfg.covariance_config, SlidingWindowConfig):
        # SW: shift the rolling window buffer in-place and append this row.
        # The corr_zi_* fields are unused; alias them to their old values so
        # the early-return and persist blocks below can reference them safely.
        buf = state.sw_ret_buf  # (W, N), already owned by state
        buf[:-1] = buf[1:]  # type: ignore[index]
        buf[-1] = vol_adj_val  # type: ignore[index]
        corr_zi_x = state.corr_zi_x
        corr_zi_x2 = state.corr_zi_x2
        corr_zi_xy = state.corr_zi_xy
        corr_zi_w = state.corr_zi_w
        corr_count = state.corr_count
    else:
        # EWM: Update IIR filter state for EWM correlation
        fin_va = np.isfinite(vol_adj_val)
        va_f = np.where(fin_va, vol_adj_val, 0.0)
        joint_fin = fin_va[:, np.newaxis] & fin_va[np.newaxis, :]  # (N, N)

        new_v_x = (va_f[:, np.newaxis] * joint_fin)[np.newaxis]  # (1, N, N)
        new_v_x2 = ((va_f**2)[:, np.newaxis] * joint_fin)[np.newaxis]  # (1, N, N)
        new_v_xy = (va_f[:, np.newaxis] * va_f[np.newaxis, :])[np.newaxis]  # (1, N, N)
        new_v_w = joint_fin.astype(np.float64)[np.newaxis]  # (1, N, N)

        filt_a_corr = np.array([1.0, -beta_corr])
        # y_x[0] is the current-step EWM state (filter output); corr_zi_x is
        # the new filter memory (zf = beta * y[0]) passed as zi next step.
        y_x, corr_zi_x = lfilter([1.0], filt_a_corr, new_v_x, axis=0, zi=state.corr_zi_x)
        y_x2, corr_zi_x2 = lfilter([1.0], filt_a_corr, new_v_x2, axis=0, zi=state.corr_zi_x2)
        y_xy, corr_zi_xy = lfilter([1.0], filt_a_corr, new_v_xy, axis=0, zi=state.corr_zi_xy)
        y_w, corr_zi_w = lfilter([1.0], filt_a_corr, new_v_w, axis=0, zi=state.corr_zi_w)
        corr_count = state.corr_count + joint_fin.astype(np.int64)

    # ── Early return during EWM warmup period ───────────────────────────
    # All accumulators are already updated above; skip the O(N²) matrix
    # reconstruction and O(N³) Cholesky solve which are wasteful during
    # warmup — the computed positions would be discarded anyway.
    if in_warmup:
        self._persist_state(
            state,
            corr_zi_x=corr_zi_x,
            corr_zi_x2=corr_zi_x2,
            corr_zi_xy=corr_zi_xy,
            corr_zi_w=corr_zi_w,
            corr_count=corr_count,
            vola_s_x=vola_s_x,
            vola_s_x2=vola_s_x2,
            vola_s_w=vola_s_w,
            vola_s_w2=vola_s_w2,
            vola_count=vola_count,
            pct_s_x=pct_s_x,
            pct_s_x2=pct_s_x2,
            pct_s_w=pct_s_w,
            pct_s_w2=pct_s_w2,
            pct_count=pct_count,
            new_price=new_p,
        )
        return self._warmup_result(n_assets, date)

    # ── Compute EWMA volatility (pct-return std) — shared ───────────────
    vola_vec = _ewm_std_from_state(pct_s_x, pct_s_x2, pct_s_w, pct_s_w2, pct_count, min_samples=cfg.vola)

    # ── Solve for position ───────────────────────────────────────────────
    mask = np.isfinite(new_p)
    if isinstance(cfg.covariance_config, SlidingWindowConfig):
        new_cash_pos, status = self._solve_sliding_window_position(
            cfg=cfg,
            state=state,
            mask=mask,
            new_m=new_m,
            vola_vec=vola_vec,
            n_assets=n_assets,
            date=date,
        )
    else:
        new_cash_pos, status = self._solve_ewma_position(
            cfg=cfg,
            state=state,
            mask=mask,
            new_m=new_m,
            vola_vec=vola_vec,
            y_x=y_x,
            y_x2=y_x2,
            y_xy=y_xy,
            y_w=y_w,
            corr_count=corr_count,
            n_assets=n_assets,
            date=date,
        )

    # ── Apply turnover constraint ─────────────────────────────────────────
    if cfg.max_turnover is not None and status == SolveStatus.VALID:
        new_cash_pos[mask] = _SolveMixin._apply_turnover_constraint(
            new_cash_pos[mask],
            state.prev_cash_pos[mask],
            cfg.max_turnover,
        )

    # ── Persist updated state ───────────────────────────────────────────
    self._persist_state(
        state,
        corr_zi_x=corr_zi_x,
        corr_zi_x2=corr_zi_x2,
        corr_zi_xy=corr_zi_xy,
        corr_zi_w=corr_zi_w,
        corr_count=corr_count,
        vola_s_x=vola_s_x,
        vola_s_x2=vola_s_x2,
        vola_s_w=vola_s_w,
        vola_s_w2=vola_s_w2,
        vola_count=vola_count,
        pct_s_x=pct_s_x,
        pct_s_x2=pct_s_x2,
        pct_s_w=pct_s_w,
        pct_s_w2=pct_s_w2,
        pct_count=pct_count,
        new_price=new_p,
        new_cash_pos=new_cash_pos,
    )

    return StepResult(
        date=date,
        cash_position=new_cash_pos,
        status=status,
        vola=vola_vec,
    )

StepResult¶

`basanos.math.StepResult` `dataclass` ¶

Frozen dataclass representing the output of a single BasanosStream step.

Each call to BasanosStream.step() returns one StepResult capturing the optimised cash positions, the per-asset volatility estimate, the step date, and a status label that describes the solver outcome for that timestep.

Attributes:

Name	Type	Description
`date`	`object`	The timestamp or date label for this step. The type mirrors whatever is stored in the `'date'` column of the input prices DataFrame (typically a Python `date`, `datetime`, or a Polars temporal scalar).
`cash_position`	`ndarray`	Optimised cash-position vector, shape `(N,)`. Entries are `NaN` for assets that are still in the EWMA warmup period or that are otherwise inactive at this step.
`status`	`SolveStatus`	Solver outcome label for this timestep (`SolveStatus`). Since `SolveStatus` is a `StrEnum`, values compare equal to their string equivalents (e.g. `result.status == "valid"` is `True`): `'warmup'` — fewer rows have been seen than the EWMA warmup requires; all positions are `NaN`. `'zero_signal'` — the expected-return signal vector `mu` is identically zero; positions are set to zero rather than solved. `'degenerate'` — the covariance matrix is ill-conditioned or numerically singular; positions cannot be computed reliably and are returned as `NaN`. `'valid'` — normal operation; `cash_position` holds the optimised allocations.
`vola`	`ndarray`	Per-asset EWMA percentage-return volatility, shape `(N,)`. Values are `NaN` during the warmup period before the EWMA has accumulated sufficient history.

Examples:

>>> import numpy as np
>>> result = StepResult(
...     date="2024-01-02",
...     cash_position=np.array([1000.0, -500.0]),
...     status="valid",
...     vola=np.array([0.012, 0.018]),
... )
>>> result.status
'valid'
>>> result.cash_position.shape
(2,)

Source code in src/basanos/math/_stream.py

@dataclasses.dataclass(frozen=True)
class StepResult:
    """Frozen dataclass representing the output of a single ``BasanosStream`` step.

    Each call to ``BasanosStream.step()`` returns one ``StepResult`` capturing
    the optimised cash positions, the per-asset volatility estimate, the step
    date, and a status label that describes the solver outcome for that
    timestep.

    Attributes:
        date: The timestamp or date label for this step.  The type mirrors
            whatever is stored in the ``'date'`` column of the input prices
            DataFrame (typically a Python `date`,
            `datetime`, or a Polars temporal scalar).
        cash_position: Optimised cash-position vector, shape ``(N,)``.
            Entries are ``NaN`` for assets that are still in the EWMA warmup
            period or that are otherwise inactive at this step.
        status: Solver outcome label for this timestep
            (`SolveStatus`).  Since `SolveStatus`
            is a ``StrEnum``, values compare equal to their string equivalents
            (e.g. ``result.status == "valid"`` is ``True``):

            * ``'warmup'`` — fewer rows have been seen than the EWMA warmup
              requires; all positions are ``NaN``.
            * ``'zero_signal'`` — the expected-return signal vector ``mu`` is
              identically zero; positions are set to zero rather than solved.
            * ``'degenerate'`` — the covariance matrix is ill-conditioned or
              numerically singular; positions cannot be computed reliably and
              are returned as ``NaN``.
            * ``'valid'`` — normal operation; ``cash_position`` holds the
              optimised allocations.
        vola: Per-asset EWMA percentage-return volatility, shape ``(N,)``.
            Values are ``NaN`` during the warmup period before the EWMA has
            accumulated sufficient history.

    Examples:
        >>> import numpy as np
        >>> result = StepResult(
        ...     date="2024-01-02",
        ...     cash_position=np.array([1000.0, -500.0]),
        ...     status="valid",
        ...     vola=np.array([0.012, 0.018]),
        ... )
        >>> result.status
        'valid'
        >>> result.cash_position.shape
        (2,)
    """

    date: object
    cash_position: np.ndarray
    status: SolveStatus
    vola: np.ndarray

Streaming API¶

BasanosStream¶

basanos.math.BasanosStream ¶

assets property ¶

__init__(cfg, assets, state) ¶

__setattr__(name, value) ¶

from_warmup(prices, mu, cfg) classmethod ¶

Parameters¶

Returns:¶

Notes:¶

Raises:¶

load(path) classmethod ¶

save(path) ¶

step(new_prices, new_mu, date=None) ¶

Parameters¶

Returns:¶

StepResult¶

basanos.math.StepResult dataclass ¶

`basanos.math.BasanosStream` ¶

`assets` `property` ¶

`init(cfg, assets, state)` ¶

`setattr(name, value)` ¶

`from_warmup(prices, mu, cfg)` `classmethod` ¶

`load(path)` `classmethod` ¶

`save(path)` ¶

`step(new_prices, new_mu, date=None)` ¶

`basanos.math.StepResult` `dataclass` ¶