fast-minimum-variance: Solving Minimum Variance Portfolios Fast¶
Overview¶
fast-minimum-variance solves the long-only minimum variance portfolio without ever forming the sample covariance matrix. The key observation is that the KKT stationarity condition $2\Sigma w = \lambda\mathbf{1}$ immediately gives $w \propto \Sigma^{-1}\mathbf{1}$: the entire problem reduces to one symmetric positive definite linear system $\Sigma v = \mathbf{1}$, solved matrix-free by conjugate gradients. The budget constraint is recovered by a single rescaling $w = v / (\mathbf{1}^\top v)$.
Working directly with the returns matrix $X \in \mathbb{R}^{T \times N}$ — rather than the assembled covariance $X^\top X$ — has two consequences. First, each conjugate gradient iteration costs $O(TN)$ rather than $O(N^2)$, and $X^\top X$ is never stored. Second, Ledoit-Wolf shrinkage enters as a simple row-augmentation of $X$: stacking $[\sqrt{1-\alpha}\,X;\,\sqrt{\gamma}\,I]$ yields a matrix whose Gram matrix equals $\Sigma_{\text{LW}}$. The same CG code handles both the plain and shrunk problem without modification.
Quick Start¶
import numpy as np
from fast_minimum_variance import Problem
# 500 daily returns, 20 assets
X = np.random.default_rng(42).standard_normal((500, 20))
w, iters = Problem(X).solve_cg() # matrix-free CG — recommended
w, iters = Problem(X).solve_kkt() # direct dense solve — exact baseline
assert abs(w.sum() - 1.0) < 1e-8
assert (w >= 0).all()
Ledoit-Wolf Shrinkage¶
Ledoit-Wolf shrinkage plays a dual role: statistically it reduces estimation error; numerically
it compresses the eigenvalue spectrum and directly cuts CG iteration counts. Use
alpha = N / (N + T) as a simple analytical estimate of the optimal shrinkage intensity:
On S&P 500 equity data (495 assets, 1192 days), shrinkage cuts CG iterations from 685 to 205 and makes the matrix-free solver the fastest option by a wide margin.
Solvers¶
All solvers are methods on Problem and return (w, iters) where
$w \in \mathbb{R}^N$, $\sum_i w_i = 1$, $w_i \geq 0$.
| Method | Approach | When to use |
|---|---|---|
solve_cg() |
Matrix-free conjugate gradients on the SPD reduced system | Default — fastest for large $N$, especially with shrinkage |
solve_kkt() |
Direct dense factorisation via numpy.linalg.solve |
Small problems or when an exact solve is needed |
solve_nnls() |
Non-negative least squares via Lawson-Hanson | Single-shot; useful when no outer loop is desired |
solve_clarabel() |
Clarabel interior-point solver (direct API) | Comparison baseline without CVXPY overhead |
solve_osqp() |
OSQP operator-splitting QP solver (direct API) | Alternative QP baseline; faster than Clarabel on medium problems |
solve_cvxpy() |
CVXPY + Clarabel | Ground-truth reference |
solve_cg — matrix-free conjugate gradients¶
The inner step builds a LinearOperator that applies
$$v \;\mapsto\; (1-\alpha)\,X_a^\top(X_a v) + \gamma v, \qquad \gamma = \frac{\alpha|X|_F^2}{N}$$
to a vector using two matrix-vector products with the active-asset submatrix $X_a$, without ever forming $\Sigma_a = X_a^\top X_a$. Standard CG then solves $\Sigma_a v = \mathbf{1}$. Ledoit-Wolf shrinkage ($\alpha > 0$) compresses the eigenvalue spectrum and reduces iteration counts dramatically — from nearly 2000 iterations at $\alpha \approx 0$ to single digits at $\alpha \approx 1$ in rank-deficient settings.
solve_kkt — direct dense solve¶
Assembles $\Sigma_a = (1-\alpha)X_a^\top X_a + \gamma I$ explicitly and calls
numpy.linalg.solve. Exact to machine precision. Scales as $O(N^3)$ in the active
portfolio size, so it becomes expensive for $N \gtrsim 500$ without shrinkage (which
reduces the number of active assets). With shrinkage, the active-set outer loop converges
in 2–4 steps and the inner systems are small, making the direct solve competitive.
solve_nnls — non-negative least squares¶
Reformulates the problem as a non-negative least squares problem on an augmented matrix:
$$\min_{w \geq 0}\;\left|\begin{pmatrix}\sqrt{1-\alpha}\,X \ \sqrt{\gamma}\,I \ M\mathbf{1}^\top\end{pmatrix}w - \begin{pmatrix}\mathbf{0} \ \mathbf{0} \ M\end{pmatrix}\right|^2$$
where $M = |X|_F \cdot T$ enforces the budget constraint as a large penalty. The
Lawson-Hanson algorithm handles $w \geq 0$ natively, so no outer primal-dual loop is
needed. Single-shot but does not benefit from the matrix-free structure: Lawson-Hanson
implicitly forms normal equations of the augmented matrix. With shrinkage the augmented
matrix grows from $T \times N$ to $(T+N) \times N$, making solve_nnls slower with
shrinkage than without.
solve_clarabel — Clarabel direct API¶
Calls the Clarabel interior-point solver directly, bypassing CVXPY's problem-construction
overhead. Assembles $P = 2\Sigma_{\text{LW}}$ as a sparse CSC matrix and solves the
standard QP. Useful for benchmarking: on a 1000-asset synthetic problem, Clarabel direct
takes 0.28 s while the CVXPY wrapper takes 8.2 s — over 97% of solve_cvxpy's time is
Python interface overhead, not solving. CG is still 15× faster than Clarabel direct.
solve_osqp — OSQP operator-splitting solver¶
Calls the OSQP operator-splitting QP solver directly, bypassing CVXPY overhead. Assembles
$P = 2\Sigma_{\text{LW}}$ as a sparse upper-triangular CSC matrix and applies ADMM
iterations on the primal-dual update. Consistently about 2× faster than Clarabel direct:
on S&P 500 data OSQP takes 0.036 s versus Clarabel's 0.067 s; on a 1000-asset synthetic
problem, 0.12 s versus 0.28 s. The iters return value is the ADMM iteration count.
CG is still 4–6× faster than OSQP for large $N$, but OSQP is the fastest drop-in QP
solver for problems where the matrix-free structure cannot be exploited.
The Primal-Dual Active-Set Loop¶
Long-only weights are enforced by an outer loop that wraps any inner solver:
- Primal step. Solve the budget-only equality system over the current active asset set. Drop any asset with weight below $-\varepsilon$ (multiple assets at once if violations are large).
- Dual step. Once all active weights are non-negative, compute the gradient $\nabla_i f(w) = 2[(1-\alpha)(X^\top X w)_i + \gamma w_i] - \rho\mu_i$ for every excluded asset. If any excluded asset has $\nabla_i f(w) < \lambda$ (the budget multiplier), it would decrease variance if added — re-insert the most-violated asset and repeat.
- Termination. The loop exits when primal and dual feasibility hold simultaneously. Combined with stationarity from the inner solve, this is sufficient for global optimality.
With Ledoit-Wolf shrinkage at the analytically optimal $\alpha$, the loop typically converges in 2–4 outer iterations on real equity data.
Problem Variants¶
The same solver handles a range of portfolio construction problems by choosing $\alpha$, $\rho$, $\mu$:
| Problem | alpha |
rho |
mu |
|---|---|---|---|
| Minimum variance | $0$ | $0$ | — |
| Mean-variance (Markowitz) | any | $> 0$ | expected returns |
| Minimum tracking error to benchmark $b$ | any | $2$ | X.T @ (X @ b) |
| LW-regularised minimum variance | $N/(N+T)$ | $0$ | — |
# Mean-variance
mu = np.random.default_rng(0).standard_normal(N) # expected returns, shape (N,)
w, _ = Problem(X, rho=1.0, mu=mu).solve_cg()
# Minimum tracking error to benchmark b
b = np.ones(N) / N # equal-weight benchmark
mu_te = X.T @ (X @ b)
w, _ = Problem(X, rho=2.0, mu=mu_te).solve_cg()
When rho != 0, two SPD solves are performed per outer step: $\Sigma_a v_1 = \mathbf{1}$
and $\Sigma_a v_2 = \mu_a$. The budget multiplier $\lambda$ is recovered analytically
from the budget constraint, avoiding the full saddle-point system.
Custom Constraints¶
For problems beyond budget + long-only (sector limits, turnover bounds, factor-exposure constraints), pass explicit constraint matrices:
A = np.ones((N, 1)) # budget: 1'w = 1
b = np.ones(1)
C = -np.eye(N) # long-only: w >= 0
d = np.zeros(N)
w, _ = Problem(X, A=A, b=b, C=C, d=d).solve_kkt()
This routes to a general active-set solver that handles arbitrary linear equality and
inequality constraints. Use this path sparingly — the default path (no A, b, C, d)
is significantly faster for the standard long-only problem.
Benchmarks¶
All timings on Apple M4 Pro, Python 3.12, NumPy 2.4, SciPy 1.17.
Synthetic: $N=1000$, $T=2000$, i.i.d. Gaussian returns¶
| Method | Time (s) | Speedup vs CVXPY |
|---|---|---|
solve_cvxpy |
8.16 | 1× |
solve_clarabel |
0.28 | 29× |
solve_osqp |
0.12 | 68× |
solve_kkt |
0.063 | 129× |
solve_cg |
0.019 | 430× |
solve_nnls |
1.69 | 5× |
With Ledoit-Wolf shrinkage ($\alpha = 0.333$), 56 CG iterations.
S&P 500: $N=495$, $T=1192$ (Jul 2021–Apr 2026)¶
| Method | Time (s) | Speedup vs CVXPY |
|---|---|---|
solve_cvxpy |
1.48 | 1× |
solve_clarabel |
0.067 | 22× |
solve_osqp |
0.036 | 41× |
solve_kkt |
0.018 | 84× |
solve_cg |
0.0091 | 162× |
solve_nnls |
0.088 | 17× |
With Ledoit-Wolf shrinkage ($\alpha = 0.293$), 205 CG iterations.
Installation¶
For development:
git clone https://github.com/Jebel-Quant/fast_minimum_variance
cd fast_minimum_variance
make install
Requirements¶
- Python 3.11+
- numpy
- scipy
- cvxpy
- clarabel
- osqp
Citing¶
If you use this library in academic work or research, please cite:
@software{fast_minimum_variance,
author = {Schmelzer, Thomas},
title = {fast-minimum-variance: Solving Minimum Variance Portfolios Fast},
url = {https://github.com/Jebel-Quant/fast_minimum_variance},
year = {2026},
license = {MIT}
}
License¶
MIT License — see LICENSE for details.