Backtesting Your First Quant Strategy with Python: A Walkthrough
A step-by-step guide from data to ranked results — the survivorship-bias trap, the look-ahead bug, transaction costs that destroy paper returns, and the smallest viable backtest harness.
What this guide assumes
You can write Python. You understand the basics of equity markets — what a price series is, what total return means, what rebalancing means. You haven’t done a backtest before, or you’ve done one and weren’t sure if the result was real.
The goal: walk through building a minimal but correct backtest of a simple strategy (12-month momentum on the S&P 500 constituents), so you can spot the three biases that destroy 90% of amateur backtests and understand why your “Sharpe ratio of 4.2” almost certainly isn’t real.
The strategy
We’ll test a textbook momentum strategy:
- Each month, rank the S&P 500 constituents by their trailing 12-month total return (excluding the most recent month).
- Hold the top 10% (~50 stocks), equal-weighted.
- Rebalance monthly.
- Compare to buying and holding the S&P 500.
This is a real strategy that has been published in academic literature (Jegadeesh & Titman 1993). The “exclude the most recent month” detail is to avoid short-term reversal effects.
We expect (based on prior research) a Sharpe ratio improvement over buy-and-hold of ~0.2-0.4. If your backtest shows 1.5+, something is wrong.
The setup
import pandas as pdimport numpy as npimport yfinance as yf
# Get datatickers = pd.read_csv('sp500_historical_constituents.csv') # see note belowprices = yf.download(tickers['ticker'].unique().tolist(), start='2010-01-01', end='2025-12-31', auto_adjust=True)['Close']prices = prices.dropna(axis=1, how='all')The csv referenced is critical. We need historical S&P 500 constituents — the list of companies that were in the index at each historical date — not today’s S&P 500 list applied retroactively. The latter is survivorship bias and it’s the #1 reason amateur backtests look unrealistically good.
If you backtest against today’s S&P 500 list using historical prices, you’re only testing companies that survived until today. The companies that went bankrupt or got acquired (Lehman, WaMu, Sun Microsystems) silently drop out of your universe. The strategy “looks better” than it would have in real life because you’ve eliminated the losers a-priori.
Sources for historical constituents: WRDS (academic), CRSP (paid), some Kaggle datasets (sketchy but free). If you can’t get historical constituents, your backtest is fundamentally compromised — call it an “illustrative” backtest at best.
The momentum signal
# Compute trailing 12-month total return, lagged by 1 monthreturns = prices.pct_change()
# 12-month return ending 1 month agotrailing_12m = (1 + returns).rolling(12).apply(np.prod, raw=True) - 1signal = trailing_12m.shift(1) # exclude most recent monthThat .shift(1) is the second critical detail. Without it you’re using current-month information to make decisions you’d have made at the start of the month. That’s a look-ahead bias — using future data to make past decisions. It will inflate your Sharpe.
The general rule: any decision at time T can only use information available at T (or before). When in doubt, shift your signals by 1 period and check whether your backtest still works. If it doesn’t, you had look-ahead bias and your previous result was illusory.
The portfolio formation
def get_top_decile(date, signal_df, return_df): universe = signal_df.loc[date].dropna() threshold = universe.quantile(0.9) selected = universe[universe >= threshold].index return selected
# Build portfolio returnsportfolio_returns = []dates = signal.index[12:] # need 12 months of history
for date in dates: if date not in signal.index: continue selected = get_top_decile(date, signal, returns) if len(selected) == 0: portfolio_returns.append(0) continue # Equal-weight return next period next_date_idx = signal.index.get_loc(date) + 1 if next_date_idx >= len(signal.index): break next_date = signal.index[next_date_idx] next_returns = returns.loc[next_date, selected].dropna() portfolio_returns.append(next_returns.mean())
portfolio_series = pd.Series(portfolio_returns, index=dates[:len(portfolio_returns)])Note: this code is naive on purpose. Real implementations would use a vectorized backtest library (vectorbt, bt, zipline). For a first backtest, the explicit loop is easier to reason about and harder to silently introduce bugs into.
The transaction cost layer
This is the third bias: ignoring transaction costs. A monthly-rebalanced momentum strategy turns over a lot of its portfolio every month — typically 30-60% of holdings change each rebalance. At those turnover rates, transaction costs matter.
For a realistic backtest:
TRANSACTION_COST_BPS = 10 # 10 basis points per trade, each way
def compute_turnover(prev_holdings, current_holdings): if prev_holdings is None: return 1.0 # 100% turnover on initial buy prev_set = set(prev_holdings) curr_set = set(current_holdings) # rough: fraction that changed return len(prev_set.symmetric_difference(curr_set)) / (2 * len(curr_set))
# Apply costsprev_holdings = Noneadjusted_returns = []for date, selected in zip(dates, holdings_history): raw_return = portfolio_returns[i] turnover = compute_turnover(prev_holdings, selected) cost = turnover * (TRANSACTION_COST_BPS / 10000) * 2 # buy + sell adjusted_returns.append(raw_return - cost) prev_holdings = selected10 bps per trade is optimistic for retail. Actual retail costs (spread + commission + slippage) are 15-50 bps depending on the stocks and your broker. Bid-ask spreads on small caps are wider than you think.
Apply realistic costs and watch your Sharpe ratio drop by 0.3-0.5. If your strategy survives that with positive alpha, you might have something. If it doesn’t, you had a “before costs” strategy, which is a strategy that doesn’t exist.
What “good” looks like
After applying all the corrections (survivorship-free universe, properly lagged signals, realistic transaction costs), a textbook 12-month momentum strategy on US large-caps from 2010-2025 should produce roughly:
- Annualized excess return over S&P 500: 1-3% (highly variable by sub-period)
- Sharpe ratio: ~0.6-0.8 (vs. S&P 500’s ~0.5-0.7)
- Maximum drawdown: comparable to or worse than the index
- Years where it underperforms: roughly 40% of the time
If your backtest produces Sharpe 1.5+ or 10%+ excess returns annually, assume you have a bug and go find it. The most likely candidate: you accidentally re-introduced survivorship bias or look-ahead bias somewhere.
The minimal viable backtest harness
For a first backtest, what you actually need:
- Survivorship-bias-free historical universe: this is the hardest data to source for free. Without it, treat all results as illustrative.
- Vectorized return computation: pandas + numpy is fine, don’t pre-optimize.
- Proper signal lagging: explicit
.shift(1)everywhere, audit by spot-checking that decisions don’t use same-period data. - Transaction cost layer: configurable bps per trade, applied to turnover.
- Baseline comparison: every result is meaningful only relative to a benchmark. Always show your strategy’s return alongside buy-and-hold of the same universe.
That’s it. Don’t add walk-forward optimization, Bayesian parameter selection, or regime detection to your first backtest. Those features hide bugs by giving you more degrees of freedom to fit historical noise.
What to use after this
Once your minimal backtest works:
- vectorbt (free, open-source) for faster iteration on parameter sweeps
- backtrader (free) for event-driven backtests with realistic order modeling
- QuantConnect (free for community, paid for production) for cloud backtests with paid data feeds
- Zipline-Reloaded (free) if you want institutional-grade infrastructure
For data: free options (Yahoo, Alpha Vantage) have known quality issues that will bite you. Paid options (Polygon.io, IEX Cloud, EOD Historical Data) are worth it once you’re past the learning stage.
Verdict
A correct first backtest is harder than the code suggests. The bugs aren’t in the syntax — they’re in the unstated assumptions (survivorship bias, look-ahead, transaction costs) that destroy Sharpe ratios. Build the minimal harness, validate against published research expectations, and assume any spectacular result is wrong until proven otherwise.
The skill that takes most of the time to develop isn’t writing the backtest. It’s developing the paranoid intuition for “this looks too good — what bug is inflating it?” That intuition saves you from publishing or trading on illusory edges.
Related reading
2026-05-28
How Engineers Should Read a 10-K: A Backtest-Driven Approach
What sections of a 10-K actually contain signal for retail quant strategies, what's noise, and the 5-line Python harness that pulls the data points worth caring about. Written for engineers who'd rather grep than read.
2026-05-28
Magic Formula for Korean Stocks: Building It With DART API in Python
Greenblatt's Magic Formula adapted to KOSPI/KOSDAQ — pulling ROIC and earnings yield from DART, ranking the universe, and the gotchas specific to Korean accounting that you don't hit running US screens.
2026-05-28
Polygon.io vs Alpha Vantage for Retail Quant: API Limits, Latency, Cost
Side-by-side tests of both APIs across the data-source workloads a retail quant actually runs — historical equity prices, fundamentals, intraday, real-time. Which one's worth paying for and which to use for free tier exploration.
2026-05-28
QuantConnect vs Backtrader vs vectorbt: Which to Start With in 2026
Three backtest frameworks I've shipped real strategies in. The decision tree for picking depends on data needs, latency to first result, and whether you want to deploy live — not on framework features.
2026-05-27
Alpaca Markets Trading API Review: Commission-Free Algo Trading for Developers
A developer's hands-on review of Alpaca's trading API — paper trading setup, Python SDK, real-time websocket streams, order execution, and how it compares to IBKR for retail algo trading.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.