pickuma.
Finance

Building a Monte Carlo Retirement Simulator in Python

How to model thousands of possible market paths instead of trusting one average return — a working Python approach to retirement projections and the assumptions that make or break it.

O
Owen
Engineer · Investor
Verify profile ↗
7 min read

A retirement projection built on a single number — “assume 7% per year” — answers the wrong question. It tells you what happens on one perfectly smooth path that the market has never actually delivered. Real returns arrive jagged: a strong decade, a flat stretch, a crash three years before you plan to stop working. A Monte Carlo simulation replaces that single path with thousands of them, and the output stops being a number and becomes a distribution. This walks through building one in Python and, more importantly, understanding what it can and cannot tell you.

Why a Single Average Return Lies to You

The standard projection uses the deterministic compound-growth formula: start with a balance, multiply by (1 + r) every year, subtract withdrawals. It is easy to put in a spreadsheet and it produces a clean curve. The problem is that the clean curve corresponds to a sequence of identical annual returns, and no such sequence has ever occurred.

The full set of returns over 30 years does determine your ending balance if you never withdraw — multiplication is commutative. But the moment you take money out, the order of returns matters. A bad year early in retirement forces you to sell more shares to fund the same withdrawal, and those shares are gone before any recovery can compound them. Two retirees with the identical 30-year average can end up with wildly different outcomes depending on when the bad years landed. A single-average projection cannot see this. It computes one path and hides the distribution of paths around it.

What a Monte Carlo Simulation Actually Does

The technique is mechanically simple. You assume a distribution for annual returns — for example, a mean and standard deviation estimated from long-run historical data. Then you:

  1. Draw a random return for each year of the projection horizon.
  2. Run your withdrawal logic year by year on that random sequence.
  3. Record whether the portfolio survived and what it ended at.
  4. Repeat thousands of times.

The result is not “your portfolio will be worth X.” It is “across 10,000 simulated futures, the portfolio survived in 87% of them, and here is the spread of ending balances.” That success rate, and the shape of the distribution, is the actual deliverable.

A Minimal Simulator in Python

The core is about 20 lines of NumPy. Everything else is interpretation.

import numpy as np
def simulate(start, withdrawal, years, mean, std, trials=10_000):
# Draw a (trials x years) matrix of random annual returns
returns = np.random.normal(mean, std, size=(trials, years))
balances = np.full(trials, float(start))
survived = np.ones(trials, dtype=bool)
for y in range(years):
balances = balances * (1 + returns[:, y]) - withdrawal
survived &= balances > 0
balances = np.clip(balances, 0, None)
return {
"success_rate": survived.mean(),
"p10": np.percentile(balances, 10),
"p50": np.percentile(balances, 50),
"p90": np.percentile(balances, 90),
}
result = simulate(
start=1_000_000, withdrawal=45_000, years=30,
mean=0.05, std=0.12, # illustrative real-return assumptions
)
print(result)

The numbers passed in here are illustrative, not a recommendation. A real run should use return and volatility assumptions you can defend, expressed in real (inflation-adjusted) terms so the withdrawal stays in constant dollars.

Reading the Output: Percentiles, Not Averages

The instinct is to look at the median ending balance and call it the answer. Resist it. The two outputs that matter are the success rate and the lower percentiles.

The success rate is the fraction of simulated paths where the portfolio never hit zero. A 95% success rate means 1 in 20 simulated futures ran out of money. Whether that is acceptable is a personal judgment, not a technical one — but you cannot make the judgment without the number.

The lower percentiles describe the bad-but-not-impossible cases. The P10 ending balance is the outcome that 90% of paths beat. If the P10 is a comfortable cushion, the plan is robust. If the P10 is zero, the plan depends on the market cooperating, and the median balance is cold comfort. This is the same reason you look at P99 latency instead of average latency when you operate a service: the average hides the tail, and the tail is what hurts.

Where the Model Is Wrong

A simulator is only as honest as its assumptions, and the minimal version above makes several that are convenient rather than accurate.

Normal distributions understate tail risk. Real market returns have fatter tails than a normal distribution — extreme years happen more often than the bell curve predicts. Drawing from np.random.normal produces crashes that are too mild and too rare. Using a Student’s t-distribution, or resampling from actual historical years (bootstrap), gives a more realistic tail.

Returns are not independent. The model treats each year as an independent draw. Real markets show some mean reversion over long horizons and volatility clustering over short ones. Independent draws miss both.

Fixed withdrawals are unrealistic. Almost nobody spends an identical inflation-adjusted amount through a 25% drawdown. Modeling a flexible rule — spending less in bad years — changes the success rate substantially and reflects how people actually behave.

Taxes and fees are missing. The minimal model ignores both. A 0.5% annual fee and the tax drag on a taxable account are real subtractions from the return every single year.

The point of building your own simulator is not that it beats established tools like cFIREsim or the research behind the 4% rule. It is that writing the loop forces you to make every assumption explicit. You cannot hand-wave the return distribution when you have to type it into np.random. That explicitness — seeing exactly which inputs the answer depends on — is worth more than the answer itself.

FAQ

How many trials should I run? +
Ten thousand is plenty for a stable success rate; the figure stops moving meaningfully past a few thousand. Running a million trials does not make the result more accurate — it just reduces sampling noise around an answer that is already dominated by the uncertainty in your return assumptions. Spend your effort on the assumptions, not the trial count.
Should I use historical returns or a fitted distribution? +
Both approaches are defensible and it is informative to run both. Bootstrapping from historical years preserves the real fat tails and is a good default. Fitting a distribution lets you stress-test deliberately — for example, lowering the mean to model a pessimistic future. If the two methods disagree sharply, that disagreement is itself useful information about how fragile the plan is.
Is a homemade simulator good enough, or should I use a financial advisor's tools? +
A simple simulator is excellent for building intuition and comparing scenarios, and the act of building it teaches you which inputs matter. It is not a substitute for personalized advice that accounts for your full tax situation, other income sources, healthcare costs, and goals. Use it to understand the mechanics and ask better questions, not to make irreversible decisions in isolation.

Related reading

See all finance articles →

Get the best tools, weekly

One email every Friday. No spam, unsubscribe anytime.

O
Owen
Engineer · Investor
Verify profile ↗