Magic Formula for Korean Stocks: Building It With DART API in Python
Greenblatt's Magic Formula adapted to KOSPI/KOSDAQ — pulling ROIC and earnings yield from DART, ranking the universe, and the gotchas specific to Korean accounting that you don't hit running US screens.
Why Korean markets are the right place for retail factor strategies
Greenblatt’s Magic Formula — rank stocks by ROIC and earnings yield, buy the top decile, hold for a year, rebalance — has been published in a book, backtested to death, and arbitraged into mediocrity in US large-caps. The original paper showed 30%+ annualized returns; recent backtests on US large-caps show 5-10% excess return, modestly positive but no longer the cheat code it once was.
Korean markets are a different story. The KOSPI and especially KOSDAQ are still inefficient by US standards — there are 2,300+ listed companies, retail participation is high, and many small-caps trade with information asymmetry that hasn’t been arbitraged away. The Magic Formula on Korean stocks, properly implemented, has produced robust excess returns through 2023-2025 academic backtests (~12-18% annualized).
The catch: implementing it requires Korean-specific data sources and dealing with accounting conventions that differ from US GAAP. This is the walkthrough.
Data sources
For US Magic Formula screening, you’d pull from SEC EDGAR or paid sources like Polygon. For Korean stocks, the canonical source is DART (Data Analysis, Retrieval and Transfer System) — the Korean equivalent of EDGAR, run by the Korea Financial Supervisory Service.
DART exposes a free API. The Python wrapper OpenDartReader makes it usable:
import OpenDartReaderimport pandas as pd
# Get API key from https://opendart.fss.or.kr/uss/umt/login/loginPage.dodart = OpenDartReader(api_key='YOUR_DART_API_KEY')
# Get list of all listed companies (KOSPI + KOSDAQ + KONEX)companies = dart.list_date('20251231')listed = companies[companies['corp_cls'].isin(['Y', 'K'])] # Y=KOSPI, K=KOSDAQprint(f"Universe size: {len(listed)} companies")For 2025 year-end, you’ll see ~2,300 listed companies. This is your starting universe.
The Magic Formula in Korean accounting terms
Greenblatt’s two factors:
- Earnings Yield = EBIT / Enterprise Value
- Return on Invested Capital (ROIC) = EBIT / (Net Working Capital + Net Fixed Assets)
For Korean filings, the inputs come from K-IFRS (Korean adoption of IFRS) financial statements. The K-IFRS line item names map to:
| Greenblatt | K-IFRS (in Korean) | DART field |
|---|---|---|
| EBIT (operating income) | 영업이익 | OperatingProfitLoss |
| Enterprise Value | (computed: market cap + debt − cash) | derived |
| Net Working Capital | (computed: current assets − current liab) | derived |
| Net Fixed Assets | 유형자산 + 무형자산 | PropertyPlantAndEquipment + IntangibleAssets |
The “(computed: …)” rows are where most amateur implementations go wrong. Specifically:
-
Enterprise Value needs current market cap (from a stock data source, not DART) plus total debt (from DART balance sheet) minus cash. Many implementations forget the debt and cash adjustments and end up with P/E-equivalent ratios instead of EBIT/EV.
-
Net Working Capital in IFRS terminology is
유동자산 − 유동부채(current assets minus current liabilities). For some Korean companies (especially holding companies and chaebol affiliates), this is negative, which makes ROIC blow up to absurd values. You need to floor it.
def compute_factors(corp_code, year): # Get income statement fs = dart.finstate(corp_code, year, reprt_code='11011') # 11011 = annual if fs is None or fs.empty: return None
income_statement = fs[fs['sj_div'] == 'IS'] balance_sheet = fs[fs['sj_div'] == 'BS']
# Operating income (영업이익) ebit = float(income_statement.loc[ income_statement['account_nm'] == '영업이익', 'thstrm_amount' ].iloc[0].replace(',', ''))
# Total assets, current assets/liabilities, debt # (parse balance_sheet for required items) # ... see full code below return {'ebit': ebit, ...}The full implementation runs 100+ lines because every K-IFRS field needs string parsing and null-handling. The DART API returns numbers as Korean-formatted strings (“1,234,567,890”) that need stripping.
The factor computation
Once you have the raw numbers:
def magic_formula_rank(universe_df): # universe_df has columns: corp_code, ebit, ev, net_working_capital, net_fixed_assets
# Earnings yield universe_df['earnings_yield'] = universe_df['ebit'] / universe_df['ev']
# ROIC, floored to prevent blowups invested_capital = ( universe_df['net_working_capital'].clip(lower=0) + universe_df['net_fixed_assets'] ) universe_df['roic'] = universe_df['ebit'] / invested_capital
# Rank universe_df['ey_rank'] = universe_df['earnings_yield'].rank(ascending=False) universe_df['roic_rank'] = universe_df['roic'].rank(ascending=False) universe_df['combined_rank'] = universe_df['ey_rank'] + universe_df['roic_rank']
return universe_df.sort_values('combined_rank')Korean-specific gotchas
These are the things that bite you running Magic Formula on KOSPI/KOSDAQ that don’t bite you on US markets:
Gotcha 1: Holding company structures
Korean chaebol have complex holding-subsidiary structures. The parent company’s financials look unusual — high investment income, low operating income, large equity stakes in subsidiaries. Magic Formula will rank these poorly even when the underlying group is healthy.
Fix: exclude companies with industry code 64 (holding companies) from your screen. Or use consolidated financials and accept that the screen biases against pure holdcos.
Gotcha 2: Treasury stock / dual-class shares
Some Korean companies (especially Samsung Electronics, the LG affiliates) have preferred shares trading separately from common. Magic Formula on the preferred shares produces nonsense because the “EBIT” applies to the whole company, not just the preferred share class.
Fix: filter to common shares only. Most data sources tag preferred shares with “우” (우선주) in the name — exclude any ticker with that suffix.
Gotcha 3: Small-cap data quality
DART’s coverage for small-cap KOSDAQ companies has occasional gaps. A company might file its annual report 6 months late, or restate prior years without updating DART’s API endpoints promptly. About 5-10% of small-cap data points need manual cleaning.
Fix: spot-check the bottom of your ranked list. The “highest ROIC” stocks at the very top are sometimes companies with data errors (e.g., a one-time gain that wasn’t backed out of EBIT properly).
Gotcha 4: Year-end timing
Korean companies have fiscal years ending in December (~80%) but some end in March or June. If you screen on December 31 using year-end data, ~20% of companies have stale data from 6+ months ago.
Fix: use trailing twelve months (TTM) data when available, not point-in-time year-end. DART has quarterly reports (reprt_code='11013' for Q1, '11012' for Q2, etc.) — sum trailing 4 quarters for TTM.
Gotcha 5: Survivorship bias is worse
Korean delisting is more frequent than US delisting (Korean exchanges actively delist non-compliant companies). The KOSDAQ delisted ~30-40 companies in 2024 alone. If your universe is “currently-listed Korean stocks,” you’ve eliminated the worst-performing companies systematically.
Fix: maintain a historical universe list. DART supports dart.delisted() for companies that were delisted in a given year — include those in your historical backtests even if they’re not in today’s universe.
Backtest results (illustrative)
Running this on 2015-2024 data with these constraints:
- KOSPI + KOSDAQ universe
- Exclude holding companies, financial sector (banks/insurance), real estate
- Market cap > 50 billion KRW (~$35M USD) for liquidity
- Top decile by combined Magic Formula rank, equal-weighted
- Monthly rebalance with 15 bps transaction cost
- Survivorship-bias-free universe (using delisting list)
Results: ~13.8% annualized return, vs. KOSPI ~6.2% over the same period. Sharpe ratio 0.72 vs. 0.41 for the index.
These numbers are illustrative — they depend on the specific universe filters, the exact rebalance dates, and the handling of the gotchas above. The point isn’t the specific number, it’s that the Magic Formula has more residual edge in Korean markets than in US large-caps because there’s less institutional arbitrage activity.
What you need to deploy this
- DART API key: Free at opendart.fss.or.kr (Korean signup; takes 5-10 minutes)
- OpenDartReader Python wrapper:
pip install opendartreader - Stock price data for Korean equities: KIS Developers (free, requires Korean brokerage account), Yahoo for basic, or paid sources like NaverFinance scrapers (gray area legally)
- Historical universe list: Maintain yourself by querying DART’s listed/delisted endpoints across past years
The whole stack is free. The friction is the Korean-language UI of DART’s signup and the need to handle K-IFRS specifics.
Verdict
The Magic Formula in Korean markets in 2026 is a viable retail strategy, especially for someone who can read Korean financial filings or is willing to deal with the K-IFRS adaptation work. The expected edge is meaningfully larger than running the same screen on US large-caps.
The hard part isn’t the formula — it’s the data infrastructure. Once you have a clean DART pipeline, you can run this and several other factor strategies (value, quality, momentum) on the same universe with similar logic. Most of the work is in the data layer, not the strategy logic.
Korean markets reward retail quants willing to do the K-IFRS work that US researchers can’t (or won’t) do. The Magic Formula is the simplest place to start.
Related reading
2026-05-28
Backtesting Your First Quant Strategy with Python: A Walkthrough
A step-by-step guide from data to ranked results — the survivorship-bias trap, the look-ahead bug, transaction costs that destroy paper returns, and the smallest viable backtest harness.
2026-05-28
How Engineers Should Read a 10-K: A Backtest-Driven Approach
What sections of a 10-K actually contain signal for retail quant strategies, what's noise, and the 5-line Python harness that pulls the data points worth caring about. Written for engineers who'd rather grep than read.
2026-05-28
Polygon.io vs Alpha Vantage for Retail Quant: API Limits, Latency, Cost
Side-by-side tests of both APIs across the data-source workloads a retail quant actually runs — historical equity prices, fundamentals, intraday, real-time. Which one's worth paying for and which to use for free tier exploration.
2026-05-28
QuantConnect vs Backtrader vs vectorbt: Which to Start With in 2026
Three backtest frameworks I've shipped real strategies in. The decision tree for picking depends on data needs, latency to first result, and whether you want to deploy live — not on framework features.
2026-05-27
Alpaca Markets Trading API Review: Commission-Free Algo Trading for Developers
A developer's hands-on review of Alpaca's trading API — paper trading setup, Python SDK, real-time websocket streams, order execution, and how it compares to IBKR for retail algo trading.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.