Apify Fingerprint Suite: Open-Source Browser Fingerprinting for Stealth Scrapers
Apify's fingerprint-suite generates statistically consistent browser fingerprints and injects them into Playwright or Puppeteer. How it works, how to wire it in, and when a scraper actually needs it.
You launch a headless Chrome instance, point it at a target site, and the first response is a 403 or a CAPTCHA. The IP is residential. The user-agent string looks like a normal Chrome. The block still lands. The signal that gave you away is usually the fingerprint — the cluster of browser properties an anti-bot service reads before it ever evaluates how you click or scroll.
Apify’s fingerprint-suite is an MIT-licensed, TypeScript toolkit that generates realistic browser fingerprints and injects them into Playwright or Puppeteer. We went through how it builds those fingerprints, how you wire it in, and where it stops being the answer.
How fingerprinting flags your scraper
A fingerprint is not one value. An anti-bot service assembles it from dozens of signals the browser exposes: navigator.userAgent, navigator.platform, navigator.hardwareConcurrency, navigator.deviceMemory, the list of declared languages, screen and window dimensions, the WebGL vendor and renderer strings, audio-context output, available fonts, and the order and casing of the HTTP headers themselves.
A default headless browser leaks on several of these at once. navigator.webdriver reads true. The user-agent can carry a HeadlessChrome token. Viewports default to a small fixed size. Header order differs from what the same browser sends when a person drives it.
The harder problem is consistency. Spoofing a single value is easy — you can set any user-agent string you want. But if you announce Safari on macOS while WebGL reports an ANGLE (NVIDIA...) renderer string that only appears on Windows, the contradiction is the detection. Anti-bot models are trained on how real fingerprints co-occur, so an attribute that is individually plausible but inconsistent with its neighbors is what trips the flag.
What the suite generates and injects
The toolkit splits the job into two packages. fingerprint-generator builds a fingerprint; fingerprint-injector applies it to a browser context.
The generator is backed by a Bayesian network — the generative-bayesian-network package — fitted on a corpus of real browser fingerprints. Instead of stitching values together at random, it samples a fingerprint where each attribute is conditioned on the others. Ask it for Chrome on Windows and you get Windows-plausible screen resolutions, font lists, and WebGL strings, not a grab-bag that no real machine would produce.
It generates matching HTTP headers in the same pass, through the header-generator package. That covers header names, their order, and values consistent with the browser and OS you requested — header order being a fingerprint signal in its own right. You can constrain generation by browser, operating system, device type, browser version range, and locale.
fingerprint-injector then overrides the JavaScript-visible surface inside the page: navigator properties, declared languages, screen and window geometry, WebGL vendor and renderer, supported codecs, and the navigator.webdriver flag. The fingerprint and its headers travel together, so what the page reads in JavaScript lines up with what the server saw in the request.
Wiring it into Playwright and Puppeteer
For Playwright, the suite ships a newInjectedContext helper that creates a browser context with a fresh fingerprint already applied:
import { chromium } from 'playwright';import { newInjectedContext } from 'fingerprint-injector';
const browser = await chromium.launch();const context = await newInjectedContext(browser, { fingerprintOptions: { devices: ['desktop'], operatingSystems: ['windows'], },});
const page = await context.newPage();await page.goto('https://example.com');Every page opened from that context inherits the same fingerprint and header set. For Puppeteer, the FingerprintInjector class exposes attachFingerprintToPuppeteer to apply a generated fingerprint to a page.
If you already run Crawlee — Apify’s scraping library — you are using this stack without wiring anything: Crawlee generates and injects fingerprints by default. The standalone packages matter when you drive Playwright or Puppeteer directly and want the same treatment.
Building and tuning a scraper is tight iteration — adjust the fingerprint options, rerun, read the block response, adjust again. An AI-assisted editor keeps that loop short.
Cursor
AI-native code editor that speeds up the write-run-debug loop when you're iterating on scraper logic and browser automation.
Free tier; Pro at $20/mo
Affiliate link · We earn a commission at no cost to you.
When to reach for it in an AI data pipeline
If your pipeline feeds an LLM — scraping training or evaluation data, ingesting pages for retrieval, monitoring prices or competitors, or backing an agent that browses — the fingerprint suite earns its place when you are driving a real browser against a site with active anti-bot defenses or heavy client-side rendering.
It is the wrong tool when you do not need a browser at all. If the target exposes a public API or serves static HTML, a plain HTTP request is faster, cheaper, and harder to flag than a headless browser with an injected fingerprint. Reach for fingerprinting because JavaScript execution forced you into a browser — not by default.
Two more limits worth setting expectations on. The suite addresses the fingerprint layer only; CAPTCHA solving, IP rotation, and human-like interaction are separate problems you still have to handle. And the model is trained on a fingerprint corpus, so its realism tracks how current that corpus stays against the browser versions running in the wild today.
FAQ
Does the fingerprint suite get past Cloudflare or DataDome? +
How is it different from the Puppeteer stealth plugin? +
Is it legal to scrape with it? +
Related reading
2026-05-20
How to Build an Autonomous AI Coding Agent That Opens GitHub PRs Overnight
A practical breakdown of the plan-execute-verify loop behind an autonomous AI coding agent, and how to wire it to GitHub so an issue becomes a reviewable pull request overnight.
2026-05-20
Continual Harness: The Gemini Pokémon Agent That Rewrites Its Own Loop
How the Continual Harness pattern, from the Gemini Plays Pokémon and PokeAgent teams, lets an agent rewrite its own harness mid-run — plus how to apply that online-adaptation idea to autonomous agents you build.
2026-05-20
Judea Pearl's Ladder of Causation and the Limits of LLM Reasoning
Judea Pearl's three-rung causal hierarchy — association, intervention, counterfactual — explains why data-driven ML and LLMs hit a structural wall at causal reasoning, and what that means for agents and RAG.
2026-05-20
Optuna Tutorial: Automate Hyperparameter Tuning for ML Models in Python
How Optuna's define-by-run API, TPE sampler, and pruners automate hyperparameter tuning for scikit-learn, PyTorch, and TensorFlow models, with runnable Python code.
2026-05-20
OpenAI GPT-Realtime-2: What GPT-5-Class Reasoning Actually Changes for Voice Agents
OpenAI's GPT-Realtime-2 is the first speech model with GPT-5-class reasoning. Here's what genuinely changes for voice agents — and what to test before you migrate.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.