Apify Fingerprint Suite: Open-Source Browser Fingerprinting for Stealth Scrapers

You launch a headless Chrome instance, point it at a target site, and the first response is a 403 or a CAPTCHA. The IP is residential. The user-agent string looks like a normal Chrome. The block still lands. The signal that gave you away is usually the fingerprint — the cluster of browser properties an anti-bot service reads before it ever evaluates how you click or scroll.

Apify’s fingerprint-suite is an MIT-licensed, TypeScript toolkit that generates realistic browser fingerprints and injects them into Playwright or Puppeteer. We went through how it builds those fingerprints, how you wire it in, and where it stops being the answer.

How fingerprinting flags your scraper

A fingerprint is not one value. An anti-bot service assembles it from dozens of signals the browser exposes: navigator.userAgent, navigator.platform, navigator.hardwareConcurrency, navigator.deviceMemory, the list of declared languages, screen and window dimensions, the WebGL vendor and renderer strings, audio-context output, available fonts, and the order and casing of the HTTP headers themselves.

A default headless browser leaks on several of these at once. navigator.webdriver reads true. The user-agent can carry a HeadlessChrome token. Viewports default to a small fixed size. Header order differs from what the same browser sends when a person drives it.

The harder problem is consistency. Spoofing a single value is easy — you can set any user-agent string you want. But if you announce Safari on macOS while WebGL reports an ANGLE (NVIDIA...) renderer string that only appears on Windows, the contradiction is the detection. Anti-bot models are trained on how real fingerprints co-occur, so an attribute that is individually plausible but inconsistent with its neighbors is what trips the flag.

What the suite generates and injects

The toolkit splits the job into two packages. fingerprint-generator builds a fingerprint; fingerprint-injector applies it to a browser context.

The generator is backed by a Bayesian network — the generative-bayesian-network package — fitted on a corpus of real browser fingerprints. Instead of stitching values together at random, it samples a fingerprint where each attribute is conditioned on the others. Ask it for Chrome on Windows and you get Windows-plausible screen resolutions, font lists, and WebGL strings, not a grab-bag that no real machine would produce.

It generates matching HTTP headers in the same pass, through the header-generator package. That covers header names, their order, and values consistent with the browser and OS you requested — header order being a fingerprint signal in its own right. You can constrain generation by browser, operating system, device type, browser version range, and locale.

fingerprint-injector then overrides the JavaScript-visible surface inside the page: navigator properties, declared languages, screen and window geometry, WebGL vendor and renderer, supported codecs, and the navigator.webdriver flag. The fingerprint and its headers travel together, so what the page reads in JavaScript lines up with what the server saw in the request.

Wiring it into Playwright and Puppeteer

For Playwright, the suite ships a newInjectedContext helper that creates a browser context with a fresh fingerprint already applied:

import { chromium } from 'playwright';
import { newInjectedContext } from 'fingerprint-injector';

const browser = await chromium.launch();
const context = await newInjectedContext(browser, {
  fingerprintOptions: {
    devices: ['desktop'],
    operatingSystems: ['windows'],
  },
});

const page = await context.newPage();
await page.goto('https://example.com');

Every page opened from that context inherits the same fingerprint and header set. For Puppeteer, the FingerprintInjector class exposes attachFingerprintToPuppeteer to apply a generated fingerprint to a page.

If you already run Crawlee — Apify’s scraping library — you are using this stack without wiring anything: Crawlee generates and injects fingerprints by default. The standalone packages matter when you drive Playwright or Puppeteer directly and want the same treatment.

Building and tuning a scraper is tight iteration — adjust the fingerprint options, rerun, read the block response, adjust again. An AI-assisted editor keeps that loop short.

Cursor

AI-native code editor that speeds up the write-run-debug loop when you're iterating on scraper logic and browser automation.

Free tier; Pro at $20/mo

Try Cursor

Affiliate link · We earn a commission at no cost to you.

When to reach for it in an AI data pipeline

If your pipeline feeds an LLM — scraping training or evaluation data, ingesting pages for retrieval, monitoring prices or competitors, or backing an agent that browses — the fingerprint suite earns its place when you are driving a real browser against a site with active anti-bot defenses or heavy client-side rendering.

It is the wrong tool when you do not need a browser at all. If the target exposes a public API or serves static HTML, a plain HTTP request is faster, cheaper, and harder to flag than a headless browser with an injected fingerprint. Reach for fingerprinting because JavaScript execution forced you into a browser — not by default.

Two more limits worth setting expectations on. The suite addresses the fingerprint layer only; CAPTCHA solving, IP rotation, and human-like interaction are separate problems you still have to handle. And the model is trained on a fingerprint corpus, so its realism tracks how current that corpus stays against the browser versions running in the wild today.

FAQ

Does the fingerprint suite get past Cloudflare or DataDome? +

Not on its own. It handles the browser-fingerprint layer — the navigator properties, WebGL strings, and header consistency that flag automation. Services like Cloudflare and DataDome also score IP reputation, request behavior, and CAPTCHA challenges. A consistent fingerprint removes one detection vector; it does not remove the others.

How is it different from the Puppeteer stealth plugin? +

puppeteer-extra-plugin-stealth patches a known list of headless tells with hand-written evasions. fingerprint-suite samples a complete, statistically consistent fingerprint from a trained model and injects the whole set. The difference is cross-attribute consistency — values that agree with each other rather than individual patches. The two approaches are not mutually exclusive.

Is it legal to scrape with it? +

The tool is neutral; legality depends on what you collect and how. A site's terms of service, robots.txt, the difference between public and personal data, and your jurisdiction all matter. Settle the legal question for your specific use case before scaling a pipeline on top of it.

Apify Fingerprint Suite: Open-Source Browser Fingerprinting for Stealth Scrapers

How fingerprinting flags your scraper

What the suite generates and injects

Wiring it into Playwright and Puppeteer

Cursor

When to reach for it in an AI data pipeline

FAQ

How to Build an Autonomous AI Coding Agent That Opens GitHub PRs Overnight

Continual Harness: The Gemini Pokémon Agent That Rewrites Its Own Loop

Judea Pearl's Ladder of Causation and the Limits of LLM Reasoning

Optuna Tutorial: Automate Hyperparameter Tuning for ML Models in Python

OpenAI GPT-Realtime-2: What GPT-5-Class Reasoning Actually Changes for Voice Agents

Get the best tools, weekly