May 28, 20268 min read

Key takeaways

DataDome and PerimeterX score consistency and humanity, not just IP reputation.
A clean IP with a headless fingerprint still fails; fix every layer together.
Match timezone and locale to the proxy's geolocation to stay consistent.
Add human-like mouse movement and scrolling, especially against PerimeterX.

How to Scrape Sites Protected by DataDome and PerimeterX

DataDome and PerimeterX (now part of HUMAN) are among the toughest bot protection systems on the web. They go beyond the IP and header checks of a basic firewall and build a behavioral profile of every visitor. If your scraper passes Cloudflare but dies on these, this guide explains why and what actually works.

What makes them harder than a basic WAF

A simple firewall checks your IP reputation and a few headers. DataDome and PerimeterX collect far more signals and score them together with machine learning:

Deep browser fingerprinting. Canvas, WebGL, audio context, installed fonts, screen metrics, and dozens of JavaScript properties.
Behavioral biometrics. Mouse movement curves, scroll velocity, keystroke timing, and how naturally you navigate.
Device consistency. Whether your user agent, fingerprint, and TLS signature all agree with each other.
Session reputation. A score that builds over time, so a session that suddenly acts like a bot gets flagged even if it started clean.

The key insight: these systems look for consistency and humanity, not just a clean IP. A perfect residential IP attached to an obvious headless browser fails immediately.

Why most scrapers fail here

The common failure is fixing one layer and ignoring the rest. People add residential proxies and still get blocked because the browser fingerprint screams automation. Or they patch the fingerprint but run from a flagged datacenter IP. DataDome and PerimeterX correlate signals, so any single inconsistency is enough.

The second common failure is behavior. Even a flawless fingerprint and IP get caught if the session loads ten pages per second in a perfectly even rhythm no human could produce.

The layered approach that works

Getting through requires all of these together, not any one alone.

1. Residential or mobile proxies

Datacenter IPs start with a trust deficit you cannot overcome here. Use residential or, for the hardest targets, mobile proxies, and match the proxy country to the site's audience. See my guide on rotating proxies for the rotation and retry logic.

2. A genuinely patched browser fingerprint

The browser must present a consistent, realistic fingerprint with no automation tells. This means a real user agent that matches the actual browser build, correct WebGL vendor strings, a populated plugins array, and navigator.webdriver removed. Purpose built tools like Camoufox and nodriver handle much of this, but they need updates as detection evolves.

from playwright.async_api import async_playwright

async def stealth_context(p, proxy):
    browser = await p.chromium.launch(
        headless=True,
        args=["--disable-blink-features=AutomationControlled"],
        proxy=proxy,
    )
    ctx = await browser.new_context(
        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                   "AppleWebKit/537.36 (KHTML, like Gecko) "
                   "Chrome/131.0.0.0 Safari/537.36",
        viewport={"width": 1440, "height": 900},
        locale="en-US",
        timezone_id="America/New_York",
    )
    return ctx

Note the timezone and locale. DataDome checks whether your timezone matches your IP geolocation, so a US proxy with a European timezone is a red flag.

3. Human like behavior

Add realistic interaction before extracting data. Move the mouse, scroll gradually, and vary your timing.

async def human_warmup(page):
    await page.mouse.move(200, 300)
    await page.wait_for_timeout(800)
    await page.mouse.wheel(0, 600)
    await page.wait_for_timeout(1200)
    await page.mouse.move(500, 450)

This is not optional on PerimeterX, which weighs behavioral biometrics heavily. A session that never moves the mouse is an obvious bot.

Both systems issue a cookie that carries your trust score. Once you earn a good score, reuse that session. Throwing away cookies and re-solving on every request both wastes effort and looks suspicious. Persist the session, rotate to a new one when the score degrades.

Detecting when you are blocked

These systems often return a 200 with a block page or a challenge, not an obvious error. Always validate the body.

def is_blocked(html: str, status: int) -> bool:
    if status in (403, 429):
        return True
    markers = ["datadome", "px-captcha", "_px", "blocked by"]
    lowered = html.lower()
    return any(m in lowered for m in markers)

When blocked, rotate the proxy and session together, back off, and retry. Hammering with the same flagged session escalates a soft block into a hard ban.

A realistic expectation

DataDome and PerimeterX update their detection continuously. A setup that works this month may need adjustment next month. Scraping these sites reliably is an ongoing engineering effort with monitoring and maintenance, not a one time script. Anyone promising a permanent bypass is overselling.

Need a hard target scraped reliably?

I build and maintain scrapers that get through DataDome, PerimeterX, Cloudflare, and Akamai, with the stealth, proxy, and monitoring infrastructure to keep them running. Hard targets like these are the core of my website scraping service. If you have a tough target, hire me on Upwork or reach out through the contact form. I respond within 24 hours.

datadomeperimeterxanti-botweb scrapingproxies

Have a scraping or automation project?

I build production scraping systems with proxy integration, anti-bot bypass, and the reliability to run at scale.

Hire me on Upwork Contact form