[{"data":1,"prerenderedAt":525},["ShallowReactive",2],{"blog-\u002Fblog\u002Fbypass-cloudflare-web-scraping":3},{"id":4,"title":5,"body":6,"date":504,"description":505,"draft":506,"extension":507,"meta":508,"navigation":123,"path":509,"readingTime":510,"seo":511,"stem":512,"tags":513,"takeaways":518,"updated":523,"__hash__":524},"blog\u002Fblog\u002Fbypass-cloudflare-web-scraping.md","How to Scrape Cloudflare-Protected Sites in 2026 (A Practical Approach)",{"type":7,"value":8,"toc":494},"minimark",[9,13,22,27,30,76,83,87,101,167,175,179,186,342,356,360,367,376,380,383,398,401,405,412,457,464,468,471,475,490],[10,11,5],"h1",{"id":12},"how-to-scrape-cloudflare-protected-sites-in-2026-a-practical-approach",[14,15,16,17,21],"p",{},"Cloudflare protects a large share of the web, and its bot management has gotten much harder to beat. If you've hit the \"Checking your browser\" interstitial, a Turnstile challenge, or a silent ",[18,19,20],"code",{},"403",", this is what's actually happening and how to get through it reliably.",[23,24,26],"h2",{"id":25},"what-cloudflare-actually-checks","What Cloudflare actually checks",[14,28,29],{},"Cloudflare doesn't rely on one signal. It scores you across several layers, and failing any one can flag you:",[31,32,33,48,54,64,70],"ul",{},[34,35,36,40,41,44,45,47],"li",{},[37,38,39],"strong",{},"TLS fingerprint (JA3\u002FJA4)."," The way your HTTP client negotiates TLS reveals whether you're a real browser or a Python ",[18,42,43],{},"requests"," session. This is why plain ",[18,46,43],{}," gets blocked instantly, before any JavaScript runs.",[34,49,50,53],{},[37,51,52],{},"HTTP\u002F2 fingerprint."," Header order, pseudo-header order, and frame settings differ between real Chrome and automation libraries.",[34,55,56,59,60,63],{},[37,57,58],{},"Browser fingerprint."," JavaScript challenges probe ",[18,61,62],{},"navigator.webdriver",", WebGL, canvas, installed fonts, screen properties, and dozens of other values.",[34,65,66,69],{},[37,67,68],{},"Behavioral signals."," Mouse movement, timing, and navigation patterns.",[34,71,72,75],{},[37,73,74],{},"IP reputation."," Datacenter IPs start with a low trust score.",[14,77,78,79,82],{},"The takeaway: ",[37,80,81],{},"a scraper that fixes only one layer still fails."," Clean IP with a headless fingerprint? Blocked. Perfect fingerprint from a flagged datacenter IP? Blocked.",[23,84,86],{"id":85},"why-plain-http-clients-cant-win","Why plain HTTP clients can't win",[14,88,89,90,92,93,96,97,100],{},"A request from ",[18,91,43],{}," or ",[18,94,95],{},"httpx"," is rejected at the TLS layer before Cloudflare even serves the challenge. Libraries like ",[18,98,99],{},"curl_cffi"," help by impersonating a real browser's TLS fingerprint:",[102,103,108],"pre",{"className":104,"code":105,"language":106,"meta":107,"style":107},"language-python shiki shiki-themes github-light github-dark","from curl_cffi import requests\n\n# Impersonate a real Chrome TLS + HTTP2 fingerprint\nresp = requests.get(\n    \"https:\u002F\u002Fprotected-site.com\",\n    impersonate=\"chrome131\",\n    timeout=20,\n)\nprint(resp.status_code)\n","python","",[18,109,110,118,125,131,137,143,149,155,161],{"__ignoreMap":107},[111,112,115],"span",{"class":113,"line":114},"line",1,[111,116,117],{},"from curl_cffi import requests\n",[111,119,121],{"class":113,"line":120},2,[111,122,124],{"emptyLinePlaceholder":123},true,"\n",[111,126,128],{"class":113,"line":127},3,[111,129,130],{},"# Impersonate a real Chrome TLS + HTTP2 fingerprint\n",[111,132,134],{"class":113,"line":133},4,[111,135,136],{},"resp = requests.get(\n",[111,138,140],{"class":113,"line":139},5,[111,141,142],{},"    \"https:\u002F\u002Fprotected-site.com\",\n",[111,144,146],{"class":113,"line":145},6,[111,147,148],{},"    impersonate=\"chrome131\",\n",[111,150,152],{"class":113,"line":151},7,[111,153,154],{},"    timeout=20,\n",[111,156,158],{"class":113,"line":157},8,[111,159,160],{},")\n",[111,162,164],{"class":113,"line":163},9,[111,165,166],{},"print(resp.status_code)\n",[14,168,169,170,174],{},"This gets you past the TLS check and works on Cloudflare's ",[171,172,173],"em",{},"lower"," security settings. But on sites running a managed challenge or Turnstile, you need a real browser to execute the JavaScript.",[23,176,178],{"id":177},"the-reliable-approach-a-stealth-browser","The reliable approach: a stealth browser",[14,180,181,182,185],{},"For managed challenges, run an actual browser with anti-detection patches. With Playwright, the base setup looks like this, but the stock launch is ",[171,183,184],{},"not"," enough:",[102,187,189],{"className":104,"code":188,"language":106,"meta":107,"style":107},"from playwright.async_api import async_playwright\n\nasync def scrape(url: str):\n    async with async_playwright() as p:\n        browser = await p.chromium.launch(\n            headless=True,\n            args=[\n                \"--disable-blink-features=AutomationControlled\",\n            ],\n            proxy={\n                \"server\": \"http:\u002F\u002Fgateway.provider.com:7000\",\n                \"username\": \"USER\",\n                \"password\": \"PASS\",\n            },\n        )\n        ctx = await browser.new_context(\n            user_agent=\"Mozilla\u002F5.0 (Windows NT 10.0; Win64; x64) \"\n                       \"AppleWebKit\u002F537.36 (KHTML, like Gecko) \"\n                       \"Chrome\u002F131.0.0.0 Safari\u002F537.36\",\n            viewport={\"width\": 1920, \"height\": 1080},\n            locale=\"en-US\",\n        )\n        page = await ctx.new_page()\n        await page.goto(url, wait_until=\"domcontentloaded\")\n        # Wait out the challenge, then read the real content\n        await page.wait_for_load_state(\"networkidle\")\n        return await page.content()\n",[18,190,191,196,200,205,210,215,220,225,230,235,241,247,253,259,265,271,277,283,289,295,301,307,312,318,324,330,336],{"__ignoreMap":107},[111,192,193],{"class":113,"line":114},[111,194,195],{},"from playwright.async_api import async_playwright\n",[111,197,198],{"class":113,"line":120},[111,199,124],{"emptyLinePlaceholder":123},[111,201,202],{"class":113,"line":127},[111,203,204],{},"async def scrape(url: str):\n",[111,206,207],{"class":113,"line":133},[111,208,209],{},"    async with async_playwright() as p:\n",[111,211,212],{"class":113,"line":139},[111,213,214],{},"        browser = await p.chromium.launch(\n",[111,216,217],{"class":113,"line":145},[111,218,219],{},"            headless=True,\n",[111,221,222],{"class":113,"line":151},[111,223,224],{},"            args=[\n",[111,226,227],{"class":113,"line":157},[111,228,229],{},"                \"--disable-blink-features=AutomationControlled\",\n",[111,231,232],{"class":113,"line":163},[111,233,234],{},"            ],\n",[111,236,238],{"class":113,"line":237},10,[111,239,240],{},"            proxy={\n",[111,242,244],{"class":113,"line":243},11,[111,245,246],{},"                \"server\": \"http:\u002F\u002Fgateway.provider.com:7000\",\n",[111,248,250],{"class":113,"line":249},12,[111,251,252],{},"                \"username\": \"USER\",\n",[111,254,256],{"class":113,"line":255},13,[111,257,258],{},"                \"password\": \"PASS\",\n",[111,260,262],{"class":113,"line":261},14,[111,263,264],{},"            },\n",[111,266,268],{"class":113,"line":267},15,[111,269,270],{},"        )\n",[111,272,274],{"class":113,"line":273},16,[111,275,276],{},"        ctx = await browser.new_context(\n",[111,278,280],{"class":113,"line":279},17,[111,281,282],{},"            user_agent=\"Mozilla\u002F5.0 (Windows NT 10.0; Win64; x64) \"\n",[111,284,286],{"class":113,"line":285},18,[111,287,288],{},"                       \"AppleWebKit\u002F537.36 (KHTML, like Gecko) \"\n",[111,290,292],{"class":113,"line":291},19,[111,293,294],{},"                       \"Chrome\u002F131.0.0.0 Safari\u002F537.36\",\n",[111,296,298],{"class":113,"line":297},20,[111,299,300],{},"            viewport={\"width\": 1920, \"height\": 1080},\n",[111,302,304],{"class":113,"line":303},21,[111,305,306],{},"            locale=\"en-US\",\n",[111,308,310],{"class":113,"line":309},22,[111,311,270],{},[111,313,315],{"class":113,"line":314},23,[111,316,317],{},"        page = await ctx.new_page()\n",[111,319,321],{"class":113,"line":320},24,[111,322,323],{},"        await page.goto(url, wait_until=\"domcontentloaded\")\n",[111,325,327],{"class":113,"line":326},25,[111,328,329],{},"        # Wait out the challenge, then read the real content\n",[111,331,333],{"class":113,"line":332},26,[111,334,335],{},"        await page.wait_for_load_state(\"networkidle\")\n",[111,337,339],{"class":113,"line":338},27,[111,340,341],{},"        return await page.content()\n",[14,343,344,345,347,348,351,352,355],{},"The hidden work is in the patches that hide automation: removing ",[18,346,62],{},", spoofing the permissions API, faking plugins and WebGL vendor strings, and matching the user-agent to the actual browser build. Tools like ",[18,349,350],{},"playwright-stealth",", ",[18,353,354],{},"undetected-chromedialog",", or the Camoufox\u002Fnodriver projects automate much of this, but they need maintenance as Cloudflare updates its detection.",[23,357,359],{"id":358},"residential-proxies-are-not-optional-here","Residential proxies are not optional here",[14,361,362,363,366],{},"On Cloudflare-protected sites, datacenter IPs start with a trust deficit you usually can't overcome. Pair the stealth browser with residential or mobile proxies, and ",[37,364,365],{},"match the proxy country to the site's audience",". A US store accessed through a foreign IP often triggers extra verification even when everything else is perfect.",[14,368,369,370,375],{},"See my detailed guide on ",[371,372,374],"a",{"href":373},"\u002Fblog\u002Frotating-proxies-for-web-scraping","integrating rotating proxies"," for the rotation and retry logic.",[23,377,379],{"id":378},"handling-turnstile-challenges","Handling Turnstile challenges",[14,381,382],{},"When a Turnstile or interactive challenge appears, you have two paths:",[384,385,386,392],"ol",{},[34,387,388,391],{},[37,389,390],{},"Let the stealth browser solve it passively."," With a clean fingerprint and good IP, Turnstile often passes without interaction.",[34,393,394,397],{},[37,395,396],{},"Use a solver service"," (2Captcha, CapSolver) for the token when passive solving fails. The solver returns a token you inject into the form submission.",[14,399,400],{},"In practice, a well-configured stealth browser passes most non-interactive challenges on its own, and the solver is the fallback for the hardest cases.",[23,402,404],{"id":403},"validate-the-response-not-just-the-status","Validate the response, not just the status",[14,406,407,408,411],{},"A ",[18,409,410],{},"200"," response can still be a block page. Always check the body:",[102,413,415],{"className":104,"code":414,"language":106,"meta":107,"style":107},"def is_blocked(html: str) -> bool:\n    markers = [\n        \"cf-challenge\",\n        \"Checking your browser\",\n        \"Just a moment\",\n        \"cf-turnstile\",\n    ]\n    return any(m in html for m in markers)\n",[18,416,417,422,427,432,437,442,447,452],{"__ignoreMap":107},[111,418,419],{"class":113,"line":114},[111,420,421],{},"def is_blocked(html: str) -> bool:\n",[111,423,424],{"class":113,"line":120},[111,425,426],{},"    markers = [\n",[111,428,429],{"class":113,"line":127},[111,430,431],{},"        \"cf-challenge\",\n",[111,433,434],{"class":113,"line":133},[111,435,436],{},"        \"Checking your browser\",\n",[111,438,439],{"class":113,"line":139},[111,440,441],{},"        \"Just a moment\",\n",[111,443,444],{"class":113,"line":145},[111,445,446],{},"        \"cf-turnstile\",\n",[111,448,449],{"class":113,"line":151},[111,450,451],{},"    ]\n",[111,453,454],{"class":113,"line":157},[111,455,456],{},"    return any(m in html for m in markers)\n",[14,458,459,460,463],{},"If ",[18,461,462],{},"is_blocked()"," returns true, rotate the proxy, back off, and retry. Do not treat it as success.",[23,465,467],{"id":466},"when-this-gets-hard","When this gets hard",[14,469,470],{},"Cloudflare updates its detection continuously, so a setup that works today can break next month. A production scraper needs monitoring, alerting on block-rate spikes, and a maintenance plan, not a one-off script. That ongoing reliability is the real deliverable, and it's where most DIY scrapers fall apart.",[23,472,474],{"id":473},"need-a-cloudflare-protected-site-scraped-reliably","Need a Cloudflare-protected site scraped reliably?",[14,476,477,478,484,485,489],{},"I build and maintain production scrapers that get through Cloudflare, DataDome, and Akamai, with the stealth, proxy, and monitoring infrastructure to keep them running. If you have a project, ",[371,479,483],{"href":480,"rel":481},"https:\u002F\u002Fwww.upwork.com\u002Ffreelancers\u002Fphanvuong2",[482],"nofollow","hire me on Upwork"," or reach out via the ",[371,486,488],{"href":487},"\u002F#contact","contact form",". I respond within 24 hours.",[491,492,493],"style",{},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}",{"title":107,"searchDepth":120,"depth":120,"links":495},[496,497,498,499,500,501,502,503],{"id":25,"depth":120,"text":26},{"id":85,"depth":120,"text":86},{"id":177,"depth":120,"text":178},{"id":358,"depth":120,"text":359},{"id":378,"depth":120,"text":379},{"id":403,"depth":120,"text":404},{"id":466,"depth":120,"text":467},{"id":473,"depth":120,"text":474},"2026-06-10","What Cloudflare actually checks, why most scrapers fail against it, and the layered approach of stealth browsers, fingerprinting, and residential proxies that reliably gets through.",false,"md",{},"\u002Fblog\u002Fbypass-cloudflare-web-scraping","7 min read",{"title":5,"description":505},"blog\u002Fbypass-cloudflare-web-scraping",[514,515,516,517,106],"web scraping","cloudflare","anti-bot","playwright",[519,520,521,522],"Cloudflare scores you across TLS, HTTP\u002F2, browser fingerprint, behavior, and IP reputation.","Plain HTTP clients fail at the TLS layer; curl_cffi can impersonate a real browser.","Managed challenges need a real, patched stealth browser, not a stock headless launch.","Residential proxies matched to the site's country are required, and a 200 response can still be a block page.",null,"t9vyXQhugYzXupp7mEvMSK7IHv5_iFwn9OJWAdEY8jQ",1781254278206]