[{"data":1,"prerenderedAt":686},["ShallowReactive",2],{"blog-\u002Fblog\u002Frotating-proxies-for-web-scraping":3},{"id":4,"title":5,"body":6,"date":665,"description":666,"draft":667,"extension":668,"meta":669,"navigation":148,"path":670,"readingTime":671,"seo":672,"stem":673,"tags":674,"takeaways":679,"updated":684,"__hash__":685},"blog\u002Fblog\u002Frotating-proxies-for-web-scraping.md","How to Integrate Rotating Proxies for Web Scraping (Without Getting Blocked)",{"type":7,"value":8,"toc":654},"minimark",[9,13,22,27,30,34,37,112,119,123,126,208,216,220,223,369,383,387,390,474,485,489,492,524,527,531,534,588,591,595,630,634,650],[10,11,5],"h1",{"id":12},"how-to-integrate-rotating-proxies-for-web-scraping-without-getting-blocked",[14,15,16,17,21],"p",{},"If your scraper works for the first hundred requests and then starts returning ",[18,19,20],"code",{},"403",", empty pages, or CAPTCHAs, you have an IP reputation problem, not a code problem. The fix is rotating proxies. This guide covers how to choose the right proxy type, integrate it into a Python scraper, and build the rotation and retry logic that keeps a job running at scale.",[23,24,26],"h2",{"id":25},"why-a-single-ip-gets-blocked","Why a single IP gets blocked",[14,28,29],{},"Every request you send carries your IP address. Anti-bot systems (Cloudflare, DataDome, Akamai, PerimeterX) track request volume, timing, and behavior per IP. A datacenter IP sending 500 requests a minute to a product page looks nothing like a human, so it gets rate-limited or banned. Rotating proxies spread your requests across many IPs so no single address crosses the threshold.",[23,31,33],{"id":32},"proxy-types-and-when-to-use-each","Proxy types, and when to use each",[14,35,36],{},"There are three categories, and picking the wrong one is the most common reason a scrape fails.",[38,39,40,59],"table",{},[41,42,43],"thead",{},[44,45,46,50,53,56],"tr",{},[47,48,49],"th",{},"Type",[47,51,52],{},"Cost",[47,54,55],{},"Detection risk",[47,57,58],{},"Best for",[60,61,62,80,96],"tbody",{},[44,63,64,71,74,77],{},[65,66,67],"td",{},[68,69,70],"strong",{},"Datacenter",[65,72,73],{},"Cheapest",[65,75,76],{},"High",[65,78,79],{},"Unprotected sites, internal tools, high volume where bans are cheap",[44,81,82,87,90,93],{},[65,83,84],{},[68,85,86],{},"Residential",[65,88,89],{},"Mid to high",[65,91,92],{},"Low",[65,94,95],{},"E-commerce, sites behind Cloudflare\u002FDataDome",[44,97,98,103,106,109],{},[65,99,100],{},[68,101,102],{},"Mobile (4G\u002F5G)",[65,104,105],{},"Highest",[65,107,108],{},"Lowest",[65,110,111],{},"The hardest targets like Instagram, sneaker sites, aggressive WAFs",[14,113,114,115,118],{},"Rule of thumb: ",[68,116,117],{},"start with datacenter, escalate to residential only when you see blocks."," Paying for residential on a site that doesn't need it just burns budget.",[23,120,122],{"id":121},"basic-integration-in-python-requests","Basic integration in Python (requests)",[14,124,125],{},"Most providers give you a single gateway endpoint that rotates the IP for you on every request:",[127,128,133],"pre",{"className":129,"code":130,"language":131,"meta":132,"style":132},"language-python shiki shiki-themes github-light github-dark","import requests\n\nPROXY = \"http:\u002F\u002FUSER:PASS@gateway.provider.com:7000\"\n\nproxies = {\"http\": PROXY, \"https\": PROXY}\n\nresp = requests.get(\n    \"https:\u002F\u002Fexample.com\u002Fproducts\",\n    proxies=proxies,\n    timeout=20,\n)\nprint(resp.status_code, resp.url)\n","python","",[18,134,135,143,150,156,161,167,172,178,184,190,196,202],{"__ignoreMap":132},[136,137,140],"span",{"class":138,"line":139},"line",1,[136,141,142],{},"import requests\n",[136,144,146],{"class":138,"line":145},2,[136,147,149],{"emptyLinePlaceholder":148},true,"\n",[136,151,153],{"class":138,"line":152},3,[136,154,155],{},"PROXY = \"http:\u002F\u002FUSER:PASS@gateway.provider.com:7000\"\n",[136,157,159],{"class":138,"line":158},4,[136,160,149],{"emptyLinePlaceholder":148},[136,162,164],{"class":138,"line":163},5,[136,165,166],{},"proxies = {\"http\": PROXY, \"https\": PROXY}\n",[136,168,170],{"class":138,"line":169},6,[136,171,149],{"emptyLinePlaceholder":148},[136,173,175],{"class":138,"line":174},7,[136,176,177],{},"resp = requests.get(\n",[136,179,181],{"class":138,"line":180},8,[136,182,183],{},"    \"https:\u002F\u002Fexample.com\u002Fproducts\",\n",[136,185,187],{"class":138,"line":186},9,[136,188,189],{},"    proxies=proxies,\n",[136,191,193],{"class":138,"line":192},10,[136,194,195],{},"    timeout=20,\n",[136,197,199],{"class":138,"line":198},11,[136,200,201],{},")\n",[136,203,205],{"class":138,"line":204},12,[136,206,207],{},"print(resp.status_code, resp.url)\n",[14,209,210,211,215],{},"This is the simplest setup: the provider's gateway hands you a fresh IP per request. It works, but it gives you no control over ",[212,213,214],"em",{},"when"," to rotate or how to react to a ban.",[23,217,219],{"id":218},"manual-rotation-with-a-proxy-pool","Manual rotation with a proxy pool",[14,221,222],{},"When you need control, for instance keeping the same IP across a multi-step login flow before rotating, manage the pool yourself:",[127,224,226],{"className":129,"code":225,"language":131,"meta":132,"style":132},"import random\nimport requests\n\nPROXY_POOL = [\n    \"http:\u002F\u002FUSER:PASS@p1.provider.com:8000\",\n    \"http:\u002F\u002FUSER:PASS@p2.provider.com:8000\",\n    \"http:\u002F\u002FUSER:PASS@p3.provider.com:8000\",\n]\n\ndef fetch(url: str, max_retries: int = 3) -> requests.Response | None:\n    tried = set()\n    for _ in range(max_retries):\n        proxy = random.choice([p for p in PROXY_POOL if p not in tried])\n        tried.add(proxy)\n        try:\n            resp = requests.get(\n                url,\n                proxies={\"http\": proxy, \"https\": proxy},\n                timeout=20,\n            )\n            if resp.status_code == 200:\n                return resp\n            # 403\u002F429 → this IP is burned, rotate\n        except requests.RequestException:\n            continue  # dead proxy, try the next one\n    return None\n",[18,227,228,233,237,241,246,251,256,261,266,270,275,280,285,291,297,303,309,315,321,327,333,339,345,351,357,363],{"__ignoreMap":132},[136,229,230],{"class":138,"line":139},[136,231,232],{},"import random\n",[136,234,235],{"class":138,"line":145},[136,236,142],{},[136,238,239],{"class":138,"line":152},[136,240,149],{"emptyLinePlaceholder":148},[136,242,243],{"class":138,"line":158},[136,244,245],{},"PROXY_POOL = [\n",[136,247,248],{"class":138,"line":163},[136,249,250],{},"    \"http:\u002F\u002FUSER:PASS@p1.provider.com:8000\",\n",[136,252,253],{"class":138,"line":169},[136,254,255],{},"    \"http:\u002F\u002FUSER:PASS@p2.provider.com:8000\",\n",[136,257,258],{"class":138,"line":174},[136,259,260],{},"    \"http:\u002F\u002FUSER:PASS@p3.provider.com:8000\",\n",[136,262,263],{"class":138,"line":180},[136,264,265],{},"]\n",[136,267,268],{"class":138,"line":186},[136,269,149],{"emptyLinePlaceholder":148},[136,271,272],{"class":138,"line":192},[136,273,274],{},"def fetch(url: str, max_retries: int = 3) -> requests.Response | None:\n",[136,276,277],{"class":138,"line":198},[136,278,279],{},"    tried = set()\n",[136,281,282],{"class":138,"line":204},[136,283,284],{},"    for _ in range(max_retries):\n",[136,286,288],{"class":138,"line":287},13,[136,289,290],{},"        proxy = random.choice([p for p in PROXY_POOL if p not in tried])\n",[136,292,294],{"class":138,"line":293},14,[136,295,296],{},"        tried.add(proxy)\n",[136,298,300],{"class":138,"line":299},15,[136,301,302],{},"        try:\n",[136,304,306],{"class":138,"line":305},16,[136,307,308],{},"            resp = requests.get(\n",[136,310,312],{"class":138,"line":311},17,[136,313,314],{},"                url,\n",[136,316,318],{"class":138,"line":317},18,[136,319,320],{},"                proxies={\"http\": proxy, \"https\": proxy},\n",[136,322,324],{"class":138,"line":323},19,[136,325,326],{},"                timeout=20,\n",[136,328,330],{"class":138,"line":329},20,[136,331,332],{},"            )\n",[136,334,336],{"class":138,"line":335},21,[136,337,338],{},"            if resp.status_code == 200:\n",[136,340,342],{"class":138,"line":341},22,[136,343,344],{},"                return resp\n",[136,346,348],{"class":138,"line":347},23,[136,349,350],{},"            # 403\u002F429 → this IP is burned, rotate\n",[136,352,354],{"class":138,"line":353},24,[136,355,356],{},"        except requests.RequestException:\n",[136,358,360],{"class":138,"line":359},25,[136,361,362],{},"            continue  # dead proxy, try the next one\n",[136,364,366],{"class":138,"line":365},26,[136,367,368],{},"    return None\n",[14,370,371,372,382],{},"The key ideas: ",[68,373,374,375,377,378,381],{},"track which proxies you've already tried for a given request, treat ",[18,376,20],{},"\u002F",[18,379,380],{},"429"," as a signal to rotate, and silently skip dead proxies."," Without retry logic, a single bad IP fails the whole job.",[23,384,386],{"id":385},"proxies-with-a-headless-browser-playwright","Proxies with a headless browser (Playwright)",[14,388,389],{},"For JavaScript-rendered sites you need a real browser. Playwright takes a proxy per context, which lets you isolate sessions:",[127,391,393],{"className":129,"code":392,"language":131,"meta":132,"style":132},"from playwright.async_api import async_playwright\n\nasync def scrape(url: str, proxy: str):\n    async with async_playwright() as p:\n        browser = await p.chromium.launch(\n            proxy={\n                \"server\": \"http:\u002F\u002Fgateway.provider.com:7000\",\n                \"username\": \"USER\",\n                \"password\": \"PASS\",\n            },\n        )\n        page = await browser.new_page()\n        await page.goto(url, wait_until=\"networkidle\")\n        html = await page.content()\n        await browser.close()\n        return html\n",[18,394,395,400,404,409,414,419,424,429,434,439,444,449,454,459,464,469],{"__ignoreMap":132},[136,396,397],{"class":138,"line":139},[136,398,399],{},"from playwright.async_api import async_playwright\n",[136,401,402],{"class":138,"line":145},[136,403,149],{"emptyLinePlaceholder":148},[136,405,406],{"class":138,"line":152},[136,407,408],{},"async def scrape(url: str, proxy: str):\n",[136,410,411],{"class":138,"line":158},[136,412,413],{},"    async with async_playwright() as p:\n",[136,415,416],{"class":138,"line":163},[136,417,418],{},"        browser = await p.chromium.launch(\n",[136,420,421],{"class":138,"line":169},[136,422,423],{},"            proxy={\n",[136,425,426],{"class":138,"line":174},[136,427,428],{},"                \"server\": \"http:\u002F\u002Fgateway.provider.com:7000\",\n",[136,430,431],{"class":138,"line":180},[136,432,433],{},"                \"username\": \"USER\",\n",[136,435,436],{"class":138,"line":186},[136,437,438],{},"                \"password\": \"PASS\",\n",[136,440,441],{"class":138,"line":192},[136,442,443],{},"            },\n",[136,445,446],{"class":138,"line":198},[136,447,448],{},"        )\n",[136,450,451],{"class":138,"line":204},[136,452,453],{},"        page = await browser.new_page()\n",[136,455,456],{"class":138,"line":287},[136,457,458],{},"        await page.goto(url, wait_until=\"networkidle\")\n",[136,460,461],{"class":138,"line":293},[136,462,463],{},"        html = await page.content()\n",[136,465,466],{"class":138,"line":299},[136,467,468],{},"        await browser.close()\n",[136,470,471],{"class":138,"line":305},[136,472,473],{},"        return html\n",[14,475,476,477,480,481,484],{},"One critical detail: ",[68,478,479],{},"match your proxy's geolocation to the site's expected audience."," Scraping a US retailer through a German residential IP often triggers extra verification. Most residential providers let you pin a country (",[18,482,483],{},"gateway.provider.com:7000?country=us",").",[23,486,488],{"id":487},"combining-proxies-with-fingerprint-stealth","Combining proxies with fingerprint stealth",[14,490,491],{},"Rotating IPs alone is not enough on aggressively protected sites. A fresh residential IP paired with an obvious headless-Chrome fingerprint still gets flagged. The full stack looks like:",[493,494,495,502,512,518],"ol",{},[496,497,498,501],"li",{},[68,499,500],{},"Residential\u002Fmobile proxy"," for a clean IP reputation.",[496,503,504,507,508,511],{},[68,505,506],{},"Fingerprint spoofing"," with realistic ",[18,509,510],{},"navigator"," properties, WebGL, canvas, fonts.",[496,513,514,517],{},[68,515,516],{},"Human-like timing"," using randomized delays, no perfectly even request intervals.",[496,519,520,523],{},[68,521,522],{},"Session persistence"," that reuses cookies and the same IP within a logical session, rotating between sessions.",[14,525,526],{},"Skip any one layer and the others can't compensate. This is why \"just add proxies\" often fails on Cloudflare-protected targets: the IP was clean, but the fingerprint gave it away.",[23,528,530],{"id":529},"a-retry-pattern-that-survives-real-jobs","A retry pattern that survives real jobs",[14,532,533],{},"In production I wrap every request in exponential backoff with proxy rotation on hard failures:",[127,535,537],{"className":129,"code":536,"language":131,"meta":132,"style":132},"import time\n\ndef fetch_with_backoff(url: str, max_attempts: int = 5):\n    for attempt in range(max_attempts):\n        resp = fetch(url)  # rotates proxy internally\n        if resp is not None:\n            return resp\n        sleep = min(2 ** attempt, 30)  # cap backoff at 30s\n        time.sleep(sleep)\n    raise RuntimeError(f\"Failed after {max_attempts} attempts: {url}\")\n",[18,538,539,544,548,553,558,563,568,573,578,583],{"__ignoreMap":132},[136,540,541],{"class":138,"line":139},[136,542,543],{},"import time\n",[136,545,546],{"class":138,"line":145},[136,547,149],{"emptyLinePlaceholder":148},[136,549,550],{"class":138,"line":152},[136,551,552],{},"def fetch_with_backoff(url: str, max_attempts: int = 5):\n",[136,554,555],{"class":138,"line":158},[136,556,557],{},"    for attempt in range(max_attempts):\n",[136,559,560],{"class":138,"line":163},[136,561,562],{},"        resp = fetch(url)  # rotates proxy internally\n",[136,564,565],{"class":138,"line":169},[136,566,567],{},"        if resp is not None:\n",[136,569,570],{"class":138,"line":174},[136,571,572],{},"            return resp\n",[136,574,575],{"class":138,"line":180},[136,576,577],{},"        sleep = min(2 ** attempt, 30)  # cap backoff at 30s\n",[136,579,580],{"class":138,"line":186},[136,581,582],{},"        time.sleep(sleep)\n",[136,584,585],{"class":138,"line":192},[136,586,587],{},"    raise RuntimeError(f\"Failed after {max_attempts} attempts: {url}\")\n",[14,589,590],{},"Exponential backoff prevents you from hammering a site that's already rate-limiting you, which on some WAFs escalates a soft block into a hard ban.",[23,592,594],{"id":593},"common-mistakes-to-avoid","Common mistakes to avoid",[596,597,598,608,618,624],"ul",{},[496,599,600,603,604,607],{},[68,601,602],{},"Rotating too aggressively."," A new IP on every single request can look ",[212,605,606],{},"more"," suspicious than a stable session. Match rotation to the site's tolerance.",[496,609,610,613,614,617],{},[68,611,612],{},"Ignoring response bodies."," A ",[18,615,616],{},"200"," status with a CAPTCHA page in the body is still a block. Validate content, not just status codes.",[496,619,620,623],{},[68,621,622],{},"Leaking your real IP."," WebRTC, DNS, and direct API calls can bypass the proxy. Test with an IP-check endpoint before trusting your setup.",[496,625,626,629],{},[68,627,628],{},"Buying the cheapest residential pool."," Oversold pools have burned IPs already flagged across thousands of sites.",[23,631,633],{"id":632},"need-this-built-for-your-project","Need this built for your project?",[14,635,636,637,644,645,649],{},"I build production scraping systems with proxy integration, anti-bot bypass, and the retry infrastructure to keep them running at scale, across Cloudflare, DataDome, and Akamai-protected sites. If you have a scraping or automation project, ",[638,639,643],"a",{"href":640,"rel":641},"https:\u002F\u002Fwww.upwork.com\u002Ffreelancers\u002Fphanvuong2",[642],"nofollow","hire me on Upwork"," or get in touch through the ",[638,646,648],{"href":647},"\u002F#contact","contact form",". I reply within 24 hours with a scope and quote.",[651,652,653],"style",{},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}",{"title":132,"searchDepth":145,"depth":145,"links":655},[656,657,658,659,660,661,662,663,664],{"id":25,"depth":145,"text":26},{"id":32,"depth":145,"text":33},{"id":121,"depth":145,"text":122},{"id":218,"depth":145,"text":219},{"id":385,"depth":145,"text":386},{"id":487,"depth":145,"text":488},{"id":529,"depth":145,"text":530},{"id":593,"depth":145,"text":594},{"id":632,"depth":145,"text":633},"2026-06-12","A practical guide to integrating residential and rotating proxies into a Python scraper: proxy types, rotation strategies, retry logic, and how to avoid IP bans on protected sites.",false,"md",{},"\u002Fblog\u002Frotating-proxies-for-web-scraping","8 min read",{"title":5,"description":666},"blog\u002Frotating-proxies-for-web-scraping",[675,676,131,677,678],"web scraping","proxies","anti-bot","playwright",[680,681,682,683],"Datacenter proxies are cheapest but blocked fast; residential and mobile cost more but pass protected sites.","Start with datacenter and escalate to residential only when you actually see blocks.","Treat 403 and 429 responses as a signal to rotate, and silently skip dead proxies.","Match proxy geolocation to the site's audience, and pair proxies with fingerprint stealth and human-like timing.",null,"1Ocj1ZSzLA0gcR97EZs8BvMRpVTWAsngaK8NHENlbtM",1781254278206]