[{"data":1,"prerenderedAt":443},["ShallowReactive",2],{"blog-\u002Fblog\u002Fscrape-amazon-product-data":3},{"id":4,"title":5,"body":6,"date":422,"description":423,"draft":424,"extension":425,"meta":426,"navigation":87,"path":427,"readingTime":428,"seo":429,"stem":430,"tags":431,"takeaways":436,"updated":441,"__hash__":442},"blog\u002Fblog\u002Fscrape-amazon-product-data.md","How to Scrape Amazon Product Data Reliably",{"type":7,"value":8,"toc":411},"minimark",[9,13,17,22,25,47,50,54,57,61,64,212,215,219,222,255,263,267,274,299,302,306,309,374,378,381,385,388,392,407],[10,11,5],"h1",{"id":12},"how-to-scrape-amazon-product-data-reliably",[14,15,16],"p",{},"Amazon is one of the most requested scraping targets and one of the most defended. Product data, pricing, and reviews drive competitive intelligence, repricing, and market research. This guide covers how to extract that data reliably, what breaks, and when to use the official channels instead.",[18,19,21],"h2",{"id":20},"what-you-can-extract","What you can extract",[14,23,24],{},"A typical Amazon product scrape pulls:",[26,27,28,32,35,38,41,44],"ul",{},[29,30,31],"li",{},"Title, brand, and ASIN",[29,33,34],{},"Current price, list price, and any deal price",[29,36,37],{},"Star rating and review count",[29,39,40],{},"Availability and Buy Box seller",[29,42,43],{},"Images and bullet point features",[29,45,46],{},"Review text and ratings",[14,48,49],{},"Each of these lives in a predictable spot in the page, but Amazon changes its markup often and serves different layouts to different regions and visitors, which is the first thing that breaks naive scrapers.",[18,51,53],{"id":52},"the-legal-and-policy-reality","The legal and policy reality",[14,55,56],{},"Scraping publicly visible product data is common, but Amazon's terms of service prohibit it, and Amazon actively defends against it. Be clear eyed: respect robots directives where it matters to you, do not scrape personal data, throttle your requests, and consider the official API for anything where compliance is a hard requirement. This guide is about the technical how, not a claim that it is permitted by Amazon.",[18,58,60],{"id":59},"basic-extraction-with-selectors","Basic extraction with selectors",[14,62,63],{},"For a single region and layout, the selectors are straightforward. The challenge is that Amazon uses several layouts, so robust code tries multiple selectors per field.",[65,66,71],"pre",{"className":67,"code":68,"language":69,"meta":70,"style":70},"language-python shiki shiki-themes github-light github-dark","from playwright.async_api import async_playwright\n\nasync def scrape_product(url: str):\n    async with async_playwright() as p:\n        browser = await p.chromium.launch(headless=True)\n        page = await browser.new_page()\n        await page.goto(url, wait_until=\"domcontentloaded\")\n\n        title = await page.text_content(\"#productTitle\")\n\n        # Price lives in different spots depending on layout\n        price = None\n        for sel in [\".a-price .a-offscreen\", \"#priceblock_ourprice\",\n                    \"#corePrice_feature_div .a-offscreen\"]:\n            el = await page.query_selector(sel)\n            if el:\n                price = (await el.text_content()).strip()\n                break\n\n        rating = await page.text_content(\"span[data-hook=rating-out-of-text]\")\n        await browser.close()\n        return {\"title\": title.strip() if title else None,\n                \"price\": price, \"rating\": rating}\n","python","",[72,73,74,82,89,95,101,107,113,119,124,130,135,141,147,153,159,165,171,177,183,188,194,200,206],"code",{"__ignoreMap":70},[75,76,79],"span",{"class":77,"line":78},"line",1,[75,80,81],{},"from playwright.async_api import async_playwright\n",[75,83,85],{"class":77,"line":84},2,[75,86,88],{"emptyLinePlaceholder":87},true,"\n",[75,90,92],{"class":77,"line":91},3,[75,93,94],{},"async def scrape_product(url: str):\n",[75,96,98],{"class":77,"line":97},4,[75,99,100],{},"    async with async_playwright() as p:\n",[75,102,104],{"class":77,"line":103},5,[75,105,106],{},"        browser = await p.chromium.launch(headless=True)\n",[75,108,110],{"class":77,"line":109},6,[75,111,112],{},"        page = await browser.new_page()\n",[75,114,116],{"class":77,"line":115},7,[75,117,118],{},"        await page.goto(url, wait_until=\"domcontentloaded\")\n",[75,120,122],{"class":77,"line":121},8,[75,123,88],{"emptyLinePlaceholder":87},[75,125,127],{"class":77,"line":126},9,[75,128,129],{},"        title = await page.text_content(\"#productTitle\")\n",[75,131,133],{"class":77,"line":132},10,[75,134,88],{"emptyLinePlaceholder":87},[75,136,138],{"class":77,"line":137},11,[75,139,140],{},"        # Price lives in different spots depending on layout\n",[75,142,144],{"class":77,"line":143},12,[75,145,146],{},"        price = None\n",[75,148,150],{"class":77,"line":149},13,[75,151,152],{},"        for sel in [\".a-price .a-offscreen\", \"#priceblock_ourprice\",\n",[75,154,156],{"class":77,"line":155},14,[75,157,158],{},"                    \"#corePrice_feature_div .a-offscreen\"]:\n",[75,160,162],{"class":77,"line":161},15,[75,163,164],{},"            el = await page.query_selector(sel)\n",[75,166,168],{"class":77,"line":167},16,[75,169,170],{},"            if el:\n",[75,172,174],{"class":77,"line":173},17,[75,175,176],{},"                price = (await el.text_content()).strip()\n",[75,178,180],{"class":77,"line":179},18,[75,181,182],{},"                break\n",[75,184,186],{"class":77,"line":185},19,[75,187,88],{"emptyLinePlaceholder":87},[75,189,191],{"class":77,"line":190},20,[75,192,193],{},"        rating = await page.text_content(\"span[data-hook=rating-out-of-text]\")\n",[75,195,197],{"class":77,"line":196},21,[75,198,199],{},"        await browser.close()\n",[75,201,203],{"class":77,"line":202},22,[75,204,205],{},"        return {\"title\": title.strip() if title else None,\n",[75,207,209],{"class":77,"line":208},23,[75,210,211],{},"                \"price\": price, \"rating\": rating}\n",[14,213,214],{},"The multi selector fallback for price is the single most important reliability trick. Amazon shows at least four price layouts, and hard coding one guarantees breakage.",[18,216,218],{"id":217},"handling-amazons-anti-bot-defenses","Handling Amazon's anti-bot defenses",[14,220,221],{},"Amazon serves CAPTCHAs and the \"Robot Check\" page when it suspects automation. To stay below that threshold:",[26,223,224,237,243,249],{},[29,225,226,230,231,236],{},[227,228,229],"strong",{},"Use residential proxies"," and rotate them. Datacenter IPs get the robot page fast. See my ",[232,233,235],"a",{"href":234},"\u002Fblog\u002Frotating-proxies-for-web-scraping","rotating proxies guide",".",[29,238,239,242],{},[227,240,241],{},"Match the region."," Use a proxy in the same country as the Amazon domain you are scraping, or you get redirected and see wrong prices.",[29,244,245,248],{},[227,246,247],{},"Slow down."," Amazon tolerates a steady, human like rate. Bursts trigger the check.",[29,250,251,254],{},[227,252,253],{},"Persist sessions."," Reuse cookies once you have a clean session rather than starting fresh each request.",[14,256,257,258,262],{},"When the robot check does appear, you can solve it with a CAPTCHA service, covered in my guide on ",[232,259,261],{"href":260},"\u002Fblog\u002Fsolving-captchas-2captcha-capsolver","solving CAPTCHAs",", but reducing how often it appears is cheaper than solving it.",[18,264,266],{"id":265},"detecting-the-robot-check","Detecting the robot check",[14,268,269,270,273],{},"Like most protected sites, Amazon returns a ",[72,271,272],{},"200"," with the block page rather than an error. Validate the content.",[65,275,277],{"className":67,"code":276,"language":69,"meta":70,"style":70},"def is_robot_check(html: str) -> bool:\n    markers = [\"Robot Check\", \"Enter the characters you see below\",\n               \"automated access\"]\n    return any(m in html for m in markers)\n",[72,278,279,284,289,294],{"__ignoreMap":70},[75,280,281],{"class":77,"line":78},[75,282,283],{},"def is_robot_check(html: str) -> bool:\n",[75,285,286],{"class":77,"line":84},[75,287,288],{},"    markers = [\"Robot Check\", \"Enter the characters you see below\",\n",[75,290,291],{"class":77,"line":91},[75,292,293],{},"               \"automated access\"]\n",[75,295,296],{"class":77,"line":97},[75,297,298],{},"    return any(m in html for m in markers)\n",[14,300,301],{},"If detected, rotate the proxy and session, back off, and retry.",[18,303,305],{"id":304},"scraping-reviews-and-pagination","Scraping reviews and pagination",[14,307,308],{},"Reviews span many pages. Follow the next page link and respect a delay between requests so the review crawl does not spike your rate.",[65,310,312],{"className":67,"code":311,"language":69,"meta":70,"style":70},"async def scrape_reviews(page, max_pages=10):\n    reviews = []\n    for _ in range(max_pages):\n        for r in await page.query_selector_all(\"div[data-hook=review]\"):\n            body = await r.query_selector(\"span[data-hook=review-body]\")\n            reviews.append((await body.text_content()).strip() if body else \"\")\n        nxt = await page.query_selector(\"li.a-last a\")\n        if not nxt:\n            break\n        await nxt.click()\n        await page.wait_for_timeout(2000)\n    return reviews\n",[72,313,314,319,324,329,334,339,344,349,354,359,364,369],{"__ignoreMap":70},[75,315,316],{"class":77,"line":78},[75,317,318],{},"async def scrape_reviews(page, max_pages=10):\n",[75,320,321],{"class":77,"line":84},[75,322,323],{},"    reviews = []\n",[75,325,326],{"class":77,"line":91},[75,327,328],{},"    for _ in range(max_pages):\n",[75,330,331],{"class":77,"line":97},[75,332,333],{},"        for r in await page.query_selector_all(\"div[data-hook=review]\"):\n",[75,335,336],{"class":77,"line":103},[75,337,338],{},"            body = await r.query_selector(\"span[data-hook=review-body]\")\n",[75,340,341],{"class":77,"line":109},[75,342,343],{},"            reviews.append((await body.text_content()).strip() if body else \"\")\n",[75,345,346],{"class":77,"line":115},[75,347,348],{},"        nxt = await page.query_selector(\"li.a-last a\")\n",[75,350,351],{"class":77,"line":121},[75,352,353],{},"        if not nxt:\n",[75,355,356],{"class":77,"line":126},[75,357,358],{},"            break\n",[75,360,361],{"class":77,"line":132},[75,362,363],{},"        await nxt.click()\n",[75,365,366],{"class":77,"line":137},[75,367,368],{},"        await page.wait_for_timeout(2000)\n",[75,370,371],{"class":77,"line":143},[75,372,373],{},"    return reviews\n",[18,375,377],{"id":376},"the-official-alternative-amazons-api","The official alternative: Amazon's API",[14,379,380],{},"If your use case allows it, Amazon's Product Advertising API and the Selling Partner API provide structured data without scraping. They have strict eligibility rules and rate limits, and they do not expose everything the website shows, but for compliant, stable access they are worth evaluating before building a scraper. For price monitoring at scale where the API does not fit, scraping remains the common path.",[18,382,384],{"id":383},"keeping-it-reliable-over-time","Keeping it reliable over time",[14,386,387],{},"Amazon changes its markup and tightens its defenses regularly. A scraper that works today will break, so build for maintenance: monitor your success rate, alert when extraction returns nulls, and keep the selector fallbacks updated. The real deliverable is a pipeline that stays working, not a script that ran once.",[18,389,391],{"id":390},"need-amazon-or-e-commerce-data-at-scale","Need Amazon or e-commerce data at scale?",[14,393,394,395,401,402,406],{},"I build e-commerce scrapers for price monitoring, catalog extraction, and competitor tracking, with the proxy and anti-bot infrastructure to run reliably. If you need product data at scale, ",[232,396,400],{"href":397,"rel":398},"https:\u002F\u002Fwww.upwork.com\u002Ffreelancers\u002Fphanvuong2",[399],"nofollow","hire me on Upwork"," or reach out through the ",[232,403,405],{"href":404},"\u002F#contact","contact form",". I respond within 24 hours.",[408,409,410],"style",{},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}",{"title":70,"searchDepth":84,"depth":84,"links":412},[413,414,415,416,417,418,419,420,421],{"id":20,"depth":84,"text":21},{"id":52,"depth":84,"text":53},{"id":59,"depth":84,"text":60},{"id":217,"depth":84,"text":218},{"id":265,"depth":84,"text":266},{"id":304,"depth":84,"text":305},{"id":376,"depth":84,"text":377},{"id":383,"depth":84,"text":384},{"id":390,"depth":84,"text":391},"2026-05-24","A practical guide to scraping Amazon product listings, prices, and reviews at scale. Covers selectors, anti-bot handling, the official API alternative, and staying reliable.",false,"md",{},"\u002Fblog\u002Fscrape-amazon-product-data","8 min read",{"title":5,"description":423},"blog\u002Fscrape-amazon-product-data",[432,433,434,435,69],"amazon","e-commerce","web scraping","price monitoring",[437,438,439,440],"Amazon serves several price layouts, so use multiple fallback selectors per field.","Use residential proxies matched to the marketplace country to get correct prices.","The robot check returns a 200 with a block page, so validate the content.","Consider Amazon's official APIs where compliance is a hard requirement.",null,"gmag89HmtLX-KNDs3LaFsWqF7-xNsBDmU6f3YfLQ-Ts",1781254278234]