Category: Scraping use cases

2025 Guide to Scraping Amazon with Python: prices, product data, categories

18 mins read Created Date: January 03, 2025   Updated Date: January 03, 2025

Amazon, as the world’s largest marketplace, holds information on millions of products from hundreds of thousands of vendors; data that will provide invaluable competitive insights to any ecommerce business.

This guide provides a step-by-step tutorial on how to scrape Amazon using Python.

We’ll go over:

  • legal considerations,
  • technical solutions like bypassing Web Application Firewalls (WAF),
  • and practical implementations such as exporting data to structured formats.

Whether you’re a novice or experienced developer, this guide will equip you with the skills to tackle Amazon’s dynamic pages effectively.

To solidify your skills in web scraping, check out our guide to web scraping in Python.

Does Amazon Allow Scraping?

Yes, scraping Amazon is possible and completely legal as long as you follow the guidelines and stay away from any login protected personal information.

By respecting these rules and adhering to ethical scraping practices, you can gather publicly available data without crossing any laws or regulations.

Web scraping is a widely used and legitimate method for collecting publicly available data.

However, every website, including Amazon, establishes boundaries to protect its resources. These are outlined in Amazon’s robots.txt file.

This file specifies which parts of the website are accessible to automated tools like web crawlers.

When scraping Amazon, it’s important to:

  • Avoid login-protected content like customer reviews. These areas are restricted by Amazon’s Terms of Service and often require authentication.
  • Respect the structure and limitations outlined in robots.txt to ensure compliance.

In recent updates, Amazon moved customer reviews behind login walls, making them inaccessible without proper authorization. While these changes may pose challenges, they also clarify which areas of Amazon’s site are off-limits. Staying within these boundaries ensures your scraping efforts remain ethical and compliant.

Ethical Guidelines

Scraping ethically means balancing your goals with respect for the website’s integrity and policies.

Ethical scraping doesn’t just help you avoid potential legal pitfalls; it also fosters a responsible approach to data collection.

Here are a few best practices:

  1. Respect Rate Limits: Sending too many requests in a short period can overwhelm Amazon’s servers. Use delays or throttling to mimic real user behavior.
  2. Stick to Publicly Available Data: Avoid scraping content that requires login credentials or is marked as off-limits in the robots.txt file.
  3. Use Data Responsibly: Whether for analysis or application development, ensure that the data you collect is used in ways that align with Amazon’s guidelines.

To learn more about ethical scraping and the legal landscape, check out this detailed guide on legality of web scraping.

First Things First: Bypass Amazon WAF

Amazon’s WAF(Web Application Firewalls) is one of the biggest challenges in scraping its platform.

It’s designed to detect and block automated traffic, making manual or automated scraping more complex. To scrape Amazon successfully, you need strategies to bypass these defenses effectively.

Manual Methods

Manually configuring headers, introducing time delays, and rotating proxies can help you mimic human browsing behavior.

By regularly updating your approach to reflect typical user patterns, you can reduce the risk of detection. However, these methods require consistent effort and monitoring, making them less ideal for large-scale projects.

Each adjustment requires testing and iteration to keep up with WAFs’ evolving security measures.

Using Stealth Plugins

Stealth plugins like scrapy-fake-useragent or scrapy-rotating-proxies offer a semi-automated way to handle WAF defenses. These plugins can randomize headers and rotate IPs, mimicking real user behavior.

However, they often fall short against Amazon’s sophisticated detection algorithms, which are updated regularly.

Web Scraping APIs

A simpler, more reliable approach is to use a paid service like Scrape.do.

These APIs handle complex tasks like IP rotation, header management, and CAPTCHA solving, allowing you to focus solely on extracting the data you need.

For this tutorial, we’ll be using Scrape.do and will demonstrate how you can integrate it into your scraping workflow.

Scraping Product Data from Amazon

Extracting information from an Amazon product page will be our starting point for this tutorial.

From product prices and images to descriptions and ratings, these pages contain data that can be invaluable for market analysis or application development.

For the scope of this step we will be using this product as an example.

Prerequisites

Libraries like Requests and BeautifulSoup simplify the process of sending HTTP requests and parsing HTML content.

pip install requests beautifulsoup4

Now that we have the necessary libraries installed we can start making our first request.

As we have talked about it earlier we will be using Scrape.do for bypassing the Amazon WAF and the last thing we will need is our token provided by Scrape.do

As we start, we will focus on getting a successful http response from the Amazon servers. We will route our http request through Scrape.do API which will allow us to effortlessly skip Amazon WAF.

This step normally would need to be quite complex due to advanced protection systems that are in use by Amazon but thanks to Scrape.do we don’t need to handle any bypass mechanisms!

After defining our token and the product url, we can try sending our first request using something like this:

import requests
import urllib.parse

# Our token provided by 'Scrape.do'
token = "<your_token>"

# Amazon product url
targetUrl = urllib.parse.quote_plus("https://us.amazon.com/Amazon-Basics-Portable-Adjustable-Notebook/dp/B0BLRJ4R8F/")

# Use Scrape.do to route our request
apiUrl = "http://api.scrape.do?token={}&url={}".format(token, targetUrl)
response = requests.request("GET", apiUrl)

print(response)

At this point we should be getting a HttpResponse stored in our response variable and if we print it into our console it should be looking like this:

<Response [200]>

HttpResponse with the 200 code means our request is succesful. If you are getting any other response, you can check list of the Http Status Codes to troubleshoot any problem you are having.

We can access the contents of the response by using;

response.text

This will include all of the html contents of the page including any scripts that is available.

If you want; you can print this information on your console to check or even write it into a html file to have a local copy of the product page on your computer.

Our next step will be extracting the information we need from this raw text.

Scrape Amazon Product Price

Price is the most important information about a product, so it will be a nice starting target for us.

For this step we will start using BeautifulSoup because it helps us parse the data we need.

Before starting the extraction process we need to locate the information that is necessary inside the product page we just downloaded. We can either use the response.text that we downloaded in previous step or we can go into the product page and use developer tools to inspect the elements.

You can use F12 to access the developer tools on Google Chrome and press ctrl + shift + c to inspect the element you want. We will select and inspect the price element since it is our first target.

We should be looking for a html element that encapsulates the information that we are trying to access. For this example the span object with priceToPay class contains all of the necessary parts for the price information.

Now that we know which element stores the product price we will use BeautifulSoup to access this html element and retrieve the information that is inside.

from bs4 import BeautifulSoup

<-- same as previous step -->

# Parse the request using BS
soup = BeautifulSoup(response.text, "html.parser")

price = soup.find(class_="priceToPay").text.strip()
print("Price:", price)

After running this script we should be getting the product price printed on our console.

Price: $28.57

If you’re getting this output, it means you’ve successfully scraped and parsed price information on an Amazon product.

Now let’s continue with other key product data.

Scrape Amazon Product Details

The following steps will be pretty similar to getting the price information.

We will inspect each of the information that we are insterested using the developer tools and we will try to find an encapsulating html object that holds the complete information.

Then with the help of BeautifulSoup we will locate this information inside the response we are getting.

Product Name

We can see that product name is stored inside a span object with the id productTitle. Again we will access this object with BeautifulSoup and try to print the information inside of it.

name = soup.find(id="productTitle").text.strip()
print("Product Name:", name)

We should be getting product name printed on our console.

Product Name: Amazon Basics Ergonomic and Foldable Laptop Stand for Desk, Adjustable Riser, Fits all Laptops and Notebooks up to 17.3 Inch, 10 x 8.7 x 6 in, Silver

Product Image

It is time for the product image; we will start with inspecting the element again and find the html object containing the product image.

landingImage is the id of the element that contains the image.

But this time we are interested in the src attribute of the object instead of the text, since image url is stored in this attribute. We can find and access to the src attribute of this element using following:

image = soup.find("img", {"id": "landingImage"})["src"]
print("Image URL:", image)

We should be getting the image url printed on our console!

Image URL: https://m.media-amazon.com/images/I/51KyaTB1EKL.__AC_SX300_SY300_QL70_FMwebp_.jpg

It is also possible to download and save this images into your drive but for the scope of this tutorial we will stop here and store the information about product images as urls.

Product Rating

And finally the product rating, the element that showcases the success of an ecommerce product. Our first step is again to inspect:

As we can see product rating element is not encapsulated directly under a tag that we can refer to. a-size-medium a-color-base classes are generic classes and will return other objects if we tried to use it.

div object with the AverageCustomerReviews class is something we can easily access and contains all of the information we need but also has information that we don’t need.

We will try to get this information and strip the part we are interested.

rating = soup.find(class_="AverageCustomerReviews").text.strip()
print("Rating:", rating)

This will return the following:

Rating: 4.6 out of 5 stars4.6 out of 5

This string contains the rating twice, also we don’t need to specify it is out 5.

So we are going to split this string and get a single rating score. Let’s update our code as follows:

rating = soup.find(class_="AverageCustomerReviews").text.strip().split(" out")[0]
print("Rating:", rating)

Now we get this:

Rating: 4.6

Export Product Data

To make your scraped data actionable, it’s essential to save it in a structured data format. This allows for easy sharing, visualization, or further processing.

We will export it into CSV; the top format for sharing large data.

You can use libraries like pandas to easily save your data frames into csv files but in this tutorial we will do it without a help of a library.

CSV(Comma Separated Values) files are constructible using default file access capabilities. We will separate each of the product information by a comma and write them into the file line by line.

We will also use two additional products to test our code on multiple URLs.

Let’s update our code to include these new products and loop through them while writing their information into a csv file.

Here’s the final code:

import requests
from bs4 import BeautifulSoup
import urllib.parse

# Our token provided by 'scrape.do'
token = "<your_token>"

# Amazon product urls
targetUrls = ["https://us.amazon.com/Amazon-Basics-Portable-Adjustable-Notebook/dp/B0BLRJ4R8F/",
              "https://us.amazon.com/Urmust-Ergonomic-Adjustable-Ultrabook-Compatible/dp/B081YXWDTQ/",
              "https://us.amazon.com/Ergonomic-Compatible-Notebook-Soundance-LS1/dp/B07D74DT3B/"]

for targetUrl in targetUrls:
    # Use Scrape.do to route our request

    targetUrl_encoded = urllib.parse.quote_plus(targetUrl)
    apiUrl = "http://api.scrape.do?token={}&url={}".format(token, targetUrl_encoded)
    response = requests.request("GET", apiUrl)

    # Parse the request using BS
    soup = BeautifulSoup(response.text, "html.parser")

    name = soup.find(id="productTitle").text.strip()
    price = soup.find(class_="priceToPay").text.strip()
    image = soup.find("img", {"id": "landingImage"})["src"]
    rating = soup.find(class_="AverageCustomerReviews").text.strip().split(" out of")[0]
    with open("output.csv", "a") as f:
        f.write('"' + name + '", "' + price + '", "' + image + '","' + rating + '" \n')
    f.close()

And voila, information about these 3 products should be saved into our CSV file.

Here’s our output:

Scrape Amazon Prices Behind Size or Select Input

Some Amazon products, like clothing or shoes, often have multiple size or color options, each with potentially different prices.

These options may not display a clear price until a specific size is selected, like Hanes Comfortsoft Tagless Underwear.

We will need to scrape the price for each size or option by interacting with the dropdown or size buttons dynamically.

Luckily, the Scrape.do API allows us to interact with the elements without requiring additional libraries.

Interact with Input Elements

As we have done in previous steps we need to start with inspecting elements that we are trying to access.

1. Locate the Size Dropdown:

We can access the select object using it’s ID which is dropdown_selected_size_name and we will click it using playWithBrowser parameter that is provided by Scrape.do.

2. Iterate Through Size Options:

We will also import urllib.parse for this step because we will be constructing our URL with the parameters that we want to pass onto our request.

After constructing our request we will route it through Scrape.do API with required parameters to interact with the elements we need, in this case it will be the dropdown that we have already inspected in previous step.

We can do this as follows:


import requests
import urllib.parse

# Our token provided by 'scrape.do'
token = "<your_token>"

targetUrl = urllib.parse.quote_plus("https://us.amazon.com/Calvin-Klein-Classics-Multipack-T-Shirts/dp/B08SM3QZJG/")

jsonData = '[{"Action": "Click","Selector":"#dropdown_selected_size_name"},' \
           '{ "Action": "Wait", "Timeout": 5000 }]'
encodedJsonData = urllib.parse.quote_plus(jsonData)
render = "true"

url = f"http://api.scrape.do?token={token}&url={targetUrl}&render={render}&playWithBrowser={encodedJsonData}&geoCode=us"

response = requests.request("GET", url)

This should be returning the product page with the specified dropdown clicked and dropdown options becomes visible.

Now we will store the options that became visible, and separately request each option’s page and scrape the information from them.

Let’s start with inspecting these dropdown options so we can access them in our response.

All of the dropdown options share the a-dropdown-link class which we can use to select all of them together. Also specific URL for each option is stored in data-value attribute.

We will be using this information to access every available option.

Let’s continue with our code from where we left of and we will also include Beautiful Souplibrary to parse our request:


<----- Previous section ----->

from bs4 import BeautifulSoup

# Parse the request using BS
soup = BeautifulSoup(response.text, "html.parser")
# This will return all <a> tags with the given class name as a list
size_options = soup.find_all("a", {"class": "a-dropdown-link"})

size_options_list = []
for size_option in size_options:
    if size_option.text != "    Select     ":
        size_options_list.append([size_option.text, size_option.get("data-value").split(",")[1][:-2]])

Our size_options_list variable should be looking like this:

[['Small', 'B08SM3QZJG'], ['Medium', 'B08SM3RGMG'], ['Large', 'B08SM2XF46'], ['X-Large', 'B08SM33PNR'], ['XX-Large', 'B09GPGVN3Z']]

Now we have our options stored in our size_options_list variable and it is time to iterate over this list.

When we navigate into any size option through the dropdown ?th=1&psc=1 parameters would be added to our request list.

We will also update our request with these parameters:


<----- Previous section ----->

prices = {}

for option in size_options_list:
    targetUrl = urllib.parse.quote_plus(f"https://us.amazon.com/Calvin-Klein-Classics-Multipack-T-Shirts/dp/"
                                        f"{option[1]}/?th=1&psc=1")
    urlOptions = f"http://api.scrape.do?token={token}&url={targetUrl}&geoCode=us"
    responseOptions = requests.request("GET", urlOptions)
    soupOptions = BeautifulSoup(responseOptions.text, "html.parser")
    price = soupOptions.find(class_="priceToPay").text.strip()

    size = option[0].strip()
    prices[size] = price

With this code piece added we will be iterating over each option, requesting the specific product pages for these options and finally we will store the price of each option into our prices dictionary.

We should end up with a dictionary with prices for each option:

{'Small': '$28.42', 'Medium': '$31.65', 'Large': '$37.95', 'X-Large': '$34.95', 'XX-Large': '$34.99'}

Export Data

We have collected all of the price information for the different sizes and only thing left to do is export them in an actionable format.

We will use CSV again and export all of the size options and their prices.

import csv

<----- Previous section ----->

# Scrape product details
product_name = soup.find(id="productTitle").text.strip()

# Export to CSV
header = ["Name"] + [f"{size} Price" for size in prices.keys()]
row = [product_name] + [price for price in prices.values()]

with open('out.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows([header, row])

We will end up with a CSV file that holds our data:

This is it, we have gone through all of the necessary steps and got our output for different size options.

Final Code Summary

Just for the ease of use you can find the previous sections merged together as a complete system as follows.

import csv
import requests
from bs4 import BeautifulSoup
import urllib.parse

# Our token provided by 'scrape.do'
token = "<your_token>"

targetUrl = urllib.parse.quote_plus("https://us.amazon.com/Calvin-Klein-Classics-Multipack-T-Shirts/dp/B08SM3QZJG/")

jsonData = '[{"Action": "Click","Selector":"#dropdown_selected_size_name"},' \
           '{ "Action": "Wait", "Timeout": 5000 }]'
encodedJsonData = urllib.parse.quote_plus(jsonData)
render = "true"
url = f"http://api.scrape.do?token={token}&url={targetUrl}&render={render}&playWithBrowser={encodedJsonData}&geoCode=us"
response = requests.request("GET", url)

# Parse the request using BS
soup = BeautifulSoup(response.text, "html.parser")
size_options = soup.find_all("a", {"class": "a-dropdown-link"})

size_options_list = []

for size_option in size_options:
    if size_option.text != "    Select     ":
        size_options_list.append([size_option.text, size_option.get("data-value").split(",")[1][:-2]])

prices = {}

for option in size_options_list:
    targetUrl = urllib.parse.quote_plus(f"https://us.amazon.com/Calvin-Klein-Classics-Multipack-T-Shirts/dp/"
                                        f"{option[1]}/?th=1&psc=1")
    urlOptions = f"http://api.scrape.do?token={token}&url={targetUrl}&geoCode=us"
    responseOptions = requests.request("GET", urlOptions)
    soupOptions = BeautifulSoup(responseOptions.text, "html.parser")
    price = soupOptions.find(class_="priceToPay").text.strip()

    size = option[0].strip()
    prices[size] = price

# Scrape product details
product_name = soup.find(id="productTitle").text.strip()

# Export to CSV
header = ["Name"] + [f"{size} Price" for size in prices.keys()]
row = [product_name] + [price for price in prices.values()]

with open('out.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows([header, row])

Scrape Amazon Categories and Search Results

Amazon’s categories for products and search results function exactly the same, so we’ll combine the tutorial for the two processes under one tutorial.

We’ll scrape product information and navigate through multiple pages of search results; parsing individual product data and handling pagination.

For this example we will use the laptop stands category.

We can either use the next-page button to iterate through the results or we can request each result page by adding this information in to the URL we are requesting.

Using the next-page button, while sounding more reasonable, might get blocked by Amazon because they are checking these navigation options for bot traffic.

Just to make sure we are not blocked while trying to navigate between pages we will request each search page one-by-one until we reach maximum amount of pages

Let’s start with our definitions:

import urllib.parse
import csv
import requests
from bs4 import BeautifulSoup

# Our token provided by 'scrape.do'
token = "<your_token>"

current_result_page = 1
max_result_page = 20

# Initialize list to store product data
all_products = []

Since we have 20 result pages for this category we set our max_result_page variable as 20.

Scrape Result Pages

We have everything we will need ready and we will loop through the different category pages now. We will break this loop once we reach the maximum amount of pages.

Each result element has the s-result-item as their class. We will go through all of theses results and scrape their name, price, link and images.

You can see the necessary selectors for each attribute defined below, and our code for this part should look like this:


<----- same as previous step ------>
# Loop through all result pages
while True:
    # break the loop when max page number is reached
    if current_result_page > max_result_page:
        break

    targetUrl = urllib.parse.quote("https://www.amazon.com/s?k=laptop+stands&page={}".format(current_result_page))
    apiUrl = "https://api.scrape.do?token={}&url={}".format(token, targetUrl)
    response = requests.request("GET", apiUrl)

    soup = BeautifulSoup(response.text, "html.parser")

    # Parse products on the current page
    product_elements = soup.find_all("div", {"class": "s-result-item"})

    for product in product_elements:
        try:
            # Extract product details
            name = product.select_one("h2 span").text
            try:
                price = str(product.select("span.a-price")).split('a-offscreen">')[1].split('</span>')[0]
            except:
                price = "Price not available"
            link = product.select_one(".a-link-normal").get("href")
            image = product.select_one("img").get("src")
            # Append data to the list
            if name:
                all_products.append({"Name": name, "Price": price, "Link": link, "Image": image})
        except:
            continue
    current_result_page += 1

And, here it is, we stored all of the product information into all_products variable. Only thing we need to do now is export them into a file.

Export Data to CSV

We will use CSV library to save all of this information into a file, we can achieve that as follows:

# Export the data to a CSV file
<----- same as previous step ------>
csv_file = "amazon_search_results.csv"
headers = ["Name", "Price", "Link", "Image"]

with open(csv_file, "w", newline="", encoding="utf-8") as file:
    writer = csv.DictWriter(file, fieldnames=headers)
    writer.writeheader()
    writer.writerows(all_products)

In the end our amazon_search_results.csv file should be looking like this:

Conclusion

Amazon has millions of products competing with each other, and if you’re trying to compete in the e-commerce stage you need the right data to generate actionable insights or set up automations.

This guide aimed to help you understand the basics of scraping Amazon, different ways you can get valuable data, and how to overcome simple challenges.

However, at the end of the day, whether you’re scraping just Amazon or multiple e-commerce sites, getting blocked is the last thing you’ll want.

This is where Scrape.do comes in the picture.

While you focus on taking impactful actions on the data you’ve scraped, Scrape.do handles:

  • Automated proxy rotation with 100M+ datacenter, mobile, and residential IPs,
  • Avoiding or solving CAPTCHAs
  • Handling TLS fingerprinting, header rotation, and user agents,
  • Monitoring and validating responses.

All at the fracture of a cost of doing all these yourself.

Start scraping today with 1000 free credits.

Frequently Asked Questions

Can you scrape Amazon for prices?

Answering this question from two different angles; yes, it is legal to scrape product prices on Amazon because they’re public data and yes, even a beginner developer can easily scrape them following this guide using Python.

How do you retrieve data from Amazon?

You can easily retrieve data from Amazon in 3 steps:

  1. Use a web scraping API or a stealth plugin to bypass Amazon’s firewall,
  2. Write a code in your preferred programming language that retrieves HTML responses and parses the data you want (more info in this guide)
  3. Input your target URLs and automate the process using workflows.