Category: Scraping errors

Here's Why You're Getting 403 in Python Requests and How to Fix It

9 mins read Created Date: September 18, 2025   Updated Date: September 18, 2025

Python requests throwing 403 Forbidden errors isn’t just annoying. It’s a scraper killer.

If you’re automating data collection, hitting APIs, or scraping sites in 2025, you’ve already hit this wall: instant blocks, mysterious rejections, and that dreaded 403 Forbidden response that stops your code dead.

You need solutions that actually work. Not theories. Not decade-old Stack Overflow answers. Not random header combinations that fail on the third request.

This guide breaks down the real causes and gives you working fixes with code that runs today.

Why 403 Forbidden Breaks Your Scraper (And Why It’s Not Going Away)

403 Forbidden means the server understood your request perfectly but refused to authorize it. Unlike 401 Unauthorized, this isn’t about missing authentication—it’s about being rejected despite the server knowing exactly what you want.

Think of it like reaching a club door: the bouncer sees you’re a person but still refuses entry based on other factors.

The stakes are real. When your scraper hits 403s, you lose:

  • Data collection pipelines that feed business intelligence
  • Price monitoring systems that track competitor moves
  • Market research automation that finds opportunities
  • API integrations that power your applications

Here’s what’s actually blocking you and how to fix each one.

The Real Blockers Behind 403 Errors

Missing or Wrong Headers

Sites expect browser-like headers: User-Agent, Accept-Language, Referer, and others that signal “real user.”

Bare-bones requests are flagged instantly because they look nothing like actual browser traffic.

The problem:

import requests

# This gets blocked immediately
response = requests.get("https://httpbin.org/user-agent")
print(response.status_code)  # Often returns 403

The fix:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate',
    'Connection': 'keep-alive',
}

response = requests.get("https://httpbin.org/user-agent", headers=headers)
print(response.status_code)  # Returns 200
print(response.json())  # Shows your user agent was accepted

Result: Status Code 200 with clean response data.

IP-Based Blocking

Many sites block entire IP ranges, especially datacenter IPs that are commonly used by bots and scrapers.

Residential IPs get through because they look like real users browsing from home.

The problem:

# Your datacenter IP gets flagged
response = requests.get("https://httpbin.org/ip")
# Site sees: "52.91.45.123" (AWS datacenter) → Block

The fix with Scrape.do:

import requests
import urllib.parse

token = "your-scrape-do-token"
target_url = "https://httpbin.org/ip"
encoded_url = urllib.parse.quote_plus(target_url)

# Route through residential proxies automatically
api_url = f"http://api.scrape.do/?token={token}&url={encoded_url}&super=true&geoCode=us"
response = requests.get(api_url)

print(response.json())  # Shows residential IP like "98.142.34.67"

Result: Site sees residential IP instead of datacenter → Allow.

Many sites require session cookies, CSRF tokens, or authentication cookies to be present on every request.

Ignoring cookie flows leads to instant 403s.

The problem:

# Direct request without session context
response = requests.get("https://example.com/protected-page")
# Returns 403 because no session cookies

The fix with session handling:

import requests

# Use session to persist cookies automatically
session = requests.Session()

# First request establishes session
session.get("https://example.com/login-page")

# Subsequent requests carry session cookies
response = session.get("https://example.com/protected-page")
print(response.status_code)  # Returns 200 with session active

Result: Session cookies automatically included, access granted.

WAFs and Bot Protection Systems

Cloudflare, DataDome, Akamai, and other Web Application Firewalls analyze:

  • TLS fingerprints - How your client negotiates SSL
  • Request patterns - Timing, frequency, behavior
  • JavaScript execution - Browser environment checks
  • Header consistency - Whether headers match real browsers

Plain Python requests fails these checks because it doesn’t behave like a real browser.

The problem:

# Gets blocked by Cloudflare
response = requests.get("https://protected-site.com")
# Returns challenge page or 403

The fix using Scrape.do:

import requests
import urllib.parse

token = "your-scrape-do-token"
target_url = "https://protected-site.com"
encoded_url = urllib.parse.quote_plus(target_url)

# Scrape.do handles WAF bypass automatically
api_url = f"http://api.scrape.do/?token={token}&url={encoded_url}&super=true"
response = requests.get(api_url)

print(response.status_code)  # Returns 200
print("Success! WAF bypassed")

Result: Clean HTML content with no challenge pages or blocks.

Working Solutions for Each 403 Scenario

Build Realistic Browser Headers

Send headers that match real browser patterns, not generic requests defaults.

import requests
from fake_useragent import UserAgent

def get_browser_headers():
    ua = UserAgent()
    return {
        'User-Agent': ua.random,
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Connection': 'keep-alive',
        'Upgrade-Insecure-Requests': '1',
        'Sec-Fetch-Dest': 'document',
        'Sec-Fetch-Mode': 'navigate',
        'Sec-Fetch-Site': 'none',
        'Cache-Control': 'max-age=0',
    }

# Test with realistic headers
headers = get_browser_headers()
response = requests.get("https://httpbin.org/headers", headers=headers)
print(f"Status: {response.status_code}")
print(f"Headers accepted: {response.json()['headers']['User-Agent']}")

This generates realistic browser headers that pass basic bot detection.

Handle Sessions and Cookies Properly

Use requests.Session() to maintain state across requests and handle cookie requirements.

import requests

def scrape_with_session(base_url):
    session = requests.Session()

    # Set realistic headers for the session
    session.headers.update({
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    })

    # Visit homepage first to get session cookies
    homepage = session.get(f"{base_url}/")
    print(f"Homepage status: {homepage.status_code}")
    print(f"Cookies received: {len(session.cookies)}")

    # Now access protected content with session
    protected = session.get(f"{base_url}/protected")
    print(f"Protected page status: {protected.status_code}")

    return protected

# Example usage
result = scrape_with_session("https://httpbin.org")

Sessions automatically handle cookies, maintaining state between requests.

Add Request Delays and Randomization

Avoid triggering rate limits by spacing requests and varying timing patterns.

import requests
import time
import random

def scrape_with_delays(urls):
    session = requests.Session()
    results = []

    for i, url in enumerate(urls):
        # Random delay between 1-3 seconds
        delay = random.uniform(1.0, 3.0)
        print(f"Request {i+1}/{len(urls)}, waiting {delay:.1f}s...")
        time.sleep(delay)

        try:
            response = session.get(url, timeout=10)
            results.append({
                'url': url,
                'status': response.status_code,
                'success': response.status_code == 200
            })
            print(f"✓ {url}: {response.status_code}")
        except requests.RequestException as e:
            print(f"✗ {url}: {e}")
            results.append({'url': url, 'status': 'error', 'success': False})

    return results

# Test with multiple URLs
urls = [
    "https://httpbin.org/delay/1",
    "https://httpbin.org/status/200",
    "https://httpbin.org/json"
]

results = scrape_with_delays(urls)
success_rate = sum(1 for r in results if r['success']) / len(results)
print(f"Success rate: {success_rate:.1%}")

This approach mimics human browsing patterns and avoids rate limiting.

Use Scrape.do for Complete WAF Bypass

For production-level scraping, use a service that handles all protection layers automatically.

import requests
import urllib.parse

class ScrapeDOClient:
    def __init__(self, token):
        self.token = token
        self.base_url = "http://api.scrape.do/"

    def scrape(self, url, **params):
        # Default parameters for best success rate
        default_params = {
            'token': self.token,
            'url': urllib.parse.quote_plus(url),
            'super': 'true',  # Premium proxy rotation
            'render': 'false',  # Set to 'true' for JS-heavy sites
        }

        # Merge with custom parameters
        default_params.update(params)

        # Build API URL
        param_string = '&'.join([f"{k}={v}" for k, v in default_params.items()])
        api_url = f"{self.base_url}?{param_string}"

        response = requests.get(api_url)
        return response

    def scrape_multiple(self, urls, delay=1):
        results = []
        for i, url in enumerate(urls):
            print(f"Scraping {i+1}/{len(urls)}: {url}")

            try:
                response = self.scrape(url)
                results.append({
                    'url': url,
                    'status_code': response.status_code,
                    'content_length': len(response.text),
                    'success': response.status_code == 200
                })
                print(f"✓ Status: {response.status_code}, Length: {len(response.text)}")
            except Exception as e:
                print(f"✗ Error: {e}")
                results.append({'url': url, 'success': False, 'error': str(e)})

            time.sleep(delay)

        return results

# Example usage
client = ScrapeDOClient("your-token-here")

# Single request
response = client.scrape("https://protected-site.com")
print(f"Status: {response.status_code}")

# Multiple requests
urls = ["https://site1.com", "https://site2.com", "https://site3.com"]
results = client.scrape_multiple(urls)
success_rate = sum(1 for r in results if r.get('success')) / len(results)
print(f"Overall success rate: {success_rate:.1%}")

Scrape.do handles proxy rotation, header spoofing, and WAF bypass automatically.

Complete Working Example: 403-Proof Scraper

Here’s a production-ready scraper that combines all techniques:

import requests
import time
import random
import urllib.parse
from fake_useragent import UserAgent
import json

class RobustScraper:
    def __init__(self, scrape_do_token=None):
        self.session = requests.Session()
        self.scrape_do_token = scrape_do_token
        self.ua = UserAgent()
        self.setup_session()

    def setup_session(self):
        """Configure session with realistic headers"""
        self.session.headers.update({
            'User-Agent': self.ua.random,
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.9',
            'Accept-Encoding': 'gzip, deflate, br',
            'Connection': 'keep-alive',
            'Upgrade-Insecure-Requests': '1',
        })

    def scrape_direct(self, url, retries=3):
        """Direct scraping with session and headers"""
        for attempt in range(retries):
            try:
                response = self.session.get(url, timeout=10)
                if response.status_code == 200:
                    return response
                elif response.status_code == 403:
                    print(f"403 error on attempt {attempt + 1}")
                    if attempt < retries - 1:
                        time.sleep(random.uniform(2, 5))
                        continue
            except requests.RequestException as e:
                print(f"Request error on attempt {attempt + 1}: {e}")
                if attempt < retries - 1:
                    time.sleep(random.uniform(1, 3))

        return None

    def scrape_with_scrape_do(self, url, **params):
        """Fallback to Scrape.do for protected sites"""
        if not self.scrape_do_token:
            raise ValueError("Scrape.do token required for protected sites")

        default_params = {
            'token': self.scrape_do_token,
            'url': urllib.parse.quote_plus(url),
            'super': 'true',
        }
        default_params.update(params)

        param_string = '&'.join([f"{k}={v}" for k, v in default_params.items()])
        api_url = f"http://api.scrape.do/?{param_string}"

        response = requests.get(api_url)
        return response

    def scrape(self, url, use_scrape_do=False, **kwargs):
        """Main scraping method with automatic fallback"""
        print(f"Scraping: {url}")

        if not use_scrape_do and self.scrape_do_token:
            # Try direct first
            result = self.scrape_direct(url)
            if result and result.status_code == 200:
                print(f"✓ Direct scraping successful: {result.status_code}")
                return result
            else:
                print("Direct scraping failed, trying Scrape.do...")
                result = self.scrape_with_scrape_do(url, **kwargs)
                print(f"✓ Scrape.do result: {result.status_code}")
                return result
        else:
            # Use Scrape.do directly
            result = self.scrape_with_scrape_do(url, **kwargs)
            print(f"✓ Scrape.do result: {result.status_code}")
            return result

# Example usage
scraper = RobustScraper(scrape_do_token="your-token")

# Test URLs that commonly return 403
test_urls = [
    "https://httpbin.org/status/403",  # Always returns 403
    "https://httpbin.org/headers",     # Should work with headers
    "https://httpbin.org/user-agent",  # Should work with user agent
]

results = []
for url in test_urls:
    response = scraper.scrape(url)
    if response:
        results.append({
            'url': url,
            'status': response.status_code,
            'length': len(response.text)
        })
    else:
        results.append({'url': url, 'status': 'failed'})

    # Random delay between requests
    time.sleep(random.uniform(1, 2))

# Print results
print("\nScraping Results:")
for result in results:
    print(f"URL: {result['url']}")
    print(f"Status: {result['status']}")
    if 'length' in result:
        print(f"Content Length: {result['length']} chars")
    print("-" * 50)

This scraper automatically tries direct requests first, then falls back to Scrape.do for protected sites.

Troubleshooting Common 403 Issues

“403 Forbidden” with Correct Headers

Problem: Still getting 403 even with proper headers.

Diagnosis:

import requests

response = requests.get("https://example.com", headers={
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
})

print(f"Status: {response.status_code}")
print(f"Headers sent: {response.request.headers}")
print(f"Response headers: {response.headers}")

Solution: Check if the site requires specific headers like Referer or session cookies.

“403 Forbidden” After Several Successful Requests

Problem: First few requests work, then 403s start appearing.

Cause: Rate limiting or IP-based throttling.

Solution:

import time
import random

def scrape_with_backoff(urls, base_delay=1, max_delay=60):
    delay = base_delay

    for url in urls:
        response = requests.get(url)

        if response.status_code == 403:
            print(f"Rate limited, backing off for {delay}s")
            time.sleep(delay)
            delay = min(delay * 2, max_delay)  # Exponential backoff
        else:
            delay = base_delay  # Reset delay on success
            print(f"Success: {response.status_code}")

        time.sleep(random.uniform(0.5, 1.5))  # Random jitter

“403 Forbidden” on API Endpoints

Problem: API returns 403 even with valid authentication.

Diagnosis:

response = requests.get("https://api.example.com/data",
                       headers={'Authorization': 'Bearer your-token'})

if response.status_code == 403:
    try:
        error_details = response.json()
        print(f"API Error: {error_details}")
    except:
        print(f"Raw response: {response.text}")

Common causes:

  • Expired tokens
  • Insufficient permissions
  • IP whitelist restrictions
  • API rate limits exceeded

“403 Forbidden” with JavaScript-Heavy Sites

Problem: Site loads fine in browser but returns 403 in Python.

Solution: The site requires JavaScript execution for authentication.

# Use Scrape.do with rendering enabled
import requests
import urllib.parse

token = "your-scrape-do-token"
url = "https://js-protected-site.com"

api_url = f"http://api.scrape.do/?token={token}&url={urllib.parse.quote_plus(url)}&render=true&super=true"
response = requests.get(api_url)

print(f"Status: {response.status_code}")
print(f"Content length: {len(response.text)}")

The render=true parameter executes JavaScript and handles browser-like authentication flows.

When to Use Each Solution

Scenario Best Solution Why
Simple header blocking Browser headers + session Fast and lightweight
IP-based blocking Scrape.do with super=true Residential proxy rotation
Rate limiting Request delays + backoff Respects site limits
Session requirements requests.Session() Maintains cookies automatically
JavaScript challenges Scrape.do with render=true Full browser environment
WAF protection Scrape.do super=true Professional WAF bypass
Production scraping Scrape.do Handles all protection layers

Conclusion

403 Forbidden errors are a signal of protection, not a dead end.

Modern sites use multiple layers of bot detection: header analysis, IP reputation, behavioral patterns, and JavaScript challenges. Your scraper needs to handle all of them.

With browser-like headers, smart session handling, request pacing, and when needed, a service like Scrape.do that handles WAF bypass automatically, you can reliably get past 403 errors.

The key is understanding what’s blocking you and applying the right solution for each protection layer.

Get 1000 free credits and start scraping with Scrape.do