Here's Why You're Getting 403 in Python Requests and How to Fix It
Python requests throwing 403 Forbidden errors isn’t just annoying. It’s a scraper killer.
If you’re automating data collection, hitting APIs, or scraping sites in 2025, you’ve already hit this wall: instant blocks, mysterious rejections, and that dreaded 403 Forbidden
response that stops your code dead.
You need solutions that actually work. Not theories. Not decade-old Stack Overflow answers. Not random header combinations that fail on the third request.
This guide breaks down the real causes and gives you working fixes with code that runs today.
Why 403 Forbidden Breaks Your Scraper (And Why It’s Not Going Away)
403 Forbidden means the server understood your request perfectly but refused to authorize it. Unlike 401 Unauthorized, this isn’t about missing authentication—it’s about being rejected despite the server knowing exactly what you want.
Think of it like reaching a club door: the bouncer sees you’re a person but still refuses entry based on other factors.
The stakes are real. When your scraper hits 403s, you lose:
- Data collection pipelines that feed business intelligence
- Price monitoring systems that track competitor moves
- Market research automation that finds opportunities
- API integrations that power your applications
Here’s what’s actually blocking you and how to fix each one.
The Real Blockers Behind 403 Errors
Missing or Wrong Headers
Sites expect browser-like headers: User-Agent
, Accept-Language
, Referer
, and others that signal “real user.”
Bare-bones requests are flagged instantly because they look nothing like actual browser traffic.
The problem:
import requests
# This gets blocked immediately
response = requests.get("https://httpbin.org/user-agent")
print(response.status_code) # Often returns 403
The fix:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive',
}
response = requests.get("https://httpbin.org/user-agent", headers=headers)
print(response.status_code) # Returns 200
print(response.json()) # Shows your user agent was accepted
Result: Status Code 200 with clean response data.
IP-Based Blocking
Many sites block entire IP ranges, especially datacenter IPs that are commonly used by bots and scrapers.
Residential IPs get through because they look like real users browsing from home.
The problem:
# Your datacenter IP gets flagged
response = requests.get("https://httpbin.org/ip")
# Site sees: "52.91.45.123" (AWS datacenter) → Block
The fix with Scrape.do:
import requests
import urllib.parse
token = "your-scrape-do-token"
target_url = "https://httpbin.org/ip"
encoded_url = urllib.parse.quote_plus(target_url)
# Route through residential proxies automatically
api_url = f"http://api.scrape.do/?token={token}&url={encoded_url}&super=true&geoCode=us"
response = requests.get(api_url)
print(response.json()) # Shows residential IP like "98.142.34.67"
Result: Site sees residential IP instead of datacenter → Allow.
Session and Cookie Requirements
Many sites require session cookies, CSRF tokens, or authentication cookies to be present on every request.
Ignoring cookie flows leads to instant 403s.
The problem:
# Direct request without session context
response = requests.get("https://example.com/protected-page")
# Returns 403 because no session cookies
The fix with session handling:
import requests
# Use session to persist cookies automatically
session = requests.Session()
# First request establishes session
session.get("https://example.com/login-page")
# Subsequent requests carry session cookies
response = session.get("https://example.com/protected-page")
print(response.status_code) # Returns 200 with session active
Result: Session cookies automatically included, access granted.
WAFs and Bot Protection Systems
Cloudflare, DataDome, Akamai, and other Web Application Firewalls analyze:
- TLS fingerprints - How your client negotiates SSL
- Request patterns - Timing, frequency, behavior
- JavaScript execution - Browser environment checks
- Header consistency - Whether headers match real browsers
Plain Python requests fails these checks because it doesn’t behave like a real browser.
The problem:
# Gets blocked by Cloudflare
response = requests.get("https://protected-site.com")
# Returns challenge page or 403
The fix using Scrape.do:
import requests
import urllib.parse
token = "your-scrape-do-token"
target_url = "https://protected-site.com"
encoded_url = urllib.parse.quote_plus(target_url)
# Scrape.do handles WAF bypass automatically
api_url = f"http://api.scrape.do/?token={token}&url={encoded_url}&super=true"
response = requests.get(api_url)
print(response.status_code) # Returns 200
print("Success! WAF bypassed")
Result: Clean HTML content with no challenge pages or blocks.
Working Solutions for Each 403 Scenario
Build Realistic Browser Headers
Send headers that match real browser patterns, not generic requests defaults.
import requests
from fake_useragent import UserAgent
def get_browser_headers():
ua = UserAgent()
return {
'User-Agent': ua.random,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Cache-Control': 'max-age=0',
}
# Test with realistic headers
headers = get_browser_headers()
response = requests.get("https://httpbin.org/headers", headers=headers)
print(f"Status: {response.status_code}")
print(f"Headers accepted: {response.json()['headers']['User-Agent']}")
This generates realistic browser headers that pass basic bot detection.
Handle Sessions and Cookies Properly
Use requests.Session()
to maintain state across requests and handle cookie requirements.
import requests
def scrape_with_session(base_url):
session = requests.Session()
# Set realistic headers for the session
session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
})
# Visit homepage first to get session cookies
homepage = session.get(f"{base_url}/")
print(f"Homepage status: {homepage.status_code}")
print(f"Cookies received: {len(session.cookies)}")
# Now access protected content with session
protected = session.get(f"{base_url}/protected")
print(f"Protected page status: {protected.status_code}")
return protected
# Example usage
result = scrape_with_session("https://httpbin.org")
Sessions automatically handle cookies, maintaining state between requests.
Add Request Delays and Randomization
Avoid triggering rate limits by spacing requests and varying timing patterns.
import requests
import time
import random
def scrape_with_delays(urls):
session = requests.Session()
results = []
for i, url in enumerate(urls):
# Random delay between 1-3 seconds
delay = random.uniform(1.0, 3.0)
print(f"Request {i+1}/{len(urls)}, waiting {delay:.1f}s...")
time.sleep(delay)
try:
response = session.get(url, timeout=10)
results.append({
'url': url,
'status': response.status_code,
'success': response.status_code == 200
})
print(f"✓ {url}: {response.status_code}")
except requests.RequestException as e:
print(f"✗ {url}: {e}")
results.append({'url': url, 'status': 'error', 'success': False})
return results
# Test with multiple URLs
urls = [
"https://httpbin.org/delay/1",
"https://httpbin.org/status/200",
"https://httpbin.org/json"
]
results = scrape_with_delays(urls)
success_rate = sum(1 for r in results if r['success']) / len(results)
print(f"Success rate: {success_rate:.1%}")
This approach mimics human browsing patterns and avoids rate limiting.
Use Scrape.do for Complete WAF Bypass
For production-level scraping, use a service that handles all protection layers automatically.
import requests
import urllib.parse
class ScrapeDOClient:
def __init__(self, token):
self.token = token
self.base_url = "http://api.scrape.do/"
def scrape(self, url, **params):
# Default parameters for best success rate
default_params = {
'token': self.token,
'url': urllib.parse.quote_plus(url),
'super': 'true', # Premium proxy rotation
'render': 'false', # Set to 'true' for JS-heavy sites
}
# Merge with custom parameters
default_params.update(params)
# Build API URL
param_string = '&'.join([f"{k}={v}" for k, v in default_params.items()])
api_url = f"{self.base_url}?{param_string}"
response = requests.get(api_url)
return response
def scrape_multiple(self, urls, delay=1):
results = []
for i, url in enumerate(urls):
print(f"Scraping {i+1}/{len(urls)}: {url}")
try:
response = self.scrape(url)
results.append({
'url': url,
'status_code': response.status_code,
'content_length': len(response.text),
'success': response.status_code == 200
})
print(f"✓ Status: {response.status_code}, Length: {len(response.text)}")
except Exception as e:
print(f"✗ Error: {e}")
results.append({'url': url, 'success': False, 'error': str(e)})
time.sleep(delay)
return results
# Example usage
client = ScrapeDOClient("your-token-here")
# Single request
response = client.scrape("https://protected-site.com")
print(f"Status: {response.status_code}")
# Multiple requests
urls = ["https://site1.com", "https://site2.com", "https://site3.com"]
results = client.scrape_multiple(urls)
success_rate = sum(1 for r in results if r.get('success')) / len(results)
print(f"Overall success rate: {success_rate:.1%}")
Scrape.do handles proxy rotation, header spoofing, and WAF bypass automatically.
Complete Working Example: 403-Proof Scraper
Here’s a production-ready scraper that combines all techniques:
import requests
import time
import random
import urllib.parse
from fake_useragent import UserAgent
import json
class RobustScraper:
def __init__(self, scrape_do_token=None):
self.session = requests.Session()
self.scrape_do_token = scrape_do_token
self.ua = UserAgent()
self.setup_session()
def setup_session(self):
"""Configure session with realistic headers"""
self.session.headers.update({
'User-Agent': self.ua.random,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
})
def scrape_direct(self, url, retries=3):
"""Direct scraping with session and headers"""
for attempt in range(retries):
try:
response = self.session.get(url, timeout=10)
if response.status_code == 200:
return response
elif response.status_code == 403:
print(f"403 error on attempt {attempt + 1}")
if attempt < retries - 1:
time.sleep(random.uniform(2, 5))
continue
except requests.RequestException as e:
print(f"Request error on attempt {attempt + 1}: {e}")
if attempt < retries - 1:
time.sleep(random.uniform(1, 3))
return None
def scrape_with_scrape_do(self, url, **params):
"""Fallback to Scrape.do for protected sites"""
if not self.scrape_do_token:
raise ValueError("Scrape.do token required for protected sites")
default_params = {
'token': self.scrape_do_token,
'url': urllib.parse.quote_plus(url),
'super': 'true',
}
default_params.update(params)
param_string = '&'.join([f"{k}={v}" for k, v in default_params.items()])
api_url = f"http://api.scrape.do/?{param_string}"
response = requests.get(api_url)
return response
def scrape(self, url, use_scrape_do=False, **kwargs):
"""Main scraping method with automatic fallback"""
print(f"Scraping: {url}")
if not use_scrape_do and self.scrape_do_token:
# Try direct first
result = self.scrape_direct(url)
if result and result.status_code == 200:
print(f"✓ Direct scraping successful: {result.status_code}")
return result
else:
print("Direct scraping failed, trying Scrape.do...")
result = self.scrape_with_scrape_do(url, **kwargs)
print(f"✓ Scrape.do result: {result.status_code}")
return result
else:
# Use Scrape.do directly
result = self.scrape_with_scrape_do(url, **kwargs)
print(f"✓ Scrape.do result: {result.status_code}")
return result
# Example usage
scraper = RobustScraper(scrape_do_token="your-token")
# Test URLs that commonly return 403
test_urls = [
"https://httpbin.org/status/403", # Always returns 403
"https://httpbin.org/headers", # Should work with headers
"https://httpbin.org/user-agent", # Should work with user agent
]
results = []
for url in test_urls:
response = scraper.scrape(url)
if response:
results.append({
'url': url,
'status': response.status_code,
'length': len(response.text)
})
else:
results.append({'url': url, 'status': 'failed'})
# Random delay between requests
time.sleep(random.uniform(1, 2))
# Print results
print("\nScraping Results:")
for result in results:
print(f"URL: {result['url']}")
print(f"Status: {result['status']}")
if 'length' in result:
print(f"Content Length: {result['length']} chars")
print("-" * 50)
This scraper automatically tries direct requests first, then falls back to Scrape.do for protected sites.
Troubleshooting Common 403 Issues
“403 Forbidden” with Correct Headers
Problem: Still getting 403 even with proper headers.
Diagnosis:
import requests
response = requests.get("https://example.com", headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
})
print(f"Status: {response.status_code}")
print(f"Headers sent: {response.request.headers}")
print(f"Response headers: {response.headers}")
Solution: Check if the site requires specific headers like Referer
or session cookies.
“403 Forbidden” After Several Successful Requests
Problem: First few requests work, then 403s start appearing.
Cause: Rate limiting or IP-based throttling.
Solution:
import time
import random
def scrape_with_backoff(urls, base_delay=1, max_delay=60):
delay = base_delay
for url in urls:
response = requests.get(url)
if response.status_code == 403:
print(f"Rate limited, backing off for {delay}s")
time.sleep(delay)
delay = min(delay * 2, max_delay) # Exponential backoff
else:
delay = base_delay # Reset delay on success
print(f"Success: {response.status_code}")
time.sleep(random.uniform(0.5, 1.5)) # Random jitter
“403 Forbidden” on API Endpoints
Problem: API returns 403 even with valid authentication.
Diagnosis:
response = requests.get("https://api.example.com/data",
headers={'Authorization': 'Bearer your-token'})
if response.status_code == 403:
try:
error_details = response.json()
print(f"API Error: {error_details}")
except:
print(f"Raw response: {response.text}")
Common causes:
- Expired tokens
- Insufficient permissions
- IP whitelist restrictions
- API rate limits exceeded
“403 Forbidden” with JavaScript-Heavy Sites
Problem: Site loads fine in browser but returns 403 in Python.
Solution: The site requires JavaScript execution for authentication.
# Use Scrape.do with rendering enabled
import requests
import urllib.parse
token = "your-scrape-do-token"
url = "https://js-protected-site.com"
api_url = f"http://api.scrape.do/?token={token}&url={urllib.parse.quote_plus(url)}&render=true&super=true"
response = requests.get(api_url)
print(f"Status: {response.status_code}")
print(f"Content length: {len(response.text)}")
The render=true
parameter executes JavaScript and handles browser-like authentication flows.
When to Use Each Solution
Scenario | Best Solution | Why |
---|---|---|
Simple header blocking | Browser headers + session | Fast and lightweight |
IP-based blocking | Scrape.do with super=true | Residential proxy rotation |
Rate limiting | Request delays + backoff | Respects site limits |
Session requirements | requests.Session() | Maintains cookies automatically |
JavaScript challenges | Scrape.do with render=true | Full browser environment |
WAF protection | Scrape.do super=true | Professional WAF bypass |
Production scraping | Scrape.do | Handles all protection layers |
Conclusion
403 Forbidden errors are a signal of protection, not a dead end.
Modern sites use multiple layers of bot detection: header analysis, IP reputation, behavioral patterns, and JavaScript challenges. Your scraper needs to handle all of them.
With browser-like headers, smart session handling, request pacing, and when needed, a service like Scrape.do that handles WAF bypass automatically, you can reliably get past 403 errors.
The key is understanding what’s blocking you and applying the right solution for each protection layer.