Asynchronous API
Process large-scale scraping jobs with the Async API
What is Asynchronous Scrape.do?
Asynchronous Scrape.do allows you to scrape websites asynchronously, meaning you can send multiple requests to the same website at the same time, and the requests will be processed in parallel. This is particularly useful for:
- Scraping websites with large amounts of content
- Processing batch scraping operations efficiently
- Handling slow-loading websites without blocking your application
- Managing large-scale data extraction projects
Instead of waiting for each request to complete, you create a job, receive a job ID immediately, and poll for results when ready.
Base URL
https://q.scrape.do:8000Authentication
All Async API requests require authentication via the X-Token header:
curl --location 'https://q.scrape.do:8000/api/v1/jobs' \
--header 'X-Token: YOUR_TOKEN' \
--header 'Content-Type: application/json'Creating a Job
POST /api/v1/jobs
Create a new asynchronous scraping job with specified targets and options.
Request Body:
{
"Targets": ["https://httpbin.co/anything"],
"Method": "GET",
"Body": "optional post data",
"GeoCode": "us",
"RegionalGeoCode": "europe",
"Super": true,
"Headers": {
"Content-Type": "application/json"
},
"ForwardHeaders": false,
"SessionID": "12345",
"Device": "desktop",
"SetCookies": "cookie1=value1; cookie2=value2",
"Timeout": 30000,
"RetryTimeout": 5000,
"DisableRetry": false,
"TransparentResponse": false,
"DisableRedirection": false,
"Output": "raw",
"Render": {
"BlockResources": false,
"WaitUntil": "domcontentloaded",
"CustomWait": 1000,
"WaitSelector": ".content-loaded",
"PlayWithBrowser": [
{ "Action": "Click", "Selector": "#button_id" },
{ "Action": "Wait", "Timeout": 5000 }
],
"ReturnJSON": true,
"ShowWebsocketRequests": false,
"ShowFrames": false,
"Screenshot": false,
"FullScreenshot": false,
"ParticularScreenshot": "#home"
},
"WebhookURL": "https://example.com/callback",
"WebhookHeaders": {
"Authorization": "Bearer your-token-here",
"X-Custom-Header": "custom-value"
}
}Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| Targets | array | Yes | - | Array of URLs to scrape |
| Method | string | No | GET | HTTP method: GET, POST, PUT, PATCH, HEAD, DELETE |
| Body | string | No | - | HTTP request body for POST requests |
| GeoCode | string | No | - | Country code for geo-targeting |
| RegionalGeoCode | string | No | - | Regional code for geo-targeting |
| Super | boolean | No | false | Use residential/mobile proxies |
| Headers | object | No | - | Custom HTTP headers |
| ForwardHeaders | boolean | No | false | Use only provided headers (don't merge with Scrape.do headers) |
| SessionID | string | No | - | Sticky session ID to reuse same IP address |
| Device | string | No | desktop | Device type: desktop, mobile, tablet |
| SetCookies | string | No | - | Cookies to include with the request |
| Timeout | integer | No | 60000 | Total request timeout in milliseconds |
| RetryTimeout | integer | No | 15000 | Retry timeout per request in milliseconds (not working with Render) |
| DisableRetry | boolean | No | false | Disable automatic retry mechanism |
| TransparentResponse | boolean | No | false | Return raw target website response |
| DisableRedirection | boolean | No | false | Disable following redirects |
| Output | string | No | raw | Output format: raw or markdown (for LLM use) |
| WebhookURL | string | No | - | Webhook URL to send results to |
| WebhookHeaders | object | No | - | Additional headers to send with webhook request |
Render Object Parameters
When using the Render parameter, you can configure headless browser behavior:
| Parameter | Type | Default | Description |
|---|---|---|---|
| BlockResources | boolean | true | Block loading of resources (can't use with PlayWithBrowser or Screenshot) |
| WaitUntil | string | domcontentloaded | Event to wait for: domcontentloaded, networkidle0, networkidle2 |
| CustomWait | integer | 0 | Custom wait time in milliseconds (0-35000) |
| WaitSelector | string | - | CSS selector to wait for |
| PlayWithBrowser | array | - | Array of browser interaction actions |
| ReturnJSON | boolean | false | Return response as JSON with network requests |
| ShowWebsocketRequests | boolean | false | Include websocket requests in response |
| ShowFrames | boolean | false | Include iframe content in response |
| Screenshot | boolean | false | Include screenshot in response |
| FullScreenshot | boolean | false | Include full page screenshot |
| ParticularScreenshot | string | - | CSS selector for partial screenshot |
Response
{
"JobID": "550e8400-e29b-41d4-a716-446655440000",
"Message": "Job created successfully",
"TaskIDs": [
"660e8400-e29b-41d4-a716-446655440001",
"660e8400-e29b-41d4-a716-446655440002"
]
}Retrieving Job Details
GET /api/v1/jobs/{jobID}
Get details about a specific job, including all associated tasks.
Response:
{
"JobID": "550e8400-e29b-41d4-a716-446655440000",
"TaskIDs": [
"660e8400-e29b-41d4-a716-446655440001"
],
"Status": "success",
"StartTime": "2024-01-01T10:00:00Z",
"EndTime": "2024-01-01T10:00:05Z",
"AcquiredConcurrency": 5,
"LimitConcurrency": 10,
"Canceled": false,
"Tasks": [
{
"TaskID": "660e8400-e29b-41d4-a716-446655440001",
"URL": "https://httpbin.co/anything",
"Status": "success"
}
]
}Job Status Values
- queuing - Job is being prepared
- queued - Job is in queue waiting to be processed
- pending - Job is currently being processed
- rotating - Job is retrying with different proxies
- success - Job completed successfully
- error - Job failed
- canceled - Job was canceled by user
Retrieving Task Details
GET /api/v1/jobs/{jobID}/{taskID}
Get detailed results from a specific task, including the scraped content.
Response:
{
"TaskID": "660e8400-e29b-41d4-a716-446655440001",
"JobID": "550e8400-e29b-41d4-a716-446655440000",
"URL": "https://httpbin.co/anything",
"Status": "success",
"StartTime": "2024-01-01T10:00:00Z",
"EndTime": "2024-01-01T10:00:05Z",
"ExpiresAt": "2024-01-02T10:00:05Z",
"UpdateTime": "2024-01-01T10:00:05Z",
"Base64EncodedContent": false,
"StatusCode": 200,
"ResponseHeaders": {
"Content-Type": "text/html"
},
"Scrape.do": {
"Credits-Used": "1",
"Remaining-Credits": "9999"
},
"Content": "<html>...</html>",
"ErrorMessage": ""
}Listing All Jobs
GET /api/v1/jobs
Get a paginated list of all your jobs.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
| page_size | integer | 10 | Number of items per page (max 100) |
| page | integer | 1 | Page number (minimum 1) |
Response:
{
"Jobs": [
{
"JobID": "550e8400-e29b-41d4-a716-446655440000",
"Status": "success",
"TaskIDs": ["..."],
"StartTime": "2024-01-01T10:00:00Z",
"EndTime": "2024-01-01T10:00:05Z"
}
],
"TotalCount": 150,
"PageSize": 10,
"PageNumber": 1,
"TotalPages": 15
}Canceling a Job
DELETE /api/v1/jobs/{jobID}
Cancel a running job. Note that jobs that are already completed or canceled cannot be canceled.
Response:
{
"JobID": "550e8400-e29b-41d4-a716-446655440000",
"Status": "canceled",
"Canceled": true
}Status Codes:
- 200 - Job canceled successfully
- 404 - Job not found
- 406 - Job already completed or canceled
Getting User Information
GET /api/v1/me
Get information about your account, including available concurrency and credits.
Response:
{
"TotalConcurrency": 10,
"FreeConcurrency": 7,
"ActiveJobs": 3,
"AvaliableCredits": 9999
}Error Responses
All endpoints may return error responses in the following format:
{
"Error": "Error message description",
"Code": 400
}Common Status Codes:
- 400 - Invalid request (bad parameters)
- 401 - Unauthorized (invalid or missing token)
- 404 - Resource not found
- 406 - Not acceptable (e.g., trying to cancel completed job)
- 429 - Too many requests (rate limited)
- 500 - Internal server error
Complete Example
Here's a complete example workflow:
# 1. Create a job
curl --location 'https://q.scrape.do:8000/api/v1/jobs' \
--header 'Content-Type: application/json' \
--header 'X-Token: YOUR_TOKEN' \
--data '{
"Targets": ["https://httpbin.co/anything"],
"Super": true,
"GeoCode": "us"
}'
# Response: {"JobID": "550e8400...", "TaskIDs": ["660e8400..."]}
# 2. Check job status
curl --location 'https://q.scrape.do:8000/api/v1/jobs/550e8400...' \
--header 'X-Token: YOUR_TOKEN'
# 3. Get task results
curl --location 'https://q.scrape.do:8000/api/v1/jobs/550e8400.../660e8400...' \
--header 'X-Token: YOUR_TOKEN'
# 4. Check your account status
curl --location 'https://q.scrape.do:8000/api/v1/me' \
--header 'X-Token: YOUR_TOKEN'Best Practices
- Polling: When checking job status, implement exponential backoff to avoid excessive API calls
- Webhooks: For production use, configure
WebhookURLto receive results automatically instead of polling - Error Handling: Always check the
Statusfield in task responses and handle errors appropriately - Concurrency: Monitor your
FreeConcurrencyto ensure you don't exceed your account limits - Task Expiration: Retrieved task results before the
ExpiresAttimestamp (results are stored temporarily)

