Asynchronous API

What is Asynchronous Scrape.do?

Asynchronous Scrape.do allows you to scrape websites asynchronously, meaning you can send multiple requests to the same website at the same time, and the requests will be processed in parallel. This is particularly useful for:

Scraping websites with large amounts of content
Processing batch scraping operations efficiently
Handling slow-loading websites without blocking your application
Managing large-scale data extraction projects

Instead of waiting for each request to complete, you create a job, receive a job ID immediately, and poll for results when ready.

Base URL

https://q.scrape.do:8000

Authentication

All Async API requests require authentication via the X-Token header:

curl --location 'https://q.scrape.do:8000/api/v1/jobs' \
  --header 'X-Token: YOUR_TOKEN' \
  --header 'Content-Type: application/json'

Creating a Job

POST `/api/v1/jobs`

Create a new asynchronous scraping job with specified targets and options.

Request Body:

{
  "Targets": ["https://httpbin.co/anything"],
  "Method": "GET",
  "Body": "optional post data",
  "GeoCode": "us",
  "RegionalGeoCode": "europe",
  "Super": true,
  "Headers": {
    "Content-Type": "application/json"
  },
  "ForwardHeaders": false,
  "SessionID": "12345",
  "Device": "desktop",
  "SetCookies": "cookie1=value1; cookie2=value2",
  "Timeout": 30000,
  "RetryTimeout": 5000,
  "DisableRetry": false,
  "TransparentResponse": false,
  "DisableRedirection": false,
  "Output": "raw",
  "Render": {
    "BlockResources": false,
    "WaitUntil": "domcontentloaded",
    "CustomWait": 1000,
    "WaitSelector": ".content-loaded",
    "PlayWithBrowser": [
      { "Action": "Click", "Selector": "#button_id" },
      { "Action": "Wait", "Timeout": 5000 }
    ],
    "ReturnJSON": true,
    "ShowWebsocketRequests": false,
    "ShowFrames": false,
    "Screenshot": false,
    "FullScreenshot": false,
    "ParticularScreenshot": "#home"
  },
  "WebhookURL": "https://example.com/callback",
  "WebhookHeaders": {
    "Authorization": "Bearer your-token-here",
    "X-Custom-Header": "custom-value"
  }
}

Parameters

Parameter	Type	Required	Default	Description
Targets	array	Yes	-	Array of URLs to scrape
Method	string	No	GET	HTTP method: GET, POST, PUT, PATCH, HEAD, DELETE
Body	string	No	-	HTTP request body for POST requests
GeoCode	string	No	-	Country code for geo-targeting
RegionalGeoCode	string	No	-	Regional code for geo-targeting
Super	boolean	No	false	Use residential/mobile proxies
Headers	object	No	-	Custom HTTP headers
ForwardHeaders	boolean	No	false	Use only provided headers (don't merge with Scrape.do headers)
SessionID	string	No	-	Sticky session ID to reuse same IP address
Device	string	No	desktop	Device type: desktop, mobile, tablet
SetCookies	string	No	-	Cookies to include with the request
Timeout	integer	No	60000	Total request timeout in milliseconds
RetryTimeout	integer	No	15000	Retry timeout per request in milliseconds (not working with Render)
DisableRetry	boolean	No	false	Disable automatic retry mechanism
TransparentResponse	boolean	No	false	Return raw target website response
DisableRedirection	boolean	No	false	Disable following redirects
Output	string	No	raw	Output format: raw or markdown (for LLM use)
WebhookURL	string	No	-	Webhook URL to send results to
WebhookHeaders	object	No	-	Additional headers to send with webhook request

Render Object Parameters

When using the Render parameter, you can configure headless browser behavior:

Parameter	Type	Default	Description
BlockResources	boolean	true	Block loading of resources (can't use with PlayWithBrowser or Screenshot)
WaitUntil	string	domcontentloaded	Event to wait for: domcontentloaded, networkidle0, networkidle2
CustomWait	integer	0	Custom wait time in milliseconds (0-35000)
WaitSelector	string	-	CSS selector to wait for
PlayWithBrowser	array	-	Array of browser interaction actions
ReturnJSON	boolean	false	Return response as JSON with network requests
ShowWebsocketRequests	boolean	false	Include websocket requests in response
ShowFrames	boolean	false	Include iframe content in response
Screenshot	boolean	false	Include screenshot in response
FullScreenshot	boolean	false	Include full page screenshot
ParticularScreenshot	string	-	CSS selector for partial screenshot

Response

{
  "JobID": "550e8400-e29b-41d4-a716-446655440000",
  "Message": "Job created successfully",
  "TaskIDs": [
    "660e8400-e29b-41d4-a716-446655440001",
    "660e8400-e29b-41d4-a716-446655440002"
  ]
}

Retrieving Job Details

GET `/api/v1/jobs/{jobID}`

Get details about a specific job, including all associated tasks.

Response:

{
  "JobID": "550e8400-e29b-41d4-a716-446655440000",
  "TaskIDs": [
    "660e8400-e29b-41d4-a716-446655440001"
  ],
  "Status": "success",
  "StartTime": "2024-01-01T10:00:00Z",
  "EndTime": "2024-01-01T10:00:05Z",
  "AcquiredConcurrency": 5,
  "LimitConcurrency": 10,
  "Canceled": false,
  "Tasks": [
    {
      "TaskID": "660e8400-e29b-41d4-a716-446655440001",
      "URL": "https://httpbin.co/anything",
      "Status": "success"
    }
  ]
}

Job Status Values

queuing - Job is being prepared
queued - Job is in queue waiting to be processed
pending - Job is currently being processed
rotating - Job is retrying with different proxies
success - Job completed successfully
error - Job failed
canceled - Job was canceled by user

Retrieving Task Details

GET `/api/v1/jobs/{jobID}/{taskID}`

Get detailed results from a specific task, including the scraped content.

Response:

{
  "TaskID": "660e8400-e29b-41d4-a716-446655440001",
  "JobID": "550e8400-e29b-41d4-a716-446655440000",
  "URL": "https://httpbin.co/anything",
  "Status": "success",
  "StartTime": "2024-01-01T10:00:00Z",
  "EndTime": "2024-01-01T10:00:05Z",
  "ExpiresAt": "2024-01-02T10:00:05Z",
  "UpdateTime": "2024-01-01T10:00:05Z",
  "Base64EncodedContent": false,
  "StatusCode": 200,
  "ResponseHeaders": {
    "Content-Type": "text/html"
  },
  "Scrape.do": {
    "Credits-Used": "1",
    "Remaining-Credits": "9999"
  },
  "Content": "<html>...</html>",
  "ErrorMessage": ""
}

Listing All Jobs

GET `/api/v1/jobs`

Get a paginated list of all your jobs.

Query Parameters:

Parameter	Type	Default	Description
page_size	integer	10	Number of items per page (max 100)
page	integer	1	Page number (minimum 1)

Response:

{
  "Jobs": [
    {
      "JobID": "550e8400-e29b-41d4-a716-446655440000",
      "Status": "success",
      "TaskIDs": ["..."],
      "StartTime": "2024-01-01T10:00:00Z",
      "EndTime": "2024-01-01T10:00:05Z"
    }
  ],
  "TotalCount": 150,
  "PageSize": 10,
  "PageNumber": 1,
  "TotalPages": 15
}

Canceling a Job

DELETE `/api/v1/jobs/{jobID}`

Cancel a running job. Note that jobs that are already completed or canceled cannot be canceled.

Response:

{
  "JobID": "550e8400-e29b-41d4-a716-446655440000",
  "Status": "canceled",
  "Canceled": true
}

Status Codes:

200 - Job canceled successfully
404 - Job not found
406 - Job already completed or canceled

Getting User Information

GET `/api/v1/me`

Get information about your account, including available concurrency and credits.

Response:

{
  "TotalConcurrency": 10,
  "FreeConcurrency": 7,
  "ActiveJobs": 3,
  "AvaliableCredits": 9999
}

Error Responses

All endpoints may return error responses in the following format:

{
  "Error": "Error message description",
  "Code": 400
}

Common Status Codes:

400 - Invalid request (bad parameters)
401 - Unauthorized (invalid or missing token)
404 - Resource not found
406 - Not acceptable (e.g., trying to cancel completed job)
429 - Too many requests (rate limited)
500 - Internal server error

Complete Example

Here's a complete example workflow:

# 1. Create a job
curl --location 'https://q.scrape.do:8000/api/v1/jobs' \
  --header 'Content-Type: application/json' \
  --header 'X-Token: YOUR_TOKEN' \
  --data '{
    "Targets": ["https://httpbin.co/anything"],
    "Super": true,
    "GeoCode": "us"
  }'

# Response: {"JobID": "550e8400...", "TaskIDs": ["660e8400..."]}

# 2. Check job status
curl --location 'https://q.scrape.do:8000/api/v1/jobs/550e8400...' \
  --header 'X-Token: YOUR_TOKEN'

# 3. Get task results
curl --location 'https://q.scrape.do:8000/api/v1/jobs/550e8400.../660e8400...' \
  --header 'X-Token: YOUR_TOKEN'

# 4. Check your account status
curl --location 'https://q.scrape.do:8000/api/v1/me' \
  --header 'X-Token: YOUR_TOKEN'

Best Practices

Polling: When checking job status, implement exponential backoff to avoid excessive API calls
Webhooks: For production use, configure WebhookURL to receive results automatically instead of polling
Error Handling: Always check the Status field in task responses and handle errors appropriately
Concurrency: Monitor your FreeConcurrency to ensure you don't exceed your account limits
Task Expiration: Retrieved task results before the ExpiresAt timestamp (results are stored temporarily)