logo

Asynchronous API

Process large-scale scraping jobs with the Async API

What is Asynchronous Scrape.do?

Asynchronous Scrape.do allows you to scrape websites asynchronously, meaning you can send multiple requests to the same website at the same time, and the requests will be processed in parallel. This is particularly useful for:

  • Scraping websites with large amounts of content
  • Processing batch scraping operations efficiently
  • Handling slow-loading websites without blocking your application
  • Managing large-scale data extraction projects

Instead of waiting for each request to complete, you create a job, receive a job ID immediately, and poll for results when ready.


Base URL

https://q.scrape.do:8000

Authentication

All Async API requests require authentication via the X-Token header:

curl --location 'https://q.scrape.do:8000/api/v1/jobs' \
  --header 'X-Token: YOUR_TOKEN' \
  --header 'Content-Type: application/json'

Creating a Job

POST /api/v1/jobs

Create a new asynchronous scraping job with specified targets and options.

Request Body:

{
  "Targets": ["https://httpbin.co/anything"],
  "Method": "GET",
  "Body": "optional post data",
  "GeoCode": "us",
  "RegionalGeoCode": "europe",
  "Super": true,
  "Headers": {
    "Content-Type": "application/json"
  },
  "ForwardHeaders": false,
  "SessionID": "12345",
  "Device": "desktop",
  "SetCookies": "cookie1=value1; cookie2=value2",
  "Timeout": 30000,
  "RetryTimeout": 5000,
  "DisableRetry": false,
  "TransparentResponse": false,
  "DisableRedirection": false,
  "Output": "raw",
  "Render": {
    "BlockResources": false,
    "WaitUntil": "domcontentloaded",
    "CustomWait": 1000,
    "WaitSelector": ".content-loaded",
    "PlayWithBrowser": [
      { "Action": "Click", "Selector": "#button_id" },
      { "Action": "Wait", "Timeout": 5000 }
    ],
    "ReturnJSON": true,
    "ShowWebsocketRequests": false,
    "ShowFrames": false,
    "Screenshot": false,
    "FullScreenshot": false,
    "ParticularScreenshot": "#home"
  },
  "WebhookURL": "https://example.com/callback",
  "WebhookHeaders": {
    "Authorization": "Bearer your-token-here",
    "X-Custom-Header": "custom-value"
  }
}

Parameters

ParameterTypeRequiredDefaultDescription
TargetsarrayYes-Array of URLs to scrape
MethodstringNoGETHTTP method: GET, POST, PUT, PATCH, HEAD, DELETE
BodystringNo-HTTP request body for POST requests
GeoCodestringNo-Country code for geo-targeting
RegionalGeoCodestringNo-Regional code for geo-targeting
SuperbooleanNofalseUse residential/mobile proxies
HeadersobjectNo-Custom HTTP headers
ForwardHeadersbooleanNofalseUse only provided headers (don't merge with Scrape.do headers)
SessionIDstringNo-Sticky session ID to reuse same IP address
DevicestringNodesktopDevice type: desktop, mobile, tablet
SetCookiesstringNo-Cookies to include with the request
TimeoutintegerNo60000Total request timeout in milliseconds
RetryTimeoutintegerNo15000Retry timeout per request in milliseconds (not working with Render)
DisableRetrybooleanNofalseDisable automatic retry mechanism
TransparentResponsebooleanNofalseReturn raw target website response
DisableRedirectionbooleanNofalseDisable following redirects
OutputstringNorawOutput format: raw or markdown (for LLM use)
WebhookURLstringNo-Webhook URL to send results to
WebhookHeadersobjectNo-Additional headers to send with webhook request

Render Object Parameters

When using the Render parameter, you can configure headless browser behavior:

ParameterTypeDefaultDescription
BlockResourcesbooleantrueBlock loading of resources (can't use with PlayWithBrowser or Screenshot)
WaitUntilstringdomcontentloadedEvent to wait for: domcontentloaded, networkidle0, networkidle2
CustomWaitinteger0Custom wait time in milliseconds (0-35000)
WaitSelectorstring-CSS selector to wait for
PlayWithBrowserarray-Array of browser interaction actions
ReturnJSONbooleanfalseReturn response as JSON with network requests
ShowWebsocketRequestsbooleanfalseInclude websocket requests in response
ShowFramesbooleanfalseInclude iframe content in response
ScreenshotbooleanfalseInclude screenshot in response
FullScreenshotbooleanfalseInclude full page screenshot
ParticularScreenshotstring-CSS selector for partial screenshot

Response

{
  "JobID": "550e8400-e29b-41d4-a716-446655440000",
  "Message": "Job created successfully",
  "TaskIDs": [
    "660e8400-e29b-41d4-a716-446655440001",
    "660e8400-e29b-41d4-a716-446655440002"
  ]
}

Retrieving Job Details

GET /api/v1/jobs/{jobID}

Get details about a specific job, including all associated tasks.

Response:

{
  "JobID": "550e8400-e29b-41d4-a716-446655440000",
  "TaskIDs": [
    "660e8400-e29b-41d4-a716-446655440001"
  ],
  "Status": "success",
  "StartTime": "2024-01-01T10:00:00Z",
  "EndTime": "2024-01-01T10:00:05Z",
  "AcquiredConcurrency": 5,
  "LimitConcurrency": 10,
  "Canceled": false,
  "Tasks": [
    {
      "TaskID": "660e8400-e29b-41d4-a716-446655440001",
      "URL": "https://httpbin.co/anything",
      "Status": "success"
    }
  ]
}

Job Status Values

  • queuing - Job is being prepared
  • queued - Job is in queue waiting to be processed
  • pending - Job is currently being processed
  • rotating - Job is retrying with different proxies
  • success - Job completed successfully
  • error - Job failed
  • canceled - Job was canceled by user

Retrieving Task Details

GET /api/v1/jobs/{jobID}/{taskID}

Get detailed results from a specific task, including the scraped content.

Response:

{
  "TaskID": "660e8400-e29b-41d4-a716-446655440001",
  "JobID": "550e8400-e29b-41d4-a716-446655440000",
  "URL": "https://httpbin.co/anything",
  "Status": "success",
  "StartTime": "2024-01-01T10:00:00Z",
  "EndTime": "2024-01-01T10:00:05Z",
  "ExpiresAt": "2024-01-02T10:00:05Z",
  "UpdateTime": "2024-01-01T10:00:05Z",
  "Base64EncodedContent": false,
  "StatusCode": 200,
  "ResponseHeaders": {
    "Content-Type": "text/html"
  },
  "Scrape.do": {
    "Credits-Used": "1",
    "Remaining-Credits": "9999"
  },
  "Content": "<html>...</html>",
  "ErrorMessage": ""
}

Listing All Jobs

GET /api/v1/jobs

Get a paginated list of all your jobs.

Query Parameters:

ParameterTypeDefaultDescription
page_sizeinteger10Number of items per page (max 100)
pageinteger1Page number (minimum 1)

Response:

{
  "Jobs": [
    {
      "JobID": "550e8400-e29b-41d4-a716-446655440000",
      "Status": "success",
      "TaskIDs": ["..."],
      "StartTime": "2024-01-01T10:00:00Z",
      "EndTime": "2024-01-01T10:00:05Z"
    }
  ],
  "TotalCount": 150,
  "PageSize": 10,
  "PageNumber": 1,
  "TotalPages": 15
}

Canceling a Job

DELETE /api/v1/jobs/{jobID}

Cancel a running job. Note that jobs that are already completed or canceled cannot be canceled.

Response:

{
  "JobID": "550e8400-e29b-41d4-a716-446655440000",
  "Status": "canceled",
  "Canceled": true
}

Status Codes:

  • 200 - Job canceled successfully
  • 404 - Job not found
  • 406 - Job already completed or canceled

Getting User Information

GET /api/v1/me

Get information about your account, including available concurrency and credits.

Response:

{
  "TotalConcurrency": 10,
  "FreeConcurrency": 7,
  "ActiveJobs": 3,
  "AvaliableCredits": 9999
}

Error Responses

All endpoints may return error responses in the following format:

{
  "Error": "Error message description",
  "Code": 400
}

Common Status Codes:

  • 400 - Invalid request (bad parameters)
  • 401 - Unauthorized (invalid or missing token)
  • 404 - Resource not found
  • 406 - Not acceptable (e.g., trying to cancel completed job)
  • 429 - Too many requests (rate limited)
  • 500 - Internal server error

Complete Example

Here's a complete example workflow:

# 1. Create a job
curl --location 'https://q.scrape.do:8000/api/v1/jobs' \
  --header 'Content-Type: application/json' \
  --header 'X-Token: YOUR_TOKEN' \
  --data '{
    "Targets": ["https://httpbin.co/anything"],
    "Super": true,
    "GeoCode": "us"
  }'

# Response: {"JobID": "550e8400...", "TaskIDs": ["660e8400..."]}

# 2. Check job status
curl --location 'https://q.scrape.do:8000/api/v1/jobs/550e8400...' \
  --header 'X-Token: YOUR_TOKEN'

# 3. Get task results
curl --location 'https://q.scrape.do:8000/api/v1/jobs/550e8400.../660e8400...' \
  --header 'X-Token: YOUR_TOKEN'

# 4. Check your account status
curl --location 'https://q.scrape.do:8000/api/v1/me' \
  --header 'X-Token: YOUR_TOKEN'

Best Practices

  1. Polling: When checking job status, implement exponential backoff to avoid excessive API calls
  2. Webhooks: For production use, configure WebhookURL to receive results automatically instead of polling
  3. Error Handling: Always check the Status field in task responses and handle errors appropriately
  4. Concurrency: Monitor your FreeConcurrency to ensure you don't exceed your account limits
  5. Task Expiration: Retrieved task results before the ExpiresAt timestamp (results are stored temporarily)