Category:Scraping Use Cases

How to Scrape Realtor.com: Property Listings, Prices, and Agent Data

10 Mins Read

Created Date: May 22, 2026

Updated Date: May 22, 2026

Serhat Kurtulus

Software Engineer

Realtor.com is a high-signal source for US real estate data: active inventory, price changes, agent attribution, and the history that makes comps and market tracking possible. The platform has also expanded into AI-powered home search and launched a ChatGPT plugin for querying listings conversationally.

The catch is access. Plain requests hits 429s fast, and the HTML we get back is mostly a shell with none of the card data we expected. That is not a dead end. It is a hint that the real payload is already structured.

We will route requests through Scrape.do to stay unblocked, extract search listings from embedded JSON, then enrich a single listing via GraphQL into clean artifacts: a CSV of search results and a JSON record with agent, brokerage, history, tax data, and view and save windows.

The win condition is simple: the site stops being “a page” and becomes rows.

Full working code on GitHub ⚙

Get 1000 free credits and start scraping with Scrape.do

Scraping Realtor.com Search Results

The first obstacle is not selectors. It is access. Realtor.com will happily serve an HTML shell while rate limiting the requests that look automated.

Prerequisites

Install the only libraries we need to scrape search results pages:

pip install requests beautifulsoup4

Set a Scrape.do token in TOKEN and start with a single search URL. Manhattan is our test bench. Once we see rows, we can swap the URL to any other city.

Extracting Listings from Embedded JSON

Here is the constraint: the “result cards” are not a stable data source in the HTML we receive. Realtor.com ships the listing rows as embedded JSON, and that is what we extract.

We fetch the search page through Scrape.do with super=true to avoid 429s. This is the stable request shape we keep.

import os
import urllib.parse

import requests

TOKEN = "<your_token>"
SEARCH_URL = "https://www.realtor.com/realestateandhomes-search/Manhattan_NY"
MAX_PAGES = 3
REQUEST_DELAY_SEC = 1.5

def fetch(target_url):
    api_url = f"http://api.scrape.do/?token={TOKEN}&url={urllib.parse.quote(target_url, safe='')}&super=true"
    r = requests.get(api_url, timeout=120)
    if r.status_code != 200:
        print(f"  HTTP {r.status_code}")
        return None
    return r

When this prints HTTP 200, we are looking at real Realtor.com HTML. Any other status code means we did not get usable content yet, so we fix the request before moving on.

The card data we want is not in the visible HTML. Realtor.com ships it as a JSON object inside a <script> tag, under props.pageProps.properties. We scan every script tag, try to parse it as JSON, and grab the first one that contains listings:

response = fetch(page_url)
if not response:
    break

soup = BeautifulSoup(response.text, "html.parser")

properties = None
for script in soup.find_all("script"):
    raw = (script.string or "").strip()
    if not raw.startswith("{"):
        continue
    try:
        data = json.loads(raw)
    except json.JSONDecodeError:
        continue
    props = data.get("props", {}).get("pageProps", {}).get("properties")
    if props:
        properties = props
        break

On a healthy search results page, properties is a non-empty list. In practice, it is often around 42 listings per page.

That is the moment we want. One request in, structured rows out.

At this point, we have a list of dicts that already contains price, address, bed and bath counts, sqft, status, and the permalink needed to build a detail URL. We shape it into a CSV with these columns:

price, address, beds, baths, sqft, lot_size, property_type, listing_status, listing_url, search_page_url

Before we build each CSV row, we define three small helpers to keep formatting consistent.

format_price() normalizes list prices into a consistent $1,234,567 string:

def format_price(amount):
    if amount is None:
        return ""
    return f"${int(amount):,}"

build_address() turns the structured address object into one stable string:

def build_address(loc):
    line = (loc.get("line") or "").strip()
    city = (loc.get("city") or "").strip()
    st = (loc.get("state_code") or "").strip()
    postal = (loc.get("postal_code") or "").strip()
    right = f"{city}, {st} {postal}".strip() if city and st else " ".join(x for x in [city, f"{st} {postal}".strip()] if x)
    return ", ".join(x for x in [line, right] if x)

status_label() turns MLS status plus a few flags into a readable label for a CSV:

def status_label(status, flags):
    # Check boolean flags first, then fall back to the MLS status string
    for flag, label in [("is_pending", "Pending"), ("is_contingent", "Contingent"), ("is_coming_soon", "Coming soon"), ("is_foreclosure", "Foreclosure")]:
        if flags.get(flag):
            return label
    return STATUS_MAP.get(status, (status or "").replace("_", " ").title())

Here is the exact row shaping that fills those columns:

desc = p.get("description") or {}
loc = (p.get("location") or {}).get("address") or {}
permalink = (p.get("permalink") or "").strip()

sqft = desc.get("sqft")
lot = desc.get("lot_sqft")
beds = desc.get("beds")
baths = desc.get("baths_consolidated") or ""

all_rows.append({
    "price": format_price(p.get("list_price")),
    "address": build_address(loc),
    "beds": f"{beds} bed" if beds is not None else "",
    "baths": f"{baths} bath" if baths else "",
    "sqft": f"{sqft:,} sqft" if isinstance(sqft, int) else (str(sqft) if sqft else ""),
    "lot_size": f"{lot:,} sqft lot" if isinstance(lot, int) else "",
    "property_type": (desc.get("sub_type") or desc.get("type") or "").replace("_", " "),
    "listing_status": status_label(p.get("status") or "", p.get("flags") or {}),
    "listing_url": f"https://www.realtor.com/realestateandhomes-detail/{permalink}" if permalink else "",
    "search_page_url": page_url,
})

Following Pagination Links

Page numbers are not the cleanest surface here. The reliable move is to follow the footer’s Next link, because it tracks whatever Realtor.com decides the next search results URL should be.

This is the selector pattern we will use:

nxt = soup.select_one("a[aria-label='Go to next page']") or soup.select_one("a[aria-label^='Go to next']")
href = (nxt.get("href") or "").strip() if nxt else ""
if not href or href.startswith("#"):
    print("  No next page link.")
    break
page_url = href if href.startswith("http") else urllib.parse.urljoin("https://www.realtor.com/", href)
time.sleep(REQUEST_DELAY_SEC)

If the page has more results, page_url usually looks like the same search path with /pg-2, /pg-3, and so on. If there is no next link, we stop without guessing.

Exporting to CSV

At the end, we write everything to a single CSV with a fixed column order:

with open(OUTPUT_CSV, "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=CSV_FIELDS)
    writer.writeheader()
    writer.writerows(all_rows)

print(f"Wrote {len(all_rows)} rows to {OUTPUT_CSV}")

After it runs, realtor-search-results.csv contains one row per listing with the same fields we mapped above. The final line tells us how many rows were written, which is the quickest sanity check before we scale the crawl.

If we want the complete runnable version in one place, we use the GitHub link in the intro.

If you're looking to scrape Australian property listings, check out our RealEstate.com.au scraping guide.

Scraping Realtor.com Listing Details

Search pages are about breadth. Listing pages are about depth.

Here is the split: the listing page has two structured sources.

Embedded JSON: core fields like property_id, listing_id, beds, baths, sqft, price, agent, brokerage
GraphQL: property_history, tax_history, and popularity windows (views and saves over time)

Fetching a Listing Page and Finding `propertyDetails`

We fetch a single listing URL through Scrape.do with super=true:

LISTING_URL = "https://www.realtor.com/realestateandhomes-detail/33-Park-Row-1_New-York_NY_10038_M93744-96076"

print(f"Fetching listing: {LISTING_URL}")
api_url = f"http://api.scrape.do/?token={TOKEN}&url={urllib.parse.quote(LISTING_URL, safe='')}&super=true"
response = requests.get(api_url, timeout=120)
if response.status_code != 200:
    raise SystemExit(f"HTTP {response.status_code}")

The listing page uses a different JSON path than the search page. Search listings live at props.pageProps.properties. Listing details live deeper, at props.pageProps.initialReduxState.propertyDetails. We scan the same way but look for a different key:

pd = None
for script in BeautifulSoup(response.text, "html.parser").find_all("script"):
    raw = (script.string or "").strip()
    if not raw.startswith("{"):
        continue
    try:
        data = json.loads(raw)
    except json.JSONDecodeError:
        continue
    candidate = data.get("props", {}).get("pageProps", {}).get("initialReduxState", {}).get("propertyDetails")
    if candidate and (candidate.get("listing_id") or candidate.get("property_id")):
        pd = candidate
        break

If pd is still None, we did not get the payload we need. We stop here and fix the request before trying GraphQL.

Shaping Core Listing Fields

We pull the title from the SEO block (it is always present), with an address fallback. format_price() was defined earlier. Now we shape the core record:

desc = pd.get("description") or {}
sqft = desc.get("sqft")
pps = pd.get("price_per_sqft")

# Title from SEO data, fallback to address
seo = pd.get("seo") or {}
title = (seo.get("seo_title") or seo.get("title") or "").strip()

record = {
    "listing_url": LISTING_URL,
    "property_id": str(pd.get("property_id") or ""),
    "listing_id": str(pd.get("listing_id") or ""),
    "title": title,
    "price": format_price(pd.get("list_price")),
    "price_currency": str(pd.get("list_price_currency") or "").strip(),
    "beds": str(desc.get("beds") or ""),
    "baths": str(desc.get("baths_consolidated") or ""),
    "sqft": f"{sqft:,} sqft" if isinstance(sqft, int) else (str(sqft) if sqft else ""),
    "unit_type": (desc.get("sub_type") or desc.get("type") or "").replace("_", " "),
    "price_per_sqft": f"${pps:,}/sqft" if isinstance(pps, int) else (str(pps) if pps else ""),
}

Now we have the IDs we need for enrichment, especially property_id. That ID is the key for both GraphQL calls.

Property History via GraphQL

The GraphQL endpoint is https://www.realtor.com/frontdoor/graphql. We send the POST through Scrape.do, with customHeaders=true so we can include the RDC headers Realtor expects.

Before we call it, we need three pieces from DevTools Network on a real listing page: a client version header, the history query text, and the persisted query hash for urgency. These values change when Realtor deploys a new frontend version, so if the GraphQL calls start failing after working for weeks, refreshing these from DevTools is the first thing to check.

Open the listing page in Chrome, open DevTools, then go to the Network tab and filter for graphql. Click the request that contains PropertyAndTaxHistory. We want two things from that one request: the rdc-client-version header and the raw GraphQL query text.

RDC_CLIENT_VERSION: copy the rdc-client-version request header
HISTORY_QUERY: copy the query string from the Request Payload

Chrome DevTools Network headers showing rdc-client-version for Realtor.com GraphQL

Chrome DevTools Request Payload showing PropertyAndTaxHistory query text

For popularity, click the request with operationName: GetUrgencyData. That one uses a persisted query. We copy the hash from extensions.persistedQuery.sha256Hash.

GET_URGENCY_HASH: copy extensions.persistedQuery.sha256Hash from the Request Payload

Chrome DevTools Request Payload showing GetUrgencyData persistedQuery sha256Hash

def graphql_post(body, label):
    # POST to Realtor GraphQL via Scrape.do with custom RDC headers
    api_url = "http://api.scrape.do/?" + urllib.parse.urlencode(
        {"token": TOKEN, "url": GRAPHQL_URL, "super": "true", "customHeaders": "true"}
    )
    headers = {
        "Content-Type": "application/json",
        "Referer": LISTING_URL,
        "Origin": "https://www.realtor.com",
        "rdc-client-name": "RDC_WEB_DETAILS_PAGE",
        "rdc-client-version": RDC_CLIENT_VERSION,
    }
    r = requests.post(api_url, data=json.dumps(body), headers=headers, timeout=120)
    if r.status_code != 200:
        print(f"  {label}: HTTP {r.status_code}")
        return None
    return r.json()

Now we call PropertyAndTaxHistory using the listing’s property_id:

property_id = record["property_id"]
record["property_history"] = []
record["tax_history"] = []

if property_id:
    result = graphql_post({
        "operationName": "PropertyAndTaxHistory",
        "variables": {"propertyId": property_id},
        "query": HISTORY_QUERY,
    }, "PropertyAndTaxHistory")
    home = (result or {}).get("data", {}).get("home") or {}
    record["property_history"] = home.get("property_history") or []
    record["tax_history"] = home.get("tax_history") or []

If record["property_history"] is non-empty, we know the GraphQL call is working and we are collecting the listing’s event timeline and tax rows.

If we can print a non-zero history count, we have crossed the hardest part.

Popularity and View Counts

Popularity comes from a persisted query called GetUrgencyData. We pass the hash in extensions.persistedQuery.sha256Hash and then normalize the output a bit.

record["urgency"] = None
if property_id:
    result = graphql_post({
        "operationName": "GetUrgencyData",
        "variables": {"propertyId": property_id},
        "extensions": {"persistedQuery": {"version": 1, "sha256Hash": GET_URGENCY_HASH}},
    }, "GetUrgencyData")
    popularity = (result or {}).get("data", {}).get("home", {}).get("popularity")
    if popularity:
        for row in popularity.get("periods") or []:
            row.pop("ldp_views_total", None)
        record["urgency"] = popularity

When this succeeds, record["urgency"]["periods"] contains the rolling windows with views and saves, which is useful for tracking demand changes over time.

This is the kind of signal we do not get from HTML scraping.

Exporting Listing Data to JSON

Finally, we write one JSON file for the listing:

with open(OUTPUT_JSON, "w", encoding="utf-8") as f:
    json.dump(record, f, indent=2, ensure_ascii=False)

print(f"Wrote {OUTPUT_JSON}")

Realtor.com listing exported to JSON

Conclusion

Realtor.com gets much easier when we stop treating it like HTML. For search results, we pull listings from embedded JSON and write a clean CSV. For a single listing, we extract propertyDetails from embedded Redux state, then enrich it with two GraphQL calls for history, tax rows, and popularity windows.

From here, we scale it the boring way. Store outputs per city and per day, keep delays modest, and keep extraction pinned to structured payloads instead of brittle selectors.

If you're looking to scrape UK property listings, check out our Rightmove scraping guide. If you're looking to scrape Spanish or Portuguese property listings, check out our Idealista scraping guide.

Get 1000 free credits and start scraping with Scrape.do

FAQ

Does Realtor.com block scraping?

Yes. In this environment, direct requests runs into 429 quickly. Routing requests through Scrape.do with super=true keeps both the HTML fetch and the GraphQL calls stable enough to build a repeatable pipeline.

How do I get property price history from Realtor.com?

Extract the listing property_id from propertyDetails, then call the GraphQL operation PropertyAndTaxHistory. The price timeline is in data.home.property_history.

Can I scrape Realtor.com without rendering JavaScript?

Yes. The approach here does not use render=true. We rely on embedded JSON that ships in the HTML plus direct GraphQL responses.

What data is available from Realtor.com search results?

The CSV export includes price, address, beds, baths, sqft, lot size, property type, listing status, the detail URL, and the search page URL that each row came from.