Category:Scraping Tools

20 C# Web Scraping Libraries (What Pros Use)

14 Mins Read

Created Date: October 21, 2025

Updated Date: October 21, 2025

Onur Mese

Full Stack Developer

Most developers start with Python for web scraping.

But C# has a secret mature scraping ecosystem that handles production workloads better than you'd expect.

It's definitely better than what I expected when I first gave it a try.

Strong typing catches errors before deployment. When your scraper breaks at 3 AM, LINQ queries and async/await patterns make debugging straightforward.
Memory management handles long-running processes. C# scrapers can run for weeks without memory leaks. .NET's garbage collector just works for this use case.
Selenium bindings match Python's API. Most Stack Overflow solutions translate with minor syntax changes. The ecosystem is identical.
Windows integration requires zero configuration. Task Scheduler, Windows Services, and Active Directory authentication work out of the box. Docker Selenium Grid scales horizontally without Python's GIL bottleneck.

I'll cover 20 libraries I tested in production environments over the years. Some are daily drivers. Others fill specific niches.

All have proven reliable under real workloads.

HTTP Client Libraries

1. RestSharp (9.8k ⭐)

GitHub • NuGet

RestSharp's been around since 2009. It's still the most popular HTTP client in .NET because it handles the boring stuff automatically.

Fluent interface makes building requests readable. Automatic JSON/XML serialization means less boilerplate. OAuth, JWT, and Basic Auth work out of the box. Headers, cookies, parameters—all chainable.

Use RestSharp when you're hitting APIs that return clean JSON or XML. Works great for auth flows and quick scraping tasks that don't need DOM parsing.

Basic example:

// RestSharp v107+ syntax
var client = new RestClient("https://api.example.com");
var request = new RestRequest("/products")
    .AddParameter("category", "electronics")
    .AddHeader("User-Agent", "Mozilla/5.0");

var response = await client.ExecuteAsync<List<Product>>(request);
// Or use GetJsonAsync for automatic deserialization
var products = await client.GetJsonAsync<List<Product>>("/products?category=electronics");

The v107+ syntax cleaned up a lot of cruft from earlier versions. Chainable methods make code easier to read and maintain.

2. Flurl (4.4k ⭐)

GitHub • NuGet

Flurl combines URL building with HTTP requests in one fluent chain. No more manual URL encoding. No more string concatenation hell.

Async-first from the ground up. Cancellation tokens work naturally. Mocking HTTP calls for tests is trivial. The syntax reads like English.

Reach for Flurl when you're building complex URLs with lots of query parameters. Especially useful in projects where everything's already async/await.

Basic example:

var products = await "https://api.example.com"
    .AppendPathSegment("products")
    .SetQueryParams(new { category = "electronics", limit = 50 })
    .WithHeader("User-Agent", "Mozilla/5.0")
    .GetJsonAsync<List<Product>>();

The fluent syntax is genuinely pleasant to write. Fewer bugs from typos and encoding mistakes.

HTML Parsing Engines

3. AngleSharp (5.4k ⭐)

GitHub • NuGet

AngleSharp is a standards-compliant HTML/CSS/SVG parser written entirely in C#. Full W3C spec implementation. DOM behaves like an actual browser.

Lots of developers start with HtmlAgilityPack and switch to AngleSharp when they hit selector limitations. The migration's painless—both use similar DOM patterns. CSS selectors here are significantly more powerful than XPath for complex queries.

Parses HTML5, CSS3, and SVG into queryable structures. Supports full CSS selector syntax. Extensible with JavaScript execution and CSS parsing plugins. Zero external dependencies.

Best for modern websites with clean markup. If you need W3C compliance or plan to add browser-like features (JS execution, CSSOM), start with AngleSharp.

Basic example:

var config = Configuration.Default;
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(req => req.Content(html));

var products = document.QuerySelectorAll("div.product");
foreach (var product in products)
{
    var name = product.QuerySelector("h2.title")?.TextContent;
    var price = product.QuerySelector("span.price")?.TextContent;
}

The DOM API matches browser JavaScript. If you know querySelector, you already know AngleSharp.

4. HtmlAgilityPack (2.8k ⭐)

GitHub • NuGet

HtmlAgilityPack's been around since 2003. It has one killer feature: parses broken HTML without throwing errors.

This is practically mandatory for real-world scraping. Websites have malformed markup everywhere—unclosed tags, mismatched attributes, complete disasters. HtmlAgilityPack handles them all. Pair it with Selenium and you've got the C# version of Python's Selenium + BeautifulSoup stack. Battle-tested. Reliable. Boring in the best way.

XPath and XSLT support out of the box. LINQ integration works naturally. The parser forgives everything.

Use HtmlAgilityPack for legacy sites with messy markup. When standards-compliant parsers fail, this one succeeds. XPath queries work better here than CSS selectors. For simple scraping where you don't need fancy selectors, start here.

Basic example:

var web = new HtmlWeb();
var doc = web.Load("https://example.com/products");

var products = doc.DocumentNode.SelectNodes("//div[@class='product']");
foreach (var product in products)
{
    var name = product.SelectSingleNode(".//h2[@class='title']")?.InnerText;
    var price = product.SelectSingleNode(".//span[@class='price']")?.InnerText;
}

If the HTML looks like it was written by a drunk intern, HtmlAgilityPack still parses it.

5. CsQuery (1.2k ⭐)

GitHub • NuGet

CsQuery brings jQuery syntax to C#. If you've written $("div.product").find("h2") in JavaScript, you'll recognize the API immediately.

Fair warning: CsQuery hasn't been actively maintained since 2015. It still works for legacy projects, but for new work, AngleSharp or HtmlAgilityPack with CSS selector extensions are safer bets for long-term maintenance.

What it does:

jQuery-style API for DOM manipulation and querying
Full CSS2 and CSS3 selector support
Standards-compliant parser
Familiar syntax for JavaScript developers

When to use it:

Maintaining existing projects that already use it
Teams with strong JavaScript/jQuery backgrounds (though consider AngleSharp instead)
Rapid prototyping where jQuery muscle memory accelerates development

Basic example:

CQ dom = html;

var products = dom["div.product"];
foreach (var product in products)
{
    var name = dom[product].Find("h2.title").Text();
    var price = dom[product].Find("span.price").Text();
}

The syntax feels natural if you've spent years writing jQuery selectors.

Browser Automation Tools

6. Selenium WebDriver (33.5k ⭐)

GitHub • NuGet

Selenium is the actual cornerstone of browser automation.

Every major browser.

Multiple language bindings.

Massive ecosystem.

The C# bindings match Python's API almost exactly.

Stack Overflow solutions translate with minor tweaks. For production, run Docker Selenium Grid,isolated containers, built-in monitoring, horizontal scaling. This setup handles Google Play Console scraping, high-volume parallel crawls, whatever you throw at it.

What it does:

Cross-browser automation (Chrome, Firefox, Safari, Edge)
Full control over browser actions (clicks, scrolling, form submission)
JavaScript execution within the browser context
Screenshot and video recording capabilities

When to use it:

Multi-browser testing and scraping
Sites with complex JavaScript interactions
Projects requiring extensive community support and documentation
Production environments where Docker Grid provides scalability

Basic example:

var options = new ChromeOptions();
options.AddArgument("--headless");

using var driver = new ChromeDriver(options);
driver.Navigate().GoToUrl("https://example.com/products");

var products = driver.FindElements(By.CssSelector("div.product"));
foreach (var product in products)
{
    var name = product.FindElement(By.CssSelector("h2.title")).Text;
    var price = product.FindElement(By.CssSelector("span.price")).Text;
}
// driver automatically disposed at end of using block

Selenium's maturity means most JavaScript rendering problems have documented solutions.

7. PuppeteerSharp (3.8k ⭐)

GitHub • NuGet

PuppeteerSharp is Google's Puppeteer for .NET. High-level API over Chrome DevTools Protocol.

Authentication flows are where PuppeteerSharp shines. Multi-step logins, OAuth redirects, CAPTCHA challenges—handling these manually is painful. Replicating cookie exchanges? Nightmare. PuppeteerSharp just... logs in. Like a human would. For simple scraping, HtmlAgilityPack suffices. When login gets complex, reach for Puppeteer.

The hybrid approach: Many devs run PuppeteerSharp to render JavaScript-heavy pages, then pass the HTML to AngleSharp for parsing. Best of both worlds. Reliable rendering + powerful selectors. Particularly useful for sites with lazy loading or dynamic content that defeats static parsers.

What it does:

Headless Chrome automation via DevTools Protocol
Fast page navigation and JavaScript execution
Screenshot and PDF generation
Network interception and request mocking

When to use it:

Chrome-specific scraping tasks
Sites with complex authentication or login flows
Generating PDFs or full-page screenshots
Projects requiring DevTools Protocol features
Hybrid setups where you render with Puppeteer and parse with AngleSharp

Basic example:

await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions { Headless = true });
var page = await browser.NewPageAsync();
await page.GoToAsync("https://example.com/products");

var products = await page.QuerySelectorAllAsync("div.product");
foreach (var product in products)
{
    var name = await product.QuerySelectorAsync("h2.title");
    var nameText = await name?.EvaluateFunctionAsync<string>("el => el.textContent");
}
// browser automatically disposed

If Chrome is your target browser, PuppeteerSharp offers better performance than Selenium.

8. Playwright for .NET (2.8k ⭐)

GitHub • NuGet

Playwright is Microsoft's modern answer to browser automation. It supports Chromium, Firefox, and WebKit with a unified API and built-in reliability features.

What it does:

Cross-browser automation with a single API
Auto-waiting for elements (no manual waits needed)
Network interception and mocking
Built-in tracing and debugging tools

When to use it:

Modern scraping projects starting from scratch
Multi-browser testing across rendering engines
Projects requiring reliable element handling without manual waits

Basic example:

using var playwright = await Playwright.CreateAsync();
await using var browser = await playwright.Chromium.LaunchAsync(new() { Headless = true });
var page = await browser.NewPageAsync();

await page.GotoAsync("https://example.com/products");
var products = await page.QuerySelectorAllAsync("div.product");

foreach (var product in products)
{
    var name = await product.QuerySelectorAsync("h2.title");
    var nameText = await name?.TextContentAsync();
}
// browser and playwright automatically disposed

Playwright's auto-waiting eliminates the brittle Thread.Sleep() calls that plague Selenium scripts.

CSS Selector & Query Tools

These libraries add CSS selector support to parsers that don't have it natively, or provide jQuery-like querying on top of existing DOM structures.

9. Fizzler (135 ⭐)

GitHub • NuGet

Fizzler implements W3C CSS selectors for HtmlAgilityPack. If you prefer CSS selectors over XPath but want to use HtmlAgilityPack's robust parsing, Fizzler bridges that gap.

What it does:

W3C CSS3 selector engine
Integrates directly with HtmlAgilityPack
Converts CSS selectors to XPath internally

When to use it:

HtmlAgilityPack projects requiring CSS selector syntax
Teams more comfortable with CSS than XPath
Migrating from other parsers that use CSS selectors

Basic example:

var web = new HtmlWeb();
var doc = web.Load("https://example.com/products");

var products = doc.DocumentNode.QuerySelectorAll("div.product");
foreach (var product in products)
{
    var name = product.QuerySelector("h2.title")?.InnerText;
}

This gives you modern selector syntax with HtmlAgilityPack's battle-tested parser.

10. ScrapySharp (353 ⭐)

GitHub • NuGet

ScrapySharp wraps HtmlAgilityPack with a higher-level API inspired by Python's Scrapy framework. It adds browser simulation, form handling, and CSS selectors with LINQ integration.

What it does:

Simulates browser behavior (cookies, sessions, redirects)
CSS selectors with LINQ queries
Form submission and interaction
Built on HtmlAgilityPack's reliable parser

When to use it:

Form submissions and multi-step scraping flows
Sessison management across multiple requests
Projects requiring both parsing and navigation

Basic example:

var browser = new ScrapingBrowser();
var page = browser.NavigateToPage(new Uri("https://example.com/products"));

var products = page.Html.CssSelect("div.product");
foreach (var product in products)
{
    var name = product.CssSelect("h2.title").FirstOrDefault()?.InnerText;
}

ScrapySharp handles cookies and session state automatically, reducing boilerplate.

Full Crawling Frameworks

When you need to scrape hundreds of pages or entire sites, these frameworks provide the infrastructure for distributed, multi-threaded crawling operations.

11. DotnetSpider (4.1k ⭐)

GitHub • NuGet

DotnetSpider is a high-performance web crawling and scraping framework modeled after Java's WebMagic. It supports parallel crawling, data pipelines, and distributed architectures.

What it does:

Multi-threaded crawling with configurable parallelism
Data extraction pipelines (parse → validate → transform → store)
Distributed crawling across multiple machines
Built-in support for databases and message queues

When to use it:

Large-scale scraping projects (thousands of pages)
Projects requiring ETL pipelines
Distributed scraping across multiple servers

Basic example:

class ProductSpider : Spider
{
    protected override void Initialize()
    {
        AddRequest("https://example.com/products");
    }

    protected override void OnResponse(Response response)
    {
        var products = response.Document.QuerySelectorAll("div.product");
        foreach (var product in products)
        {
            AddDataItem(new
            {
                Name = product.QuerySelector("h2.title")?.TextContent,
                Price = product.QuerySelector("span.price")?.TextContent
            });
        }
    }
}

DotnetSpider handles URL deduplication, retry logic, and rate limiting automatically.

12. Abot (2.3k ⭐)

GitHub • NuGet

Abot is a fast, flexible web crawler with an event-driven architecture. You hook into crawl events (page crawled, link found, error occurred) and write custom logic for each.

What it does:

Multi-threaded crawling with configurable thread count
Event-driven architecture (subscribe to crawl lifecycle events)
Politeness features (crawl delays, robots.txt respect)
Extensible through custom implementations

When to use it:

Projects requiring fine-grained control over crawl behavior
Event-driven architectures
Crawling with custom logic per event

Basic example:

var config = new CrawlConfiguration
{
    MaxConcurrentThreads = 10,
    MaxPagesToCrawl = 1000
};

var crawler = new PoliteWebCrawler(config);

crawler.PageCrawlCompleted += (sender, args) =>
{
    var products = args.CrawledPage.AngleSharpHtmlDocument.QuerySelectorAll("div.product");
    foreach (var product in products)
    {
        var name = product.QuerySelector("h2.title")?.TextContent;
    }
};

await crawler.CrawlAsync(new Uri("https://example.com"));

Abot's event model gives you precise control over what happens at each crawl stage.

13. InfinityCrawler (251 ⭐)

GitHub • NuGet

InfinityCrawler emphasizes polite, ethical crawling. It respects robots.txt, parses sitemaps, and auto-throttles requests to avoid overwhelming target servers.

What it does:

Automatic robots.txt parsing and respect
Sitemap discovery and parsing
Auto-throttling based on server response times
Async crawling with configurable concurrency

When to use it:

Ethical crawling where respecting site rules is critical
Large sites with comprehensive sitemaps
Projects requiring automatic politeness features

Basic example:

var crawler = new Crawler(new Uri("https://example.com"), async (crawledUri, response) =>
{
    var doc = await response.Content.ReadAsStringAsync();
    // Parse and extract data
},
politeDelay: TimeSpan.FromSeconds(1));

await crawler.CrawlAsync();

InfinityCrawler handles the ethics side of scraping automatically.

14. NCrawler (157 ⭐)

GitHub • NuGet

NCrawler is a pipeline-based crawler that processes different document types (HTML, PDFs, Word docs) through configurable pipeline steps.

What it does:

Pipeline architecture (download → parse → extract → store)
Multi-threaded with customizable pipelines
Support for various document formats
Extensible through custom pipeline steps

When to use it:

Scraping non-HTML documents (PDFs, Office files)
Projects requiring custom processing pipelines
Multi-format data extraction

Basic example:

var crawler = new Crawler(
    new Uri("https://example.com"),
    new HtmlDocumentProcessor(),
    new MyCustomPipeline());

crawler.MaximumThreadCount = 5;
crawler.Crawl();

NCrawler excels when you're extracting data from diverse document types.

Content Extraction & Post-Processing

15. SmartReader (175 ⭐)

GitHub • NuGet

SmartReader implements Mozilla's Readability algorithm in C#. It extracts the main article content from web pages, stripping away navigation, ads, and sidebars.

What it does:

Article content extraction using readability algorithm
Metadata extraction (author, publication date, reading time)
Removes boilerplate HTML (nav, footer, ads)
Returns clean, readable content

When to use it:

News sites and blog scraping
Article archiving and content migration
Projects requiring main content without clutter

Basic example:

var uri = new Uri("https://example.com/article");
var article = await Reader.ParseArticleAsync(uri.AbsoluteUri);

Console.WriteLine($"Title: {article.Title}");
Console.WriteLine($"Author: {article.Byline}");
Console.WriteLine($"Published: {article.PublicationDate}");
Console.WriteLine($"Reading time: {article.TimeToRead} minutes");
Console.WriteLine($"Content: {article.TextContent}");

SmartReader saves you from manually identifying and stripping page chrome.

16. ReverseMarkdown.Net (342 ⭐)

GitHub • NuGet

ReverseMarkdown.Net converts HTML into Markdown. If you're migrating content to Markdown-based systems or building documentation from scraped HTML, this handles the conversion.

What it does:

Converts HTML elements to Markdown syntax
Handles nested lists, tables, and code blocks
Preserves links and image references
Configurable conversion rules

When to use it:

Content migration to Markdown-based CMS
Documentation generation from HTML sources
Archiving web content in a readable format

Basic example:

var converter = new Converter();
var markdown = converter.Convert(html);

File.WriteAllText("output.md", markdown);

This bridges the gap between HTML scraping and Markdown publishing workflows.

Extensions & Specialized Add-ons

17. AngleSharp.Js (109 ⭐)

GitHub • NuGet

AngleSharp.Js integrates the Jint JavaScript engine into AngleSharp, allowing you to execute JavaScript within the DOM context.

What it does:

Embeds Jint JavaScript engine
ES5 JavaScript execution
DOM script evaluation
JavaScript event handling

When to use it:

Sites requiring simple JavaScript execution for data exposure
Projects already using AngleSharp
Lightweight JavaScript evaluation without a full browser

Basic example:

var config = Configuration.Default
    .WithJs();
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(req => req.Content(html));

// JavaScript in the HTML will now execute
var result = document.QuerySelector("#dynamic-content")?.TextContent;

This gives you JavaScript execution without spinning up a full browser.

18. AngleSharp.Css (82 ⭐)

GitHub • NuGet

AngleSharp.Css adds a complete CSS Object Model to AngleSharp, including computed styles, media queries, and responsive design features.

What it does:

Full CSSOM implementation
Computed style values
Media query evaluation
CSS parsing and validation

When to use it:

Scraping sites where CSS affects rendered content
Analyzing responsive design behavior
Projects requiring computed CSS values

Basic example:

var config = Configuration.Default
    .WithCss();
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(req => req.Content(html));

var element = document.QuerySelector("div.product");
var computedStyle = element.ComputeCurrentStyle();
var backgroundColor = computedStyle.GetPropertyValue("background-color");

This unlocks CSS-aware scraping scenarios.

19. HtmlAgilityPack.CssSelectors.NetCore (35 ⭐)

GitHub • NuGet

This extension adds modern CSS selector methods directly to HtmlAgilityPack's nodes, giving you QuerySelector and QuerySelectorAll methods.

What it does:

CSS selector methods for HtmlAgilityPack
QuerySelector and QuerySelectorAll APIs
Familiar syntax from browser DOM APIs

When to use it:

HtmlAgilityPack projects requiring CSS selectors
Migrating from browser-based scraping to server-side
Teams familiar with DOM APIs

Basic example:

var doc = new HtmlDocument();
doc.LoadHtml(html);

var products = doc.DocumentNode.QuerySelectorAll("div.product");
foreach (var product in products)
{
    var name = product.QuerySelector("h2.title")?.InnerText;
}

This modernizes HtmlAgilityPack without switching parsers.

20. Flurl.Http.Xml (Niche extension)

GitHub • NuGet

Flurl.Http.Xml extends Flurl with XML serialization support, adding GetXmlAsync and PostXmlAsync methods for XML-based APIs.

What it does:

XML request/response handling for Flurl
Automatic XML serialization/deserialization
Fluent syntax for XML endpoints

When to use it:

SOAP API scraping
XML-based web services
Legacy systems that return XML instead of JSON

Basic example:

var response = await "https://api.example.com/products"
    .WithHeader("User-Agent", "Mozilla/5.0")
    .GetXmlAsync<ProductList>();

This rounds out Flurl's format support for older web services.

Bonus: When Local Libraries Aren't Enough

You've set up the libraries. Written the scraper. Parsed the data successfully.

Then Cloudflare blocks everything.

A $#@&ing bummer...

These aren't library problems. They're infrastructure problems.

Scrape.do - Managed Scraping API

When your C# scraper can't get past anti-bot defenses, Scrape.do handles the infrastructure so you can focus on extraction logic.

When you need it:

Cloudflare & bot detection: Advanced WAFs detect even the best headless browser configurations. Scrape.do maintains browser fingerprints and TLS profiles that pass detection.
IP rotation at scale: Building and maintaining proxy pools is expensive and brittle. Scrape.do rotates through 100M+ residential IPs automatically.
Geographic targeting: Need requests from Saudi Arabia for HungerStation? Germany for MediaMarkt? The US for Best Buy? Scrape.do routes through the right region.
JavaScript rendering: Without managing browser infrastructure, memory leaks, or zombie processes. Just set render=true and get fully-rendered HTML.
CAPTCHA handling: Automated solving without integrating separate CAPTCHA services or managing solving credits.
Production reliability: 99.9% uptime with automatic retries, failover, and monitoring. No 3 AM alerts about crashed crawlers.

Integration with C# is pretty straightforward:

using System.Net;

var token = "YOUR_TOKEN";
var targetUrl = "https://httpbin.co/ip";
var apiUrl = $"https://api.scrape.do/?token={token}&url={WebUtility.UrlEncode(targetUrl)}";

using var client = new HttpClient();
var response = await client.GetAsync(apiUrl);
var content = await response.Content.ReadAsStringAsync();

Console.WriteLine(content);

// Or integrate with existing libraries:
// var doc = new HtmlDocument();
// doc.LoadHtml(content);
// var data = doc.DocumentNode.SelectNodes("//div[@class='product']");

Scrape.do sits between your C# code and the target site. You still use RestSharp, AngleSharp, or HtmlAgilityPack for parsing—Scrape.do just handles the hostile infrastructure layer.

Get 1000 free credits and start scraping →

Onur Mese

Full Stack Developer