20 C# Web Scraping Libraries (What Pros Use)
Most developers start with Python for web scraping.
But C# has a secret mature scraping ecosystem that handles production workloads better than you’d expect.
It’s definitely better than what I expected when I first gave it a try.
- Strong typing catches errors before deployment. When your scraper breaks at 3 AM, LINQ queries and async/await patterns make debugging straightforward.
- Memory management handles long-running processes. C# scrapers can run for weeks without memory leaks. .NET’s garbage collector just works for this use case.
- Selenium bindings match Python’s API. Most Stack Overflow solutions translate with minor syntax changes. The ecosystem is identical.
- Windows integration requires zero configuration. Task Scheduler, Windows Services, and Active Directory authentication work out of the box. Docker Selenium Grid scales horizontally without Python’s GIL bottleneck.
I’ll cover 20 libraries I tested in production environments over the years. Some are daily drivers. Others fill specific niches.
All have proven reliable under real workloads.
HTTP Client Libraries
1. RestSharp (9.8k ⭐)
RestSharp’s been around since 2009. It’s still the most popular HTTP client in .NET because it handles the boring stuff automatically.
Fluent interface makes building requests readable. Automatic JSON/XML serialization means less boilerplate. OAuth, JWT, and Basic Auth work out of the box. Headers, cookies, parameters—all chainable.
Use RestSharp when you’re hitting APIs that return clean JSON or XML. Works great for auth flows and quick scraping tasks that don’t need DOM parsing.
Basic example:
// RestSharp v107+ syntax
var client = new RestClient("https://api.example.com");
var request = new RestRequest("/products")
.AddParameter("category", "electronics")
.AddHeader("User-Agent", "Mozilla/5.0");
var response = await client.ExecuteAsync<List<Product>>(request);
// Or use GetJsonAsync for automatic deserialization
var products = await client.GetJsonAsync<List<Product>>("/products?category=electronics");
The v107+ syntax cleaned up a lot of cruft from earlier versions. Chainable methods make code easier to read and maintain.
2. Flurl (4.4k ⭐)
Flurl combines URL building with HTTP requests in one fluent chain. No more manual URL encoding. No more string concatenation hell.
Async-first from the ground up. Cancellation tokens work naturally. Mocking HTTP calls for tests is trivial. The syntax reads like English.
Reach for Flurl when you’re building complex URLs with lots of query parameters. Especially useful in projects where everything’s already async/await.
Basic example:
var products = await "https://api.example.com"
.AppendPathSegment("products")
.SetQueryParams(new { category = "electronics", limit = 50 })
.WithHeader("User-Agent", "Mozilla/5.0")
.GetJsonAsync<List<Product>>();
The fluent syntax is genuinely pleasant to write. Fewer bugs from typos and encoding mistakes.
HTML Parsing Engines
3. AngleSharp (5.4k ⭐)
AngleSharp is a standards-compliant HTML/CSS/SVG parser written entirely in C#. Full W3C spec implementation. DOM behaves like an actual browser.
Lots of developers start with HtmlAgilityPack and switch to AngleSharp when they hit selector limitations. The migration’s painless—both use similar DOM patterns. CSS selectors here are significantly more powerful than XPath for complex queries.
Parses HTML5, CSS3, and SVG into queryable structures. Supports full CSS selector syntax. Extensible with JavaScript execution and CSS parsing plugins. Zero external dependencies.
Best for modern websites with clean markup. If you need W3C compliance or plan to add browser-like features (JS execution, CSSOM), start with AngleSharp.
Basic example:
var config = Configuration.Default;
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(req => req.Content(html));
var products = document.QuerySelectorAll("div.product");
foreach (var product in products)
{
var name = product.QuerySelector("h2.title")?.TextContent;
var price = product.QuerySelector("span.price")?.TextContent;
}
The DOM API matches browser JavaScript. If you know querySelector
, you already know AngleSharp.
4. HtmlAgilityPack (2.8k ⭐)
HtmlAgilityPack’s been around since 2003. It has one killer feature: parses broken HTML without throwing errors.
This is practically mandatory for real-world scraping. Websites have malformed markup everywhere—unclosed tags, mismatched attributes, complete disasters. HtmlAgilityPack handles them all. Pair it with Selenium and you’ve got the C# version of Python’s Selenium + BeautifulSoup stack. Battle-tested. Reliable. Boring in the best way.
XPath and XSLT support out of the box. LINQ integration works naturally. The parser forgives everything.
Use HtmlAgilityPack for legacy sites with messy markup. When standards-compliant parsers fail, this one succeeds. XPath queries work better here than CSS selectors. For simple scraping where you don’t need fancy selectors, start here.
Basic example:
var web = new HtmlWeb();
var doc = web.Load("https://example.com/products");
var products = doc.DocumentNode.SelectNodes("//div[@class='product']");
foreach (var product in products)
{
var name = product.SelectSingleNode(".//h2[@class='title']")?.InnerText;
var price = product.SelectSingleNode(".//span[@class='price']")?.InnerText;
}
If the HTML looks like it was written by a drunk intern, HtmlAgilityPack still parses it.
5. CsQuery (1.2k ⭐)
CsQuery brings jQuery syntax to C#. If you’ve written $("div.product").find("h2")
in JavaScript, you’ll recognize the API immediately.
Fair warning: CsQuery hasn’t been actively maintained since 2015. It still works for legacy projects, but for new work, AngleSharp or HtmlAgilityPack with CSS selector extensions are safer bets for long-term maintenance.
What it does:
- jQuery-style API for DOM manipulation and querying
- Full CSS2 and CSS3 selector support
- Standards-compliant parser
- Familiar syntax for JavaScript developers
When to use it:
- Maintaining existing projects that already use it
- Teams with strong JavaScript/jQuery backgrounds (though consider AngleSharp instead)
- Rapid prototyping where jQuery muscle memory accelerates development
Basic example:
CQ dom = html;
var products = dom["div.product"];
foreach (var product in products)
{
var name = dom[product].Find("h2.title").Text();
var price = dom[product].Find("span.price").Text();
}
The syntax feels natural if you’ve spent years writing jQuery selectors.
Browser Automation Tools
6. Selenium WebDriver (33.5k ⭐)
Selenium is the actual cornerstone of browser automation.
Every major browser.
Multiple language bindings.
Massive ecosystem.
The C# bindings match Python’s API almost exactly.
Stack Overflow solutions translate with minor tweaks. For production, run Docker Selenium Grid,isolated containers, built-in monitoring, horizontal scaling. This setup handles Google Play Console scraping, high-volume parallel crawls, whatever you throw at it.
What it does:
- Cross-browser automation (Chrome, Firefox, Safari, Edge)
- Full control over browser actions (clicks, scrolling, form submission)
- JavaScript execution within the browser context
- Screenshot and video recording capabilities
When to use it:
- Multi-browser testing and scraping
- Sites with complex JavaScript interactions
- Projects requiring extensive community support and documentation
- Production environments where Docker Grid provides scalability
Basic example:
var options = new ChromeOptions();
options.AddArgument("--headless");
using var driver = new ChromeDriver(options);
driver.Navigate().GoToUrl("https://example.com/products");
var products = driver.FindElements(By.CssSelector("div.product"));
foreach (var product in products)
{
var name = product.FindElement(By.CssSelector("h2.title")).Text;
var price = product.FindElement(By.CssSelector("span.price")).Text;
}
// driver automatically disposed at end of using block
Selenium’s maturity means most JavaScript rendering problems have documented solutions.
7. PuppeteerSharp (3.8k ⭐)
PuppeteerSharp is Google’s Puppeteer for .NET. High-level API over Chrome DevTools Protocol.
Authentication flows are where PuppeteerSharp shines. Multi-step logins, OAuth redirects, CAPTCHA challenges—handling these manually is painful. Replicating cookie exchanges? Nightmare. PuppeteerSharp just… logs in. Like a human would. For simple scraping, HtmlAgilityPack suffices. When login gets complex, reach for Puppeteer.
The hybrid approach: Many devs run PuppeteerSharp to render JavaScript-heavy pages, then pass the HTML to AngleSharp for parsing. Best of both worlds. Reliable rendering + powerful selectors. Particularly useful for sites with lazy loading or dynamic content that defeats static parsers.
What it does:
- Headless Chrome automation via DevTools Protocol
- Fast page navigation and JavaScript execution
- Screenshot and PDF generation
- Network interception and request mocking
When to use it:
- Chrome-specific scraping tasks
- Sites with complex authentication or login flows
- Generating PDFs or full-page screenshots
- Projects requiring DevTools Protocol features
- Hybrid setups where you render with Puppeteer and parse with AngleSharp
Basic example:
await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions { Headless = true });
var page = await browser.NewPageAsync();
await page.GoToAsync("https://example.com/products");
var products = await page.QuerySelectorAllAsync("div.product");
foreach (var product in products)
{
var name = await product.QuerySelectorAsync("h2.title");
var nameText = await name?.EvaluateFunctionAsync<string>("el => el.textContent");
}
// browser automatically disposed
If Chrome is your target browser, PuppeteerSharp offers better performance than Selenium.
8. Playwright for .NET (2.8k ⭐)
Playwright is Microsoft’s modern answer to browser automation. It supports Chromium, Firefox, and WebKit with a unified API and built-in reliability features.
What it does:
- Cross-browser automation with a single API
- Auto-waiting for elements (no manual waits needed)
- Network interception and mocking
- Built-in tracing and debugging tools
When to use it:
- Modern scraping projects starting from scratch
- Multi-browser testing across rendering engines
- Projects requiring reliable element handling without manual waits
Basic example:
using var playwright = await Playwright.CreateAsync();
await using var browser = await playwright.Chromium.LaunchAsync(new() { Headless = true });
var page = await browser.NewPageAsync();
await page.GotoAsync("https://example.com/products");
var products = await page.QuerySelectorAllAsync("div.product");
foreach (var product in products)
{
var name = await product.QuerySelectorAsync("h2.title");
var nameText = await name?.TextContentAsync();
}
// browser and playwright automatically disposed
Playwright’s auto-waiting eliminates the brittle Thread.Sleep()
calls that plague Selenium scripts.
CSS Selector & Query Tools
These libraries add CSS selector support to parsers that don’t have it natively, or provide jQuery-like querying on top of existing DOM structures.
9. Fizzler (135 ⭐)
Fizzler implements W3C CSS selectors for HtmlAgilityPack. If you prefer CSS selectors over XPath but want to use HtmlAgilityPack’s robust parsing, Fizzler bridges that gap.
What it does:
- W3C CSS3 selector engine
- Integrates directly with HtmlAgilityPack
- Converts CSS selectors to XPath internally
When to use it:
- HtmlAgilityPack projects requiring CSS selector syntax
- Teams more comfortable with CSS than XPath
- Migrating from other parsers that use CSS selectors
Basic example:
var web = new HtmlWeb();
var doc = web.Load("https://example.com/products");
var products = doc.DocumentNode.QuerySelectorAll("div.product");
foreach (var product in products)
{
var name = product.QuerySelector("h2.title")?.InnerText;
}
This gives you modern selector syntax with HtmlAgilityPack’s battle-tested parser.
10. ScrapySharp (353 ⭐)
ScrapySharp wraps HtmlAgilityPack with a higher-level API inspired by Python’s Scrapy framework. It adds browser simulation, form handling, and CSS selectors with LINQ integration.
What it does:
- Simulates browser behavior (cookies, sessions, redirects)
- CSS selectors with LINQ queries
- Form submission and interaction
- Built on HtmlAgilityPack’s reliable parser
When to use it:
- Form submissions and multi-step scraping flows
- Sessison management across multiple requests
- Projects requiring both parsing and navigation
Basic example:
var browser = new ScrapingBrowser();
var page = browser.NavigateToPage(new Uri("https://example.com/products"));
var products = page.Html.CssSelect("div.product");
foreach (var product in products)
{
var name = product.CssSelect("h2.title").FirstOrDefault()?.InnerText;
}
ScrapySharp handles cookies and session state automatically, reducing boilerplate.
Full Crawling Frameworks
When you need to scrape hundreds of pages or entire sites, these frameworks provide the infrastructure for distributed, multi-threaded crawling operations.
11. DotnetSpider (4.1k ⭐)
DotnetSpider is a high-performance web crawling and scraping framework modeled after Java’s WebMagic. It supports parallel crawling, data pipelines, and distributed architectures.
What it does:
- Multi-threaded crawling with configurable parallelism
- Data extraction pipelines (parse → validate → transform → store)
- Distributed crawling across multiple machines
- Built-in support for databases and message queues
When to use it:
- Large-scale scraping projects (thousands of pages)
- Projects requiring ETL pipelines
- Distributed scraping across multiple servers
Basic example:
class ProductSpider : Spider
{
protected override void Initialize()
{
AddRequest("https://example.com/products");
}
protected override void OnResponse(Response response)
{
var products = response.Document.QuerySelectorAll("div.product");
foreach (var product in products)
{
AddDataItem(new
{
Name = product.QuerySelector("h2.title")?.TextContent,
Price = product.QuerySelector("span.price")?.TextContent
});
}
}
}
DotnetSpider handles URL deduplication, retry logic, and rate limiting automatically.
12. Abot (2.3k ⭐)
Abot is a fast, flexible web crawler with an event-driven architecture. You hook into crawl events (page crawled, link found, error occurred) and write custom logic for each.
What it does:
- Multi-threaded crawling with configurable thread count
- Event-driven architecture (subscribe to crawl lifecycle events)
- Politeness features (crawl delays, robots.txt respect)
- Extensible through custom implementations
When to use it:
- Projects requiring fine-grained control over crawl behavior
- Event-driven architectures
- Crawling with custom logic per event
Basic example:
var config = new CrawlConfiguration
{
MaxConcurrentThreads = 10,
MaxPagesToCrawl = 1000
};
var crawler = new PoliteWebCrawler(config);
crawler.PageCrawlCompleted += (sender, args) =>
{
var products = args.CrawledPage.AngleSharpHtmlDocument.QuerySelectorAll("div.product");
foreach (var product in products)
{
var name = product.QuerySelector("h2.title")?.TextContent;
}
};
await crawler.CrawlAsync(new Uri("https://example.com"));
Abot’s event model gives you precise control over what happens at each crawl stage.
13. InfinityCrawler (251 ⭐)
InfinityCrawler emphasizes polite, ethical crawling. It respects robots.txt, parses sitemaps, and auto-throttles requests to avoid overwhelming target servers.
What it does:
- Automatic robots.txt parsing and respect
- Sitemap discovery and parsing
- Auto-throttling based on server response times
- Async crawling with configurable concurrency
When to use it:
- Ethical crawling where respecting site rules is critical
- Large sites with comprehensive sitemaps
- Projects requiring automatic politeness features
Basic example:
var crawler = new Crawler(new Uri("https://example.com"), async (crawledUri, response) =>
{
var doc = await response.Content.ReadAsStringAsync();
// Parse and extract data
},
politeDelay: TimeSpan.FromSeconds(1));
await crawler.CrawlAsync();
InfinityCrawler handles the ethics side of scraping automatically.
14. NCrawler (157 ⭐)
NCrawler is a pipeline-based crawler that processes different document types (HTML, PDFs, Word docs) through configurable pipeline steps.
What it does:
- Pipeline architecture (download → parse → extract → store)
- Multi-threaded with customizable pipelines
- Support for various document formats
- Extensible through custom pipeline steps
When to use it:
- Scraping non-HTML documents (PDFs, Office files)
- Projects requiring custom processing pipelines
- Multi-format data extraction
Basic example:
var crawler = new Crawler(
new Uri("https://example.com"),
new HtmlDocumentProcessor(),
new MyCustomPipeline());
crawler.MaximumThreadCount = 5;
crawler.Crawl();
NCrawler excels when you’re extracting data from diverse document types.
Content Extraction & Post-Processing
15. SmartReader (175 ⭐)
SmartReader implements Mozilla’s Readability algorithm in C#. It extracts the main article content from web pages, stripping away navigation, ads, and sidebars.
What it does:
- Article content extraction using readability algorithm
- Metadata extraction (author, publication date, reading time)
- Removes boilerplate HTML (nav, footer, ads)
- Returns clean, readable content
When to use it:
- News sites and blog scraping
- Article archiving and content migration
- Projects requiring main content without clutter
Basic example:
var uri = new Uri("https://example.com/article");
var article = await Reader.ParseArticleAsync(uri.AbsoluteUri);
Console.WriteLine($"Title: {article.Title}");
Console.WriteLine($"Author: {article.Byline}");
Console.WriteLine($"Published: {article.PublicationDate}");
Console.WriteLine($"Reading time: {article.TimeToRead} minutes");
Console.WriteLine($"Content: {article.TextContent}");
SmartReader saves you from manually identifying and stripping page chrome.
16. ReverseMarkdown.Net (342 ⭐)
ReverseMarkdown.Net converts HTML into Markdown. If you’re migrating content to Markdown-based systems or building documentation from scraped HTML, this handles the conversion.
What it does:
- Converts HTML elements to Markdown syntax
- Handles nested lists, tables, and code blocks
- Preserves links and image references
- Configurable conversion rules
When to use it:
- Content migration to Markdown-based CMS
- Documentation generation from HTML sources
- Archiving web content in a readable format
Basic example:
var converter = new Converter();
var markdown = converter.Convert(html);
File.WriteAllText("output.md", markdown);
This bridges the gap between HTML scraping and Markdown publishing workflows.
Extensions & Specialized Add-ons
17. AngleSharp.Js (109 ⭐)
AngleSharp.Js integrates the Jint JavaScript engine into AngleSharp, allowing you to execute JavaScript within the DOM context.
What it does:
- Embeds Jint JavaScript engine
- ES5 JavaScript execution
- DOM script evaluation
- JavaScript event handling
When to use it:
- Sites requiring simple JavaScript execution for data exposure
- Projects already using AngleSharp
- Lightweight JavaScript evaluation without a full browser
Basic example:
var config = Configuration.Default
.WithJs();
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(req => req.Content(html));
// JavaScript in the HTML will now execute
var result = document.QuerySelector("#dynamic-content")?.TextContent;
This gives you JavaScript execution without spinning up a full browser.
18. AngleSharp.Css (82 ⭐)
AngleSharp.Css adds a complete CSS Object Model to AngleSharp, including computed styles, media queries, and responsive design features.
What it does:
- Full CSSOM implementation
- Computed style values
- Media query evaluation
- CSS parsing and validation
When to use it:
- Scraping sites where CSS affects rendered content
- Analyzing responsive design behavior
- Projects requiring computed CSS values
Basic example:
var config = Configuration.Default
.WithCss();
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(req => req.Content(html));
var element = document.QuerySelector("div.product");
var computedStyle = element.ComputeCurrentStyle();
var backgroundColor = computedStyle.GetPropertyValue("background-color");
This unlocks CSS-aware scraping scenarios.
19. HtmlAgilityPack.CssSelectors.NetCore (35 ⭐)
This extension adds modern CSS selector methods directly to HtmlAgilityPack’s nodes, giving you QuerySelector
and QuerySelectorAll
methods.
What it does:
- CSS selector methods for HtmlAgilityPack
QuerySelector
andQuerySelectorAll
APIs- Familiar syntax from browser DOM APIs
When to use it:
- HtmlAgilityPack projects requiring CSS selectors
- Migrating from browser-based scraping to server-side
- Teams familiar with DOM APIs
Basic example:
var doc = new HtmlDocument();
doc.LoadHtml(html);
var products = doc.DocumentNode.QuerySelectorAll("div.product");
foreach (var product in products)
{
var name = product.QuerySelector("h2.title")?.InnerText;
}
This modernizes HtmlAgilityPack without switching parsers.
20. Flurl.Http.Xml (Niche extension)
Flurl.Http.Xml extends Flurl with XML serialization support, adding GetXmlAsync
and PostXmlAsync
methods for XML-based APIs.
What it does:
- XML request/response handling for Flurl
- Automatic XML serialization/deserialization
- Fluent syntax for XML endpoints
When to use it:
- SOAP API scraping
- XML-based web services
- Legacy systems that return XML instead of JSON
Basic example:
var response = await "https://api.example.com/products"
.WithHeader("User-Agent", "Mozilla/5.0")
.GetXmlAsync<ProductList>();
This rounds out Flurl’s format support for older web services.
Bonus: When Local Libraries Aren’t Enough
You’ve set up the libraries. Written the scraper. Parsed the data successfully.
Then Cloudflare blocks everything.
A $#@&ing bummer…
These aren’t library problems. They’re infrastructure problems.
Scrape.do - Managed Scraping API
When your C# scraper can’t get past anti-bot defenses, Scrape.do handles the infrastructure so you can focus on extraction logic.
When you need it:
- Cloudflare & bot detection: Advanced WAFs detect even the best headless browser configurations. Scrape.do maintains browser fingerprints and TLS profiles that pass detection.
- IP rotation at scale: Building and maintaining proxy pools is expensive and brittle. Scrape.do rotates through 100M+ residential IPs automatically.
- Geographic targeting: Need requests from Saudi Arabia for HungerStation? Germany for MediaMarkt? The US for Best Buy? Scrape.do routes through the right region.
- JavaScript rendering: Without managing browser infrastructure, memory leaks, or zombie processes. Just set
render=true
and get fully-rendered HTML. - CAPTCHA handling: Automated solving without integrating separate CAPTCHA services or managing solving credits.
- Production reliability: 99.9% uptime with automatic retries, failover, and monitoring. No 3 AM alerts about crashed crawlers.
Integration with C# is pretty straightforward:
using System.Net;
var token = "YOUR_TOKEN";
var targetUrl = "https://httpbin.co/ip";
var apiUrl = $"https://api.scrape.do/?token={token}&url={WebUtility.UrlEncode(targetUrl)}";
using var client = new HttpClient();
var response = await client.GetAsync(apiUrl);
var content = await response.Content.ReadAsStringAsync();
Console.WriteLine(content);
// Or integrate with existing libraries:
// var doc = new HtmlDocument();
// doc.LoadHtml(content);
// var data = doc.DocumentNode.SelectNodes("//div[@class='product']");
Scrape.do sits between your C# code and the target site. You still use RestSharp, AngleSharp, or HtmlAgilityPack for parsing—Scrape.do just handles the hostile infrastructure layer.