WebReaper 9.0.0
dotnet add package WebReaper --version 9.0.0
NuGet\Install-Package WebReaper -Version 9.0.0
<PackageReference Include="WebReaper" Version="9.0.0" />
<PackageVersion Include="WebReaper" Version="9.0.0" />
<PackageReference Include="WebReaper" />
paket add WebReaper --version 9.0.0
#r "nuget: WebReaper, 9.0.0"
#:package WebReaper@9.0.0
#addin nuget:?package=WebReaper&version=9.0.0
#tool nuget:?package=WebReaper&version=9.0.0

WebReaper
Overview
WebReaper is a declarative, high-performance web scraper, crawler and parser in C#. Crawl any web site, parse the data, and save the structured result to a file, a database, or pretty much anywhere you want — with a simple, extensible fluent API.
As of 7.0.0 the core WebReaper package is dependency-light, Native-AOT-ready and Newtonsoft-free:
a plain HTTP → file crawl pulls only AngleSharp, Microsoft.Extensions.* and Polly. Heavier capabilities
(headless browser, MongoDB, Redis, Azure Cosmos DB, Azure Service Bus, SQLite-backed local durable
scheduler/tracker) ship as optional satellite packages you add only when you need them — see
Packages.
Quick start
dotnet add package WebReaper
using WebReaper.Builders;
var engine = await new ScraperEngineBuilder()
.Get("https://www.alexpavlov.dev/blog")
.Follow("a.text-gray-900.transition")
.Parse(new()
{
new("title", ".text-3xl.font-bold"),
new("text", ".max-w-max.prose.prose-dark")
})
.WriteToJsonFile("output.json")
.PageCrawlLimit(10)
.WithParallelismDegree(30)
.LogToConsole()
.BuildAsync();
await engine.RunAsync();
That example is pure HTTP — no browser, no extra packages. For JavaScript-rendered pages, add
WebReaper.Puppeteer (see Parsing dynamic pages).
Table of contents
Install
Core package — HTTP crawling/parsing, in-memory and file-backed state, Console/CSV/JSON-Lines sinks:
dotnet add package WebReaper
Add a satellite only for the capability you need (each brings its own SDK so the core stays light):
dotnet add package WebReaper.Puppeteer # headless-browser (SPA / JS) pages
dotnet add package WebReaper.Mongo # MongoDB sink + config/cookie storage
dotnet add package WebReaper.Redis # Redis scheduler, tracker, sink, storage
dotnet add package WebReaper.AzureServiceBus # Azure Service Bus distributed scheduler
dotnet add package WebReaper.Cosmos # Azure Cosmos DB sink
dotnet add package WebReaper.Sqlite # SQLite local durable scheduler + visited-link tracker
Packages
The core and the original satellite set are versioned in lockstep (7.0.0); WebReaper.Sqlite was
added afterwards and is 7.1.0 (it depends on core 7.0.0). All packages are GPL-3.0-or-later, and
every satellite wires itself in through the builder's public registration seam.
| Package | Add it for | Key builder calls |
|---|---|---|
| WebReaper | Core. HTTP crawl/parse, in-memory + file scheduler / visited-link tracker / cookie & config storage, Console / CSV / JSON-Lines sinks. Dependency-light, Native-AOT-ready, Newtonsoft-free. | Get Follow Paginate Parse WriteToJsonFile WriteToCsvFile WriteToConsole |
| WebReaper.Puppeteer | Headless-browser loading of SPA / JavaScript pages | .WithPuppeteerPageLoader() + GetWithBrowser / FollowWithBrowser / PaginateWithBrowser |
| WebReaper.Mongo | MongoDB result sink and MongoDB-backed config / cookie storage | .WriteToMongoDb(...) .WithMongoDbConfigStorage(...) .WithMongoDbCookieStorage(...) |
| WebReaper.Redis | Redis scheduler, visited-link tracker, result sink, config / cookie storage | .WithRedisScheduler(...) .TrackVisitedLinksInRedis(...) .WriteToRedis(...) .WithRedisConfigStorage(...) .WithRedisCookieStorage(...) |
| WebReaper.AzureServiceBus | Distributed scheduler over an Azure Service Bus queue | .WithAzureServiceBusScheduler(...) |
| WebReaper.Cosmos | Azure Cosmos DB result sink | .WriteToCosmosDb(...) |
| WebReaper.Sqlite | Local durable scheduler & visited-link tracker on an embedded SQLite store — resume is a query, no position file. Opt-in robust-local tier (no server, unlike Redis). | .WithSqliteScheduler(...) .TrackVisitedLinksInSqlite(...) |
The core default page loader is HTTP-only. Crawling a dynamic page (
GetWithBrowser/FollowWithBrowser/PaginateWithBrowser) withoutWebReaper.Puppeteerregistered throws anInvalidOperationExceptiontelling you to add the package and call.WithPuppeteerPageLoader().
Requirements
.NET 10. The core package is IsAotCompatible — it Native-AOT-publishes with zero trim/AOT warnings
(proven by the AOT smoke test in CI). Satellites carry their own SDK dependencies and are not AOT-clean by
design; reference one only when you use it.
Features
- ⚡ High crawling speed through parallelism and asynchrony
- 🗒 Declarative and easy to use
- 🪶 Dependency-light, Native-AOT-ready, Newtonsoft-free core
- 💾 Console, CSV and JSON-Lines sinks out of the box; MongoDB, Redis and Azure Cosmos DB via satellites
- 🌎 Scalable: run on cloud VMs, serverless functions or on-prem; go distributed with Redis or Azure Service Bus
- 🐙 Crawl and parse Single Page Applications with Puppeteer (
WebReaper.Puppeteer) - 🖥 Proxy support
- 🌀 Extensible: replace any out-of-the-box seam with your own implementation
Usage examples
- Data mining
- Gathering data for machine learning
- Online price-change monitoring and price comparison
- News aggregation
- Product-review scraping (to watch the competition)
- Tracking online presence and reputation
API overview
Parsing dynamic pages (SPA)
Parsing Single Page Applications is simple: use GetWithBrowser and/or FollowWithBrowser, add the
WebReaper.Puppeteer package, and register it with .WithPuppeteerPageLoader(). Puppeteer then loads
those pages in a headless browser.
dotnet add package WebReaper.Puppeteer
using WebReaper.Builders;
using WebReaper.Puppeteer;
var engine = await new ScraperEngineBuilder()
.GetWithBrowser("https://www.alexpavlov.dev/blog")
.FollowWithBrowser("a.text-gray-900.transition")
.Parse(new()
{
new("title", ".text-3xl.font-bold"),
new("text", ".max-w-max.prose.prose-dark")
})
.WithPuppeteerPageLoader()
.WriteToJsonFile("output.json")
.PageCrawlLimit(10)
.WithParallelismDegree(30)
.LogToConsole()
.BuildAsync();
await engine.RunAsync();
.WithPuppeteerPageLoader() is parameterless and reproduces the pre-7.0 behaviour exactly (one shared
cookie container, optional proxy applied the browser's own way). The first dynamic-page run downloads
Chromium via Puppeteer.
Running JavaScript / page actions
You can run JavaScript and drive the page as it loads in the headless browser. Pass an actions lambda
(e.g. .ScrollToEnd()) — useful when the content you need appears only after clicks, scrolls, etc.
using WebReaper.Builders;
using WebReaper.Puppeteer;
var engine = await new ScraperEngineBuilder()
.GetWithBrowser("https://www.reddit.com/r/dotnet/", actions => actions
.ScrollToEnd()
.Build())
.Follow("a.SQnoC3ObvgnGjWt90zD9Z._2INHSNB8V5eaWp4P0rY_mE")
.Parse(new()
{
new("title", "._eYtD2XCVieq6emjKBH3m"),
new("text", "._3xX726aBn29LDbsDtzr_6E._1Ap4F5maDtT1E1YuCiaO0r.D3IL3FD0RFy_mkKLPwL4")
})
.WithPuppeteerPageLoader()
.WriteToJsonFile("output.json")
.LogToConsole()
.BuildAsync();
await engine.RunAsync();
Console.ReadLine();
PageActionBuilder exposes Click, Wait, ScrollToEnd, WaitForSelector, WaitForNetworkIdle,
EvaluateExpression, Repeat/RepeatWithDelay, and Build().
Persist the progress locally
To persist the job queue and visited links locally — so you can resume where you left off — use
WithTextFileScheduler and TrackVisitedLinksInFile:
using WebReaper.Builders;
var engine = await new ScraperEngineBuilder()
.WithLogger(logger)
.Get("https://rutracker.org/forum/index.php?c=33")
.Follow("#cf-33 .forumlink>a")
.Follow(".forumlink>a")
.Paginate("a.torTopic", ".pg")
.Parse(new()
{
new("name", "#topic-title"),
new("category", "td.nav.t-breadcrumb-top.w100.pad_2>a:nth-child(3)"),
new("subcategory", "td.nav.t-breadcrumb-top.w100.pad_2>a:nth-child(5)"),
new("torrentSize", "div.attach_link.guest>ul>li:nth-child(2)"),
new("torrentLink", ".magnet-link", "href"),
new("coverImageUrl", ".postImg", "src")
})
.WriteToJsonFile("result.json")
.IgnoreUrls(blackList)
.WithTextFileScheduler("jobs.txt", "currentJob.txt")
.TrackVisitedLinksInFile("links.txt")
.BuildAsync();
The file scheduler is the zero-dependency default: an append-only job file, a 300 ms poll and a
sidecar position file. For a long single-machine crawl that must survive kill -9 and resume by
query — without standing up a Redis server — add the WebReaper.Sqlite satellite and swap the two
local backends. "Resume" becomes a SELECT over an indexed table; there is no position file to keep
in sync (the visited-link table is the set — no in-memory mirror). The core file adapters are
unchanged and stay the default; this is opt-in:
using WebReaper.Builders;
using WebReaper.Sqlite; // dotnet add package WebReaper.Sqlite
var engine = await new ScraperEngineBuilder()
.Get("https://rutracker.org/forum/index.php?c=33")
.Follow(".forumlink>a")
.Paginate("a.torTopic", ".pg")
.Parse(new() { new("name", "#topic-title") })
.WriteToJsonFile("result.json")
.WithSqliteScheduler("crawl/state.db") // resume is a query, not a position file
.TrackVisitedLinksInSqlite("crawl/state.db") // the table is the set
.BuildAsync();
Pass dataCleanupOnStart: true to either call to start a fresh crawl (clears that table at start).
Authorization
If the site needs authorization, call SetCookies and fill the CookieContainer with the cookies
required. You perform the login yourself; WebReaper only uses the cookies you provide.
using System.Net;
using WebReaper.Builders;
var engine = await new ScraperEngineBuilder()
.WithLogger(logger)
.Get("https://rutracker.org/forum/index.php?c=33")
.SetCookies(cookies =>
{
cookies.Add(new Cookie("AuthToken", "123"));
})
// ...
.BuildAsync();
How to disable headless mode
When scraping with a browser (GetWithBrowser / FollowWithBrowser, via WebReaper.Puppeteer) the
default is headless — you don't see the browser. Seeing it can help with debugging; disable headless mode
with .HeadlessMode(false):
using WebReaper.Builders;
using WebReaper.Puppeteer;
var engine = await new ScraperEngineBuilder()
.GetWithBrowser("https://www.reddit.com/r/dotnet/", actions => actions
.ScrollToEnd()
.Build())
.HeadlessMode(false)
.WithPuppeteerPageLoader()
// ...
.BuildAsync();
Cleaning data from a previous run
To start fresh, pass dataCleanupOnStart: true to the relevant builder method.
// Result file — note: WriteToJsonFile already defaults dataCleanupOnStart to TRUE
.WriteToJsonFile("output.json", dataCleanupOnStart: true)
// Visited-link tracker
.TrackVisitedLinksInFile("visited.txt", dataCleanupOnStart: true)
// Job queue / scheduler
.WithTextFileScheduler("jobs.txt", "currentJob.txt", dataCleanupOnStart: true)
The dataCleanupOnStart parameter exists on the satellite sinks too (e.g. WriteToMongoDb,
WriteToRedis, WriteToCosmosDb). Note WriteToJsonFile defaults it to true (it wipes the file on
start) — the opposite of the other sinks, which default to false. The "JSON" file sink writes
JSON Lines (one compact JSON object per line), not a JSON array.
Distributed and serverless scraping
Swap the scheduler, config storage and link tracker to Redis or Azure Service Bus and multiple
workers / serverless functions can share one crawl. Examples/WebReaper.AzureFuncs shows the serverless
shape with two functions:
- StartScraping builds the scraper configuration, seeds the distributed Outstanding-work latch, and enqueues the first job (the start URL) onto the queue (e.g. Azure Service Bus).
- WebReaperSpider is the distributed Crawl driver, triggered by each queued job. It gets a
bare
ISpiderfromnew ScraperEngineBuilder()...BuildSpider()(load → Crawl step →JobReport), then interprets the report: an atomic visited-link test-and-set gates duplicates/redeliveries, a parsed page is fanned out to the sink, discovered child jobs are enqueued back onto the queue, and a distributed Outstanding-work latch detects when all work has drained. It never throws to signal the crawl limit, so the queue is never poisoned (ADR-0022).
BuildSpider() (the ADR-0009 distributed-worker seam) returns an ISpider without building or
persisting a ScraperConfig, so — unlike BuildAsync() — it does not require Get/Parse; the worker's
config is persisted separately and read from storage at crawl time. See also
Examples/WebReaper.DistributedScraperWorkerService.
Storage and scheduler backends
Every backend is a swappable seam. In-memory is the default; file-backed lives in core; the rest come from satellites.
| Seam | Core (in-memory default + file) | Satellite options |
|---|---|---|
| Scheduler | in-memory, WithTextFileScheduler |
WithSqliteScheduler (SQLite, local durable), WithRedisScheduler (Redis), WithAzureServiceBusScheduler (Azure Service Bus) |
| Visited-link tracker | in-memory, TrackVisitedLinksInFile |
TrackVisitedLinksInSqlite (SQLite, local durable), TrackVisitedLinksInRedis (Redis) |
| Config storage | in-memory, WithFileConfigStorage |
WithMongoDbConfigStorage, WithRedisConfigStorage |
| Cookie storage | in-memory, WithFileCookieStorage |
WithMongoDbCookieStorage, WithRedisCookieStorage |
| Result sink | WriteToConsole, WriteToCsvFile, WriteToJsonFile |
WriteToMongoDb, WriteToRedis, WriteToCosmosDb |
| Page loader | HTTP (default) | WithPuppeteerPageLoader() (headless browser) |
Extensibility: adding a sink
Out of the box the core package sends parsed data to the Console, CSV and JSON-Lines sinks; MongoDB,
Redis and Cosmos DB sinks come from satellites. Add your own by implementing IScraperSink:
using WebReaper.Sinks.Abstract;
using WebReaper.Sinks.Models;
public interface IScraperSink
{
bool DataCleanupOnStart { get; set; }
Task EmitAsync(ParsedData entity, CancellationToken cancellationToken = default);
}
ParsedData is record ParsedData(string Url, JsonObject Data) — Data is a
System.Text.Json.Nodes.JsonObject (no Newtonsoft). A minimal console sink:
using System.Text.Json.Nodes;
using WebReaper.Sinks.Abstract;
using WebReaper.Sinks.Models;
public class ConsoleSink : IScraperSink
{
public bool DataCleanupOnStart { get; set; }
public Task EmitAsync(ParsedData entity, CancellationToken cancellationToken = default)
{
Console.WriteLine(entity.Data.ToJsonString());
return Task.CompletedTask;
}
}
Register it with AddSink:
using WebReaper.Builders;
var engine = await new ScraperEngineBuilder()
.AddSink(new ConsoleSink())
.Get("https://rutracker.org/forum/index.php?c=33")
.Follow("#cf-33 .forumlink>a")
.Follow(".forumlink>a")
.Paginate("a.torTopic", ".pg")
.Parse(new()
{
new("name", "#topic-title"),
})
.BuildAsync();
For result callbacks without a custom sink, use .Subscribe(Action<ParsedData>) or
.PostProcess(Func<Metadata, JsonObject, Task>).
Interfaces
| Interface | Description |
|---|---|
IScheduler |
Reads and writes the job queue. Default is in-memory; file, Redis and Azure Service Bus implementations are available. |
IVisitedLinkTracker |
Tracks visited links. Default is in-memory; file and Redis implementations are available. |
IPageLoader |
Turns a PageRequest into a page's HTML, dispatching on PageType to one load transport. The Spider holds one and is loader-blind. |
IPageLoadTransport |
The per-mechanism adapter behind IPageLoader: HTTP (core) or headless browser (WebReaper.Puppeteer). The only home for that mechanism's client/launch quirks and proxy application. |
IJsonContentParser |
Takes a document + Schema and returns its System.Text.Json.Nodes.JsonObject representation. The shipped HTML/CSS, HTML/XPath (WithXPathContentParser()) and JSON (WithJsonContentParser()) parsers are thin shells over one shared Schema fold. |
ISchemaBackend<TNode> |
The per-document-shape seam the shared fold calls: parse a root, select many / one by selector, extract a leaf's raw value. The shipped CSS, XPath and JSON backends implement this. |
ILinkParser |
Takes HTML and returns the page's links. |
IScraperSink |
A destination for scraping results. Receives ParsedData (Url + JsonObject). |
ICrawlStep |
The crawl-step decision: maps a Job + loaded page + Schema to a CrawlOutcome (parse the page, follow links, or paginate). Swap it to customize crawl-vs-parse behavior. |
ISpider |
The per-Job I/O shell around ICrawlStep: loads one page, runs the Crawl step, and returns a JobReport — nothing else. The Crawl driver (in-process ScraperEngine or the distributed worker) owns the visited-link tracker, the crawl-limit stop, sink fan-out and the callbacks. Obtained from ScraperEngineBuilder.BuildSpider(). |
IOutstandingWorkLatch |
The Crawl driver's termination detector (ADR-0022): a unit-credit counter that trips exactly once when all work is drained. In-memory Interlocked adapter (in-process) and a distributed-atomic Redis adapter (WebReaper.Redis). |
Main entities
- Job — a record representing one unit of work for the spider.
- LinkPathSelector — a selector for links to be crawled.
- CrawlOutcome — the closed result of a crawl step: a parsed target page, followed links, or paginated pages.
- Schema fold — the single recursive
Schemainterpreter (SchemaContentParser<TNode>); every backend reuses it instead of re-implementing the walk.
Repository structure
| Project | Description |
|---|---|
WebReaper |
The core library (the WebReaper NuGet package). |
WebReaper.Puppeteer |
Satellite: headless-browser page loader. |
WebReaper.Mongo |
Satellite: MongoDB sink + config/cookie storage. |
WebReaper.Redis |
Satellite: Redis scheduler, tracker, sink, config/cookie storage. |
WebReaper.AzureServiceBus |
Satellite: Azure Service Bus distributed scheduler. |
WebReaper.Cosmos |
Satellite: Azure Cosmos DB sink. |
Examples/WebReaper.ConsoleApplication |
Using WebReaper in a console application. |
Examples/WebReaper.ScraperWorkerService |
Using WebReaper in a .NET Worker Service. |
Examples/WebReaper.DistributedScraperWorkerService |
Distributed crawl across workers sharing crawl state. |
Examples/WebReaper.AzureFuncs |
Serverless crawl with Azure Functions + Azure Service Bus. |
Examples/BrownsfashionScraper |
A real-world e-commerce scraper example. |
Misc/WebReaper.ProxyProviders |
Example proxy-provider implementations. |
License
See the LICENSE file for license rights and limitations (GNU GPLv3).
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- AngleSharp (>= 1.4.0)
- AngleSharp.XPath (>= 2.0.6)
- Microsoft.Extensions.Http (>= 10.0.8)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.8)
- Polly (>= 8.6.6)
NuGet packages (6)
Showing the top 5 NuGet packages that depend on WebReaper:
| Package | Downloads |
|---|---|
|
WebReaper.Puppeteer
Headless-browser (Puppeteer/Chromium) page-load transport for WebReaper, for scraping JavaScript-rendered pages (GetWithBrowser / FollowWithBrowser / PaginateWithBrowser). Adds ScraperEngineBuilder.WithPuppeteerPageLoader() over the core WithLoadTransport registration seam. Satellite package (ADR-0009) so the WebReaper core stays dependency-light and AOT-clean — core is HTTP-only by default and no longer references PuppeteerSharp/PuppeteerExtraSharp or the Chromium provisioning path. |
|
|
WebReaper.Mongo
MongoDB adapters for WebReaper: the MongoDbSink, plus MongoDB-backed scraper-config and cookie storage. Adds ScraperEngineBuilder.WriteToMongoDb / WithMongoDbConfigStorage / WithMongoDbCookieStorage. Satellite package (ADR-0009) so the WebReaper core stays dependency-light and AOT-clean. |
|
|
WebReaper.Redis
Redis adapters for WebReaper: the Redis scheduler, visited-link tracker, sink, and Redis-backed scraper-config and cookie storage, all sharing one connection pool (ADR-0005). Adds ScraperEngineBuilder.WithRedisScheduler / TrackVisitedLinksInRedis / WriteToRedis / WithRedisConfigStorage / WithRedisCookieStorage. Satellite package (ADR-0009) so the WebReaper core stays dependency-light and AOT-clean. |
|
|
WebReaper.Cosmos
Azure Cosmos DB sink for WebReaper. Adds ScraperEngineBuilder.WriteToCosmosDb(...). Satellite package (ADR-0009) so the WebReaper core stays dependency-light and AOT-clean. |
|
|
WebReaper.AzureServiceBus
Azure Service Bus scheduler for WebReaper: a distributed job queue backed by an Azure Service Bus queue, for sharing crawl state across workers and serverless functions. Adds ScraperEngineBuilder.WithAzureServiceBusScheduler. Satellite package (ADR-0009) so the WebReaper core stays dependency-light and AOT-clean. |
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 9.0.0 | 130 | 5/19/2026 |
| 8.0.0 | 133 | 5/19/2026 |
| 7.0.0 | 155 | 5/17/2026 |
| 6.0.0 | 94 | 5/17/2026 |
| 5.1.0 | 83 | 5/16/2026 |
| 5.0.0 | 82 | 5/16/2026 |
| 4.1.0 | 82 | 5/16/2026 |
| 4.0.0 | 99 | 5/16/2026 |
| 3.5.2 | 3,101 | 10/19/2024 |
| 3.5.1 | 2,992 | 8/15/2023 |
| 3.5.0 | 289 | 8/9/2023 |
| 3.4.0 | 430 | 4/17/2023 |
| 3.3.0 | 395 | 4/3/2023 |
| 3.2.0 | 382 | 4/2/2023 |
| 3.1.0 | 475 | 2/28/2023 |
| 3.0.8 | 774 | 11/12/2022 |
| 3.0.7 | 527 | 11/4/2022 |
| 3.0.6 | 520 | 11/3/2022 |
| 3.0.5 | 535 | 10/31/2022 |
| 3.0.4 | 533 | 10/29/2022 |
9.0.0 (breaking, major): the core public surface is now exactly the documented contract (ADR-0023). Every Tier-1 type carries XML doc; every Tier-2 implementation type is internal (the File* and InMemory* leaves, sinks and formats, parsers and loaders, Spider, CrawlStep, ValidatedProxyProvider, Executor, ColorConsoleLogger, the Timer and Counter helpers); ScraperEngine constructor is internal (the class stays public, BuildAsync is the construction contract). WarningsAsErrors=CS1591 keeps the documented surface non-regressing. No shipped package is affected; fluent-API consumers, custom-seam implementers and the distributed-worker pattern need no changes - only code that constructed a core implementation adapter by name migrates to the builder method or the interface. SchemaContentParser (the generic custom-backend fold) stays public. Rationale and migration: docs/adr/0023-core-doc-contract.md and CHANGELOG.md. 8.0.0 (breaking, major): the per-Job Spider shell stops leaking its result through side channels and stops throwing to terminate (ADR-0022). ISpider.CrawlAsync now returns a closed JobReport (the ADR-0001 CrawlOutcome + the loaded document); Spider is reduced to load -> Crawl step -> report. The Crawl driver (in-process ScraperEngine or the distributed worker) owns what the shell used to: the visited-link tracker, the crawl-limit stop, sink fan-out, and the PostProcessor/ScrapedData callbacks. The crawl limit is now a value the driver checks (Scheduler.Complete()), no longer a thrown PageCrawlLimitException — that exception type is removed. IVisitedLinkTracker.TryAddVisitedLinkAsync is a new atomic test-and-set, the single idempotency authority (default-interface-method; InMemory is a lock-free CAS, Redis an atomic SADD). Termination detection is one IOutstandingWorkLatch seam — a unit-credit counter that trips exactly once when all work has drained, with an in-memory Interlocked adapter and a distributed Redis adapter (atomic INCRBY/DECRBY + a SET-NX one-shot fence). Examples/WebReaper.AzureFuncs is now a real distributed Crawl driver that never throws to terminate, so the queue is no longer poisoned at the crawl-limit boundary. Closed by construction: the retry-amplified limit exception, the racy discovery dedup, the distributed poison message. No compat shell (a List<Job>-returning forwarder would reinstate the side channels removed here; ADR-0009 precedent). The fluent builder API is unchanged — .PageCrawlLimit / .Subscribe / .PostProcess / .StopWhenAllLinksProcessed keep their signatures and behaviour; only a direct ISpider/BuildSpider() consumer migrates to the JobReport shape (re-enqueue report.Outcome.NextJobs; fan a CrawlOutcome.Parsed page out to its sink itself) and drops any PageCrawlLimitException handling. Rationale and migration: docs/adr/0022-crawl-driver-and-outstanding-work-latch.md, research/distributed-crawl-termination.md, and CHANGELOG.md.