Private beta CLI / HTTP / Docker / Skill / Hosted beta

Evidence-grade acquisition

Turn messy web pages into citable, downstream-ready records.

Scout is a local-first acquisition workbench for AI workflows. It does more than fetch markdown: it preserves source evidence, blocked pages, citations, validation, and typed records your agents, search indexes, and intelligence pipelines can trust.

Scrape Crawl Map Browser capture Product records Company intelligence Artifacts Citations

Scout beta demo

URL to evidence to records in under a minute.

Watch the beta-safe flow: Scout starts from a public category URL, preserves the evidence trail, produces product records, and exports them to downstream tools.

URL Evidence Records Exports

No hard-site bypass guarantee.

Animated Scout beta demo showing a URL becoming source evidence, product records, and export artifacts.
Fictional demo data. The production promise is evidence and artifact discipline, not guaranteed access to every protected site.

Beta operating model

Free local utility. Metered hosted convenience.

01 / LOCAL

Local free

Run Scout on your machine with CLI, HTTP, Docker, or skill usage. You bring compute, browser, storage, and optional keys.

02 / HOSTED

Hosted metered credits

Hosted beta is invite-only and capped while unit economics are validated. It is not unlimited crawling, lifetime API usage, or public self-serve scale.

03 / EVIDENCE

Artifacts owned by you

Every run writes records, source pages, blocked-page evidence, validation, and reports so downstream workflows can inspect the proof.

What Scout produces

A run is not done until the evidence is reusable.

01

Source registry

Every fetched, rendered, saved, blocked, or API-provided source gets a stable identity, URL, provider, status, timestamp, and hash.

02

Cited records

Records carry citations back to source evidence. Missing citations become validation warnings instead of hidden risk.

03

Portable artifacts

Outputs are files you can inspect, version, export, hand to agents, load into search, or replay without re-crawling.

Why not just another crawler?

Hosted crawlers return content. Scout returns an evidence trail.

Firecrawl-style APIs are excellent when you need hosted markdown from a URL. Scout is for the next step: turning acquisition into records that can be audited, exported, and reused.

Use Scout when the question is not only "can I fetch this page?" but "can I prove where this field came from, explain what was blocked, and ship the result into a downstream workflow?"

Acquisition ladder

Choose the least expensive mode that works.

01

Direct / Crawl4AI

Default local acquisition for normal pages, sitemaps, crawls, and screenshots.

02

Scout browser

Launch a controlled browser, capture screenshots, DOM, links, and source evidence.

03

User browser

When your browser can see the page, capture authorized session evidence locally.

04

Saved evidence

Replay saved HTML, DOM, markdown, and artifacts without refetching the website.

Artifact contract

Every run leaves a trail.

run/
  manifest.json
  records.json
  records.jsonl
  source_pages.json
  blocked_pages.json
  validation.json
  extraction_report.md
  raw/
  screenshots/

Typed outputs

Records, not blobs.

product.v1 company.v1 investor_asset.v1 career_site.v1 job_posting.v1 news_signal.v1 research_record.v1 website_quality.v1

Exports beyond Algolia

Product data should not be trapped in one destination.

Search-ready

Prepare object IDs, names, prices, categories, URLs, images, variants, citations, and completeness warnings for search indexes.

Spreadsheet-ready

Export product and intelligence records to CSV or Google Sheets workflows for review, enrichment, and lightweight operations.

Database-ready

Keep local SQLite/JSONL datasets for repeatable research, diffing, replay, and agent consumption.

Distribution model

Local first. Hosted optional.

Local Scout

Free for beta users. Run on your machine, choose a workdir, keep artifacts locally, and bring your own optional keys.

  • CLI, HTTP API, Docker, and skill usage
  • No hosted dependency for normal runs
  • Best for private workflows and browser capture

Hosted Scout

Convenience API for teams that want managed workers and API keys. Hosted usage must be metered.

  • API key generation and usage credits
  • Rate limits, retention limits, and tenant isolation
  • Beta pass before full subscription plans

Launch status

Private beta can move. Public launch still has a security gate.

Blocked

Public launch is blocked

Crawl4AI currently resolves lxml 5.4.0, and the dependency audit reports a known vulnerability fixed in lxml 6.1.0. Scout will not claim public production readiness until the dependency audit is clean, or a formal exception is approved.

Limited

Private beta is limited

Hosted beta access is capped, approved-tester only, and metered. Pricing is not approved yet; hosted access is limited credits, not unlimited hosted scraping or lifetime API usage.

Primary path

Local install remains the primary path

Scout is safest to test locally: your machine, your workdir, your artifacts, your optional keys. No clean security claim is made for hosted production until the remaining launch gates close.

What people use it for

Acquisition primitives for real workflows.

Product catalogs

Extract listing records, preserve source evidence, and prepare exports for Algolia, JSONL, CSV, SQLite, or spreadsheet adapters.

Competitive intelligence

Collect company, investor, careers, news, docs, and website-quality evidence into repeatable artifact folders.

Agent tools

Give local agents a controlled acquisition layer with provenance instead of raw, untraceable fetches.

Pricing direction

Free local. Metered hosted.

$0

Local beta

Install Scout locally and run with your own compute, browser, storage, and optional provider keys.

PAYG

Pay-as-you-go candidate

Prepaid credits are the leading paid model. Monthly plans come later only if real usage patterns justify them.

Quickstart

Try Scout locally first.

Local Scout is the cleanest beta path because you own the machine, browser, artifacts, and keys. Hosted Scout comes after tenancy, quotas, and payment are production-ready.

git clone https://github.com/arijitchowdhury80/scout.git
cd scout
git checkout codex/scout-platform-foundation
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
playwright install chromium
scout serve

Private beta

Bring one website you currently research or scrape by hand.

The first beta should test whether Scout makes acquisition more inspectable, reusable, and trustworthy. Not whether it can promise infinite hosted crawling.

Local Scout stays free. Hosted beta creates a metered convenience API key after Stripe confirms payment.

Hosted beta pricing is being validated from unit economics. Limited credits, not unlimited crawling.

View GitHub View pricing