Page to markdown
Use this when you need one public page captured as readable text with source metadata.
mkdir -p ./scout-runs/example-page
scout scrape https://example.com \
> ./scout-runs/example-page/scrape.json
Scout Examples
Start with small, inspectable runs. Scout is strongest when every record can point back to source evidence and every failure leaves a useful artifact trail.
Workflows
Use this when you need one public page captured as readable text with source metadata.
mkdir -p ./scout-runs/example-page
scout scrape https://example.com \
> ./scout-runs/example-page/scrape.json
Use this for a category or listing page when you want product records plus blocked-page evidence.
scout products books \
--site books.toscrape.com \
--start-url https://books.toscrape.com/ \
--output-dir ./scout-runs/books-products
Use this for a small company packet with overview, key URLs, and citable source pages.
scout run company \
--query Adobe \
--mode auto \
--workdir ./scout-runs/adobe-company
Use saved mode when you already have HTML or browser-captured evidence and want reproducible extraction.
scout run research \
--mode saved \
--query ./captures/page.html \
--workdir ./scout-runs/captured-html
Artifact contract
Structured records created from the run. Product records can be prepared for search ingestion.
Source registry with URLs, providers, fetched timestamps, content hashes, and evidence pointers.
Blocked, sparse, or failed pages with provider attempts and user-facing reasons where available.
A human-readable summary of what worked, what failed, and what should be reviewed next.
Exports
Product records to JSONL, CSV, SQLite, Google Sheets, and Algolia is the intended handoff path. Start with preview and validation before pushing any hosted or production index.
scout product-export \
./scout-runs/books-products/records.json \
--format jsonl \
--output-dir ./scout-runs/books-products/exports
Boundaries
Browser and saved-evidence modes improve visibility, but Scout does not promise access to every protected site.
The beta website, CLI, HTTP API, artifacts, and hosted limits are the current product surface for launch claims.