Siteprobe — a concise command-line audit for every URL in your sitemap

When a site is redeployed, migrated, or tuned, site maintainers need hard evidence that every public page still loads quickly and returns the correct status. Siteprobe supplies that evidence in one step: it reads your sitemap.xml, requests each URL in parallel, and produces an immediate summary plus an optional CSV for future analysis.

Siteprobe’s workflow is straightforward. First, it downloads the sitemap (or sitemap index) and extracts every location tag. A configurable worker pool then issues HTTP requests, recording status codes, final targets after redirects, response-time metrics, and payload sizes. When the queue is empty, the tool prints a compact table showing success and redirect rates, mean and percentile latencies, minimum and maximum response times, and the share of pages that exceeded a user-defined “slow” threshold. Any outliers, non-200 responses, or slow pages are listed so that engineers can act on them immediately.

Key capabilities are intentionally focused:

Complete sitemap coverage ensures the test set matches what search engines and users see.
Retrieves and stores all linked documents for comprehensive offline access or static site generation workflows.
Parallel retrieval with rate control (--concurrency-limit) lets you run gentle staging checks or full-speed production sweeps without overloading origin servers.
Supports flexible rate limiting in formats like (--rate-limit=100r/5min), which allows you to define precise throughput ceilings (e.g., no more than 100 requests every 5 minutes). This helps prevent rate-based blocking or server strain, especially when dealing with fragile staging environments, partner APIs, or third-party hosts that enforce strict quotas.
Flexible context switches such as Basic Auth support, cache-bypass query strings, custom user agents, and per-request time-outs adapt the probe to private, cached, or security-restricted environments.
Structured output via --report-path creates a CSV that drops straight into Grafana, Looker Studio, Excel, or any BI pipeline for trend visualisation and alerting. If you prefer something machine-readable, --report-path-json produces the same statistics as a compact JSON document—perfect for CI jobs or scripts that need to parse results and make pass/fail decisions automatically.
Retrieves and stores all linked documents for offline access or static site generation workflows.

Typical usage after a release might look like this:

siteprobe https://lincolnloop.com/sitemap.xml \
		--concurrency-limit=8 --slow-threshold=0.1 \
		--report-path="report.csv"

Within a minute or two, depending on sitemap size, you get a human-readable digest such as:

followed by the exact URLs requiring attention.

Where does Siteprobe fit? First, as a quick sanity-check to confirm that everything still works after a change; then in CI/CD pipelines as a post-deploy gate that fails when errors appear or latency rises beyond policy; in nightly cron jobs that feed latency data into dashboards; in SEO health checks to reveal unintended 301/302 chains; and in infrastructure-tuning sessions when you need before-and-after evidence of performance impact.

Siteprobe’s value is its focus: verify that every URL advertised by your site is reachable, fast, and delivering the expected status; no guesswork, no manual clicking, just data you can act on immediately.

Installation

The source code is available at https://github.com/bartTC/siteprobe. If you already have Rust and Cargo installed, you can add Siteprobe to your toolchain with a single command:

cargo install siteprobe

Once the build finishes, the siteprobe binary is on your $PATH, ready to validate your next deployment.

Installation

Martin Mahner