Siteprobe. A concise command-line audit for every URL in your sitemap

Siteprobe is a command-line tool that verifies every URL in your sitemap. At Lincoln Loop, it has become a staple in my daily toolchain for complex deployments and migrations. It provides a level of verification that standard tests often miss: a guarantee that every single user-facing page on the site is still reachable and working correctly.

After a deployment, you need to know that the whole site is functional. Not just the homepage or the primary user flows covered by your test suite. Siteprobe automates this sanity check by reading your sitemap.xml, visiting every URL found, and reporting back on the health of the entire site.

Siteprobe results

Why use it?

It’s designed to answer two simple questions: Is it working? and Is it fast?

1. Performance & Health Metrics

Siteprobe doesn’t just check for “200 OK” status codes; it measures the Time to First Byte (TTFB), total download time, and payload size for every single page. This helps you identify performance bottlenecks or unexpectedly large pages that wouldn’t show up in a simple uptime check.

It also tracks redirect chains, helping you spot unintended 301/302 loops or incorrect targets, essential for SEO health during a migration.

By setting a “slow threshold” (e.g., --slow-threshold=0.5), you can flag pages that are technically working but effectively failing your performance standards.

2. Controlled Load & Stress Testing

Siteprobe gives you precise control over how hard you hit your server. You can use it for gentle health checks or turn up the dials for a simple stress test.

  • Concurrency: Control how many requests run at once (--concurrency-limit). Keep it low to be gentle, or increase it to see how your server handles simultaneous traffic.
  • Rate Limiting: Set a specific throughput ceiling, like “100 requests every 5 minutes” (--rate-limit=100r/5min). This is perfect for verifying your own rate-limiting configurations or ensuring you don’t overwhelm fragile staging environments.

3. Testing “Real” Conditions

Sometimes checking the public URL isn’t enough. You might need to peek behind a CDN or access a protected staging site.

  • Bypass Caches: Force the server to generate a fresh page by appending random timestamps to requests, ensuring you’re testing the origin, not the cache.
  • Authentication: extensive support for Basic Auth lets you audit private staging environments.
  • Custom Identity: Change your User-Agent string to mimic a mobile device or a specific bot.

4. Data & Mirroring

For engineers and data analysts, Siteprobe offers structured output in CSV or JSON formats. This makes it straightforward to integrate with CI/CD pipelines, visualize in Grafana, or process in Excel.

The reports include detailed statistics like mean and percentile latencies, providing the data needed for automated pass/fail decisions in a build pipeline.

Siteprobe can also save every downloaded page to disk, which is useful for creating static mirrors or offline backups of a dynamic site.

Usage

Run a quick check on your sitemap:

uvx siteprobe https://example.com/sitemap.xml \
  --concurrency-limit=8 \
  --slow-threshold=0.5

You’ll get a concise summary of success rates and response times, followed by a list of any URLs that need attention.

Installation

The source code is available at github.com/bartTC/siteprobe.

Using uv

uvx siteprobe

Other methods

pip install siteprobe    # Python
cargo install siteprobe  # Rust
brew install bartTC/siteprobe/siteprobe # macOS

Binary releases are also available on the GitHub releases page.

Martin Mahner

About the author

Martin Mahner

Martin is an active member and contributor to the Django community where he is mostly known as bartTC. It's likely that you have stumbled over one of his apps or snippets. Besides coding, Martin also has …

View Martin's profile