All Posts
The Sovereignty Protocol
spectreweb intelligencepro
๐Ÿ•ต๏ธ

Project Spectre: AI-Powered Site Intelligence at Scale

Project Spectre is a fully sovereign web intelligence engine โ€” like Firecrawl or Tavily, but self-hosted, agent-callable, and tightly integrated with the rest of the Sovereignty Protocol. Here is what it does and why it exists.

30 April 2026ยท7 min readยทThe Sovereignty Protocol Team

The Web Is Your Data Source

The most valuable intelligence is often sitting on a public website. Competitor pricing. News coverage. Technical documentation. Job listings that signal what a company is building next. Domain registrations that reveal expansion plans.

The problem is that getting from a URL to structured, usable data โ€” at any scale โ€” is genuinely hard. You need to handle pagination, rate limits, JavaScript rendering, link discovery, and the messy reality of inconsistent HTML structure. Then you need to pipe the results somewhere useful.

Project Spectre is the Sovereignty Protocol's answer to that problem. It is a fully sovereign web intelligence engine built on the same infrastructure as the rest of the platform โ€” callable by your agents, auditable in your logs, and fully under your control.


What Spectre Does

At its core, Spectre extracts structured content from any website in clean JSON and Markdown. For each page it crawls, you get:

  • Title and H1โ€“H3 heading structure โ€” the semantic skeleton of the page
  • Clean body text โ€” stripped of navigation, ads, and boilerplate
  • Internal links โ€” the full link graph within the site
  • External links โ€” domains this site references, which feed into the Spectre Hopper for chaining
  • Word count and crawl depth โ€” metadata for filtering and prioritisation
  • A site map tree โ€” built automatically as pages are discovered

The output format is your choice: JSON for machine processing, Markdown for direct LLM ingestion, or both. The results are ready to pipe directly into a vector store, a Nexus Report, an LLM prompt, or any downstream system you operate.


The Four Spectre Routes

Spectre Extract

The foundation. Give it a URL and a depth budget โ€” extract a single page, a section, or an entire domain. The output is clean, structured, and immediately useful. This is what the mcp_spectre_extract MCP tool calls under the hood when your agents use Spectre autonomously.

Spectre Campaigns

Run intelligence operations against multiple targets in parallel. A campaign defines a set of URLs, a crawl depth, a schedule, and an output handler. Results accumulate over time, so you can track changes across pages โ€” a competitor's pricing page on Monday versus Friday, for example.

Campaigns can trigger Nexus Cascades when they complete, so fresh intelligence flows automatically into your downstream workflows.

Spectre Domain Suite

Domain-level intelligence: link health scanning, external domain discovery, and outreach target identification from a single crawl. The Domain Suite builds a picture of a site's connectivity โ€” what it links to, who links to it (via outbound reference analysis), and where broken links are degrading SEO.

Spectre Email

Outreach discovery integrated with the crawl infrastructure. Spectre Email finds contact signals from target domains โ€” email patterns, contact page structures, LinkedIn references โ€” and enriches them with the context gathered during the crawl.

Combined with a Nexus Cascade automation trigger, this powers fully governed outreach pipelines: crawl a domain, enrich the contact, generate a personalised message, send via Sovereign Mail.


How It Compares

Spectre is in the same category as Firecrawl and Tavily โ€” but with some important differences:

SpectreFirecrawl / Tavily
HostingFully self-hosted, your infrastructureHosted service, third-party
Data custodyStays in your PocketBase instanceSent to external servers
Agent integrationNative MCP tool, callable by any agentAPI only
GovernanceSubject to Sovereignty Protocol audit logsExternal, not auditable
Cascade triggersBuilt-in โ€” results can trigger workflowsNot available

If data sovereignty matters to your use case โ€” financial research, competitive intelligence, client data โ€” Spectre is the answer. Nothing leaves your environment unless you explicitly export it.


Calling Spectre From Your Agents

Spectre is callable in three ways:

  1. MCP tool โ€” mcp_spectre_extract is available to any agent with tool access. Your Librarian can crawl a source and fold the content into a research synthesis in a single workflow step.
  2. API key โ€” make authenticated POST requests from any external system
  3. Nexus Cascades โ€” use the http step type to trigger a Spectre extraction as part of a multi-step cascade

The MCP integration is the most powerful path. It means your agents can decide at runtime which URLs to crawl based on what they find in previous steps โ€” adaptive, autonomous web intelligence rather than static extraction jobs.


The Spectre Hopper

When Spectre crawls a page, it discovers external domains linked from that page. Those domains are automatically added to the Spectre Hopper โ€” a queue of discovered targets waiting for further investigation.

You can review the Hopper, select domains to crawl next, and chain extractions without specifying the next URL manually. The crawl infrastructure propagates outward through the web graph, guided by your configuration and your agents' decisions.

This is how Spectre moves from a single-page extraction tool to a genuine intelligence platform.


What Teams Use Spectre For

  • Competitive monitoring โ€” weekly crawls of competitor pricing and feature pages, filed as Nexus Reports with diff summaries
  • Market research โ€” domain suite runs across industry verticals to map who is building what
  • SEO health audits โ€” link scanner reports identifying broken outbound links across a client's domain
  • Lead enrichment โ€” Spectre Email + Cascade pipelines that turn a domain list into a qualified outreach queue
  • Documentation ingestion โ€” extract third-party API docs and push them into the Smart Memory System so agents have accurate context without manual copy-paste

Project Spectre is a Pro-tier feature. If you are on a Free plan and want to see it in action, start a trial โ€” full access for 31 days, no card required.

The Sovereignty Protocol

Governed AI workforces for the real world. Laws your agents cannot break, memory that persists, security that is built in from day one.