Most product tools help you store work. spark2build is designed to reason about it. This post explains exactly how that works — the data pipeline, the AI models, the retrieval architecture, and why those technical choices matter for enterprise product managers. We also compare directly with the tools teams most commonly use instead.
The problem we are solving
Enterprise product managers carry an enormous amount of context. They have user research from last quarter, a regulatory briefing from legal, three competitor analyses, a roadmap approved six months ago, and a product vision that was rewritten twice. All of it lives in different places. None of it talks to each other.
When a PM has a new product idea — what the team calls a Spark — they need to answer a specific set of questions before they can move forward:
- Does this conflict with anything in the regulatory landscape?
- Has a similar idea already been validated or rejected?
- Is there user research that supports or contradicts this direction?
- Does this align with the current product vision and roadmap?
- What do we know about competitors doing something similar?
In practice, answering these questions takes hours of manual searching. PMs open ten tabs, ask three colleagues, and often miss something important anyway. spark2build automates exactly this cross-referencing — not as a search interface, but as a structured reasoning layer over a curated knowledge base.
The cost of disconnected knowledge
Architecture overview: the five-stage pipeline
Every piece of knowledge added to spark2build flows through five stages before it can influence a Spark analysis. Understanding these stages explains why the system produces confident, explainable results rather than generic AI suggestions.
Stage 1 — Knowledge ingestion
Knowledge enters spark2build as structured entries organised into six domains: User Research, Regulations, Competitors, Vision, Roadmap, and Product Documents. Teams can type directly, import from files (PDF, DOCX, PPTX, HTML, Markdown, plain text up to 25 MB), or sync documents from the built-in Product Documents editor.
Each entry carries a valid_to date — an expiry timestamp that tells the system when that piece of knowledge should be treated as stale. Regulations change. Competitor analyses age. User research from three years ago describes a different market. The valid_to field is not optional decoration; it is input to the confidence scoring model described below.
Stage 2 — Chunking and embedding
Long entries are split into overlapping semantic chunks. Each chunk is embedded using OpenAI's text-embedding-3-small model, which produces a 1536-dimensional vector. These vectors are stored in PostgreSQL with the pgvector extension via Supabase.
When a Spark is analysed, its title and context are also embedded. The system then computes cosine distance between the Spark embedding and all knowledge entry embeddings to find semantically similar content — even when the words used are completely different.
Why cosine distance and not keyword search
Stage 3 — Entity extraction and the knowledge graph
This is where spark2build diverges significantly from tools that simply store and search text. Every entry is processed by GPT-4o to extract named entities: products, organisations, regulations, technologies, people, and domain concepts.
Before inserting a new entity, the system checks for near-duplicates using embedding similarity (threshold: 0.9 cosine similarity). If a matching entity already exists — for example, "GDPR" and "General Data Protection Regulation" — the new surface form is merged into the existing entity's alias list rather than creating a duplicate node. This co-reference resolution keeps the graph clean as the knowledge base grows.
The result is two tables: graph_entities (named concept nodes) andgraph_entity_mentions (which entries mention which entities, with a confidence score). These form a relational knowledge graph that enables a retrieval technique called GraphRAG.
Stage 4 — Domain synthesis
Domain synthesis runs as a background cron job every 15 minutes. It reads all entries in each domain and uses GPT-4o to produce a compiled, coherent summary of what the team collectively knows in that area. This summary is stored as a domain_synthesesrow.
Synthesis is incremental by default: when a new entry arrives, the system gives GPT-4o the existing synthesis plus the new entry and asks it to integrate the new knowledge — rather than reprocessing the entire domain from scratch. This keeps costs predictable as the knowledge base scales.
Synthesis also detects contradictions. When GPT-4o identifies that a new entry conflicts with existing domain knowledge, it writes a knowledge_conflictsrow with a severity rating (high / medium / low) and a description of the conflicting claims. PMs see these conflicts surfaced in the Radar feed, where they can resolve, dismiss, or mark them as intentional.
Stage 5 — Spark validation with decomposed confidence
When a PM creates or re-analyses a Spark, the system runs a multi-phase retrieval and reasoning pipeline. The output is not a list of search results — it is a set of scored relationships between the Spark and specific knowledge entries, each with a decomposed confidence score.
Three independent signals contribute to every relationship score:
- Structural confidence — based on entity co-occurrence in the knowledge graph. If a Spark mentions "payment tokenisation" and a compliance entry also mentions that entity, that is a structural signal that they are related.
- Semantic confidence — cosine distance between the Spark embedding and the entry embedding. This catches conceptual relationships even when no shared entities are mentioned.
- Temporal confidence — a freshness decay score derived from valid_to dates and domain-specific staleness thresholds. A regulation that expired two years ago is weighted less than one that is current. A competitor analysis from last month scores higher than one from three years ago.
The weights shift depending on which retrieval mode is active.
The system auto-selects the retrieval mode based on knowledge base size. A team just starting out gets fast, precise vector search. A mature team with hundreds of entries gets full graph traversal — where the system first extracts entities from the Spark itself, traverses the graph to find connected entries, then re-ranks using all three confidence signals.
Regulations are always force-injected
What the PM actually sees
The output of all this machinery is designed to be immediately actionable for a PM who does not care about vector databases or graph traversal.
Each Spark receives a set of relationship cards grouped by domain. Every card shows the linked knowledge entry, a confidence percentage, a plain-English explanation of why the AI thinks they are related, and a thumbs up / thumbs down feedback button. PM feedback is stored in relationship_feedback and used to improve synthesis signal counts over time.
Knowledge gaps are surfaced explicitly. If a Spark touches an area where the knowledge base has thin coverage — for example, a Spark about a new payment product where the team has no competitor research in that segment — the system flags that gap by domain. Gaps are not failures; they are signals that tell the PM exactly where to invest research effort.
The Decision Brief is the final output: a structured one-page summary of the Spark, the supporting evidence, the identified gaps, and the PM's validated position. It can be exported and saved back to the Knowledge Hub as a new entry — creating a feedback loop where validated decisions become part of the knowledge base for future Sparks.
The Knowledge Health Score
Most knowledge management tools have no way to tell you whether your knowledge base is actually useful. spark2build computes a Knowledge Health Score (0–100) from six signals measured in pure SQL — no LLM call, target latency under 800ms.
- Coverage — are all six domains populated?
- Freshness — what percentage of entries have a valid_to date, and how many are stale?
- Contradiction health — what is the ratio of unresolved conflicts to total entries?
- Synthesis currency — how recently was each domain synthesised?
- Spark linkage — what fraction of knowledge entries are actually referenced in Spark analyses?
- Insight reinjection — are validated Sparks and Decision Briefs being saved back into the Knowledge Hub?
The score is computed on demand when a PM visits the Health dashboard, with a lazy weekly snapshot written to a knowledge_health_snapshots table for trend tracking. A sparkline of up to eight weekly snapshots shows whether the team's collective knowledge is improving or degrading over time.
Why health scores matter for enterprise
How we compare to other tools
There are good tools for storing product knowledge. There are good tools for managing product ideas. spark2build is the only tool that does both and actively connects them with AI reasoning. Here is an honest breakdown.
spark2build vs Notion AI
Notion is excellent for flexible document storage. Notion AI adds summarisation and writing assistance on top of that storage. What it does not do is cross-reference your product ideas against your knowledge base, extract a knowledge graph from your documents, or tell you which regulatory entries are relevant to a new feature you are considering.
Notion AI answers the question: "Can you summarise this page?" spark2build answers the question: "Is this product idea safe to build given everything my team knows?"
Notion has no concept of knowledge freshness, domain structure, or confidence scoring. A two-year-old compliance note looks identical to one written last week. In financial services, healthcare, or telecoms, that is not a minor limitation — it is a compliance risk.
spark2build vs Confluence
Confluence is the dominant enterprise wiki. It is powerful for documentation and process management. Its search is keyword-based, and Atlassian Intelligence (its AI layer) is primarily a writing and summarisation tool.
Most enterprise PMs already use Confluence. The problem is that Confluence has no product intelligence layer. You cannot ask Confluence: "What does our knowledge base say about the regulatory implications of this product idea?" You can search for keywords, read documents, and draw your own conclusions — manually.
spark2build integrates with Jira (Atlassian's issue tracker) to link Sparks to Jira tickets bidirectionally. Teams can keep Confluence as their documentation layer and use spark2build as the reasoning layer on top.
spark2build vs Productboard
Productboard is the category leader for product management — roadmapping, feature prioritisation, customer feedback collection, and stakeholder communication. It is a strong tool for its intended purpose.
Productboard does not have a structured knowledge base. It captures customer feedback and feature requests, but it does not connect those features to regulatory knowledge, competitor intelligence, or product vision in an AI-reasoned way. Its "insights" feature links feedback to features, but there is no cross-domain confidence scoring or knowledge graph. There is also no concept of knowledge health or synthesis currency.
The positioning is different: Productboard is a roadmap and prioritisation tool. spark2build is a product intelligence tool. In practice, they are often complementary — spark2build validates whether a feature idea is well-grounded before it enters the Productboard roadmap.
spark2build vs Dovetail
Dovetail is the best-in-class user research repository. It is excellent at storing, tagging, and surfacing qualitative user research — interview transcripts, survey responses, usability study notes. Its AI highlights themes and patterns across research sessions.
Dovetail covers one domain out of six. It does not handle competitor intelligence, regulatory knowledge, roadmap history, or product vision. It has no mechanism for cross-referencing a new product idea against all six domains simultaneously.
If your team uses Dovetail, spark2build does not replace it. It complements it: the key research findings from Dovetail become structured entries in the User Research domain of the spark2build Knowledge Hub, where they can be cross-referenced against Sparks alongside regulatory and competitive knowledge.
spark2build vs Aha!
Aha! is a comprehensive product management suite covering strategy, roadmaps, ideas, and releases. It is a strong operational tool for managing product work end-to-end.
Aha! has an Ideas portal for capturing product ideas and a strategy section for vision and goals. Neither is backed by a knowledge graph, AI cross-referencing, or decomposed confidence scoring. Ideas in Aha! are voted on by stakeholders — which is a valid signal, but a different one to "does our knowledge base support this idea?"
What we have deliberately not built
Transparency matters. Here is what spark2build is not, and why.
We are not a document editor. There are better tools for writing and collaborating on documents. spark2build has a Product Documents section, but it is designed for structured capture, not rich editing. Teams write their documents in the tools they already use; spark2build is where the intelligence about those documents lives.
We are not a project management tool. We integrate with Jira to link Sparks to tracked work, but we do not replace sprint planning, backlog management, or delivery tracking.
We are not a generic AI assistant. The AI in spark2build is narrowly scoped to product intelligence tasks: cross-referencing ideas against structured knowledge, detecting contradictions, synthesising domain knowledge, and generating decision briefs. We do not provide a general chat interface over your knowledge base. That choice is intentional — a chat interface over unstructured documents produces unpredictable outputs that enterprise teams cannot rely on for consequential decisions.
The reasoning behind scoped AI
The enterprise design choices
Several technical choices in spark2build were made specifically because the target users are enterprise product managers in regulated industries.
Workspace isolation. Every piece of data is scoped to a workspace ID via row-level security in Postgres. No cross-tenant data leakage is possible at the query level — it is enforced in the database, not just in application code.
Role-based access. Admin, PM Creator, and Viewer roles are enforced at both the API and UI level. Viewers cannot write data; write attempts return HTTP 403. This matters for financial services teams where not everyone should have edit access to regulatory knowledge.
Temporal knowledge management. The valid_to date system, the staleness indicators, and the Knowledge Health Score freshness signal all exist because enterprise product knowledge has a shelf life. A knowledge management system that treats a three-year-old market analysis the same as a current one is not safe to use for regulated product decisions.
Explainable AI outputs. Every relationship has a plain-English confidence explanation. Every confidence score is broken into three components. Every knowledge gap is labelled by domain. spark2build is not a black box — it is designed to show its reasoning so that PMs can exercise judgement, not just accept outputs.
Current scale and what comes next
The current architecture handles enterprise knowledge bases well into the hundreds of entries per workspace, with GraphRAG Full mode active at 100 entries and above. The entity graph with co-reference resolution keeps the graph clean as it grows, and incremental synthesis means costs scale sub-linearly with knowledge base size.
The product roadmap includes a deeper graph intelligence layer — Phases 2 through 4 of the knowledge graph PRD — which will enable multi-hop relationship reasoning (connecting a Spark to an entry two or three entity links away, not just direct mentions). This will be especially valuable for large enterprise teams where a single Spark might be relevant to dozens of interconnected regulatory and competitive signals.
For teams evaluating product intelligence tools in 2026: the question is not whether AI will be part of product management. It already is. The question is whether the AI your team uses can show its work.