Technical Deep Dive
The foundational technologies that transform unstructured data into searchable, connected intelligence.
A knowledge graph is a structured representation of information as a network of entities (nodes) connected by relationships (edges), with semantic meaning encoded in the structure itself.
Real-world objects: customers, products, reports, metrics. Abstract concepts: churn, profitability, seasonality. Each node has properties.
Directed connections between nodes. Typed relationships: ANALYZED_BY, CORRELATES_WITH, CAUSED_BY.
Descriptive tags categorizing nodes and edges. Properties add context: timestamps, confidence scores, source information.
The fundamental unit of a knowledge graph is the triple:
// Triples from an analytics knowledge graph
(Customer_Churn_Report, FOUND, 4.2%_Monthly_Churn_Rate)
(4.2%_Monthly_Churn_Rate, CORRELATES_WITH, Support_Ticket_Volume)
(Support_Ticket_Volume, ANALYZED_BY, Operations_Team_Q3_Report)
While relational databases excel at transactional data with known query patterns, knowledge graphs are optimized for connected data and discovery.
| Aspect | Relational Database | Knowledge Graph |
|---|---|---|
| Data Model | Tables with rows and columns | Nodes and edges (graph) |
| Schema | Fixed schema defined upfront | Schema-flexible, evolves with data |
| Relationships | Foreign keys + JOINs (computed at query) | First-class citizens (stored explicitly) |
| Query Language | SQL | SPARQL, Cypher, Gremlin |
| Best For | Transactional data, known queries | Connected data, discovery, inference |
Relational databases answer "What is X?" efficiently. Knowledge graphs answer "How is X related to Y, Z, and everything else?"—which is what you need when searching across analytical insights.
Knowledge graphs power some of the most sophisticated information systems in the world.
Contains over 500 billion facts about people, places, and things. Powers the "knowledge panel" in search results and disambiguates queries—knowing whether "jaguar" means the animal, the car, or the NFL team based on context.
Connects movies, actors, directors, genres, themes, and viewer behavior. Powers approximately 80% of content consumption through recommendations. Understands sequel relationships, franchise membership, and thematic similarities.
Connects products, customer reviews, purchase behaviors, and attributes. Enables personalized recommendations and powers natural language product search.
A vector embedding is a numerical representation of data as a point in high-dimensional space, where semantic similarity corresponds to geometric proximity.
// Text converted to vectors (simplified - actual vectors have 1536 dimensions)
"Churn analysis" → [0.82, -0.15, 0.67, 0.23, ..., -0.41]
"Customer attrition" → [0.79, -0.18, 0.71, 0.19, ..., -0.38] // similar meaning = close vectors
"Weather forecast" → [-0.34, 0.56, -0.12, 0.89, ..., 0.23] // different = far apart
"Churn analysis" and "Customer attrition" have different words but similar meanings—their vectors are close together in embedding space. "Weather forecast" is semantically unrelated, so its vector points in a different direction.
Modern embedding models use transformer neural networks trained on massive text corpora. The model learns to map semantically similar content to nearby points in vector space.
Text is split into tokens (subwords). Each token gets an initial vector.
Transformer layers let each token attend to all others, capturing context.
Token vectors are combined into a single vector representing the full text.
Cosine similarity measures the angle between two vectors, determining how semantically related they are.
Dot product divided by the product of magnitudes
| Score | Meaning | Example |
|---|---|---|
| 0.95+ | Nearly identical meaning | "Customer churn" vs "Client attrition" |
| 0.7-0.95 | Strongly related | "Customer churn" vs "Retention strategy" |
| 0.5-0.7 | Somewhat related | "Customer churn" vs "Customer satisfaction" |
| < 0.5 | Unrelated | "Customer churn" vs "Weather forecast" |
Cosine similarity measures angle, not magnitude. A short query and a long document can have high similarity if they point the same direction—their lengths don't matter.
Finding relevant content based on meaning rather than keyword matching.
// 1. Index: Embed all documents once
Report_1 → embed() → [0.82, -0.15, ...] → Store
Report_2 → embed() → [0.45, 0.33, ...] → Store
Report_3 → embed() → [-0.21, 0.67, ...] → Store
// 2. Query: Embed the question, find nearest vectors
"What do we know about customer churn?"
→ embed() → [0.79, -0.18, ...]
→ cosine_similarity() with all stored vectors
→ Return: Report_1 (0.94), Report_47 (0.87), Report_203 (0.82)
Query: "churn"
Finds: Documents containing "churn"
Misses: Documents about "attrition", "customer loss", "retention failure"
Query: "churn"
Finds: All documents about customer loss, regardless of terminology
Understands synonyms, related concepts
MCP Analytics applies these technologies to make your analytical work searchable and connected.
Every completed analysis is processed through OpenAI's text-embedding-3-small model, converting metrics, insights, and findings into 1536-dimensional vectors.
Embeddings are stored in PostgreSQL using pgvector, enabling efficient similarity search across your entire analytical history.
Natural language questions are embedded and compared against all stored reports using cosine similarity to find relevant past analyses.
Every new analysis adds to your searchable knowledge base. The more you analyze, the more context is available for future queries.
Every analysis you run becomes part of your organization's growing intelligence.