Technical Deep Dive

Knowledge Graphs &
Vector Embeddings

The foundational technologies that transform unstructured data into searchable, connected intelligence.

01. What is a Knowledge Graph?

A knowledge graph is a structured representation of information as a network of entities (nodes) connected by relationships (edges), with semantic meaning encoded in the structure itself.

Nodes (Entities)

Real-world objects: customers, products, reports, metrics. Abstract concepts: churn, profitability, seasonality. Each node has properties.

Edges (Relationships)

Directed connections between nodes. Typed relationships: ANALYZED_BY, CORRELATES_WITH, CAUSED_BY.

Labels & Properties

Descriptive tags categorizing nodes and edges. Properties add context: timestamps, confidence scores, source information.

The Triple: (Subject, Predicate, Object)

The fundamental unit of a knowledge graph is the triple:

// Triples from an analytics knowledge graph (Customer_Churn_Report, FOUND, 4.2%_Monthly_Churn_Rate) (4.2%_Monthly_Churn_Rate, CORRELATES_WITH, Support_Ticket_Volume) (Support_Ticket_Volume, ANALYZED_BY, Operations_Team_Q3_Report)
Sources
IBM Stardog

02. Knowledge Graph vs. Relational Database

While relational databases excel at transactional data with known query patterns, knowledge graphs are optimized for connected data and discovery.

Aspect Relational Database Knowledge Graph
Data Model Tables with rows and columns Nodes and edges (graph)
Schema Fixed schema defined upfront Schema-flexible, evolves with data
Relationships Foreign keys + JOINs (computed at query) First-class citizens (stored explicitly)
Query Language SQL SPARQL, Cypher, Gremlin
Best For Transactional data, known queries Connected data, discovery, inference

Why This Matters for Analytics

Relational databases answer "What is X?" efficiently. Knowledge graphs answer "How is X related to Y, Z, and everything else?"—which is what you need when searching across analytical insights.

Sources
Neo4j Legislate

03. Who Uses Knowledge Graphs?

Knowledge graphs power some of the most sophisticated information systems in the world.

Google Knowledge Graph

Contains over 500 billion facts about people, places, and things. Powers the "knowledge panel" in search results and disambiguates queries—knowing whether "jaguar" means the animal, the car, or the NFL team based on context.

Netflix

Connects movies, actors, directors, genres, themes, and viewer behavior. Powers approximately 80% of content consumption through recommendations. Understands sequel relationships, franchise membership, and thematic similarities.

Amazon

Connects products, customer reviews, purchase behaviors, and attributes. Enables personalized recommendations and powers natural language product search.

04. What Are Vector Embeddings?

A vector embedding is a numerical representation of data as a point in high-dimensional space, where semantic similarity corresponds to geometric proximity.

// Text converted to vectors (simplified - actual vectors have 1536 dimensions) "Churn analysis" → [0.82, -0.15, 0.67, 0.23, ..., -0.41] "Customer attrition" → [0.79, -0.18, 0.71, 0.19, ..., -0.38] // similar meaning = close vectors "Weather forecast" → [-0.34, 0.56, -0.12, 0.89, ..., 0.23] // different = far apart

"Churn analysis" and "Customer attrition" have different words but similar meanings—their vectors are close together in embedding space. "Weather forecast" is semantically unrelated, so its vector points in a different direction.

How Embeddings Are Created

Modern embedding models use transformer neural networks trained on massive text corpora. The model learns to map semantically similar content to nearby points in vector space.

1. Tokenization

Text is split into tokens (subwords). Each token gets an initial vector.

2. Attention

Transformer layers let each token attend to all others, capturing context.

3. Pooling

Token vectors are combined into a single vector representing the full text.

Sources
Pinecone OpenAI

05. Measuring Similarity

Cosine similarity measures the angle between two vectors, determining how semantically related they are.

Cosine Similarity
similarity = (A · B) / (||A|| × ||B||)

Dot product divided by the product of magnitudes

Score Meaning Example
0.95+ Nearly identical meaning "Customer churn" vs "Client attrition"
0.7-0.95 Strongly related "Customer churn" vs "Retention strategy"
0.5-0.7 Somewhat related "Customer churn" vs "Customer satisfaction"
< 0.5 Unrelated "Customer churn" vs "Weather forecast"

Why Cosine Over Distance?

Cosine similarity measures angle, not magnitude. A short query and a long document can have high similarity if they point the same direction—their lengths don't matter.

Sources
Milvus

07. How MCP Analytics Uses This

MCP Analytics applies these technologies to make your analytical work searchable and connected.

1

Report Embedding

Every completed analysis is processed through OpenAI's text-embedding-3-small model, converting metrics, insights, and findings into 1536-dimensional vectors.

2

Vector Storage

Embeddings are stored in PostgreSQL using pgvector, enabling efficient similarity search across your entire analytical history.

3

Semantic Search

Natural language questions are embedded and compared against all stored reports using cosine similarity to find relevant past analyses.

4

Compounding Knowledge

Every new analysis adds to your searchable knowledge base. The more you analyze, the more context is available for future queries.

Technical Specs

Embedding Model: text-embedding-3-small
Dimensions: 1536
Vector Store: PostgreSQL + pgvector
Similarity Metric: Cosine similarity

The more you analyze, the smarter your organization gets.

See How It Works Get Started Free

Turn Your Analytics Into Searchable Knowledge

Every analysis you run becomes part of your organization's growing intelligence.