Understanding Knowledge Graphs & Embeddings

01. What is a Knowledge Graph?

A knowledge graph is a structured representation of information as a network of entities (nodes) connected by relationships (edges), with semantic meaning encoded in the structure itself.

Nodes (Entities)

Real-world objects: customers, products, reports, metrics. Abstract concepts: churn, profitability, seasonality. Each node has properties.

Edges (Relationships)

Directed connections between nodes. Typed relationships: ANALYZED_BY, CORRELATES_WITH, CAUSED_BY.

Labels & Properties

Descriptive tags categorizing nodes and edges. Properties add context: timestamps, confidence scores, source information.

The Triple: (Subject, Predicate, Object)

The fundamental unit of a knowledge graph is the triple:

                
// Triples from an analytics knowledge graph
(Customer_Churn_Report, FOUND, 4.2%_Monthly_Churn_Rate)
(4.2%_Monthly_Churn_Rate, CORRELATES_WITH, Support_Ticket_Volume)
(Support_Ticket_Volume, ANALYZED_BY, Operations_Team_Q3_Report)

Sources

IBM Stardog

02. Knowledge Graph vs. Relational Database

While relational databases excel at transactional data with known query patterns, knowledge graphs are optimized for connected data and discovery.

Aspect	Relational Database	Knowledge Graph
Data Model	Tables with rows and columns	Nodes and edges (graph)
Schema	Fixed schema defined upfront	Schema-flexible, evolves with data
Relationships	Foreign keys + JOINs (computed at query)	First-class citizens (stored explicitly)
Query Language	SQL	SPARQL, Cypher, Gremlin
Best For	Transactional data, known queries	Connected data, discovery, inference

Why This Matters for Analytics

Relational databases answer "What is X?" efficiently. Knowledge graphs answer "How is X related to Y, Z, and everything else?"—which is what you need when searching across analytical insights.

Sources

Neo4j Legislate

03. Who Uses Knowledge Graphs?

Knowledge graphs power some of the most sophisticated information systems in the world.

Google Knowledge Graph

Contains over 500 billion facts about people, places, and things. Powers the "knowledge panel" in search results and disambiguates queries—knowing whether "jaguar" means the animal, the car, or the NFL team based on context.

Netflix

Connects movies, actors, directors, genres, themes, and viewer behavior. Powers approximately 80% of content consumption through recommendations. Understands sequel relationships, franchise membership, and thematic similarities.

Amazon

Connects products, customer reviews, purchase behaviors, and attributes. Enables personalized recommendations and powers natural language product search.

Sources

AIMultiple Wikipedia

04. What Are Vector Embeddings?

A vector embedding is a numerical representation of data as a point in high-dimensional space, where semantic similarity corresponds to geometric proximity.

                
// Text converted to vectors (simplified - actual vectors have 1536 dimensions)
"Churn analysis"     → [0.82, -0.15, 0.67, 0.23, ..., -0.41]
"Customer attrition" → [0.79, -0.18, 0.71, 0.19, ..., -0.38]  // similar meaning = close vectors
"Weather forecast"   → [-0.34, 0.56, -0.12, 0.89, ..., 0.23]  // different = far apart

"Churn analysis" and "Customer attrition" have different words but similar meanings—their vectors are close together in embedding space. "Weather forecast" is semantically unrelated, so its vector points in a different direction.

How Embeddings Are Created

Modern embedding models use transformer neural networks trained on massive text corpora. The model learns to map semantically similar content to nearby points in vector space.

1. Tokenization

Text is split into tokens (subwords). Each token gets an initial vector.

2. Attention

Transformer layers let each token attend to all others, capturing context.

3. Pooling

Token vectors are combined into a single vector representing the full text.

Sources

Pinecone OpenAI

05. Measuring Similarity

Cosine similarity measures the angle between two vectors, determining how semantically related they are.

Cosine Similarity

similarity = (A · B) / (||A|| × ||B||)

Dot product divided by the product of magnitudes

Score	Meaning	Example
0.95+	Nearly identical meaning	"Customer churn" vs "Client attrition"
0.7-0.95	Strongly related	"Customer churn" vs "Retention strategy"
0.5-0.7	Somewhat related	"Customer churn" vs "Customer satisfaction"
< 0.5	Unrelated	"Customer churn" vs "Weather forecast"

Why Cosine Over Distance?

Cosine similarity measures angle, not magnitude. A short query and a long document can have high similarity if they point the same direction—their lengths don't matter.

Sources

Milvus

06. Semantic Search

Finding relevant content based on meaning rather than keyword matching.

How It Works

                
// 1. Index: Embed all documents once
Report_1 → embed() → [0.82, -0.15, ...] → Store
Report_2 → embed() → [0.45, 0.33, ...]  → Store
Report_3 → embed() → [-0.21, 0.67, ...] → Store

// 2. Query: Embed the question, find nearest vectors
"What do we know about customer churn?"
    → embed() → [0.79, -0.18, ...]
    → cosine_similarity() with all stored vectors
    → Return: Report_1 (0.94), Report_47 (0.87), Report_203 (0.82)
                
            

Keyword Search

Query: "churn"
Finds: Documents containing "churn"
Misses: Documents about "attrition", "customer loss", "retention failure"

Semantic Search

Query: "churn"
Finds: All documents about customer loss, regardless of terminology
Understands synonyms, related concepts

Sources

Supabase Sentence Transformers

07. How MCP Analytics Uses This

MCP Analytics applies these technologies to make your analytical work searchable and connected.

1

Report Embedding

Every completed analysis is processed through OpenAI's text-embedding-3-small model, converting metrics, insights, and findings into 1536-dimensional vectors.

2

Vector Storage

Embeddings are stored in PostgreSQL using pgvector, enabling efficient similarity search across your entire analytical history.

3

Semantic Search

Natural language questions are embedded and compared against all stored reports using cosine similarity to find relevant past analyses.

4

Compounding Knowledge

Every new analysis adds to your searchable knowledge base. The more you analyze, the more context is available for future queries.

Technical Specs

Embedding Model: text-embedding-3-small

Dimensions: 1536

Vector Store: PostgreSQL + pgvector

Similarity Metric: Cosine similarity

The more you analyze, the smarter your organization gets.

See How It Works Get Started Free

Knowledge Graphs &
Vector Embeddings

Contents

01. What is a Knowledge Graph?

Nodes (Entities)

Edges (Relationships)

Labels & Properties

The Triple: (Subject, Predicate, Object)

02. Knowledge Graph vs. Relational Database

Why This Matters for Analytics

03. Who Uses Knowledge Graphs?

Google Knowledge Graph

Netflix

Amazon

04. What Are Vector Embeddings?

How Embeddings Are Created

1. Tokenization

2. Attention

3. Pooling

05. Measuring Similarity

Why Cosine Over Distance?

06. Semantic Search

How It Works

Keyword Search

Semantic Search

07. How MCP Analytics Uses This

Report Embedding

Vector Storage

Semantic Search

Compounding Knowledge

Technical Specs

Turn Your Analytics Into Searchable Knowledge

Knowledge Graphs & Vector Embeddings

Contents

01. What is a Knowledge Graph?

Nodes (Entities)

Edges (Relationships)

Labels & Properties

The Triple: (Subject, Predicate, Object)

02. Knowledge Graph vs. Relational Database

Why This Matters for Analytics

03. Who Uses Knowledge Graphs?

Google Knowledge Graph

Netflix

Amazon

04. What Are Vector Embeddings?

How Embeddings Are Created

1. Tokenization

2. Attention

3. Pooling

05. Measuring Similarity

Why Cosine Over Distance?

06. Semantic Search

How It Works

Keyword Search

Semantic Search

07. How MCP Analytics Uses This

Report Embedding

Vector Storage

Semantic Search

Compounding Knowledge

Technical Specs

Turn Your Analytics Into Searchable Knowledge

Knowledge Graphs &
Vector Embeddings