Search Type | Speed | Accuracy | Flexibility | Storage | Best For |
---|---|---|---|---|---|
Exact Match | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐ | ⭐⭐⭐⭐⭐ | IDs, codes, filters |
Pattern Match | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | Autocomplete, prefixes |
Full-Text | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | Documents, articles |
Vector / Semantic | ⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐ | Recommendations, concepts |
Fuzzy | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | Typos, data cleaning |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- CREATE EXTENSION IF NOT EXISTS postgis; | |
-- CREATE EXTENSION IF NOT EXISTS pgvector; | |
------------------------------ Exercise 1 ------------------------------ | |
-- Table setup | |
CREATE TABLE products ( | |
id SERIAL PRIMARY KEY, | |
sku VARCHAR(20) UNIQUE, | |
name VARCHAR(200), | |
category VARCHAR(50), |
Scenario | Best Choice | Alternative / Avoid |
---|---|---|
User login/auth | Exact Match | All others |
Product SKU lookup | Exact Match | All others |
Autocomplete | Pattern Match (prefix) | Fuzzy, Vector |
Blog search | Full-Text | Vector + Full-Text, Pattern |
Recommendation | Vector + Full-Text | Pattern |
Exact Data with typos | Fuzzy | Pattern, Exact |
Multi-language content | Vector + Full-Text | Pattern |
Real-time search | Exact / Pattern | Full-Text, Vector |
Search Type | Additional Storage | Index Size | Notes |
---|---|---|---|
Exact Match | None | ~2–5% of data | B-tree indexes |
Pattern Match | None | ~10–20% | GIN trigram indexes |
Full-Text | ~20–50% | ~10–30% | tsvector + GIN index |
Vector Search | ~50–200% | ~20–100% | Depends on dimensions |
Fuzzy Search | None | ~10–20% | Uses trigram indexes |
Analyzer | Description | Example Use |
---|---|---|
Standard | Default; breaks text by word boundaries, removes most punctuation, lowercases tokens. | English prose, general search |
Simple | Splits on non-letter, lowercases. | Part numbers, technical terms |
Whitespace | Splits on whitespace only, preserves case. | Code, serial numbers |
Keyword | Does not split; treats entire text as a single token. | Exact match fields, IDs, tags |
Feature | TF-IDF | BM25 |
---|---|---|
Full Form | Term Frequency – Inverse Document Frequency | Best Matching 25 |
Default in Elasticsearch | ❌ (before v5.0) | ✅ (v5.0 and later) |
Term Frequency Handling | Linear | Saturated (diminishing returns) |
Document Length Normalization | Minimal | Tunable and robust |
Tunable Parameters | No | Yes (k1 , b ) |
Use Case | TF-IDF | BM25 |
---|---|---|
Simple scoring model | ✅ | ✅ |
Accurate relevance for modern search | ❌ | ✅ |
Normalize for document length | ❌ | ✅ |
Tune scoring behavior with parameters | ❌ | ✅ |
NewerOlder