PostgreSQL Full-Text Search: tsvector, tsquery, Ranking and Multilingual Search
PostgreSQL has a sophisticated built-in full-text search engine that handles stemming, stop words, ranking, and multiple languages. For many applications, it eliminates the need for an external search service like Elasticsearch. This guide shows you how to build production-quality search using only PostgreSQL.
Core Concepts
tsvector
A tsvector is a preprocessed, sorted list of unique words (lexemes) extracted from a document, with their positions. It handles:
- Normalization β lowercasing, removing punctuation
- Stemming β "running" β "run", "cats" β "cat"
- Stop words β removing "the", "is", "a", etc.
sqlSELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog'); -- 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2 -- stop words (the, over) removed, words stemmed and sorted
tsquery
A tsquery is a search expression using operators:
&β AND (both terms must match)|β OR (either term matches)!β NOT (term must not match)<->β phrase search (adjacent words)
sqlSELECT to_tsquery('english', 'javascript & (react | vue)'); -- 'javascript' & ( 'react' | 'vue' ) SELECT to_tsquery('english', 'full <-> text <-> search'); -- phrase: "full text search" (words must be adjacent in order) -- plainto_tsquery: natural language input (no operators needed from user) SELECT plainto_tsquery('english', 'full text search postgresql'); -- 'full' & 'text' & 'search' & 'postgresql' -- websearch_to_tsquery: Google-like syntax SELECT websearch_to_tsquery('english', 'react hooks -class "use effect"'); -- 'react' & 'hook' & !'class' & 'use' <-> 'effect'
The @@ operator
Test if a tsvector matches a tsquery:
sqlSELECT to_tsvector('english', 'PostgreSQL full text search') @@ to_tsquery('english', 'postgresql & search'); -- true SELECT to_tsvector('english', 'PostgreSQL full text search') @@ to_tsquery('english', 'mysql'); -- false
Setting Up Full-Text Search on a Table
Adding a tsvector column
sqlALTER TABLE articles ADD COLUMN search_vector tsvector; -- Populate the column UPDATE articles SET search_vector = to_tsvector( 'english', COALESCE(title, '') || ' ' || COALESCE(body, '') || ' ' || COALESCE(tags, '') ); -- Auto-update with a trigger CREATE OR REPLACE FUNCTION articles_search_trigger() RETURNS TRIGGER AS $$ BEGIN NEW.search_vector := to_tsvector( 'english', COALESCE(NEW.title, '') || ' ' || COALESCE(NEW.body, '') || ' ' || COALESCE(NEW.tags, '') ); RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER articles_search_update BEFORE INSERT OR UPDATE ON articles FOR EACH ROW EXECUTE FUNCTION articles_search_trigger();
GIN Index for fast search
Without an index, full-text search scans every row. A GIN index makes it fast:
sqlCREATE INDEX idx_articles_search ON articles USING GIN(search_vector); -- Now this query uses the index SELECT id, title FROM articles WHERE search_vector @@ to_tsquery('english', 'postgresql & indexing') LIMIT 20;
Weighted Search
Different parts of a document can have different importance. Weight A (highest) β D (lowest):
sqlUPDATE articles SET search_vector = setweight(to_tsvector('english', COALESCE(title, '')), 'A') || setweight(to_tsvector('english', COALESCE(summary, '')), 'B') || setweight(to_tsvector('english', COALESCE(body, '')), 'C') || setweight(to_tsvector('english', COALESCE(tags, '')), 'A');
Matches in the title and tags rank higher than matches in the body.
Ranking Results
sqlSELECT id, title, ts_rank(search_vector, query) AS rank, ts_rank_cd(search_vector, query) AS rank_cd -- cover density variant FROM articles, to_tsquery('english', 'postgresql & performance') query WHERE search_vector @@ query ORDER BY rank DESC LIMIT 10;
ts_rank considers term frequency. ts_rank_cd also considers how close terms appear to each other β often better for multi-term queries.
Normalize by document length
sql-- Divide rank by 1 + log(document length) to avoid long documents dominating SELECT id, title, ts_rank(search_vector, query, 1) AS rank -- normalization option 1 FROM articles, to_tsquery('english', 'database') query WHERE search_vector @@ query ORDER BY rank DESC;
Headline (Excerpt with Highlighting)
Show matching context with highlighted terms:
sqlSELECT title, ts_headline( 'english', body, to_tsquery('english', 'postgresql & search'), 'MaxWords=30, MinWords=15, StartSel=<mark>, StopSel=</mark>' ) AS excerpt FROM articles WHERE search_vector @@ to_tsquery('english', 'postgresql & search');
Result:
...using <mark>PostgreSQL</mark>'s built-in <mark>search</mark> capabilities...Complete Search Function
sqlCREATE OR REPLACE FUNCTION search_articles( query_text TEXT, result_limit INT DEFAULT 10, result_offset INT DEFAULT 0 ) RETURNS TABLE( id INT, title TEXT, excerpt TEXT, rank FLOAT4, total_count BIGINT ) AS $$ DECLARE search_query tsquery; BEGIN search_query := websearch_to_tsquery('english', query_text); RETURN QUERY SELECT a.id, a.title, ts_headline( 'english', a.body, search_query, 'MaxWords=25, MinWords=10, StartSel=<b>, StopSel=</b>' ) AS excerpt, ts_rank_cd(a.search_vector, search_query) AS rank, COUNT(*) OVER() AS total_count FROM articles a WHERE a.search_vector @@ search_query AND a.published = true ORDER BY rank DESC LIMIT result_limit OFFSET result_offset; END; $$ LANGUAGE plpgsql; -- Usage SELECT * FROM search_articles('postgresql full text search', 10, 0);
Multilingual Search
sql-- Check available text search configurations SELECT cfgname FROM pg_ts_config; -- english, french, german, spanish, portuguese, ... -- Per-row language UPDATE articles SET search_vector = to_tsvector(language::regconfig, body) WHERE language IN ('english', 'french', 'german'); -- Search in a specific language SELECT title FROM articles WHERE to_tsvector('french', body) @@ to_tsquery('french', 'recherche');
Fuzzy Search with pg_trgm
For typo-tolerant search, combine full-text search with trigram similarity:
sqlCREATE EXTENSION pg_trgm; CREATE INDEX idx_articles_title_trgm ON articles USING GIN(title gin_trgm_ops); -- Find titles similar to the query (handles typos) SELECT title, similarity(title, 'postgresq') AS sim FROM articles WHERE title % 'postgresq' -- similarity threshold ORDER BY sim DESC LIMIT 10; -- Combined: full-text + fuzzy fallback WITH fts AS ( SELECT id, title, ts_rank(search_vector, query) AS rank FROM articles, to_tsquery('english', 'postgresql') query WHERE search_vector @@ query ), fuzzy AS ( SELECT id, title, similarity(title, 'postgresql') AS rank FROM articles WHERE title % 'postgresql' AND id NOT IN (SELECT id FROM fts) ) SELECT * FROM fts UNION ALL SELECT * FROM fuzzy ORDER BY rank DESC;
Common Interview Questions
Q: What is the difference between LIKE and full-text search in PostgreSQL?
LIKE '%term%' does a simple substring match β it cannot use a B-tree index on the interior, does not understand language (no stemming or stop words), and cannot rank results. Full-text search with tsvector/tsquery supports GIN indexes for fast search, handles stemming (matching "run" finds "running"), removes stop words, ranks results by relevance, and supports phrase and proximity search.
Q: What is a GIN index and why is it used for full-text search?
GIN (Generalized Inverted Index) stores an index mapping each unique lexeme in the corpus to the set of documents containing it β the inverse of a document pointing to its words. This makes it extremely fast for looking up "which documents contain this word" queries. B-tree indexes store values in sorted order, which is efficient for range queries but not for searching inside complex types like tsvector.
Q: How do you keep the search_vector column up to date?
The standard approach is a BEFORE INSERT OR UPDATE trigger that recalculates and stores the tsvector whenever a row is inserted or updated. This keeps the index always current at the cost of slightly slower writes. For very write-heavy tables, an alternative is to not store tsvector at all and compute it inline:
WHERE to_tsvector('english', body) @@ queryPractice SQL on Froquiz
Full-text search and advanced PostgreSQL features appear in backend and data engineering interviews. Test your SQL knowledge on Froquiz across all difficulty levels.
Summary
tsvectoris a preprocessed, stemmed word list;tsqueryis a boolean search expression@@matches atsvectoragainst atsqueryβ the core search operator- GIN index on
search_vectormakes full-text search fast β always add it to production tables setweightassigns A-D importance to different parts (title, body, tags)ts_rankandts_rank_cdscore relevance; sort by rank descending for best resultsts_headlinegenerates excerpts with highlighted matching termswebsearch_to_tsqueryaccepts natural language input β ideal for user-facing search boxes- Add
pg_trgmfor fuzzy/typo-tolerant matching alongside full-text search