🐘 The Ultimate PostgreSQL Interview Mastery Guide
From Beginner to Most Expert — Master PL/pgSQL, DBA, Cloud, pgvector & AI Integration. Walk into your interview with unstoppable confidence.
🐣 PostgreSQL Developer — Beginner 0-2 Yrs
PostgreSQL is an advanced, open-source object-relational database system. It emphasizes extensibility, standards compliance (ACID, SQL), and supports modern features like JSONB, full-text search, and custom types. It's often called the "world's most advanced open-source database."
PostgreSQL uses a multi-process model (postmaster spawns backends). Key processes: writer, WAL writer, autovacuum, stats collector. Shared memory includes buffer cache and WAL buffers. Data is stored in pages (8KB) with MVCC via tuple versions.
CREATE DATABASE ecommerce OWNER app_admin; CREATE USER app_user WITH PASSWORD 'Str0ng!Pass'; GRANT CONNECT ON DATABASE ecommerce TO app_user; GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_user;
Multi-Version Concurrency Control keeps multiple versions of a row. Readers don't block writers and vice versa. Each transaction sees a snapshot of the database, providing high concurrency without heavy locking. VACUUM eventually cleans up old versions.
PostgreSQL offers rich types. NUMERIC(precision, scale) for exact money, TEXT for large strings, VARCHAR(n) for limited text. JSONB stores binary JSON with indexing. DATE and TIMESTAMP (with/without time zone) are crucial for global apps.
CREATE FUNCTION get_total_orders(cust_id INT) RETURNS INT AS $$ BEGIN RETURN (SELECT COUNT(*) FROM orders WHERE customer_id = cust_id); END; $$ LANGUAGE plpgsql;
A primary key uniquely identifies a row. SERIAL creates an auto-incrementing integer. Use IDENTITY columns (GENERATED AS IDENTITY) in modern PostgreSQL.
SELECT c.name, o.amount FROM customers c INNER JOIN orders o ON c.id = o.cust_id;
B-tree for equality/range. GIN for full-text/arrays/JSONB. GiST for geometric data. Appropriate indexes can turn a full scan into an index lookup.
DELETE removes rows with WHERE, can be rolled back. TRUNCATE removes all rows quickly but is transactional? Yes, in PostgreSQL TRUNCATE is MVCC-safe and can be rolled back. DROP removes the table.
SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 50000;
A view is a saved query. Use for security (hide columns), abstraction, and reusable logic. Materialized views cache results.
EXPLAIN SELECT * FROM orders WHERE amount > 1000;
SELECT COALESCE(phone, 'N/A') FROM contacts; SELECT NULLIF(division, 0) FROM stats;
BEGIN; UPDATE ...; SAVEPOINT sp1; ... ROLLBACK TO sp1; COMMIT;
\COPY products FROM '/data/products.csv' CSV HEADER; pg_dump mydb > backup.sql
Schemas organize objects. search_path determines which schema is checked first for unqualified names. Typical pattern: `public` for app, separate schemas per module.
VARCHAR(n) checks length; TEXT is unlimited. Performance is identical; prefer TEXT for flexibility unless you need a constraint.
CREATE TABLE users (id SERIAL PRIMARY KEY); -- old CREATE TABLE users (id INT GENERATED ALWAYS AS IDENTITY); -- modern
VACUUM reclaims storage occupied by dead tuples and updates statistics for the planner. Without it, table bloat and performance degrade. Autovacuum automates this.
SELECT SPLIT_PART(email, '@', 2) AS domain FROM users;
SELECT NOW() - INTERVAL '7 days'; SELECT order_time AT TIME ZONE 'UTC' AT TIME ZONE 'US/Eastern';
SELECT * FROM employees e WHERE salary > (SELECT AVG(salary) FROM employees WHERE dept = e.dept);
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 40;
CREATE DOMAIN positive_int AS INT CHECK (VALUE > 0); ALTER TABLE products ADD CONSTRAINT price_positive CHECK (price > 0);
SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1;
WITH recent_orders AS (SELECT * FROM orders WHERE order_date > CURRENT_DATE - 30) SELECT customer_id, SUM(amount) FROM recent_orders GROUP BY customer_id;
ALTER TABLE old_name RENAME TO new_name; ALTER TABLE orders RENAME COLUMN amt TO amount;
UNION removes duplicates; UNION ALL keeps all. Use UNION ALL unless you need distinct results; it's faster.
COMMENT ON TABLE employees IS 'Stores employee data'; COMMENT ON COLUMN employees.salary IS 'Annual salary in USD';
🐥 PostgreSQL Developer — Intermediate 2-5 Yrs
Each row has hidden columns xmin (creating transaction) and xmax (deleting/updating transaction). A transaction sees rows where xmin is committed before its snapshot and xmax is not committed or later. This provides isolation without read locks.
CREATE FUNCTION apply_bonus(dept_id INT, pct DECIMAL) RETURNS void AS $$
DECLARE emp RECORD;
BEGIN
FOR emp IN SELECT id, salary FROM employees WHERE department_id = dept_id LOOP
UPDATE employees SET salary = salary * (1 + pct/100) WHERE id = emp.id;
END LOOP;
END; $$ LANGUAGE plpgsql;SELECT sales_rep, revenue, RANK() OVER (ORDER BY revenue DESC) as rank, SUM(revenue) OVER () as total_revenue FROM quarterly_sales;
SELECT data->>'name' AS name FROM users WHERE data @> '{"vip": true}';
CREATE INDEX idx_gin ON users USING GIN (data jsonb_path_ops);WITH RECURSIVE org AS ( SELECT id, name, manager_id FROM employees WHERE manager_id IS NULL UNION ALL SELECT e.id, e.name, e.manager_id FROM employees e JOIN org o ON e.manager_id = o.id ) SELECT * FROM org;
CREATE FUNCTION audit_func() RETURNS trigger AS $$
BEGIN
INSERT INTO audit_log VALUES (OLD.*, NOW());
RETURN NEW;
END; $$ LANGUAGE plpgsql;
CREATE TRIGGER audit_emp AFTER UPDATE ON employees FOR EACH ROW EXECUTE FUNCTION audit_func();SELECT department, STRING_AGG(name, ', ' ORDER BY name) FROM employees GROUP BY department;
CREATE INDEX idx_active ON orders (order_date) WHERE status = 'active'; CREATE INDEX idx_cover ON orders (customer_id) INCLUDE (amount, order_date);
EXISTS stops at first match, often faster for existence checks. JOIN returns duplicate rows if many matches. Use EXISTS for semi-joins, JOIN when you need columns from both sides.
CREATE TABLE orders (
order_id INT, amount NUMERIC, order_date DATE
) PARTITION BY RANGE (order_date);
CREATE TABLE orders_2026q1 PARTITION OF orders FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');CREATE EXTENSION postgres_fdw; CREATE SERVER remote_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '10.0.0.5', dbname 'remote'); CREATE USER MAPPING FOR local_user SERVER remote_server OPTIONS (user 'remote_user', password 'secret'); CREATE FOREIGN TABLE remote_orders (...) SERVER remote_server OPTIONS (schema_name 'public', table_name 'orders');
SELECT * FROM orders WHERE id > :last_id ORDER BY id LIMIT 20;
CREATE MATERIALIZED VIEW monthly_sales AS SELECT ...; REFRESH MATERIALIZED VIEW monthly_sales CONCURRENTLY;
SELECT tags[1] FROM articles WHERE 'postgres' = ANY(tags);
INSERT INTO users (name) VALUES ('John') RETURNING id;
UPDATE products SET price = price*1.1 WHERE id=1 RETURNING price;SELECT pg_advisory_lock(123); -- application-level lock for mutual exclusion
READ COMMITTED sees changes from concurrent committed transactions. SERIALIZABLE uses predicate locking to ensure true serial execution. Most apps use READ COMMITTED; SERIALIZABLE for financial consistency.
SELECT date, amount, SUM(amount) OVER (PARTITION BY account_id ORDER BY date) FROM transactions;
ALTER TABLE employees ADD full_name TEXT GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED;
COPY (SELECT * FROM orders WHERE order_date = CURRENT_DATE) TO '/tmp/orders.csv' CSV HEADER;
Heap-Only Tuple updates avoid index bloat when the update does not change indexed columns. The new tuple stays in the same page and is linked from the old one. Reduces VACUUM work.
Block Range INdex stores min/max per block range. Very small, ideal for large tables with natural ordering (e.g., time-series). Faster than B-tree for range scans on huge tables if correlation is high.
WITH next AS ( SELECT id FROM job_queue WHERE status='pending' ORDER BY id LIMIT 1 FOR UPDATE SKIP LOCKED ) UPDATE job_queue SET status='processing' FROM next WHERE job_queue.id = next.id RETURNING *;
SQL functions can be inlined by the optimizer, often faster for simple queries. PL/pgSQL adds procedural logic (loops, conditionals) but may have overhead. Choose SQL when possible.
They fire on DDL commands (CREATE, ALTER, DROP). Useful for auditing schema changes or enforcing naming conventions. Example: `CREATE EVENT TRIGGER ... ON ddl_command_start EXECUTE FUNCTION ...`.
CREATE EXTENSION pg_stat_statements; SELECT query, calls, mean_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;
NOW() returns the transaction start time (stable within transaction). clock_timestamp() returns current wall-clock time. Use NOW() for consistent auditing.
PostgreSQL supports table inheritance (child tables inherit columns). Mostly replaced by declarative partitioning, but still useful for certain partitioning patterns with custom rules.
INSERT INTO users (id, email) VALUES (1, 'a@b.com') ON CONFLICT (id) DO UPDATE SET email = EXCLUDED.email;
SELECT * FROM articles WHERE to_tsvector('english', body) @@ to_tsquery('postgres & interview');🦅 PostgreSQL Developer — Expert 5-10 Yrs
All modifications are logged to WAL before data files. During recovery, WAL is replayed. For replication, WAL records are streamed to standbys. WAL level must be 'replica' or 'logical'. Key parameter: wal_level, max_wal_senders.
-- Publisher CREATE PUBLICATION pub1 FOR TABLE users; -- Subscriber CREATE SUBSCRIPTION sub1 CONNECTION 'host=pub dbname=mydb' PUBLICATION pub1;
PostgreSQL can use parallel workers for sequential scans, joins, aggregates. Set max_parallel_workers_per_gather. Useful for large datasets. Monitor with EXPLAIN (ANALYZE, VERBOSE).
JIT compiles expressions to machine code. Helps CPU-bound queries with complex WHERE. Enable with jit=on. Not beneficial for simple queries; test before enabling globally.
Check max_connections and superuser_reserved_connections. Increase max_connections or use connection pooling (PgBouncer). Find idle connections with `SELECT * FROM pg_stat_activity WHERE state = 'idle';`
shared_buffers (25% RAM), effective_cache_size (50-75% RAM), work_mem (per operation), maintenance_work_mem, wal_buffers, random_page_cost (1.1 for SSD), effective_io_concurrency.
SELECT pg_prewarm('big_table'); -- loads table into OS cache or shared buffersUse COPY (not INSERT), drop indexes before load then recreate, increase maintenance_work_mem, disable triggers, use UNLOGGED tables if acceptable. Parallelize with pg_bulkload.
SELECT cron.schedule('nightly-vacuum', '0 3 * * *', 'VACUUM ANALYZE');RLS is built-in: `CREATE POLICY ... USING (tenant_id = current_setting('app.tenant_id')::int)`. Schema-per-tenant provides isolation but management overhead. Choose based on tenant count and security needs.
PostgreSQL does not cache row counts due to MVCC, always sequential scan. Use estimates via `pg_class.reltuples`, or use a trigger-maintained counter table. In 14+, `VACUUM` updates visibility map to allow index-only scans for count if no dead tuples.
pgBouncer acts as a lightweight pooler. Modes: session, transaction, statement. Transaction pooling works best for web apps. Configure pool size, max_client_conn, and use `server_reset_query = DISCARD ALL`.
It tracks which pages contain only tuples visible to all transactions. VACUUM can skip those pages, and index-only scans can use it to avoid fetching heap tuples. Reduces I/O.
Streaming replication ships WAL as is (physical). Logical decoding transforms WAL into a higher-level representation of changes, allowing selective replication, multi-version compatibility, and integration with other systems (Debezium).
pg_repack -t bloated_table -d mydb
CHECK constraints are declarative, faster, and enforced at the engine level. They are evaluated before row insertion, making them more efficient than trigger-based validation.
Debezium captures changes from PostgreSQL WAL and streams to Kafka. Consumers (materialized views, caches) are updated in real time. Excellent for CDC without polling.
Synchronous commit waits for WAL to be flushed to standby before acknowledging the client. Ensures zero data loss but adds latency. Use only for critical transactions; others use asynchronous.
SELECT application_name, pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes FROM pg_stat_replication;
Patroni is a template for high availability using etcd/Consul/ZooKeeper for leader election and configuration. It automates failover and replication management. Often used with pgBouncer and HAProxy.
SELECT pid, query, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event IS NOT NULL;
PostgreSQL uses 32-bit transaction IDs. When they approach max, it can cause data loss. VACUUM (especially aggressive) freezes old tuples, preventing wraparound. Monitor age: `SELECT max(age(datfrozenxid)) FROM pg_database;`
Add `deleted_at TIMESTAMP` and index it. Use partial index `WHERE deleted_at IS NULL` for active queries. Query with `WHERE deleted_at IS NULL`. Consider partitioning by deleted status.
CREATE DOMAIN email_address AS TEXT CHECK (VALUE ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');# TYPE DATABASE USER ADDRESS METHOD host all all 192.168.1.0/24 scram-sha-256
🐉 PostgreSQL Developer — Most Expert 10+ Yrs
Use sharding via Citus or built-in partitioning + FDW. Combine logical replication for cross-region. Use connection pooling (PgBouncer) and read replicas. pgvector for AI search, and PostgresML for in-database ML.
CREATE EXTENSION vector; CREATE TABLE products (id SERIAL, embedding vector(512)); SELECT * FROM products ORDER BY embedding <-> '[...]' LIMIT 10;
SELECT pgml.train('churn_model', 'classification', 'churn_data', 'is_churn');
SELECT pgml.predict('churn_model', ARRAY[age, income, ...]);pg_squeeze reorganizes tables online without blocking, similar to pg_repack but scheduled. It's useful for high-write tables where autovacuum can't keep up.
BDR (Bi-Directional Replication) allows writes on multiple nodes. Conflict resolution is application-driven. pglogical provides logical replication with conflict handling. Complex but useful for distributed apps.
SELECT ts_rank(to_tsvector('english', body), query) AS rank, * FROM articles ...TimescaleDB adds automatic partitioning by time (hypertables), compression, and continuous aggregates. Ideal for IoT, monitoring, and financial tick data.
Enable SSL, use certificate authentication. Integrate LDAP for user management. Implement RLS policies for patient data isolation. Encrypt columns with pgcrypto. Audit with pgAudit.
Zheap (work in progress) aims to eliminate bloat by avoiding the need for VACUUM. It uses undo logs like Oracle's InnoDB. Currently experimental.
CREATE FUNCTION call_ml(text) RETURNS float AS $$ import some_ml_lib return some_ml_lib.predict(text) $$ LANGUAGE plpython3u;
Oracle's packages become schemas with functions. Sequences need adjustment. PL/SQL to PL/pgSQL: differences in exception handling, autonomous transactions (use dblink), and performance.
pg_ivm (extension) allows materialized views to be updated incrementally as base tables change, avoiding full refresh. Great for real-time dashboards.
Use logical replication: set up new version replica, replicate, switch application traffic, then promote. Or use pg_upgrade with link mode for fast in-place upgrade (requires downtime, but minutes).
Active-active replication improvements, better parallel query, AI integration with pgvector and PostgresML, autonomous tuning, and cloud-native storage engines.
Follow official blogs, Planet PostgreSQL, attend PGConf, and experiment with extensions and beta versions. Contribute to community tools.
🗄️ PostgreSQL DBA — Beginner
sudo apt install postgresql postgresql-contrib sudo pg_ctlcluster 14 main start
pg_wal, pg_stat, base (tablespaces), global, pg_logical, and configuration files (postgresql.conf, pg_hba.conf).
CREATE TABLESPACE fast_ssd LOCATION '/mnt/ssd/pgdata';
Archive mode copies WAL files to a safe location. Enable archive_mode and set archive_command. Combined with base backup, allows PITR to any point.
pg_basebackup -D /backup/base -Ft -z -P
Autovacuum reclaims storage and updates statistics. Tune with autovacuum_scale_factor and naptime. Monitor in pg_stat_user_tables.
SELECT pg_database_size('mydb');
SELECT pg_total_relation_size('table_name');CREATE ROLE analyst LOGIN PASSWORD 'secret'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO analyst;
SELECT pid, usename, state, query FROM pg_stat_activity;
SELECT pg_terminate_backend(pid);
It tracks database activity (table/index scans, rows, etc.) for the query planner. Data stored in pg_stat_* views. Ensures planner uses accurate stats.
log_min_duration_statement = 1000 # log queries >1s
Checkpoint writes dirty pages to disk. Frequent checkpoints reduce recovery time but increase I/O. Tune with checkpoint_timeout and max_wal_size.
pg_dump for single database; pg_dumpall for entire cluster including roles and tablespaces.
CREATE EXTENSION pg_stat_statements;
CREATE USER readonly WITH PASSWORD 'pass'; GRANT CONNECT ON DATABASE mydb TO readonly; GRANT USAGE ON SCHEMA public TO readonly; GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly;
Host-Based Authentication file defines which users can connect from where using which method (password, md5, scram-sha-256, peer).
`pg_ctl reload` sends SIGHUP to reload configuration files without restart. Some changes require a full restart.
Physical: ships WAL, byte-for-byte copy. Logical: decodes WAL into row changes, allows selective replication, can cross versions.
SELECT pg_database.datname, pg_size_pretty(pg_database_size(pg_database.datname)) FROM pg_database;
🗄️ PostgreSQL DBA — Intermediate
# On primary CREATE USER replicator REPLICATION LOGIN CONNECTION LIMIT 5 ENCRYPTED PASSWORD 'secret'; # On standby: pg_basebackup, configure recovery.conf / standby.signal, start PostgreSQL.
Synchronous waits for standby to acknowledge WAL flush before committing; ensures zero data loss but adds latency. Asynchronous is faster but may lose last transactions on failover.
pg_ctl promote -D /var/lib/postgresql/data
If a standby was promoted and you want to bring back the old primary as a standby, pg_rewind synchronizes the data directories by copying changed blocks, avoiding a full resync.
Limits concurrent WAL sender processes for replication. Set to number of standbys + some buffer for backups. Default 10, increase for multiple replicas.
SELECT * FROM pg_create_physical_replication_slot('standby_slot');Slots ensure WAL is retained until consumed by standbys, but can cause disk full if standbys lag.
[databases] mydb = host=127.0.0.1 port=5432 [pgbouncer] pool_mode = transaction listen_port = 6432
Tune autovacuum to be more aggressive (reduce scale factor). Monitor dead tuples. Schedule VACUUM FREEZE periodically. Use pg_repack for severe bloat.
/usr/lib/postgresql/15/bin/pg_upgrade \ --old-datadir /var/lib/postgresql/14/data \ --new-datadir /var/lib/postgresql/15/data \ --old-bindir /usr/lib/postgresql/14/bin \ --new-bindir /usr/lib/postgresql/15/bin
SELECT schemaname, relname, n_dead_tup, n_live_tup, last_vacuum FROM pg_stat_user_tables;
🗄️ PostgreSQL DBA — Expert
Use streaming replication to a different region with asynchronous mode. Combine with WAL archiving to S3 for point-in-time recovery. Test failover regularly.
Ensures entire page is written to WAL after checkpoint; prevents partial page corruption but increases WAL volume. Keep enabled unless using battery-backed cache or ZFS.
Kubernetes offers dynamic scaling and self-healing, but requires careful storage and network setup. Use Crunchy Data or Zalando operators. Performance overhead is minimal with proper configuration.
pgbackrest --stanza=main --type=full backup pgbackrest --stanza=main --type=diff backup
Check `age(datfrozenxid)` > 1 billion. Run aggressive VACUUM FREEZE. Ensure autovacuum is working. Set `vacuum_freeze_min_age` appropriately.
☁️ Cloud & Advanced
Aurora offers faster storage, up to 15 read replicas, automatic scaling, and global database. RDS is standard PostgreSQL with managed ops. Aurora is more expensive but higher performance.
Deploy the operator, create a PostgreSQL CRD. It manages HA, backups, and cloning. Uses Patroni under the hood.
Allows safe deployment of PL extensions without full superuser. Enables custom functions in restricted environments.
Supabase provides a real-time API, authentication, and storage on top of PostgreSQL. It uses PostgREST and row-level security heavily. Great for rapid app development.
Use logical replication from on-prem to cloud. Cloud replica can serve analytics. Ensure secure connection (VPN/SSL).
🚚 Migration & Upgradation
Use Ora2Pg for schema conversion. Handle sequences, PL/SQL to PL/pgSQL, data types (NUMBER to NUMERIC). Test extensively. Use foreign data wrappers for incremental migration.
Use pgloader for initial load. Set up logical replication from MySQL (via Debezium) to PostgreSQL, then switch over after validation.
pgloader mysql://user@localhost/sourcedb postgresql://user@localhost/targetdb
🤖 AI/ML with PostgreSQL
SELECT * FROM documents ORDER BY embedding <=> query_embedding LIMIT 5;
Store document chunks with embeddings. On query, retrieve relevant chunks via vector similarity, construct a prompt with context, call LLM API via PL/Python or external service, return answer.
It integrates popular ML libraries (XGBoost, Scikit-learn) into SQL functions. `SELECT pgml.train(...)`. No data movement needed.
CREATE FUNCTION get_embedding(text) RETURNS vector AS $$ import openai return openai.Embedding.create(input=text, model='text-embedding-3-small')['data'][0]['embedding'] $$ LANGUAGE plpython3u;
String similarity, fuzzy matching, and deduplication using Jaccard, Cosine, etc. Useful for data quality tasks.
🎭 Scenarios
Check execution plan with EXPLAIN ANALYZE. Compare plan before/after. Likely caused by statistics or planner changes. Use `pg_stat_statements` to find regression. Fix with ANALYZE or hints.
Set autovacuum_vacuum_cost_limit higher, adjust cost delay, and maybe throttle during business hours using cron to disable/enable.
Check network, WAL generation rate, apply latency. Increase max_wal_senders, use replication slots, or upgrade standby hardware.
Archive or delete old WAL (if no replication needs), temporarily increase disk, or set up monitoring. `pg_archivecleanup` can help.
Audit policies with `pg_policies`. Ensure default deny, test with `SET ROLE`. Implement comprehensive test suite.
Increase max_connections cautiously. Implement PgBouncer. Reduce idle timeouts. Scale out with read replicas.
Convert DATETIME2 to TIMESTAMP, handle time zone. Use pgloader transformation rules.
Use GIN index, tune `gin_fuzzy_search_limit`, or consider external search engines like Elasticsearch if beyond PostgreSQL's scope.
Use `pg_resetwal` as last resort; restore from backup. Enable data checksums to detect early.
Check random_page_cost (lower for SSD). Check statistics; maybe increase statistics target. Use `SET enable_seqscan = off` temporarily to test.
🧪 Hands-On Labs
Follow step-by-step in official docs or use Docker: `docker run --name pg1 -e POSTGRES_PASSWORD=pass -d postgres:16` and configure replication.
CREATE FUNCTION compound_interest(principal NUMERIC, rate NUMERIC, years INT) RETURNS NUMERIC ...
Generate embeddings using a Python script, load into table, query with `<=>`.
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON projects USING (tenant_id = current_setting('app.current_tenant')::int);pgloader mysql://root@localhost/mydb postgresql://user@localhost/mydb
💻 Code Exercises
SELECT month, product, revenue FROM (
SELECT date_trunc('month', sale_date) as month, product, SUM(amount) revenue,
RANK() OVER (PARTITION BY date_trunc('month', sale_date) ORDER BY SUM(amount) DESC) rnk
FROM sales GROUP BY 1,2
) sub WHERE rnk <= 3;CREATE FUNCTION check_projects() RETURNS trigger AS $$
BEGIN
IF EXISTS (SELECT 1 FROM projects WHERE manager_id = OLD.id AND status = 'active') THEN
RAISE EXCEPTION 'Cannot delete employee with active projects';
END IF;
RETURN OLD;
END; $$ LANGUAGE plpgsql;⚡ Performance Tuning Deep Dive
Look at actual time vs planned, rows vs estimated, loops. High startup or total time indicates bottleneck. Buffers show I/O. Use `FORMAT JSON` for tools.
It's memory per operation. Too low causes disk spills (slow). Too high can cause OOM. Increase for large analytical queries, but set per session if needed.
SELECT query, calls, mean_time, total_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;
LOAD 'auto_explain'; SET auto_explain.log_min_duration = '500ms'; SET auto_explain.log_analyze = on;
SELECT c.relname, count(*) AS buffers FROM pg_buffercache b JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid) GROUP BY c.relname ORDER BY 2 DESC LIMIT 10;
It estimates how much memory is available for caching (OS + shared buffers). Planner uses it to decide between index scans and sequential scans. Set to ~75% of total RAM.
If query cost exceeds this, JIT compilation is considered. Increase if JIT overhead is too high for simple queries, decrease to benefit complex ones.
SELECT pg_prewarm('important_table');Default 4.0 assumes HDD. For SSD, set to 1.0-1.5 to reflect that random access is nearly as fast as sequential. This encourages index use.
Limits parallel workers per gather node. Increase for large tables. Set based on CPU cores. Too high can oversubscribe CPU.
SELECT l.pid, l.mode, l.granted, a.query FROM pg_locks l JOIN pg_stat_activity a ON l.pid = a.pid;
Too high `shared_buffers`, large number of connections, or huge `work_mem` settings. Reduce or adjust kernel parameters.
./postgres_exporter --web.listen-address=":9187"
Scrape metrics into Prometheus, visualize with Grafana.
Resets statistics counters. Use after major changes to see fresh baselines. Not for routine monitoring.
Limits temporary file usage per session (for sorts/hashes). Prevents runaway queries from filling disk.
Set to 0.7-0.9 to spread checkpoint writes over time, reducing spikes. Higher value means longer checkpoints but smoother.
Compresses WAL records, reducing I/O and network traffic for replication. CPU overhead moderate. Enable if WAL writes are bottleneck.
Check `top`, `pg_stat_activity`. Look for high CPU queries with `pg_stat_statements`. Tune queries, add indexes, or scale up.
It reduces commit latency because it doesn't wait for WAL flush. Risk of losing a few committed transactions on crash. Acceptable for many analytics workloads.
SELECT * FROM pgstattuple('table_name');Deploy PgBouncer in transaction pooling mode. Adjust default_pool_size. This multiplexes client connections into a smaller number of DB connections.
Database-level stats (commits, rollbacks, blocks) vs background writer efficiency (buffers cleaned, checkpoints).
HAProxy checks Patroni's health endpoint to route traffic to the primary. Patroni manages the PostgreSQL cluster. This provides automatic failover for applications.
Set to a reasonable number (e.g., 200-500) and use connection pooler. Too many connections cause context switching and memory pressure. Monitor `pg_stat_activity` for idle connections.
Master EXPLAIN, understand the planner, learn internal architecture (WAL, MVCC, buffer management), and practice with real-world workloads. Contribute to the community and stay curious.

No comments:
Post a Comment
Thanks for your valuable comment...........
Md. Mominul Islam