Wednesday, January 29, 2025

Optimize PostgreSQL Query with pg_hint_plan

 Query optimization in PostgreSQL can sometimes benefit from manual intervention. The pg_hint_plan extension provides developers with the ability to influence the query planner's decisions. Let's explore this powerful tool through a practical example.

Setting Up the Test Environment

First, let's create a simple test scenario with a table containing an integer primary key and a text column:

create table t(i int primary key, c text);
insert into t select x,rpad('x',100,'x') from generate_series(1,10) x;
vacuum analyze t;

Default Query Behavior

Without any hints, let's examine how PostgreSQL handles a simple query:

explain analyze select * from t where i=1;
postgres=# explain analyze select * from t where i=1;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..1.12 rows=1 width=105) (actual time=0.023..0.033 rows=1 loops=1)
   Filter: (i = 1)
   Rows Removed by Filter: 9
 Planning Time: 0.155 ms
 Execution Time: 0.065 ms
(5 rows)

The planner chose a Sequential Scan despite having a primary key index available. This decision was made because:

The table is small (only 10 rows)

The cost of reading the entire table (cost=0.00..1.12) was deemed cheaper than using the index

Implementing pg_hint_plan

-- Create extension is not mandatory, load the library 'pg_hint_plan' is mandatory.
postgres=# create extension pg_hint_plan;
CREATE EXTENSION

After installing the pg_hint_plan extension, we can influence the query plan. There are two ways to specify hints with EXPLAIN:

  1. Hint before EXPLAIN:
postgres=# explain analyze /*+ IndexScan(t) */ select * from t where i=1;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.14..8.15 rows=1 width=105) (actual time=0.065..0.067 rows=1 loops=1)
   Index Cond: (i = 1)
 Planning Time: 0.311 ms
 Execution Time: 0.142 ms
(4 rows)
  1. Hint after EXPLAIN:
postgres=# /*+ IndexScan(t) */ explain analyze select * from t where i=1;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.14..8.15 rows=1 width=105) (actual time=0.017..0.018 rows=1 loops=1)
   Index Cond: (i = 1)
 Planning Time: 0.104 ms
 Execution Time: 0.032 ms
(4 rows)

Both methods successfully forced an Index Scan using the primary key (t_pkey), though with slightly different execution times.

Using the Hint Table

It works well for suitations you can't modify the SQL to add the hint.

  1. Enable the hint table feature:
postgres=# set pg_hint_plan.enable_hint_table=on;
SET
  1. Get query identifier, store the hint for it:
postgres=# explain (verbose) select * from t where i=1;
                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on public.t  (cost=0.00..1.12 rows=1 width=105)
   Output: i, c
   Filter: (t.i = 1)
 Query Identifier: -3099684296572641299
(4 rows)
postgres=# insert into hint_plan.hints(query_id, application_name, hints) values ('-3099684296572641299','','IndexScan(t)');
INSERT 0 1
postgres=# explain (verbose) select * from t where i=1;
                               QUERY PLAN
-------------------------------------------------------------------------
 Index Scan using t_pkey on public.t  (cost=0.14..8.15 rows=1 width=105)
   Output: i, c
   Index Cond: (t.i = 1)
 Query Identifier: -3099684296572641299
(4 rows)

After storing the hint, the planner consistently uses the index scan strategy for matching queries.

Debugging Hints

For troubleshooting, pg_hint_plan provides debugging capabilities:

postgres=# set pg_hint_plan.debug_print=on;
2025-01-29 23:28:34.223 +08 [2935] postgres@postgres LOG:  pg_hint_plan:
        used hint:
        IndexScan(t)
        not used hint:
        duplication hint:
        error hint:

2025-01-29 23:28:34.223 +08 [2935] postgres@postgres STATEMENT:  explain (verbose) select * from t where i=1;

Key Takeaways

  • pg_hint_plan provides multiple methods to influence query planning
  • Hints can be specified inline with queries or stored in a hint table
  • Debug mode helps verify hint application

Monday, January 27, 2025

Understanding Index Behavior in PostgreSQL Partitioned Tables

PostgreSQL's partitioning feature becomes more powerful with each release, and understanding how indexes work with partitioned tables is crucial for optimal performance. Let's explore some interesting behaviors through practical examples.

Basic Setup and Index Inheritance

Consider a simple partitioned table structure:

CREATE TABLE t (id int, c1 text, c2 text) PARTITION BY RANGE(id); 

Scenario 1: Creating Indexes Before Partition Attachment

When you create an index on a table before attaching it as a partition, the index remains independent:

CREATE TABLE t_p1 (id int, c1 text, c2 text);
CREATE INDEX idx_t_p1_c1 ON t_p1(c1);
ALTER TABLE t ATTACH PARTITION t_p1 FOR VALUES FROM (1) TO (100);

The index idx_t_p1_c1 continues to exist on the partition after attachment.

Scenario 2: Creating Indexes on Parent Table

When you create an index on the parent table:

CREATE INDEX t_c1_idx ON t(c1);

PostgreSQL automatically:

  • Creates a partitioned index on the parent table
  • Creates corresponding indexes on all existing partitions
  • Names the automatically created indexes systematically For example, when we had partition t_p2, PostgreSQL automatically created t_p2_c1_idx.

Scenario 3: Adding New Partitions

The behavior differs based on how you create new partitions:

  1. Using PARTITION OF syntax:
CREATE TABLE t_p2 PARTITION OF t FOR VALUES FROM (100) TO (200);

The new partition automatically gets matching indexes for any indexes on the parent table.

  1. Using ATTACH PARTITION with existing table:
CREATE TABLE t_p3 (id int, c1 text, c2 text);
CREATE INDEX idx_t_p3_c1 ON t_p3(c1);
ALTER TABLE t ATTACH PARTITION t_p3 FOR VALUES FROM (200) TO (300);

The existing index on the attached partition remains, and it functions as the local index for the parent's partitioned index.

Scenario 4: Attaching Plain Table Without Pre-existing Index

When attaching a table that doesn't have any indexes to a parent table that has indexes, PostgreSQL automatically creates matching indexes:

CREATE TABLE t_p4 (id int, c1 text, c2 text);
ALTER TABLE t ATTACH PARTITION t_p4 FOR VALUES FROM (300) TO (400);

After attachment, examining the partition shows that PostgreSQL automatically created the index t_p4_c1_idx on the partition to match the parent table's index structure. This demonstrates that:

  • PostgreSQL ensures index consistency across all partitions
  • The naming convention follows the pattern: {partition_name}_{parent_index_suffix}
  • The index is created automatically during the attachment process

Sunday, January 26, 2025

UUIDv7 in PostgreSQL 18 (Still in dev phase as time of this post)

 

Check these uuid functions are available


mytest=# select version();
                                                       version
---------------------------------------------------------------------------------------------------------------------
 PostgreSQL 18devel on x86_64-apple-darwin24.2.0, compiled by Apple clang version 16.0.0 (clang-1600.0.26.6), 64-bit
(1 row)

mytest=# select uuidv4(),uuidv7() from generate_series(1,5);;
                uuidv4                |                uuidv7
--------------------------------------+--------------------------------------
 6cd7e3c8-06f3-4459-9e43-02d83983e337 | 0194a08d-88d0-778a-ba8b-66c65ea88774
 cd9e75fe-6864-4af4-94ef-3a0329246c2e | 0194a08d-88d0-79f4-b239-2d755157f618
 d50cbff5-cfae-4df5-b045-389e37adb978 | 0194a08d-88d0-7bc1-a09d-0461661379f7
 eeb39221-9c79-4f9b-99c5-b451dc38b644 | 0194a08d-88d0-7d97-b11c-c59d1dbdd9fe
 8cfaf63b-14f1-48a1-86bd-feffb61e1b2f | 0194a08d-88d1-71d0-9286-342eca29aa6c
(5 rows)

Create table


mytest=# create table t_uuidv4 (id uuid primary key, c text);
CREATE TABLE

mytest=# create table t_uuidv7 (id uuid primary key, c text);
CREATE TABLE

Populate data for uuidv4 and uuidv7 tables


insert into t_uuidv4
select uuidv4(), rpad('x',100,'x') from generate_series(1,1000000);

mytest=# insert into t_uuidv4
mytest-# select uuidv4(), rpad('x',100,'x') from generate_series(1,1000000);
INSERT 0 1000000
Time: 35709.216 ms (00:35.709)

insert into t_uuidv7
select uuidv7(), rpad('x',100,'x') from generate_series(1,1000000);

-- The time difference of 2 seconds is test margin of error,
--  dones't mean uuidv7 is faster based on this test.
mytest=# insert into t_uuidv7
select uuidv7(), rpad('x',100,'x') from generate_series(1,1000000);
INSERT 0 1000000
Time: 33109.727 ms (00:33.110)

Check the storage size

mytest=# \dt+
                                     List of relations
 Schema |   Name   | Type  |  Owner   | Persistence | Access method |  Size  | Description
--------+----------+-------+----------+-------------+---------------+--------+-------------
 public | t_uuidv4 | table | donghual | permanent   | heap          | 142 MB |
 public | t_uuidv7 | table | donghual | permanent   | heap          | 142 MB |
(2 rows)

mytest=# \di+
                                            List of relations
 Schema |     Name      | Type  |  Owner   |  Table   | Persistence | Access method | Size  | Description
--------+---------------+-------+----------+----------+-------------+---------------+-------+-------------
 public | t_uuidv4_pkey | index | donghual | t_uuidv4 | permanent   | btree         | 38 MB |
 public | t_uuidv7_pkey | index | donghual | t_uuidv7 | permanent   | btree         | 30 MB |
(2 rows)

t_uuidv7_pkey is 21% smaller than t_uuidv4_pkey, further more, it would have smaller memory footprint if working set is hot data based on time.