Thursday, February 13, 2025

How to fix Yum/DNF issue in CentOS 8

 Symptom:

[root@~]# yum update -y
CentOS Stream 8 - AppStream                                                             0.0  B/s |   0  B     00:00
Errors during downloading metadata for repository 'appstream':
  - Curl error (6): Couldn't resolve host name for http://mirrorlist.centos.org/?release=8-stream&arch=x86_64&repo=AppStream&infra=ec2 [Could not resolve host: mirrorlist.centos.org]
Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: Curl error (6): Couldn't resolve host name for http://mirrorlist.centos.org/?release=8-stream&arch=x86_64&repo=AppStream&infra=ec2 [Could not resolve host: mirrorlist.centos.org]

Solution:

cd /etc/yum.repos.d/
sudo sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-*
sudo sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-*
sudo yum update -y


Wednesday, January 29, 2025

Optimize PostgreSQL Query with pg_hint_plan

 Query optimization in PostgreSQL can sometimes benefit from manual intervention. The pg_hint_plan extension provides developers with the ability to influence the query planner's decisions. Let's explore this powerful tool through a practical example.

Setting Up the Test Environment

First, let's create a simple test scenario with a table containing an integer primary key and a text column:

create table t(i int primary key, c text);
insert into t select x,rpad('x',100,'x') from generate_series(1,10) x;
vacuum analyze t;

Default Query Behavior

Without any hints, let's examine how PostgreSQL handles a simple query:

explain analyze select * from t where i=1;
postgres=# explain analyze select * from t where i=1;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..1.12 rows=1 width=105) (actual time=0.023..0.033 rows=1 loops=1)
   Filter: (i = 1)
   Rows Removed by Filter: 9
 Planning Time: 0.155 ms
 Execution Time: 0.065 ms
(5 rows)

The planner chose a Sequential Scan despite having a primary key index available. This decision was made because:

The table is small (only 10 rows)

The cost of reading the entire table (cost=0.00..1.12) was deemed cheaper than using the index

Implementing pg_hint_plan

-- Create extension is not mandatory, load the library 'pg_hint_plan' is mandatory.
postgres=# create extension pg_hint_plan;
CREATE EXTENSION

After installing the pg_hint_plan extension, we can influence the query plan. There are two ways to specify hints with EXPLAIN:

  1. Hint before EXPLAIN:
postgres=# explain analyze /*+ IndexScan(t) */ select * from t where i=1;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.14..8.15 rows=1 width=105) (actual time=0.065..0.067 rows=1 loops=1)
   Index Cond: (i = 1)
 Planning Time: 0.311 ms
 Execution Time: 0.142 ms
(4 rows)
  1. Hint after EXPLAIN:
postgres=# /*+ IndexScan(t) */ explain analyze select * from t where i=1;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.14..8.15 rows=1 width=105) (actual time=0.017..0.018 rows=1 loops=1)
   Index Cond: (i = 1)
 Planning Time: 0.104 ms
 Execution Time: 0.032 ms
(4 rows)

Both methods successfully forced an Index Scan using the primary key (t_pkey), though with slightly different execution times.

Using the Hint Table

It works well for suitations you can't modify the SQL to add the hint.

  1. Enable the hint table feature:
postgres=# set pg_hint_plan.enable_hint_table=on;
SET
  1. Get query identifier, store the hint for it:
postgres=# explain (verbose) select * from t where i=1;
                        QUERY PLAN
----------------------------------------------------------
 Seq Scan on public.t  (cost=0.00..1.12 rows=1 width=105)
   Output: i, c
   Filter: (t.i = 1)
 Query Identifier: -3099684296572641299
(4 rows)
postgres=# insert into hint_plan.hints(query_id, application_name, hints) values ('-3099684296572641299','','IndexScan(t)');
INSERT 0 1
postgres=# explain (verbose) select * from t where i=1;
                               QUERY PLAN
-------------------------------------------------------------------------
 Index Scan using t_pkey on public.t  (cost=0.14..8.15 rows=1 width=105)
   Output: i, c
   Index Cond: (t.i = 1)
 Query Identifier: -3099684296572641299
(4 rows)

After storing the hint, the planner consistently uses the index scan strategy for matching queries.

Debugging Hints

For troubleshooting, pg_hint_plan provides debugging capabilities:

postgres=# set pg_hint_plan.debug_print=on;
2025-01-29 23:28:34.223 +08 [2935] postgres@postgres LOG:  pg_hint_plan:
        used hint:
        IndexScan(t)
        not used hint:
        duplication hint:
        error hint:

2025-01-29 23:28:34.223 +08 [2935] postgres@postgres STATEMENT:  explain (verbose) select * from t where i=1;

Key Takeaways

  • pg_hint_plan provides multiple methods to influence query planning
  • Hints can be specified inline with queries or stored in a hint table
  • Debug mode helps verify hint application

Monday, January 27, 2025

Understanding Index Behavior in PostgreSQL Partitioned Tables

PostgreSQL's partitioning feature becomes more powerful with each release, and understanding how indexes work with partitioned tables is crucial for optimal performance. Let's explore some interesting behaviors through practical examples.

Basic Setup and Index Inheritance

Consider a simple partitioned table structure:

CREATE TABLE t (id int, c1 text, c2 text) PARTITION BY RANGE(id); 

Scenario 1: Creating Indexes Before Partition Attachment

When you create an index on a table before attaching it as a partition, the index remains independent:

CREATE TABLE t_p1 (id int, c1 text, c2 text);
CREATE INDEX idx_t_p1_c1 ON t_p1(c1);
ALTER TABLE t ATTACH PARTITION t_p1 FOR VALUES FROM (1) TO (100);

The index idx_t_p1_c1 continues to exist on the partition after attachment.

Scenario 2: Creating Indexes on Parent Table

When you create an index on the parent table:

CREATE INDEX t_c1_idx ON t(c1);

PostgreSQL automatically:

  • Creates a partitioned index on the parent table
  • Creates corresponding indexes on all existing partitions
  • Names the automatically created indexes systematically For example, when we had partition t_p2, PostgreSQL automatically created t_p2_c1_idx.

Scenario 3: Adding New Partitions

The behavior differs based on how you create new partitions:

  1. Using PARTITION OF syntax:
CREATE TABLE t_p2 PARTITION OF t FOR VALUES FROM (100) TO (200);

The new partition automatically gets matching indexes for any indexes on the parent table.

  1. Using ATTACH PARTITION with existing table:
CREATE TABLE t_p3 (id int, c1 text, c2 text);
CREATE INDEX idx_t_p3_c1 ON t_p3(c1);
ALTER TABLE t ATTACH PARTITION t_p3 FOR VALUES FROM (200) TO (300);

The existing index on the attached partition remains, and it functions as the local index for the parent's partitioned index.

Scenario 4: Attaching Plain Table Without Pre-existing Index

When attaching a table that doesn't have any indexes to a parent table that has indexes, PostgreSQL automatically creates matching indexes:

CREATE TABLE t_p4 (id int, c1 text, c2 text);
ALTER TABLE t ATTACH PARTITION t_p4 FOR VALUES FROM (300) TO (400);

After attachment, examining the partition shows that PostgreSQL automatically created the index t_p4_c1_idx on the partition to match the parent table's index structure. This demonstrates that:

  • PostgreSQL ensures index consistency across all partitions
  • The naming convention follows the pattern: {partition_name}_{parent_index_suffix}
  • The index is created automatically during the attachment process

Sunday, January 26, 2025

UUIDv7 in PostgreSQL 18 (Still in dev phase as time of this post)

 

Check these uuid functions are available


mytest=# select version();
                                                       version
---------------------------------------------------------------------------------------------------------------------
 PostgreSQL 18devel on x86_64-apple-darwin24.2.0, compiled by Apple clang version 16.0.0 (clang-1600.0.26.6), 64-bit
(1 row)

mytest=# select uuidv4(),uuidv7() from generate_series(1,5);;
                uuidv4                |                uuidv7
--------------------------------------+--------------------------------------
 6cd7e3c8-06f3-4459-9e43-02d83983e337 | 0194a08d-88d0-778a-ba8b-66c65ea88774
 cd9e75fe-6864-4af4-94ef-3a0329246c2e | 0194a08d-88d0-79f4-b239-2d755157f618
 d50cbff5-cfae-4df5-b045-389e37adb978 | 0194a08d-88d0-7bc1-a09d-0461661379f7
 eeb39221-9c79-4f9b-99c5-b451dc38b644 | 0194a08d-88d0-7d97-b11c-c59d1dbdd9fe
 8cfaf63b-14f1-48a1-86bd-feffb61e1b2f | 0194a08d-88d1-71d0-9286-342eca29aa6c
(5 rows)

Create table


mytest=# create table t_uuidv4 (id uuid primary key, c text);
CREATE TABLE

mytest=# create table t_uuidv7 (id uuid primary key, c text);
CREATE TABLE

Populate data for uuidv4 and uuidv7 tables


insert into t_uuidv4
select uuidv4(), rpad('x',100,'x') from generate_series(1,1000000);

mytest=# insert into t_uuidv4
mytest-# select uuidv4(), rpad('x',100,'x') from generate_series(1,1000000);
INSERT 0 1000000
Time: 35709.216 ms (00:35.709)

insert into t_uuidv7
select uuidv7(), rpad('x',100,'x') from generate_series(1,1000000);

-- The time difference of 2 seconds is test margin of error,
--  dones't mean uuidv7 is faster based on this test.
mytest=# insert into t_uuidv7
select uuidv7(), rpad('x',100,'x') from generate_series(1,1000000);
INSERT 0 1000000
Time: 33109.727 ms (00:33.110)

Check the storage size

mytest=# \dt+
                                     List of relations
 Schema |   Name   | Type  |  Owner   | Persistence | Access method |  Size  | Description
--------+----------+-------+----------+-------------+---------------+--------+-------------
 public | t_uuidv4 | table | donghual | permanent   | heap          | 142 MB |
 public | t_uuidv7 | table | donghual | permanent   | heap          | 142 MB |
(2 rows)

mytest=# \di+
                                            List of relations
 Schema |     Name      | Type  |  Owner   |  Table   | Persistence | Access method | Size  | Description
--------+---------------+-------+----------+----------+-------------+---------------+-------+-------------
 public | t_uuidv4_pkey | index | donghual | t_uuidv4 | permanent   | btree         | 38 MB |
 public | t_uuidv7_pkey | index | donghual | t_uuidv7 | permanent   | btree         | 30 MB |
(2 rows)

t_uuidv7_pkey is 21% smaller than t_uuidv4_pkey, further more, it would have smaller memory footprint if working set is hot data based on time.

Monday, July 8, 2024

Performance validation for UUIDv7 implemented using SQL in PostgreSQL v16

Source code for UUIDv7 for PostgreSQL: https://gist.github.com/fabiolimace/515a0440e3e40efeb234e12644a6a346

create or replace function uuid7() returns uuid as $$
declare
begin
    return uuid7(clock_timestamp());
end $$ language plpgsql;
create or replace function uuid7(p_timestamp timestamp with time zone) returns uuid as $$
declare

    v_time double precision := null;

    v_unix_t bigint := null;
    v_rand_a bigint := null;
    v_rand_b bigint := null;

    v_unix_t_hex varchar := null;
    v_rand_a_hex varchar := null;
    v_rand_b_hex varchar := null;

    c_milli double precision := 10^3;  -- 1 000
    c_micro double precision := 10^6;  -- 1 000 000
    c_scale double precision := 4.096; -- 4.0 * (1024 / 1000)
    
    c_version bigint := x'0000000000007000'::bigint; -- RFC-9562 version: b'0111...'
    c_variant bigint := x'8000000000000000'::bigint; -- RFC-9562 variant: b'10xx...'

begin

    v_time := extract(epoch from p_timestamp);

    v_unix_t := trunc(v_time * c_milli);
    v_rand_a := trunc((v_time * c_micro - v_unix_t * c_milli) * c_scale);
    v_rand_b := trunc(random() * 2^30)::bigint << 32 | trunc(random() * 2^32)::bigint;

    v_unix_t_hex := lpad(to_hex(v_unix_t), 12, '0');
    v_rand_a_hex := lpad(to_hex((v_rand_a | c_version)::bigint), 4, '0');
    v_rand_b_hex := lpad(to_hex((v_rand_b | c_variant)::bigint), 16, '0');

    return (v_unix_t_hex || v_rand_a_hex || v_rand_b_hex)::uuid;
    
end $$ language plpgsql;

Performance Testing

Testing for UUID generation

Setup

do
$$
declare 
u uuid;
begin
    for i in 1..10000000
    loop
        select gen_random_uuid() into u;
--        raise notice '%', u;
    end loop;
end;
$$;
do
$$
declare 
u uuid;
begin
    for i in 1..10000000
    loop
        select uuid7() into u;
--        raise notice '%', u;
    end loop;
end;
$$;

Test

mytest=# do
mytest-# $$
mytest$# declare
mytest$# u uuid;
mytest$# begin
mytest$#     for i in 1..10000000
mytest$#     loop
mytest$#         select gen_random_uuid() into u;
mytest$# --        raise notice '%', u;
mytest$#     end loop;
mytest$# end;
mytest$# $$;
DO
Time: 31743.056 ms (00:31.743)
-- each execution 0.031743056ms
mytest=# do
mytest-# $$
mytest$# declare
mytest$# u uuid;
mytest$# begin
mytest$#     for i in 1..10000000
mytest$#     loop
mytest$#         select uuid7() into u;
mytest$# --        raise notice '%', u;
mytest$#     end loop;
mytest$# end;
mytest$# $$;
DO
Time: 54132.676 ms (00:54.133)
-- each execution 0.054132676ms

Testing for UUID insertion

Setup

create table tbl_uuidv4 
(u uuid primary key, c text);
insert into tbl_uuidv4
select gen_random_uuid(), 'additional data' from generate_series(1,10000000);
create table tbl_uuidv7 
(u uuid primary key, c text);
insert into tbl_uuidv7
select uuid7(), 'additional data' from generate_series(1,10000000);

Test

mytest=# insert into tbl_uuidv4
mytest-# select gen_random_uuid(), 'additional data' from generate_series(1,10000000);
INSERT 0 10000000
Time: 60405.272 ms (01:00.405)
mytest=# insert into tbl_uuidv7
mytest-# select uuid7(), 'additional data' from generate_series(1,10000000);

INSERT 0 10000000
Time: 56902.619 ms (00:56.903)
mytest=# \dt+ tbl_uuid*
                                      List of relations
 Schema |    Name    | Type  |  Owner   | Persistence | Access method |  Size  | Description
--------+------------+-------+----------+-------------+---------------+--------+-------------
 public | tbl_uuidv4 | table | donghua  | permanent   | heap          | 575 MB |
 public | tbl_uuidv7 | table | donghua  | permanent   | heap          | 575 MB |
(2 rows)
mytest=# \di+ tbl_uuid*
                                               List of relations
 Schema |      Name       | Type  |  Owner   |   Table    | Persistence | Access method |  Size  | Description
--------+-----------------+-------+----------+------------+-------------+---------------+--------+-------------
 public | tbl_uuidv4_pkey | index | donghua  | tbl_uuidv4 | permanent   | btree         | 392 MB |
 public | tbl_uuidv7_pkey | index | donghua  | tbl_uuidv7 | permanent   | btree         | 301 MB |
(2 rows)

Testing how many buffers dirtied for UUID insertion

Setup

explain (analyze,buffers)
insert into tbl_uuidv4
select gen_random_uuid(), 'additional data' from generate_series(1,100);
explain (analyze,buffers)
insert into tbl_uuidv7
select uuid7(), 'additional data' from generate_series(1,100);

Test

mytest=# explain (analyze,buffers)
mytest-# insert into tbl_uuidv4
mytest-# select gen_random_uuid(), 'additional data' from generate_series(1,100);
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Insert on tbl_uuidv4  (cost=0.00..2.25 rows=0 width=0) (actual time=40.337..40.338 rows=0 loops=1)
   Buffers: shared hit=231 read=186 dirtied=106 written=1
   ->  Subquery Scan on "*SELECT*"  (cost=0.00..2.25 rows=100 width=48) (actual time=0.038..1.379 rows=100 loops=1)
         ->  Function Scan on generate_series  (cost=0.00..1.25 rows=100 width=48) (actual time=0.037..1.306 rows=100 loops=1)
 Planning Time: 0.048 ms
 Execution Time: 40.365 ms
(6 rows)

Time: 41.754 ms
mytest=# explain (analyze,buffers)
mytest-# insert into tbl_uuidv7
mytest-# select uuid7(), 'additional data' from generate_series(1,100);
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Insert on tbl_uuidv7  (cost=0.00..27.00 rows=0 width=0) (actual time=1.204..1.205 rows=0 loops=1)
   Buffers: shared hit=219 dirtied=1 written=1
   ->  Subquery Scan on "*SELECT*"  (cost=0.00..27.00 rows=100 width=48) (actual time=0.077..0.614 rows=100 loops=1)
         ->  Function Scan on generate_series  (cost=0.00..26.00 rows=100 width=48) (actual time=0.076..0.594 rows=100 loops=1)
 Planning Time: 0.036 ms
 Execution Time: 1.227 ms
(6 rows)

Time: 1.904 ms