Monday, March 31, 2025

Observation about PostgreSQL's Index Only Scan behavior

 

Test Case

mytest=# \dt pgbench_accounts;
              List of relations
 Schema |       Name       | Type  |  Owner
--------+------------------+-------+----------
 public | pgbench_accounts | table | donghual
(1 row)
mytest=# delete from pgbench_accounts where aid=1;
DELETE 1
Time: 16.197 ms

Observation 1

mytest=# explain (buffers,analyze) select aid from pgbench_accounts where aid=1;
                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using pgbench_accounts_pkey on pgbench_accounts  (cost=0.29..8.31 rows=1 width=4) (actual time=1.692..1.693 rows=0 loops=1)
   Index Cond: (aid = 1)
   Heap Fetches: 1
   Buffers: shared hit=3 read=1 dirtied=1
 Planning Time: 0.079 ms
 Execution Time: 1.758 ms
(6 rows)

Time: 5.305 ms

Observation 2

mytest=# explain (buffers,analyze) select aid from pgbench_accounts where aid=1;
                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using pgbench_accounts_pkey on pgbench_accounts  (cost=0.29..8.31 rows=1 width=4) (actual time=0.023..0.023 rows=0 loops=1)
   Index Cond: (aid = 1)
   Heap Fetches: 0
   Buffers: shared hit=2
 Planning Time: 0.056 ms
 Execution Time: 0.066 ms
(6 rows)

Time: 0.414 ms

Why the first execution shows "Heap Fetches: 1" but the second shows "Heap Fetches: 0"

The difference relates to how PostgreSQL's Index Only Scan works and the visibility information in the index:

  1. First execution (Heap Fetches: 1): 

    • You just deleted the row with aid=1 

    • When PostgreSQL performs an Index Only Scan, it first checks the index • The index entry for aid=1 was still present, but PostgreSQL needed to verify if this row was visible ( not deleted) 

    • Since the visibility information wasn't current in the index's visibility map, PostgreSQL had to fetch the heap (table) page to check if the row was actually visible 

    • That's why you see "Heap Fetches: 1" - it had to check the actual table data 

    • You also see "Buffers: shared hit=3 read=1 dirtied=1" showing more I/O activity

  2. Second execution (Heap Fetches: 0): 

    • By this time, PostgreSQL had updated its visibility information 

    • The system now knew that the row with aid=1 was deleted 

    • The index's visibility map was updated to reflect this 

    • When executing the same query again, PostgreSQL could determine directly from the index that no visible rows exist with aid=1 

    • No need to check the heap (table) data, hence "Heap Fetches: 0" 

    • You also see reduced buffer activity: "Buffers: shared hit=2"

This is one of the key optimizations in PostgreSQL's Index Only Scans. The first query after a data modification often needs to do heap fetches to verify visibility, but subsequent queries can avoid these fetches once the visibility map is updated.

The execution time difference (1.758ms vs 0.066ms) clearly shows the performance benefit of avoiding those heap fetches.

Thursday, March 20, 2025

Installing Oracle Database 23ai on Amazon Linux 2023

 This guide provides steps to install Oracle Database 23ai on Amazon Linux 2023 by resolving dependency issues with the Oracle preinstall package. The steps documented here are purely for testing purpose, and never be considered as recommendation for production setup.

Problem Overview

The Oracle Database 23ai preinstall package (oracle-database-preinstall-23ai) has dependencies that conflict with Amazon Linux 2023:

  1. It requires /etc/redhat-release which doesn't exist on Amazon Linux
  2. It requires xorg-x11-utils which has been split into individual packages in Amazon Linux 2023

Solution: Create Custom RPMs

We'll create two custom RPMs to satisfy these dependencies without conflicting with existing packages.

1. Create fake-redhat-release RPM

This RPM provides the /etc/redhat-release file that Oracle's preinstall package checks for.

Step 1: Set up the RPM build environment

# Install RPM build tools
sudo yum install -y rpm-build rpmdevtools

# Set up the RPM build directory structure
rpmdev-setuptree

Step 2: Create the RPM spec file

cd ~
cat > fake-redhat-release.spec << 'EOF'
Name:           fake-redhat-release
Version:        9.0
Release:        1%{?dist}
Summary:        Fake Red Hat Enterprise Linux release file
License:        MIT

BuildArch:      noarch
Provides:       redhat-release = 9.0
Provides:       system-release = 9.0

%description
This package provides a fake /etc/redhat-release file to satisfy dependencies
for packages that require it on non-RHEL systems like Amazon Linux.

%prep
# Nothing to do

%build
# Nothing to build

%install
mkdir -p %{buildroot}/etc
echo "Red Hat Enterprise Linux release 9.0 (Plow)" > %{buildroot}/etc/redhat-release

%files
%attr(0644,root,root) /etc/redhat-release

%changelog
* Thu Mar 20 2025 User <user@example.com> - 9.0-1
- Initial package
EOF

Step 3: Build the RPM

rpmbuild -bb fake-redhat-release.spec

Step 4: Install the RPM

sudo rpm -ivh ~/rpmbuild/RPMS/noarch/fake-redhat-release-9.0-1*.rpm

2. Create dummy-xorg-x11-utils RPM

This RPM provides the virtual capability xorg-x11-utils without containing any files that would conflict with the individual X11 utility packages.

Step 1: Create the RPM spec file

cd ~
cat > dummy-xorg-x11-utils.spec << 'EOF'
Name:           dummy-xorg-x11-utils
Version:        7.5
Release:        39%{?dist}
Summary:        Dummy package that provides xorg-x11-utils
License:        MIT

BuildArch:      noarch
Provides:       xorg-x11-utils = 7.5-39
# Don't conflict with the actual utilities that obsolete xorg-x11-utils
Conflicts:      xorg-x11-utils

%description
This is a dummy package that provides the xorg-x11-utils virtual capability
without containing any files that would conflict with the individual X11
utility packages in Amazon Linux 2023.

%prep
# Nothing to do

%build
# Nothing to build

%install
# Create an empty directory structure
mkdir -p %{buildroot}

%files
# No files

%changelog
* Thu Mar 20 2025 User <user@example.com> - 7.5-39
- Initial dummy package
EOF

Step 2: Build the RPM

rpmbuild -bb dummy-xorg-x11-utils.spec

Step 3: Install the dummy package

sudo rpm -ivh ~/rpmbuild/RPMS/noarch/dummy-xorg-x11-utils-7.5-39*.rpm

3. Install Oracle Database 23ai Preinstall Package

After installing both custom RPMs, you can install the Oracle preinstall package:

sudo yum install https://yum.oracle.com/repo/OracleLinux/OL9/appstream/x86_64/getPackage/oracle-database-preinstall-23ai-1.0-2.el9.x86_64.rpm

4. Install Required X11 Utility Packages

Even with the dummy package, you should install the actual X11 utilities that Oracle might need:

sudo yum install -y xdpyinfo xev xwininfo xprop xkill xlsclients xlsatoms
After this step, you can install Oracle 23ai (FREE) RPM "oracle-database-free-23ai-1.0-1.el9.x86_64.rpm". (Link)

Notes

  • This approach is specifically for Amazon Linux 2023 and Oracle Database 23ai
  • For production environments, consider using Oracle Linux which is fully supported
  • Always refer to Oracle's official documentation for the latest requirements

[root@ip-x-2-2-x ~]# cat /etc/amazon-linux-release
Amazon Linux release 2023.6.20250303 (Amazon Linux)
[root@ip-x-2-2-x ~]# 
[root@ip-x-2-2-x ~]# rpm -qa|grep oracle-database-free-23ai-1.0-1.x86_64
oracle-database-free-23ai-1.0-1.x86_64


Tuesday, February 25, 2025

Tips to minimize table lock for DDL on large tables for PostgreSQL

Here are some tips to minimize table lock for DDL on large tables:

  • NOT VALID for FOREIGN KEY/CHECK CONSTRAINT
  • CREATE INDEX CONCURRENTLY
  • UNIQUE/PRIMARY KEY CONSTRAINT using pre-created index


Examples:

mytest=# alter table t add constraint t_fk foreign key (id) references p(id) not valid;
ALTER TABLE
Time: 4.706 ms

mytest=# alter table t validate constraint t_fk;
ALTER TABLE
Time: 279.145 ms
mytest=# create unique index concurrently t_pk on t(id) ;
CREATE INDEX
Time: 201.014 ms

mytest=# alter table t add constraint t_pk primary key using index t_pk;
ALTER TABLE
Time: 4.386 ms

Sunday, February 23, 2025

Compare different PostgreSQL index types for text column

Prepare test case data

-- Create the table
CREATE TABLE website_data (
    table_storage_id BIGINT,
    uri TEXT,
    page_viewer_counter INTEGER
);

-- Function to generate a random number within a given range
CREATE OR REPLACE FUNCTION random_between(low INT, high INT) RETURNS INT AS $$
BEGIN
    RETURN floor(random() * (high - low + 1) + low)::INT;
END;
$$ LANGUAGE plpgsql;

-- Function to generate a random nested path for the URI
CREATE OR REPLACE FUNCTION generate_random_uri() RETURNS TEXT AS $$
DECLARE
    top_tier TEXT[] := ARRAY['/doc', '/prod', '/sales', '/customer'];
    num_layers INT;
    layer TEXT;
    result_uri TEXT;
    i INT;
BEGIN
    -- Select a random top tier
    result_uri := top_tier[random_between(1, array_length(top_tier, 1))];
    -- Generate a random number of layers between 3 and 5
    num_layers := random_between(3, 5);
    FOR i IN 1..num_layers LOOP
        -- Generate a random layer name (a simple alphanumeric string for example)
        layer := md5(random()::text || clock_timestamp()::text)::text;
        result_uri := result_uri || '/' || left(layer, 8);
    END LOOP;
    RETURN result_uri;
END;
$$ LANGUAGE plpgsql;

-- Insert 1 million rows of dummy data
INSERT INTO website_data (table_storage_id, uri, page_viewer_counter)
SELECT 
    generate_series(1, 1000000) AS table_storage_id,
    generate_random_uri() AS uri,
    random_between(0, 1000) AS page_viewer_counter;

mytest=# select * from website_data limit 10;
 table_storage_id |                          uri                           | page_viewer_counter
------------------+--------------------------------------------------------+---------------------
                1 | /sales/667ba163/ef63ee69/28750da6                      |                 516
                2 | /sales/9614802b/d5b5cc57/150aaa58/1a199920             |                 946
                3 | /customer/a577d741/e0bc954d/c6e0fd20/fead2dbf/6e0f8889 |                 788
                4 | /doc/8863091a/ee194b9d/716852a4/bd3ff291               |                 221
                5 | /doc/8e2df3f4/5d9856b4/eeed2bd2/97b16c45/0c3fc22b      |                 141
                6 | /doc/57068bd4/74eabae3/20a5fb9b                        |                  54
                7 | /customer/811aa406/f293f18b/c9d05b7e                   |                  36
                8 | /doc/7c9f200a/0b20f9eb/f4f3b778/63be4322               |                 398
                9 | /customer/85bd643e/22850ba5/59a57598/b27f3572          |                 509
               10 | /doc/7ec5ef49/757faacc/af257a5a/fe7ea09e               |                 662
(10 rows)

Create different indexes types:


CREATE INDEX idx_uri_btree ON website_data USING btree (uri);
Time: 711.671 ms

CREATE INDEX idx_uri_hash ON website_data USING hash (uri);
Time: 1010.717 ms (00:01.011)

mytest=# create extension btree_gist;
CREATE EXTENSION

CREATE INDEX idx_uri_gist ON website_data USING gist (uri);
Time: 16012.289 ms (00:16.012)

CREATE INDEX idx_uri_spgist ON website_data USING spgist (uri);
Time: 2257.327 ms (00:02.257)

VACUUM ANALYZE;

\di+ idx_uri_*
                                               List of relations
 Schema |      Name      | Type  |  Owner  |    Table     | Persistence | Access method |  Size  | Description
--------+----------------+-------+---------+--------------+-------------+---------------+--------+-------------
 public | idx_uri_btree  | index | donghua | website_data | permanent   | btree         | 64 MB  |
 public | idx_uri_gist   | index | donghua | website_data | permanent   | gist          | 102 MB |
 public | idx_uri_hash   | index | donghua | website_data | permanent   | hash          | 32 MB  |
 public | idx_uri_spgist | index | donghua | website_data | permanent   | spgist        | 66 MB  |


 mytest=# \dt+ website_data;
                                      List of relations
 Schema |     Name     | Type  |  Owner  | Persistence | Access method | Size  | Description
--------+--------------+-------+---------+-------------+---------------+-------+-------------
 public | website_data | table | donghua | permanent   | heap          | 83 MB |
load 'pg_hint_plan';
create extension pg_hint_plan;

Compare execution plans (take 2nd exectuion output to avoid cold cache)

mytest=# explain (analyze,buffers) /*+ SeqScan(website_data) */ select * from website_data where uri='/customer/811aa406/f293f18b/c9d05b7e';
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..16792.43 rows=1 width=55) (actual time=0.236..27.291 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=513 read=10071
   ->  Parallel Seq Scan on website_data  (cost=0.00..15792.33 rows=1 width=55) (actual time=11.457..19.653 rows=0 loops=3)
         Filter: (uri = '/customer/811aa406/f293f18b/c9d05b7e'::text)
         Rows Removed by Filter: 333333
         Buffers: shared hit=513 read=10071
 Planning Time: 0.232 ms
 Execution Time: 27.308 ms
(10 rows)

Time: 27.985 ms
mytest=# explain (analyze,buffers) /*+ IndexScan(website_data idx_uri_btree) */ select * from website_data where uri='/customer/811aa406/f293f18b/c9d05b7e';
                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_uri_btree on website_data  (cost=0.42..8.44 rows=1 width=55) (actual time=0.021..0.022 rows=1 loops=1)
   Index Cond: (uri = '/customer/811aa406/f293f18b/c9d05b7e'::text)
   Buffers: shared hit=4
 Planning Time: 0.157 ms
 Execution Time: 0.033 ms
(5 rows)

Time: 0.538 ms
mytest=# explain (analyze,buffers) /*+ IndexScan(website_data idx_uri_hash) */ select * from website_data where uri='/customer/811aa406/f293f18b/c9d05b7e';
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_uri_hash on website_data  (cost=0.00..8.02 rows=1 width=55) (actual time=0.021..0.021 rows=1 loops=1)
   Index Cond: (uri = '/customer/811aa406/f293f18b/c9d05b7e'::text)
   Buffers: shared hit=2
 Planning Time: 0.224 ms
 Execution Time: 0.036 ms
(5 rows)

Time: 2.500 ms
mytest=# explain (analyze,buffers) /*+ IndexScan(website_data idx_uri_gist) */ select * from website_data where uri='/customer/811aa406/f293f18b/c9d05b7e';
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_uri_gist on website_data  (cost=0.41..8.43 rows=1 width=55) (actual time=0.068..0.143 rows=1 loops=1)
   Index Cond: (uri = '/customer/811aa406/f293f18b/c9d05b7e'::text)
   Buffers: shared hit=21
 Planning Time: 0.149 ms
 Execution Time: 0.157 ms
(5 rows)

Time: 0.652 ms
mytest=# explain (analyze,buffers) /*+ IndexScan(website_data idx_uri_spgist) */ select * from website_data where uri='/customer/811aa406/f293f18b/c9d05b7e';
                                                          QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_uri_spgist on website_data  (cost=0.29..8.30 rows=1 width=55) (actual time=0.083..0.084 rows=1 loops=1)
   Index Cond: (uri = '/customer/811aa406/f293f18b/c9d05b7e'::text)
   Buffers: shared hit=6
 Planning Time: 0.165 ms
 Execution Time: 0.096 ms
(5 rows)

Time: 0.629 ms
Index TypeStorage Size (MB)Planning TimeExecution Time
BTree640.1570.033
Hash320.2240.036
Gist1020.1490.157
Spgist660.1650.096
No Index-0.23227.308