Friday, April 3, 2026

Building Production-Grade Oracle 19c High Availability on AWS with Pacemaker and DRBD

 Building on my PostgreSQL HA implementation, this guide shows how to adapt the same Pacemaker + DRBD + Route53 stack for Oracle 19c with a key difference:

  • ✅ Oracle binaries: Local on each node (/u01) - no replication needed
  • ✅ Oracle data: Replicated via DRBD (/oradata) - zero data loss
  • ✅ Automatic failover: ~90 seconds with Route53 DNS updates
  • ✅ Multi-AZ: Works across different availability zones

    Key Insight: Oracle's architecture allows binaries and data to be separated, making HA implementation simpler than you might think.

Why This Approach?

Oracle vs PostgreSQL Architecture

PostgreSQL: Binaries and data tightly coupled

  • Solution: Replicate everything via DRBD

Oracle: Binaries and data can be separated

  • Binaries: /u01/app/oracle (local, identical on both nodes)
  • Data: /oradata (DRBD-replicated, shared)
  • Advantage: Faster failover, less data to replicate

What's Different from PostgreSQL?

ComponentPostgreSQLOracle 19c
Binary location/drbd-data (replicated)/u01 (local, not replicated)
Data location/drbd-data (replicated)/oradata (DRBD-replicated)
InstallationOnce on DRBDTwice (on each node)
Failover time~60s~90s (Oracle startup slower)
Resource agentpgsqloracle

Prerequisites

Before starting, read the PostgreSQL HA guide for:

  • Infrastructure setup (VPC, subnets, security groups)
  • DRBD installation and configuration
  • Pacemaker cluster setup
  • Route53 Private DNS configuration
  • STONITH fencing setup

This guide focuses only on Oracle-specific differences.


Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                    VPC (10.0.0.0/16)                        │
│                                                             │
│  ┌──────────────────────┐  ┌──────────────────────┐         │
│  │  Subnet AZ-a         │  │  Subnet AZ-b         │         │
│  │  10.0.1.0/24         │  │  10.0.2.0/24         │         │
│  │                      │  │                      │         │
│  │  ┌────────────────┐  │  │  ┌────────────────┐  │         │
│  │  │  ora-primary   │  │  │  │  ora-secondary │  │         │
│  │  │  10.0.1.10     │◄─┼──┼─►│  10.0.2.10     │  │         │
│  │  │                │  │  │  │                │  │         │
│  │  │ /u01 (local)   │  │  │  │ /u01 (local)   │  │         │
│  │  │ /oradata (DRBD)│  │  │  │ /oradata (DRBD)│  │         │
│  │  │ Oracle ACTIVE  │  │  │  │ (standby)      │  │         │
│  │  └────────────────┘  │  │  └────────────────┘  │         │
│  └──────────────────────┘  └──────────────────────┘         │
│                                                             │
│  Route53: db.oraclha.internal → 10.0.1.10 (active)          │
│  DRBD: /oradata only (not /u01)                             │
│  Pacemaker: Manages Oracle + DNS                            │
└─────────────────────────────────────────────────────────────┘

Oracle-Specific Implementation

1. Infrastructure Differences

Instance Requirements:

# Larger instances for Oracle
Instance Type: t3.large or m5.large (minimum)
Memory: 8GB+ (Oracle requirement)
Storage:
  - Root: 50GB (for OS)
  - /u01: 50GB local EBS (Oracle binaries)
  - DRBD volume: 100GB+ (for /oradata)

Additional Security Group Ports:

TCP 1521  - Oracle Listener
TCP 5500  - Oracle EM Express (optional)
# Plus all ports from PostgreSQL guide (22, 2224, 7789, 5404-5405)

2. Oracle Installation (Both Nodes)

Key Point: Install Oracle binaries locally on BOTH nodes, but create database only on primary.

# On BOTH nodes - Install Oracle binaries to /u01

# 1. Create Oracle user and groups
sudo groupadd -g 54321 oinstall
sudo groupadd -g 54322 dba
sudo groupadd -g 54323 oper
sudo useradd -u 54321 -g oinstall -G dba,oper oracle

# 2. Create directories
sudo mkdir -p /u01/app/oracle/product/19.0.0/dbhome_1
sudo chown -R oracle:oinstall /u01
sudo chmod -R 775 /u01

# 3. Set kernel parameters
sudo tee -a /etc/sysctl.conf << EOF
fs.file-max = 6815744
kernel.sem = 250 32000 100 128
kernel.shmmni = 4096
kernel.shmall = 1073741824
kernel.shmmax = 4398046511104
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500
EOF
sudo sysctl -p

# 4. Install Oracle prerequisites
sudo dnf install -y oracle-database-preinstall-19c

# 5. Install Oracle binaries (as oracle user)
# Download Oracle 19c from oracle.com
# Extract and run installer
su - oracle
cd /tmp/oracle19c
./runInstaller -silent \
    -responseFile /tmp/db_install.rsp \
    oracle.install.option=INSTALL_DB_SWONLY \
    UNIX_GROUP_NAME=oinstall \
    INVENTORY_LOCATION=/u01/app/oraInventory \
    ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1 \
    ORACLE_BASE=/u01/app/oracle \
    oracle.install.db.InstallEdition=EE \
    oracle.install.db.OSDBA_GROUP=dba \
    oracle.install.db.OSOPER_GROUP=oper \
    DECLINE_SECURITY_UPDATES=true

# 6. Run root scripts (as root)
sudo /u01/app/oraInventory/orainstRoot.sh
sudo /u01/app/oracle/product/19.0.0/dbhome_1/root.sh

Important: Repeat steps 1-6 on BOTH nodes. Binaries must be identical.

3. DRBD Configuration for Oracle

Differences from PostgreSQL:

  • Larger volume (100GB+ for Oracle data)
  • Only /oradata is on DRBD (not /u01)
# /etc/drbd.d/oradata.res
resource oradata {
  protocol C;
  
  disk {
    resync-rate 100M;  # Faster for larger volumes
  }
  
  net {
    max-buffers 8000;
    max-epoch-size 8000;
  }
  
  on ora-primary {
    device /dev/drbd0;
    disk /dev/nvme1n1;
    address 10.0.1.10:7789;
    meta-disk internal;
  }
  
  on ora-secondary {
    device /dev/drbd0;
    disk /dev/nvme1n1;
    address 10.0.2.10:7789;
    meta-disk internal;
  }
}

Initialize DRBD:

# On both nodes
sudo drbdadm create-md oradata --force
sudo drbdadm up oradata

# On primary only
sudo drbdadm primary oradata --force

# Create filesystem (primary only)
sudo mkfs.ext4 /dev/drbd0
sudo mkdir -p /oradata
sudo mount /dev/drbd0 /oradata
sudo chown -R oracle:oinstall /oradata
sudo chmod 775 /oradata

4. Create Oracle Database (Primary Only)

Key Point: Create database on DRBD-mounted /oradata on primary node only.

# On primary node as oracle user
su - oracle

# Set environment
export ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1
export ORACLE_SID=ORCL
export PATH=$ORACLE_HOME/bin:$PATH

# Create database using DBCA
dbca -silent \
    -createDatabase \
    -templateName General_Purpose.dbc \
    -gdbname ORCL \
    -sid ORCL \
    -responseFile NO_VALUE \
    -characterSet AL32UTF8 \
    -sysPassword SysPassword123 \
    -systemPassword SysPassword123 \
    -createAsContainerDatabase false \
    -databaseType MULTIPURPOSE \
    -automaticMemoryManagement false \
    -storageType FS \
    -datafileDestination /oradata \
    -redoLogFileSize 100 \
    -emConfiguration NONE \
    -ignorePreReqs

# Verify database created
sqlplus / as sysdba << EOF
SELECT name, open_mode FROM v\$database;
EXIT;
EOF

5. Configure Oracle Listener

Important: Listener must bind to all interfaces for failover.

# On both nodes - /u01/app/oracle/product/19.0.0/dbhome_1/network/admin/listener.ora
LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 0.0.0.0)(PORT = 1521))
    )
  )

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = ORCL)
      (ORACLE_HOME = /u01/app/oracle/product/19.0.0/dbhome_1)
      (SID_NAME = ORCL)
    )
  )

# On both nodes - /u01/app/oracle/product/19.0.0/dbhome_1/network/admin/tnsnames.ora
ORCL =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = db.oraclha.internal)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ORCL)
    )
  )

6. Pacemaker Resources for Oracle

Oracle Resource Agent: Use oracle OCF resource agent (included in resource-agents package).

# Create DRBD resource (same as PostgreSQL)
sudo pcs resource create oradata_drbd ocf:linbit:drbd \
    drbd_resource=oradata \
    op monitor interval=10s

sudo pcs resource promotable oradata_drbd \
    promoted-max=1 promoted-node-max=1 \
    clone-max=2 clone-node-max=1 notify=true

# Create Filesystem resource
sudo pcs resource create oradata_fs Filesystem \
    device=/dev/drbd0 \
    directory=/oradata \
    fstype=ext4 \
    op monitor interval=20s

# Create Oracle resource
sudo pcs resource create oracle ocf:heartbeat:oracle \
    sid=ORCL \
    home=/u01/app/oracle/product/19.0.0/dbhome_1 \
    user=oracle \
    ipcrm=instance \
    op start timeout=120s \
    op stop timeout=120s \
    op monitor interval=30s timeout=30s

# Create Route53 DNS resource (same as PostgreSQL)
sudo pcs resource create cluster_dns ocf:heartbeat:route53-private \
    hosted_zone_id=/hostedzone/Z0858847QEI548E3AXOM \
    hostname=db.oraclha.internal \
    ttl=30 \
    op start timeout=60s \
    op stop timeout=60s \
    op monitor interval=30s timeout=20s

# Set constraints
sudo pcs constraint colocation add oradata_fs with oradata_drbd-clone INFINITY with-rsc-role=Master
sudo pcs constraint order promote oradata_drbd-clone then start oradata_fs
sudo pcs constraint colocation add oracle with oradata_fs INFINITY
sudo pcs constraint order oradata_fs then oracle
sudo pcs constraint colocation add cluster_dns with oracle INFINITY
sudo pcs constraint order oracle then cluster_dns

7. Oracle-Specific Environment Setup

Create wrapper script for Oracle resource agent:

# On both nodes - /usr/local/bin/oracle_env.sh
#!/bin/bash
export ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1
export ORACLE_SID=ORCL
export PATH=$ORACLE_HOME/bin:$PATH
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH

# Make executable
sudo chmod +x /usr/local/bin/oracle_env.sh

Update oracle user's profile (both nodes):

# /home/oracle/.bash_profile
export ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1
export ORACLE_SID=ORCL
export PATH=$ORACLE_HOME/bin:$PATH
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH

Testing Oracle HA

Test 1: Manual Failover

# Check current status
sudo pcs status resources

# Connect to database
sqlplus system/SysPassword123@db.oraclha.internal/ORCL
SQL> SELECT instance_name, host_name FROM v$instance;

# Trigger failover
sudo pcs node standby ora-primary

# Wait 90 seconds
sleep 90

# Verify failover
sqlplus system/SysPassword123@db.oraclha.internal/ORCL
SQL> SELECT instance_name, host_name FROM v$instance;
# Should show ora-secondary

# Failback
sudo pcs node unstandby ora-primary

Test 2: Instance Stop (Automatic Failover)

# Stop primary instance
aws ec2 stop-instances --instance-ids i-xxx --region ap-southeast-2

# Wait 90 seconds for automatic failover

# Verify database accessible
sqlplus system/SysPassword123@db.oraclha.internal/ORCL
SQL> SELECT instance_name, host_name FROM v$instance;

Test 3: Data Integrity

# Before failover
sqlplus system/SysPassword123@db.oraclha.internal/ORCL << EOF
CREATE TABLE test_ha (id NUMBER, test_data VARCHAR2(100));
INSERT INTO test_ha VALUES (1, 'Before failover');
COMMIT;
SELECT * FROM test_ha;
EXIT;
EOF

# Trigger failover
sudo pcs node standby ora-primary
sleep 90

# After failover - verify data
sqlplus system/SysPassword123@db.oraclha.internal/ORCL << EOF
SELECT * FROM test_ha;
INSERT INTO test_ha VALUES (2, 'After failover');
COMMIT;
SELECT * FROM test_ha;
EXIT;
EOF

Key Differences Summary

Installation

AspectPostgreSQLOracle 19c
Binary locationDRBD (/drbd-data)Local (/u01)
Data locationDRBD (/drbd-data)DRBD (/oradata)
Install countOnceTwice (both nodes)
Binary size~200MB~8GB
Data replicatedEverythingData only

Failover

AspectPostgreSQLOracle 19c
Startup time~15s~30s
Total failover~60s~90s
Resource agentpgsqloracle
ComplexityLowerHigher

Resource Configuration

PostgreSQL:

DRBD → Filesystem → PostgreSQL → DNS

Oracle:

DRBD → Filesystem → Oracle → DNS
(Oracle binaries already on local /u01)

Performance Considerations

DRBD Tuning for Oracle

# /etc/drbd.d/oradata.res - Performance tuning
resource oradata {
  protocol C;
  
  disk {
    resync-rate 200M;        # Faster resync
    c-plan-ahead 20;         # Better write performance
    c-fill-target 10M;
    c-max-rate 700M;
  }
  
  net {
    max-buffers 8000;
    max-epoch-size 8000;
    sndbuf-size 1M;
    rcvbuf-size 1M;
  }
}

Oracle Memory Configuration

# For 8GB instance
SQL> ALTER SYSTEM SET sga_target=4G SCOPE=SPFILE;
SQL> ALTER SYSTEM SET pga_aggregate_target=2G SCOPE=SPFILE;
SQL> SHUTDOWN IMMEDIATE;
SQL> STARTUP;

Troubleshooting Oracle-Specific Issues

Issue: Oracle Won't Start After Failover

Check:

# Verify /oradata mounted
df -h | grep oradata

# Check Oracle processes
ps -ef | grep ora_

# Check alert log
tail -100 /oradata/diag/rdbms/orcl/ORCL/trace/alert_ORCL.log

# Verify ORACLE_HOME
ls -la /u01/app/oracle/product/19.0.0/dbhome_1/bin/oracle

Fix:

# Manually start to see errors
su - oracle
sqlplus / as sysdba
SQL> STARTUP;

# Check for lock files
rm -f /oradata/ORCL/lk*

Issue: Listener Not Starting

Check:

# Verify listener configuration
cat /u01/app/oracle/product/19.0.0/dbhome_1/network/admin/listener.ora

# Check if port 1521 is in use
sudo netstat -tlnp | grep 1521

# Start listener manually
su - oracle
lsnrctl start
lsnrctl status

Issue: DRBD Sync Slow

Check:

# Monitor DRBD sync
watch -n 1 'cat /proc/drbd'

# Check network throughput
iperf3 -s  # On one node
iperf3 -c 10.0.2.10  # On other node

# Increase resync rate
sudo drbdadm disk-options --resync-rate=200M oradata

Production Checklist

Before Go-Live

  • [ ] Oracle binaries installed identically on both nodes
  • [ ] DRBD synced and tested (100GB+ takes time)
  • [ ] Oracle database created on DRBD /oradata
  • [ ] Listener configured to bind to 0.0.0.0
  • [ ] Pacemaker resources configured and tested
  • [ ] Route53 DNS working
  • [ ] STONITH fencing tested
  • [ ] Manual failover tested (3+ times)
  • [ ] Automatic failover tested (instance stop)
  • [ ] Data integrity verified
  • [ ] Backup procedures implemented (RMAN + EBS snapshots)
  • [ ] Monitoring configured (CloudWatch + OEM)
  • [ ] Performance tuning completed
  • [ ] Documentation updated

Oracle-Specific Checks

  • [ ] Archive log mode enabled (for RMAN backups)
  • [ ] Fast Recovery Area configured on /oradata
  • [ ] Oracle patches applied (both nodes)
  • [ ] TNS configuration identical on both nodes
  • [ ] Oracle user environment identical
  • [ ] Alert log monitoring configured
  • [ ] AWR/ADDM reports scheduled

Backup Strategy

RMAN Configuration

# On active node
su - oracle
rman target /

RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 7 DAYS;
RMAN> CONFIGURE CONTROLFILE AUTOBACKUP ON;
RMAN> CONFIGURE DEVICE TYPE DISK BACKUP TYPE TO COMPRESSED BACKUPSET;
RMAN> CONFIGURE CHANNEL DEVICE TYPE DISK FORMAT '/oradata/backup/%U';

# Daily backup script
RMAN> BACKUP DATABASE PLUS ARCHIVELOG;
RMAN> DELETE NOPROMPT OBSOLETE;

EBS Snapshots

# Snapshot DRBD volumes daily
aws ec2 create-snapshot \
    --volume-id vol-xxx \
    --description "DRBD oradata backup $(date +%Y%m%d)" \
    --tag-specifications 'ResourceType=snapshot,Tags=[{Key=Name,Value=oradata-backup}]'

Monitoring

Oracle-Specific Metrics

# Add to CloudWatch agent config
{
  "metrics": {
    "namespace": "Oracle-HA",
    "metrics_collected": {
      "oracle": {
        "measurement": [
          {"name": "sessions", "unit": "Count"},
          {"name": "active_sessions", "unit": "Count"},
          {"name": "db_block_gets", "unit": "Count"},
          {"name": "physical_reads", "unit": "Count"}
        ],
        "metrics_collection_interval": 60
      }
    }
  }
}

Custom Oracle Monitoring Script

#!/bin/bash
# /usr/local/bin/oracle_metrics.sh

export ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1
export ORACLE_SID=ORCL
export PATH=$ORACLE_HOME/bin:$PATH

# Get metrics
SESSIONS=$(sqlplus -s / as sysdba << EOF
SET PAGESIZE 0 FEEDBACK OFF VERIFY OFF HEADING OFF ECHO OFF
SELECT COUNT(*) FROM v\$session;
EXIT;
EOF
)

# Send to CloudWatch
aws cloudwatch put-metric-data \
    --namespace Oracle-HA \
    --metric-name Sessions \
    --value $SESSIONS \
    --unit Count

Conclusion

Adapting the PostgreSQL HA solution for Oracle 19c is straightforward with these key differences:

✅ Install Oracle binaries locally on both nodes (/u01)
✅ Replicate only data via DRBD (/oradata)
✅ Use oracle resource agent instead of pgsql
✅ Expect 90-second failover (vs 60s for PostgreSQL)
✅ Same infrastructure (DRBD, Pacemaker, Route53, STONITH)

When to Use This Solution

Good Fit:

  • Need Oracle HA without RAC licensing costs
  • Multi-AZ requirement with different subnets
  • Can tolerate 90-second failover
  • Want full control over Oracle configuration

Not a Good Fit:

  • Need sub-second failover (use Oracle RAC)
  • Want fully managed solution (use RDS Oracle)
  • Need active-active configuration
  • Can't tolerate any downtime

Next Steps

  1. Follow the PostgreSQL guide for infrastructure setup
  2. Implement Oracle-specific changes from this guide
  3. Test thoroughly in non-production
  4. Document your specific Oracle configuration
  5. Deploy to production with confidence

Resources

Base Infrastructure Guidehttps://www.dbaglobe.com/2026/04/building-production-grade-postgresql.html

Oracle Documentation:

  • Oracle 19c Installation Guide
  • Oracle HA Best Practices
  • RMAN Backup and Recovery Guide

DRBDhttps://linbit.com/drbd-user-guide/
Pacemakerhttps://clusterlabs.org/pacemaker/