Thursday, April 2, 2026

Zero Data Loss Database Migration with RMAN Active Duplication

 

Introduction

When migrating Oracle databases, achieving zero data loss while minimizing downtime is critical. This article demonstrates how to use RMAN's DUPLICATE FROM ACTIVE DATABASE feature with the NOOPEN clause to achieve zero data loss migration while keeping the source database online.

The Challenge

Traditional RMAN active database duplication opens the duplicate database immediately after completion, which means:

  • Any transactions committed after duplication starts may be lost
  • For large databases (TB+), duplication can take hours
  • Data loss window = time from duplication start to completion

Question: Can we recover all transactions up to a planned cutover time, even if duplication takes hours?

Answer: Yes, using the NOOPEN clause.

The Solution: NOOPEN Clause

The key is to prevent RMAN from opening the duplicate database immediately. Instead, keep it in MOUNT mode and open it at your planned cutover time.

How It Works

  1. Duplication Phase (source stays online): RMAN duplicates datafiles and applies archive logs up to duplication completion
  2. Mount Mode: Database remains in MOUNT mode, not yet opened
  3. Cutover: Stop application, force log switch
  4. Roll Forward: Catalog and apply all remaining archive logs generated during duplication
  5. Open with RESETLOGS: Open database to complete recovery
  6. Result: Zero data loss

Implementation

Prerequisites

  • Source database in ARCHIVELOG mode
  • Network connectivity between source and target
  • Sufficient disk space on target server
  • Password file with matching SYS password on both servers

Step 1: Prepare Auxiliary Instance

#!/bin/bash
# prepare_auxiliary.sh
export ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1
export ORACLE_SID=dupdb
export PATH=$ORACLE_HOME/bin:$PATH

# Create directories
mkdir -p /u02/oradata/dupdb
mkdir -p /u02/fra/dupdb
mkdir -p /u01/app/oracle/admin/dupdb/adump

# Create password file (match source password)
$ORACLE_HOME/bin/orapwd file=$ORACLE_HOME/dbs/orapwdupdb password=<YOUR_PASSWORD> force=yes

# Create init parameter file
cat > $ORACLE_HOME/dbs/initdupdb.ora <<'EOF'
db_name=dupdb
db_unique_name=dupdb
control_files='/u02/oradata/dupdb/control01.ctl','/u02/oradata/dupdb/control02.ctl'
db_block_size=8192
sga_target=16G
pga_aggregate_target=8G
processes=1000
db_recovery_file_dest=/u02/fra/dupdb
db_recovery_file_dest_size=500G
diagnostic_dest=/u01/app/oracle
enable_pluggable_database=true
EOF

# Start auxiliary instance
sqlplus / as sysdba <<SQL
startup nomount pfile='$ORACLE_HOME/dbs/initdupdb.ora'
exit
SQL

Step 2: Configure TNS (if on different servers)

On both source and target servers, add TNS entries:

# tnsnames.ora
SOURCEDB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = source-server)(PORT = 1521))
    (CONNECT_DATA = (SERVICE_NAME = sourcedb))
  )

DUPDB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = target-server)(PORT = 1521))
    (CONNECT_DATA = (SID = dupdb))
  )

On target server, add static listener entry for auxiliary instance:

# listener.ora
SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = dupdb)
      (ORACLE_HOME = /u01/app/oracle/product/19.0.0/dbhome_1)
      (SID_NAME = dupdb)
    )
  )

Reload listener: lsnrctl reload

Step 3: Execute Duplication with NOOPEN

#!/bin/bash
# rman_duplicate_noopen.sh
export ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1
export ORACLE_SID=sourcedb  # Run from source server
export PATH=$ORACLE_HOME/bin:$PATH

LOG_FILE="/backup/rman_duplicate_$(date +%Y%m%d_%H%M%S).log"
echo "Starting duplication at: $(date)"
echo "Log file: $LOG_FILE"

$ORACLE_HOME/bin/rman msglog=$LOG_FILE <<'EOF'
connect target sys/<PASSWORD>@sourcedb
connect auxiliary sys/<PASSWORD>@dupdb

run {
  allocate channel c1 device type disk connect 'sys/<PASSWORD>@sourcedb';
  allocate channel c2 device type disk connect 'sys/<PASSWORD>@sourcedb';
  allocate auxiliary channel aux1 device type disk;
  allocate auxiliary channel aux2 device type disk;
  
  duplicate target database to dupdb from active database
  spfile
    set db_unique_name='dupdb'
    set db_name='dupdb'
    set control_files='/u02/oradata/dupdb/control01.ctl','/u02/oradata/dupdb/control02.ctl'
    set db_recovery_file_dest='/u02/fra/dupdb'
    set db_recovery_file_dest_size='500G'
    set diagnostic_dest='/u01/app/oracle'
    set db_file_name_convert='/u01/oradata/sourcedb','/u02/oradata/dupdb'
    set log_file_name_convert='/u01/oradata/sourcedb','/u02/oradata/dupdb'
  nofilenamecheck
  noopen;    # KEY: Keep database in MOUNT mode
}

exit
EOF

echo "Duplication completed at: $(date)"
echo "Database is in MOUNT mode, ready for cutover"

Important: For same-server duplication, add nofilenamecheck. For different servers, omit it.

Step 4: Cutover Process

At your planned cutover time:

#!/bin/bash
# cutover.sh

# 1. Stop application
echo "Stopping application..."
# <your application stop commands>

# 2. Force log switch on source to generate final archive logs
export ORACLE_SID=sourcedb
sqlplus / as sysdba <<EOF
alter system switch logfile;
alter system archive log current;
exit
EOF

echo "Waiting for archive logs to be written..."
sleep 5

# 3. On target server: Apply remaining archive logs
export ORACLE_SID=dupdb

# Option A: If archive logs are accessible via NFS or shared storage
rman target / <<EOF
catalog start with '/mnt/source_archive/';
recover database;
exit
EOF

# Option B: If archive logs were copied to target
rman target / <<EOF
catalog start with '/target/archive/';
recover database;
exit
EOF

# 4. Open duplicate database with RESETLOGS
sqlplus / as sysdba <<EOF
alter database open resetlogs;
exit
EOF

# 5. Open PDBs
sqlplus / as sysdba <<EOF
alter pluggable database all open;
exit
EOF

echo "Cutover completed at: $(date)"

Key Steps Explained:

  1. Force log switch: Ensures all transactions are in archive logs
  2. Catalog archive logs: Makes RMAN aware of new archive logs from source
  3. Recover database: Applies all archive logs up to the latest available
  4. Open resetlogs: Creates new incarnation and completes recovery

Step 5: Verification

-- On duplicate database
ALTER SESSION SET CONTAINER = pdb1;
SELECT COUNT(*) FROM critical_table;

-- Compare with source
-- Counts should match exactly

Key Considerations

Archive Log Management

For large databases with long duplication times:

  • Archive log retention: Ensure sufficient space for archive logs during duplication
  • Archive log destination: Must be accessible from duplicate database
  • Monitoring: Watch archive log generation rate
-- Check archive log space usage
SELECT SUM(blocks * block_size)/1024/1024/1024 AS gb_used
FROM v$archived_log
WHERE first_time > SYSDATE - 1;

Network Bandwidth

For different servers:

  • Bandwidth requirement: ~100-500 MB/s for efficient transfer
  • Compression: Consider using SECTION SIZE for parallel transfer
  • Monitoring: Track network utilization during duplication

Downtime Estimation

Actual downtime = Time to:

  1. Stop application (seconds)
  2. Force log switch (seconds)
  3. Roll forward: Copy, catalog and apply remaining archive logs (60 - 300 seconds)
  4. Open database with RESETLOGS (1-3 minutes for TB databases)
  5. Verify data (1-2 minutes)

Typical downtime: 3-10 minutes regardless of database size

Data Loss Risk

Zero data loss IF:

  • Archive logs are retained and accessible
  • No archive log gaps
  • Duplicate database can access source archive log location

Verify before cutover:

-- Check for archive log gaps
SELECT thread#, sequence#
FROM v$archived_log
WHERE deleted = 'NO'
ORDER BY thread#, sequence#;

Different Server Considerations

Network Configuration

  1. Firewall rules: Open port 1521 between servers
  2. DNS/hosts: Ensure hostname resolution
  3. TNS connectivity: Test with tnsping before duplication

Archive Log Access

Option 1: NFS Mount (Recommended)

# On target server, mount source archive log location
mount source-server:/archive /mnt/source_archive

# In RMAN, Oracle will automatically find logs

Option 2: Copy Archive Logs

# After duplication, before opening
scp source-server:/archive/* /target/archive/

# Then open database

Option 3: Standby Redo Logs (Advanced) Configure standby redo logs for real-time log shipping during duplication.

File Path Mapping

Ensure correct path conversion:

-- Check source paths
SELECT name FROM v$datafile;
SELECT member FROM v$logfile;

-- Use in duplicate command
set db_file_name_convert='/source/path','/target/path'
set log_file_name_convert='/source/path','/target/path'

Finding Recovery SCN

After duplication completes, verify the recovery point:

-- Checkpoint SCN (recovery point)
SELECT checkpoint_change#, 
       TO_CHAR(checkpoint_time, 'YYYY-MM-DD HH24:MI:SS') as checkpoint_time
FROM v$database;

-- Datafile checkpoint SCN
SELECT file#, name, checkpoint_change#,
       TO_CHAR(checkpoint_time, 'YYYY-MM-DD HH24:MI:SS') as checkpoint_time
FROM v$datafile 
ORDER BY file#;

-- Convert SCN to timestamp
SELECT current_scn,
       TO_CHAR(scn_to_timestamp(current_scn), 'YYYY-MM-DD HH24:MI:SS.FF6') as scn_timestamp
FROM v$database;

Monitoring Script

#!/bin/bash
# monitor_duplication.sh

while true; do
  echo "=== $(date) ==="
  
  # Check RMAN progress
  tail -20 $LOG_FILE | grep -E 'channel|datafile|archive'
  
  # Check archive log generation
  sqlplus -s / as sysdba <<EOF
  SELECT COUNT(*) as "Archive logs last hour"
  FROM v\$archived_log
  WHERE first_time > SYSDATE - 1/24;
  exit
EOF
  
  sleep 300  # Check every 5 minutes
done

Troubleshooting

Issue: ORA-01017 during duplication

Cause: Password file mismatch
Solution: Recreate password file on auxiliary with same password as source

Issue: Archive logs not found during OPEN RESETLOGS

Cause: Archive log location not accessible
Solution:

  • Verify archive log destination parameter
  • Check file permissions
  • For different servers, ensure NFS mount or copy logs

Issue: Database opens but PDB not recovered

Cause: PDB archive logs not applied
Solution:

ALTER PLUGGABLE DATABASE pdb1 OPEN;
-- If fails, check alert log for specific archive log needed

Comparison with Alternatives

MethodDowntimeData LossComplexityCost
DUPLICATE + NOOPEN3-10 min0LowEE/SE2
Data Guard Switchover<1 min0MediumEE
Export/ImportHoursNALowEE/SE2
Transportable Tablespaces10-30 minNAMediumEE
GoldenGate<1 min0HighOGG

Conclusion

RMAN's DUPLICATE FROM ACTIVE DATABASE with NOOPEN provides a simple, effective solution for zero data loss migrations:

✓ Zero data loss - All committed transactions recovered
✓ Minimal downtime - 2-5 minutes regardless of database size
✓ Source stays online - No impact during duplication
✓ No additional licensing - Uses standard RMAN features
✓ Works across servers - With proper network configuration

The key is the NOOPEN clause, which keeps the database in MOUNT mode until your planned cutover time, allowing Oracle to automatically apply all remaining archive logs when you open it.

References

  • Oracle Database Backup and Recovery User's Guide
  • MOS Note 452868.1: How to Duplicate a Database Using RMAN
  • MOS Note 1526162.1: RMAN DUPLICATE FROM ACTIVE DATABASE

Tested on Oracle Database 19c and 26ai. Procedure applies to Oracle 11g and above.

Friday, March 20, 2026

Blog Post Series: Kafka to S3 Iceberg Data Lake

 

Overview

This series provides comprehensive guidance on streaming CDC data from Kafka to S3 Iceberg tables, comparing two approaches: MSK Connect with Iceberg Kafka Connect vs Amazon Kinesis Data Firehose.

Blog Posts Created

Blog Post #5: Building a Real-Time Data Lake with MSK Connect + Iceberg

Link: https://www.dbaglobe.com/2026/03/building-real-time-data-lake-kafka-to.html

Implementation: Orders table using MSK Connect

Topics Covered:

  • Apache Iceberg benefits and features
  • Creating Iceberg Kafka Connect custom plugin
  • AWS Glue Catalog setup
  • IAM permissions configuration
  • Connector configuration and deployment
  • Verification and monitoring
  • Advanced features (time travel, schema evolution, partition evolution)
  • Performance optimization
  • Troubleshooting

Key Results:

  • ✅ Latency: 15-30 seconds
  • ✅ Throughput: 8 MB/sec sustained
  • ✅ Native CDC operations (INSERT/UPDATE/DELETE)
  • ✅ Automatic schema evolution
  • ⚠️ Cost: ~$320/month

Best For:

  • Real-time analytics (< 1 minute latency)
  • High throughput (> 5 MB/sec)
  • Complex CDC operations
  • Exactly-once semantics

Blog Post #6: Streaming CDC Data to S3 Iceberg with Kinesis Firehose

Link: https://www.dbaglobe.com/2026/03/streaming-kafka-cdc-to-s3-iceberg-with.html

Implementation: Customers table using Firehose

Topics Covered:

  • Kinesis Data Firehose benefits
  • Lambda transformation for CDC events
  • IAM permissions setup
  • Firehose delivery stream configuration
  • Soft delete pattern implementation
  • Periodic compaction strategy
  • Monitoring and optimization
  • Cost analysis

Key Results:

  • ✅ Latency: 5-7 minutes
  • ✅ Throughput: 2 MB/sec
  • ✅ Cost: ~$6/month (98% cheaper than MSK Connect)
  • ⚠️ Soft deletes require compaction
  • ⚠️ Schema changes need Lambda updates

Best For:

  • Cost-sensitive projects
  • Serverless architecture
  • Moderate throughput (< 5 MB/sec)
  • Simple CDC patterns (mostly inserts)

Blog Post #7: MSK Connect vs Firehose - Comprehensive Comparison

Link: https://www.dbaglobe.com/2026/03/kafka-to-s3-iceberg-msk-connect-vs.html

Topics Covered:

  • Detailed feature comparison
  • Setup complexity analysis
  • Performance benchmarks
  • Cost comparison (100 GB and 1 TB scenarios)
  • CDC operation support
  • Use case suitability
  • Real-world implementation results
  • Decision matrix
  • Hybrid approach recommendation
  • Migration paths
  • Best practices for both approaches

Key Findings:

Cost Comparison (100 GB/month):

  • MSK Connect: $324/month
  • Firehose: $6/month
  • Savings: 98%

Latency Comparison:

  • MSK Connect: 10-30 seconds
  • Firehose: 1-5 minutes

Recommendation: Hybrid approach

  • Use MSK Connect for critical, high-frequency tables
  • Use Firehose for reference data and append-mostly tables

Complete Architecture

┌─────────────────────┐
│   PostgreSQL        │
│   (RDS/Aurora)      │
└──────────┬──────────┘
           │
           │ Logical Replication
           ▼
┌─────────────────────┐
│   Debezium CDC      │
│   (MSK Connect)     │
└──────────┬──────────┘
           │
           │ CDC Events
           ▼
┌─────────────────────┐
│   Kafka Topics      │
│   (Amazon MSK)      │
└──────────┬──────────┘
           │
           ├─────────────────────┐
           │                     │
           ▼                     ▼
┌──────────────────┐   ┌──────────────────┐
│  Iceberg Sink    │   │  Kinesis         │
│  (MSK Connect)   │   │  Firehose        │
│                  │   │  + Lambda        │
│  • orders        │   │  • customers     │
│  • transactions  │   │  • products      │
└────────┬─────────┘   └────────┬─────────┘
         │                      │
         └──────────┬───────────┘
                    │
                    ▼
         ┌─────────────────────┐
         │  S3 Iceberg Tables  │
         │  (AWS Glue Catalog) │
         └──────────┬──────────┘
                    │
                    ▼
         ┌─────────────────────┐
         │  Query Engines      │
         │  • Athena           │
         │  • Spark            │
         │  • Trino            │
         └─────────────────────┘

Comparison Summary

Setup Complexity

AspectMSK ConnectFirehoseWinner
Setup Time45 minutes15 minutes✅ Firehose
ConfigurationComplexSimple✅ Firehose
Learning CurveSteepModerate✅ Firehose

Performance

MetricMSK ConnectFirehoseWinner
Latency10-30 sec1-5 min✅ MSK Connect
ThroughputUnlimited5 MB/sec✅ MSK Connect
Real-timeYesNo✅ MSK Connect

Cost (100 GB/month)

ComponentMSK ConnectFirehoseSavings
Total$324$698%

CDC Operations

OperationMSK ConnectFirehoseWinner
INSERTNativeVia Lambda✅ MSK Connect
UPDATENative upsertSoft update✅ MSK Connect
DELETENative deleteSoft delete✅ MSK Connect

Decision Framework

Use MSK Connect When:

✅ Real-time latency required (< 1 minute)
✅ High throughput needed (> 5 MB/sec)
✅ Complex CDC operations (native upserts/deletes)
✅ Exactly-once semantics required
✅ Automatic schema evolution needed

Example Tables: orders, transactions, inventory, real-time events

Use Firehose When:

✅ Cost is primary concern
✅ Serverless architecture preferred
✅ Moderate throughput (< 5 MB/sec)
✅ Simple CDC patterns (mostly inserts)
✅ Latency of 1-5 minutes acceptable

Example Tables: customers, products, categories, audit_logs

Strategy: Use both approaches based on table characteristics

Implementation:

  1. Default to Firehose for cost savings (98% cheaper)
  2. Migrate to MSK Connect only when needed:
    • Real-time requirements
    • High throughput
    • Complex CDC operations

Benefits:

  • Optimize cost (Firehose where possible)
  • Maintain performance (MSK Connect where needed)
  • Reduce operational complexity
  • Flexibility per table

Real-World Results

Orders Table (MSK Connect)

Configuration: 2 MCU, 2 workers, 5-min commit
Results:
  - Latency: 15-30 seconds
  - Throughput: 8 MB/sec
  - Native CDC: ✅
  - Cost: $320/month

Customers Table (Firehose)

Configuration: 128 MB buffer, 5-min interval, Lambda 256 MB
Results:
  - Latency: 5-7 minutes
  - Throughput: 2 MB/sec
  - Soft deletes: ⚠️ (requires compaction)
  - Cost: $6/month

Key Takeaways

  1. Cost vs Performance Trade-off:

    • Firehose: 98% cheaper but 10x higher latency
    • MSK Connect: Real-time but 50x more expensive
  2. CDC Operation Support:

    • MSK Connect: Native upserts/deletes
    • Firehose: Soft deletes + compaction jobs
  3. Operational Complexity:

    • Firehose: Fully managed, serverless
    • MSK Connect: Requires worker management
  4. Hybrid Approach Best:

    • Use Firehose as default
    • Use MSK Connect for critical tables
    • Optimize cost/performance balance

Migration Paths

Starting with Firehose:

  1. Implement all tables with Firehose
  2. Monitor latency and CDC requirements
  3. Migrate high-priority tables to MSK Connect
  4. Keep low-priority tables on Firehose

Starting with MSK Connect:

  1. Implement critical tables with MSK Connect
  2. Monitor costs and usage patterns
  3. Migrate low-priority tables to Firehose
  4. Optimize cost/performance balance

Best Practices

For MSK Connect:

  • Right-size workers (start with 1 MCU × 1 worker)
  • Tune commit interval (5-10 minutes)
  • Monitor consumer lag
  • Use date partitioning
  • Enable Iceberg compaction

For Firehose:

  • Optimize buffer (3-5 minutes)
  • Keep Lambda transformation simple
  • Implement soft delete pattern
  • Schedule periodic compaction
  • Monitor error prefix in S3

Cost Optimization

At Different Scales:

100 GB/month:

  • MSK Connect: $324
  • Firehose: $6
  • Savings: $318 (98%)

1 TB/month:

  • MSK Connect: $373
  • Firehose: $52
  • Savings: $321 (86%)

10 TB/month:

  • MSK Connect: $500 (scale workers)
  • Firehose: $290
  • Savings: $210 (42%)

Recommendation: Firehose becomes less cost-effective at very high scale (> 10 TB/month)

Target Audience

  • Data Engineers: Building data lake pipelines
  • Solution Architects: Designing CDC architectures
  • DevOps Engineers: Operating data infrastructure
  • Data Platform Teams: Choosing technologies

Prerequisites

Readers should have completed:

  • Blog Posts #1-4 (Debezium CDC setup)
  • Understanding of Kafka and CDC concepts
  • AWS experience (MSK, S3, Glue, Lambda)
  • Basic SQL and Python knowledge

Conclusion

This blog post series provides a complete, production-ready guide to streaming CDC data from Kafka to S3 Iceberg tables. By implementing both MSK Connect and Firehose approaches, we provide real-world comparison data to help readers make informed decisions.

Key Insight: There's no one-size-fits-all solution. The hybrid approach—using Firehose as the default and MSK Connect for critical tables—provides the optimal balance of cost, performance, and operational simplicity.

These posts will help readers:

  • Understand both approaches deeply
  • Make data-driven technology choices
  • Implement production-ready solutions
  • Optimize cost and performance
  • Avoid common pitfalls

Resources