Automatic Failback of a Service in a Oracle-19c-RAC-Database

2019-05-29 Off By Markus Flechtner

High-availability of database services has been a feature of Oracle Real Application Servers since many versions. Bascially, when a database instance fails, a service which has got this instance as a preferred instance, fails over to another available instance. Unfortunately, the service did not fail back to the original instance as soon as the instance is up again. The administrator had to relocate the service. This has changed with Oracle Database 19c.

I’m running a three node Oracle 19c cluster. Both Grid Infrastructure and RDBMS are Oracle 19.3.0.0.0. An administrator managed database is running on all nodes:

oracle@green:~/ [RAC1] srvctl status database -db RACDB
Instance RAC1 is running on node green
Instance RAC2 is running on node white
Instance RAC3 is running on node red

Let’s create a simple service for this database:

oracle@green:~/ [RAC1] srvctl add service -db RACDB -service FMATEST -preferred RAC2 -available RAC1 -failback YES

Please note the new option “-failback YES”. This will make the service fail back to the original instance (in my case “RAC2”) . The default is “NO”, i.e. Oracle will keep the old behaviour by default.

oracle@green:~/ [RAC1] srvctl start service -db RACDB -service FMATEST
oracle@green:~/ [RAC1] srvctl status service -db RACDB -service FMATEST
Service FMATEST is running on instance(s) RAC2
oracle@green:~/ [RAC1] srvctl config service -db RACDB -service FMATEST

Service name: FMATEST
Server pool:
Cardinality: 1
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type:
Failover method:
Failover retries:
Failover delay:
Failover restore: NONE
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Pluggable database name:
Hub service:
Maximum lag time: ANY
SQL Translation Profile:
Retention: 86400 seconds
Failback :  true
Replay Initiation Time: 300 seconds
Drain timeout:
Stop option:
Session State Consistency: DYNAMIC
GSM Flags: 0
Service is enabled
Preferred instances: RAC2
Available instances: RAC1
CSS critical: no
Service uses Java: false

The line “Failback: true” shows, that failback is configured. Unfortunately, there is no line “Failback: false” if failback is not configured.

Let’s reboot node white (which hosts instance RAC2) in another session and see what happens:

When the instance is down on node white, the service is started on node green (instance RAC2). This is the expected, well known behaviour):

oracle@green:~/ [RAC1] srvctl status service -db RACDB -service FMATEST
Service FMATEST is running on instance(s) RAC1

It takes some time for node white to reboot and to start all the Grid Infrastructure stuff, but after some time – without intervention of the DBA:

oracle@green:~/ [RAC1] srvctl status service -db RACDB -service FMATEST
Service FMATEST is running on instance(s) RAC2

The service is back on instance RAC2 again.

To sum up, a feature which has been expected by RAC administrators for years, finally was implemented in Oracle 19c. And it works fine.