How to do Failover [BCP]

How to do Failover BCP

DC1

Data From Applications - /databus/data/...
Local stream Path - /databus/streams_local/example_stream/...
Merged Stream Path - /databus/streams/example_stream/...

DC2

Data From Applications - /databus/data/..
Local stream Path - /databus/streams_local/example_stream/...
Mirror Stream Path - /databus/streams/example_stream/...

##Setup In the following scenarios we run DATABUS in two data centers(dc1, dc2) on two hadoop clusters. dc1 and dc2 both may have some primary streams and mirror streams.

Scenario - Fails/Data Center(DC1) is down for maintenance

Steps for HOT FAILOVER to DC2 when DC1 goes down for maintenance

Make all collectors across all D.C.'s spool in local disk [Can be achieved by having wrong config in collector]
Wait till all data for Primary streams(DC1) is mirrored on their respective mirror clusters. [Eg: if dc2 is mirror for a certain stream then check /databus/system/mirrors/[dc2], /databus/system/consumers/[dc1], /databus/local_streams/.... should not have any new data. Also check databus.log on DC1 cluster]
Stop Databus in all Clusters.
Stop Hadoop in DC1 for maintenance.
Change databus Config and make DC2 primary destination and DC1 as mirror for all streams which had DC1 as primary. Also DC1 shouldn't be the source of any stream. [This config should be change across all databus instances which interact with DC1]
Start databus in DC2 after this change.
Configure all the collectors in DC2 to not spool anymore.
Move all collector traffic from DC1 to DC2[Fix collector config and rehup for it to reload the config]. If you use any IP white listing at collector level ensure DC1's collector's IP addresses are white-listed in DC2 collectors.
Check all spools on DC1 collector's should be empty. Send empty messages to those collectors having spool data for triggering despooling.

Now all traffic from DC1 and DC2 applications is coming to DC2 collectors and all data is available on DC2 HDFS.

Steps to bring back DC1 cluster.

Stop databus in DC2
Check if #files in DC2 cluster path "/databus/system/mirrors/[dc1]" is greater than 60 then run databus.sh collapse hdfs://dc2namenode:port /databus/system/mirrors/DC1/ [Optimization - to reduce bootstrap time]
Start databus in DC1 [It pulls/bootstraps all data which got missed during maintenance]
Wait till all data is pulled. [ /databus/system/mirrors/[dc1] should be empty on DC2]
Stop databus in DC1.
Revert the config back to original. DC1 is primary and DC2 is mirror/secondary [ Again change it across all databus instances which interact with DC1]
Start databus in DC1 and DC2.
Move all traffic in DC1 apps back to DC1 collectors.

Now we are back to original setup prior to BCP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do Failover [BCP]

DC1

DC2

Scenario - Fails/Data Center(DC1) is down for maintenance

Steps for HOT FAILOVER to DC2 when DC1 goes down for maintenance

Steps to bring back DC1 cluster.

Clone this wiki locally