Monday, 31 December 2018

Backing Up and Restoring Cloudera Search (Oracle BDA)

Introduction

Recently I had a scenario to take solr backup in Oracle big data appliance as part of BDA upgrade and there could be multiple scenarios where we need to backup solr data to local or remote cluster. Documented set of commands and steps which are performed for backing up solr data to remote cluster. More details can be found in the reference cloudera links. 

create the backup folder and give solr user access in the source

 

# sudo -u hdfs hdfs dfs -mkdir /solr-backups
# sudo -u hdfs hdfs dfs -chown solr:solr /solr-backups 

Create snapshot


$ solrctl collection --create-snapshot <snapshotName> -c <collectionName>
# sudo -u solr solrctl collection --create-snapshot  dic_snapshot_311218 -c dic (Example)

Verify created snapshot


$ solr solrctl collection --list-snapshots <collectionName>
# sudo -u solr solrctl collection --list-snapshots dic (Example) 

Create backup directory in the target cluster

# sudo -u hdfs hdfs dfs -mkdir /solr-backups
# sudo -u hdfs hdfs dfs -chown solr:solr /solr-backups

Prepare snapshot for export to a remote cluster


$ solrctl collection --prepare-snapshot-export <snapshotName> -c <collectionName> -d <destDir>
# sudo -u solr solrctl collection --prepare-snapshot-export dic_snapshot_311218 -c dic -d /solr-backups (Example)

Export the snapshot to remote cluster

The snapshot needs to be exported from source to target cluster and the commands will be executed on the source.

$ solrctl collection --export-snapshot <snapshotName> -s <sourceDir> -d <protocol>://<namenode>:<port>/<destDir>
# sudo -u hdfs solrctl collection --export-snapshot dic_snapshot_311218 -s  /solr-backups -d webhdfs://target_cluster_name_node:50070//solr-backups (Examples)

Restore the collection from snapshot in the target


$ solrctl collection --restore <restoreCollectionName> -l <backupLocation> -b <snapshotName> -i <requestId>
# sudo -u solr solrctl collection --restore dic -l /solr-backups -b dic_snapshot_311218 -i dic_snapshot_3112181 (Example)
Here  requestId is simply an identifier  which can be used later for monitoring.

Monitor status of restore


Below command can be used to monitor progress of restoration.

$ solrctl collection --request-status <requestId>
# sudo -u solr solrctl collection --request-status dic_snapshot_311218 (Example)


References

https://www.cloudera.com/documentation/enterprise/5-11-x/topics/search_backup_restore.html
https://www.cloudera.com/documentation/enterprise/5-11-x/topics/cdh_admin_distcp_data_cluster_migrate.html