Introduction
Recently I had a scenario to take solr backup in Oracle big data appliance as part of BDA upgrade and there could be multiple scenarios where we need to backup solr data to local or remote cluster. Documented set of commands and steps which are performed for backing up solr data to remote cluster. More details can be found in the reference cloudera links.
create the backup folder and give solr user access in the source
# sudo -u hdfs hdfs dfs -mkdir /solr-backups
# sudo -u hdfs hdfs dfs -chown solr:solr /solr-backups
Create snapshot
$ solrctl collection --create-snapshot <snapshotName> -c <collectionName>
# sudo -u solr solrctl collection --create-snapshot dic_snapshot_311218 -c dic (Example)
Verify created snapshot
$ solr solrctl collection --list-snapshots <collectionName>
# sudo -u solr solrctl collection --list-snapshots dic (Example)
Create backup directory in the target cluster
# sudo -u hdfs hdfs dfs -mkdir /solr-backups
# sudo -u hdfs hdfs dfs -chown solr:solr /solr-backups
# sudo -u hdfs hdfs dfs -chown solr:solr /solr-backups
Prepare snapshot for export to a remote cluster
$ solrctl collection --prepare-snapshot-export <snapshotName> -c <collectionName> -d <destDir>
# sudo -u solr solrctl collection --prepare-snapshot-export dic_snapshot_311218 -c dic -d /solr-backups (Example)
Export the snapshot to remote cluster
The snapshot needs to be exported from source to target cluster and the commands will be executed on the source.
$ solrctl collection --export-snapshot <snapshotName> -s <sourceDir> -d <protocol>://<namenode>:<port>/<destDir>
# sudo -u hdfs solrctl collection --export-snapshot dic_snapshot_311218 -s /solr-backups -d webhdfs://target_cluster_name_node:50070//solr-backups (Examples)
Restore the collection from snapshot in the target
$ solrctl collection --restore <restoreCollectionName> -l <backupLocation> -b <snapshotName> -i <requestId>
# sudo -u solr solrctl collection --restore dic -l /solr-backups -b dic_snapshot_311218 -i dic_snapshot_3112181 (Example)
Here requestId is simply an identifier which can be used later for monitoring.
Here requestId is simply an identifier which can be used later for monitoring.
Monitor status of restore
Below command can be used to monitor progress of restoration.
$ solrctl collection --request-status <requestId>
# sudo -u solr solrctl collection --request-status dic_snapshot_311218 (Example)
References
https://www.cloudera.com/documentation/enterprise/5-11-x/topics/search_backup_restore.html
https://www.cloudera.com/documentation/enterprise/5-11-x/topics/cdh_admin_distcp_data_cluster_migrate.html
https://www.cloudera.com/documentation/enterprise/5-11-x/topics/cdh_admin_distcp_data_cluster_migrate.html