Monday, 31 December 2018

Backing Up and Restoring Cloudera Search (Oracle BDA)

Introduction

Recently I had a scenario to take solr backup in Oracle big data appliance as part of BDA upgrade and there could be multiple scenarios where we need to backup solr data to local or remote cluster. Documented set of commands and steps which are performed for backing up solr data to remote cluster. More details can be found in the reference cloudera links. 

create the backup folder and give solr user access in the source

 

# sudo -u hdfs hdfs dfs -mkdir /solr-backups
# sudo -u hdfs hdfs dfs -chown solr:solr /solr-backups 

Create snapshot


$ solrctl collection --create-snapshot <snapshotName> -c <collectionName>
# sudo -u solr solrctl collection --create-snapshot  dic_snapshot_311218 -c dic (Example)

Verify created snapshot


$ solr solrctl collection --list-snapshots <collectionName>
# sudo -u solr solrctl collection --list-snapshots dic (Example) 

Create backup directory in the target cluster

# sudo -u hdfs hdfs dfs -mkdir /solr-backups
# sudo -u hdfs hdfs dfs -chown solr:solr /solr-backups

Prepare snapshot for export to a remote cluster


$ solrctl collection --prepare-snapshot-export <snapshotName> -c <collectionName> -d <destDir>
# sudo -u solr solrctl collection --prepare-snapshot-export dic_snapshot_311218 -c dic -d /solr-backups (Example)

Export the snapshot to remote cluster

The snapshot needs to be exported from source to target cluster and the commands will be executed on the source.

$ solrctl collection --export-snapshot <snapshotName> -s <sourceDir> -d <protocol>://<namenode>:<port>/<destDir>
# sudo -u hdfs solrctl collection --export-snapshot dic_snapshot_311218 -s  /solr-backups -d webhdfs://target_cluster_name_node:50070//solr-backups (Examples)

Restore the collection from snapshot in the target


$ solrctl collection --restore <restoreCollectionName> -l <backupLocation> -b <snapshotName> -i <requestId>
# sudo -u solr solrctl collection --restore dic -l /solr-backups -b dic_snapshot_311218 -i dic_snapshot_3112181 (Example)
Here  requestId is simply an identifier  which can be used later for monitoring.

Monitor status of restore


Below command can be used to monitor progress of restoration.

$ solrctl collection --request-status <requestId>
# sudo -u solr solrctl collection --request-status dic_snapshot_311218 (Example)


References

https://www.cloudera.com/documentation/enterprise/5-11-x/topics/search_backup_restore.html
https://www.cloudera.com/documentation/enterprise/5-11-x/topics/cdh_admin_distcp_data_cluster_migrate.html


Monday, 19 February 2018

ORA-00600: internal error code After switchover to standby and Switching Back to primary

Introduction

We were observing ORA-00600: internal error code after performing dataguard switchover to standby and switch back to primary.

Details

The scenario here after performing dataguard switchover to standby and switch back to primary application team started complaining and checking database alert log we noticied ORA-00600: internal errors.
ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [], [], [], [], []
 

DBVerify on the datafile reported below error,

[oraprd@xla_idx]$ cat xxfah_xla_idx.3176.966695155.log
DBVERIFY: Release 12.1.0.2.0 - Production on Mon Feb 19 07:38:12 2018
Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved
DBVERIFY - Verification starting : FILE = +DATAC1/EBSPRD/DATAFILE/xxfah_xla_idx.3176.966695155itl[19] has higher commit scn(0x0968.cd0a7607) than block scn (0x0968.a58866e1)Page 2476267 failed with check code 6056

DBVERIFY - Verification complete

Total Pages Examined         : 4194176 =
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 4185059
Total Pages Failing   (Index): 1
Total Pages Processed (Other): 4252
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 4865
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 0 (0.0)
[oraprd@xla_idx]$

RMAN cannot detect this corruption.
This actually not Data Corruption.
And this can happen in other databases also.

Solution

We need to put  below work around and apply patch

alter system set "_ktb_debug_flags"=8 scope=both sid='*';
After settting this parameter, it will self heal whenever index is updated.

To avoid future errors, we need to apply patch  22241601 ORA-600 [KDSGRP1] IN ADG AFTER FAILOVER

References

ALERT Bug 22241601 ORA-600 [kdsgrp1] / ORA-1555 / ORA-600 [ktbdchk1: bad dscn] / ORA-600 [2663] due to Invalid Commit SCN in INDEX (Doc ID 1608167.1)
On Standby, DBV Shows Pages Failing With Check Code 6056 (Doc ID 1523623.1)
ORA-1555 Zero Duration/ ORA-600 [ktbdchk1: bad dscn] / ORA-600 [2663] / after Physical Standby switchover (Doc ID 1577824.1)