Archive for April, 2016

Deferrable primary/unique keys constraints still might result in duplicate values due to updates (even in recent 12c)

Tuesday, April 19th, 2016

I’m still surprised that Oracle PSU (January 2016) is apparently still bugged by old zombie bug of the past .

BTW: On most recent PostgreSQL 9.4 series (9.4.6) it works OK (fails with error).

Oracle Secure Backup/volume corruption due to tape auto-rewinding on /dev/st* access

Thursday, April 14th, 2016

We’ve got interesting problem with RMAN+OSB 10.4 that resulted in observation that we were literally loosing backups. Co-worker discovered that our in-house solution for automatic Oracle database restores was sometimes failing due to missing backup pieces, later he nailed down the problem using RMAN’s crosschecks and discovered that backup pieces were in EXPIRED state without any pattern. Automatic restores most of the times worked due to RMAN failovers to previous backup, but with archivelog the situation was worse. We then tried to discover what can potentially cause it (across two DCs!) and somehow it was discovered that when we disabled Puppet and monitoring the problem magically disappeared. Generally speaking I do not like magic on our systems, so therefore I’ve used some spare time to explain the magic.

After analyzing some events and theories the problem was nailed down to corruption of running backups due to monitoring scripts (customized Zabbix agent with add-on scripts all of it managed by Puppet). One of the scripts contained lshw executions which under debugging session revealed opening /dev/st[0-9]+ tape devices via simple open(2) syscall. The man 4 st (Linux’s kernel SCSI tape device) describes the st device with the following characteristics:

Each device uses eight minor device numbers. The lowermost five bits in the minor numbers are assigned sequentially in the order of detection. In the 2.6 kernel, the bits above the eight lowermost bits are concatenated to the five lowermost bits to form the tape number. The minor numbers can be grouped into two sets of four numbers: the principal (auto-rewind) minor device numbers, n, and the “no-rewind” device numbers, (n + 128). Devices opened using the principal device number will be sent a REWIND command when they are closed. Devices opened using the “no-rewind” device number will not (..)

In order to test the theory I’ve performed set of simple experiments that was like this:

  1. Perform 1st single RMAN datafile backup, verify it’s integrity
  2. Perform 2nd single RMAN datafile backup, just to verify it hits the same OSB backup label (volume, barcode
  3. While no backups were running, execute lshw several times
  4. Perform 3rd single RMAN datafile backup, verify it’s integrity
  5. Start 4th single RMAN datafile backup, and it parallel start executing lshw
  6. Wait till 4th datafile backup finishes
  7. Perform 5th single RMAN datafile backup, verify it’s integrity

The test proved that backups #1-3 were OK, however execution of backup #5 raised my eyebrows.

oracle/12215.1: 2016/04/13.14:10:24 Info: original mount data for tape2 don’t match those read
oracle/12215.1: 2016/04/13.14:10:24 from the tape.
oracle/12215.1: 2016/04/13.14:10:24 (amh) qdv__automount_in_mh entered
oracle/12215.1: 2016/04/13.14:10:24 (amh) qdv__automount_in_mh tape2 at 2016/04/13.14:10:24, flags 0×100
oracle/12215.1: 2016/04/13.14:10:24 (amh) mount volume options list contains:
oracle/12215.1: 2016/04/13.14:10:24 (amh) vtype 3 (app), vid (null), vs_create 0, family XYZ-TE4-BACKUP, retain (null), size 0, mediainfo 2, scratch 0
oracle/12215.1: 2016/04/13.14:10:28 (amh) don’t preserve previous mh automount state
oracle/12215.1: 2016/04/13.14:10:31 (gep) getting reservation for element 0×4 (dte)
oracle/12215.1: 2016/04/13.14:10:31 (amh) pass 0 – trying to automount loaded volume
oracle/12215.1: 2016/04/13.14:10:31 (atv) qdv__automount_this_vol entered
oracle/12215.1: 2016/04/13.14:10:32 (atv) calling qdv__mount
oracle/12215.1: 2016/04/13.14:10:32 (mt) qdv__read_mount_db() succeeded, found vol_oid 0
oracle/12215.1: 2016/04/13.14:10:32 (mt) qdv__read_label() of 1048576 bytes failed – not at a tar file header (OB device mgr)
oracle/12215.1: 2016/04/13.14:10:32 (mt) qdb__label_event() returned vol_oid 839
oracle/12215.1: 2016/04/13.14:10:32 (mt) setting vol_oid in mount_info to 839
oracle/12215.1: 2016/04/13.14:10:32 (atv) qdv__mount succeeded
oracle/12215.1: 2016/04/13.14:10:32 (atv) qdv__automount_this_vol exited
oracle/12215.1: 2016/04/13.14:10:32 (amh) automount of loaded volume succeeded
oracle/12215.1: 2016/04/13.14:10:32 (amh) end of automount at 2016/04/13.14:10:32 (0×0)
oracle/12215.1: 2016/04/13.14:10:32 (amh) returning from qdv__automount_in_mh
oracle/12215.1: 2016/04/13.14:10:32 Info: volume in tape2 is usable for this operation.
oracle/12215.1: 2016/04/13.14:10:32 It must be overwritten at BOT.

oracle/12215.1: 2016/04/13.14:10:32 (pvfw) at BOT
oracle/12215.1: 2016/04/13.14:10:40 (pvfw) mounted ok
oracle/12215.1: 2016/04/13.14:10:40 (pvfw) pos_vol_cleanup not returning pstate
oracle/12215.1: Volume label:
oracle/12215.1: Volume tag: XYZ034L3E
oracle/12215.1: Volume UUID: ec287b98-e38d-1033-a482-e839352cd95c
oracle/12215.1: Volume ID: XYZ-TE4-BACKUP-000116

What’ was interesting was that there was some kind of desync between what OSB was expecting on the tape (obtar format) and what it got, then it proceeded directly into relabeling current tape in drive (XYZ-TE4-BACKUP-000115) to XYZ-TE4-BACKUP-000116 loosing all previous backup pieces on that volume with the same barcode. Any attempt to restore anything from XYZ-TE4-BACKUP-000115 failed from now on.

So it is clear that lshw opening /dev/st[0-9]+ devices even in read-only caused rewinding tape during backing up to that tape(!). The actual error messages may vary e.g. when tape is relabeled from different OSB media-family.

Testing revealed even more interesting testing results on different system (OL 6.7 with mhVTL to simulate SCSI tapes):

[root@db3 ~]# lshw -version
[root@db3 ~]# strace -ffe open /usr/sbin/lshw 2>&1 | grep /dev/st
[root@db3 ~]#

but afterwards after uninstaling lshw B02.17 (installed earlier from standard yum) and installing our older lshw B02.14 RPM (for OL5.11)

[root@db3 ~]# lshw -version
the latest version is B.02.17
[root@db3 ~]# strace -ffe open /usr/sbin/lshw 2>&1 | grep /dev/st
open("/dev/st0", O_RDONLY|O_NONBLOCK)   = 3
open("/dev/st0a", O_RDONLY|O_NONBLOCK)  = 3

So apparently there was bug in auto-rewind /dev/st[0-9]+ bug in lshw that was fixed. Actually lshw change-log didn’t clearly mention that (there was similar problem in years ago, however comparing source code of between those lshw version reveals:

diff -r -u B.02.14/ B.02.17/
--- B.02.14/     2008-05-07 23:23:31.000000000 +0200
+++ B.02.17/     2013-03-13 21:52:25.000000000 +0100
@@ -29,7 +29,7 @@

 __ID("@(#) $Id$");

-#define SG_X "/dev/sg%d"
+#define SG_X "/dev/sg*"
 #define SG_MAJOR 21

@@ -195,10 +195,10 @@
-  "/dev/st*", <-- our bugfix!!!
-  "/dev/tape*",
+  "/dev/ntape*",
+  "/dev/nrtape*",

.. and the actually the code in older lshw with open() looks like this:

static void scan_devices()
  int fd = -1;
  int i = 0;
  size_t j = 0;
  My_scsi_idlun m_idlun;

  for (i = 0; devices[i] != NULL; i++)
    glob_t entries;

    if(glob(devices[i], 0, NULL, &entries) == 0)
      for(j=0; j < entries.gl_pathc; j++)
        fd = open(entries.gl_pathv[j], O_RDONLY | O_NONBLOCK);

So the solution is fairly simple: upgrade your lshw RPM packge. Interesting question is which scsi/tape-related tools are safe and which are not. On OL 6.x the safe list contains at least those: lsscsi, obtool lsdev/dumpdev/lsvol/etc are safe, hpacucli (HP raid monitoring tool), sg_scan (sg3_utils)

But those might be not: obtool lsdev -ldg, sg_scan, mt -f /dev/st0 status, mtx, sg_inq.