Bug/ORA-00060 deadlock during concurrent Drop Table (purge) and INSERT

I’ve just found a rather new and not known bug during analysis for one of hangs in production system. This research ended up in full reproducible test case against the following Oracle versions on x86_64:

  • 11.2.0.3.8
  • 11.2.0.4.0
  • 12.1.0.1.0

First, let’s prepare the experimental schema to trigger this bug, we create TRANS table and the CHILD one, the target objective is to drop CHILD table from the system to release space:

drop table trans CASCADE CONSTRAINTS purge;
drop table child CASCADE CONSTRAINTS purge;
drop sequence trans_seq;
drop sequence child_seq;

CREATE TABLE TRANS (
  ID NUMBER(10,0) NOT NULL PRIMARY KEY,
  CDATE DATE NOT NULL,
  TXT VARCHAR2(64)
) PARTITION BY RANGE (cdate) INTERVAL (NUMTOYMINTERVAL(1,'MONTH'))
(
   PARTITION p1 values LESS THAN (TO_DATE('01-NOV-2010','DD-MON-YYYY'))
);

CREATE TABLE CHILD (
  CHILD_ID NUMBER(10,0) NOT NULL PRIMARY KEY,
  TRANS_ID NUMBER(10,0) NOT NULL ENABLE,
  TXT VARCHAR2(64),
  CONSTRAINT "TRANS_ID_FK" FOREIGN KEY (TRANS_ID) REFERENCES TRANS (ID) ENABLE
);

create sequence trans_seq start with 1 increment by 1 cache 1000;
create sequence child_seq start with 269282 increment by 1 cache 1000;

create or replace procedure generate_load as
  i number;
begin
  for i in 1 .. 50000 loop
    insert into trans (id, cdate) values (trans_seq.nextval, to_date(trunc(sysdate-i)));
    commit write wait;
  end loop;
end;
/

exec generate_load;

-- load child with some vaules from parent (trans)
insert into child (child_id, trans_id) select child_seq.nextval, id from trans;
commit;

The trick is that CHILD has referential constraint to the TRANS table. Additional trick is that there are many (3-4) concurrent sessions doing concurrent simple INSERTs to that table followed immediately by COMMIT WRITE WAIT in PL/SQL loop. OK so let’s start our generator:

[oracle@t10050 fk_drop_table_locks]$ ./gen.sh
Mon Mar 24 14:44:50 CET 2014
Launching 3 workers with load.sql
...waiting to finish

When we attempt now to drop our CHILD table using this (what has been really performed by production DBA some time ago in our case):

alter session set ddl_lock_timeout=15;
drop table child purge;

In the experiment we have end up in INSERT sessions throwing famous ORA-00060 errors. You might wonder

BEGIN generate_load; END;
BEGIN generate_load; END;

*
*
ERROR at line 1:
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "DBA1.GENERATE_LOAD", line 5
ORA-06512: at "DBA1.GENERATE_LOAD", line 5
ORA-06512: at line 1
ORA-06512: at line 1

Elapsed: 00:00:15.06
Elapsed: 00:00:15.06
BEGIN generate_load; END;

*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "DBA1.GENERATE_LOAD", line 5
ORA-06512: at line 1

Elapsed: 00:00:15.10

Of course alert log will be full of deadlock errors and the sample single session deadlock graph looks similiar:

*** 2014-03-24 14:25:23.056
DEADLOCK DETECTED ( ORA-00060 )

[Transaction Deadlock]

The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TM-0000a079-00000000        20     138    SX             24       6           S
TM-0000a07c-00000000        24       6     X             20     138          SX

session 138: DID 0001-0014-00000BF2     session 6: DID 0001-0018-00000055
session 6: DID 0001-0018-00000055       session 138: DID 0001-0014-00000BF2 

Rows waited on:
  Session 138: obj - rowid = 0000A07C - AAAAAAAAAAAAAAAAAA
  (dictionary objn - 41084, file - 0, block - 0, slot - 0)
  Session 6: obj - rowid = 0000A079 - AAAAAAAAAAAAAAAAAA
  (dictionary objn - 41081, file - 0, block - 0, slot - 0)

From this we can say that two sessions {138,6} endup in rivilaization for single two TM enqueues but in incompatibile way for two tables. Session 138 was holding SX lock on TM enqueue for table 0000A079 (TRANS) – i.e. was protecting the table in shared exclusvie mode, because wanted to insert – and in parallel to that, session 6 was holding X lock on TM enqueue for table 0000a07c (CHILD) – i.e. was holding exclusive access to whole table , so it was that tried to DROP it. In the next processing cycles:

  • session 6 (the one with DROP TABLE PURGE) tried to accquire shared lock S on table TRANS due tue to FK constraint, and started waiting for it (S and SX are not compatibile according to MetaLink note Id Detecting and Resolving Locking Conflicts and Ora-00060 errors (Doc ID 15476.1))
  • session 138 (the one doing INSERT at the time) tried to accquire SX on the enqueue TM for table CHILD, but couldnt get it as CHILD has been eXclusivley locked by session 6
  • in the several seconds automatic DeadLock detection kicks in and kills sessions performing INSERTs causing downtime/error for application

Upon further investigation together with my DBA coworker Andrzej Nowicki and short trials and errors today, we’ve came to the conclusion that the main culprint of this error is the PURGE clause of the DROP TABLE statement. This has several interesting repercusions:

  • “DROP TABLE X PURGE” vs “DROP TABLE X” are technically two different things and the former actually does NOT results in bug.
  • Oracle parameter responsible for recyclebin affects whether “DROP TABLE X” results really in “DROP TABLE X PURGE”
  • This case is very similiar to the Bug 11693365: Concurrent Drop table and Select on Reference constraint table hangs (deadlock), however that one is stated to be fixed 12.1.0.1 (Base Release) and 11.2.0.3 (Server Patch Set), however the mentioned here bug is NOT.
  • The workarounds for bug 11693365 seem to also help here (“Disable and drop all the referential integrity constraints before dropping the child table.
    eg: alter table disable constraint ; / alter table drop constraint ; /drop table cascade constraints purge;”)

One Response to “Bug/ORA-00060 deadlock during concurrent Drop Table (purge) and INSERT”

  1. admin says:

    Oracle registered Bug 18662591: ORA-00060 DEADLOCK DURING CONCURRENT DROP TABLE (PURGE) AND INSERT and closed it as not a bug, referencing Bug 173624: CANNOT DROP A CHILD TABLE WITH CONSTRAINT DISABLED AND PARENT WITH DML LOCK as explanation which states this:

    *** 09/09/93 11:22 am ***
    The locking scheme used to update the row cache entries does not allow you
    to drop a child table (even if the constraint is disabled) until it can
    update the parent table row cache entries associated with this constraint.
    It is important to note that disabling a constraint only stops enforcement
    of the constraint, but is does NOT change semantics with respect to keeping
    dependencies between constraint definitions. (e.g., you cannot drop a
    primary key that is referenced by foreign keys).