SELECT Query Kills Logins on Oracle

Mikah B.

Member
Yesterday I was working on a query to summarize some data. It pulls data from 2 fairly static tables, each containing around 70k rows. As soon as I Executed the query - almost instantly - I got an "End of file on communication channel..." error. That is followed by a "Not connected to Oracle" error. So, then I try to log back in - and can't. Turns out that nobody else can either. The DB runs fine for those who are in, but nobody can log in. Oracle shows no errors in any logs, and the only way we can get it back is to stop and restart Oracle. I'll post the SQL, can't get the explain plan because that wipes the DB too. Has anybody ever seen anything like this? BTW - Oracle version 9.2.0.4.0 and the 9.2 client also. Current version of PL/SQL Dev.

select a.session_level,
a.session_id,
a.parent_session_id,
scans.action,
scans.num_scans
from inventory_sessions a left outer join (
select session_id,
decode(sc.perform_add,null,'0','1') ||
decode(sc.perform_activate,null,'0','1') ||
decode(sc.perform_transfer,null,'0','1') ||
decode(sc.perform_relocate,null,'0','1') ||
decode(sc.perform_missing,null,'0','1') ||
decode(sc.perform_none,null,'0','1') as ACTION,
count(*) as num_scans
from inventory_scans sc
where record_status not in ('X','O')
group by session_id,
decode(sc.perform_add,null,'0','1') ||
decode(sc.perform_activate,null,'0','1') ||
decode(sc.perform_transfer,null,'0','1') ||
decode(sc.perform_relocate,null,'0','1') ||
decode(sc.perform_missing,null,'0','1') ||
decode(sc.perform_none,null,'0','1')) scans on (scans.session_id = a.session_id)
connect by prior a.session_id = a.parent_session_id
start with a.session_id = 70
order by session_level,
session_id;
 
My suggestions
- Patch at least to 9.2.0.6
- don't use "left outer join" syntax. It's fairly new and slow, probably error prone. Use col1=col2(+) syntax instead
 
I thought Oracle was moving away from supporting the (+) syntax? I've been retraining my brain to use the ANSI-style join syntax.
 
BWendling:
- don't use "left outer join" syntax. It's fairly new and slow, probably error prone. Use col1=col2(+) syntax instead
hmm, I'm running following two statements, and I get same number of rows, same response time (on average), same execution plan.

Code:
SQL> select  d.dname, e.ename, e.sal
  2    from  dept d, emp e
  3   where  d.deptno = e.deptno(+)
  4     and  e.sal > 2000
  5  ;

DNAME          ENAME             SAL
-------------- ---------- ----------
ACCOUNTING     KING             5000
ACCOUNTING     CLARK            2450
RESEARCH       FORD             3000
RESEARCH       SCOTT            3000
RESEARCH       JONES            2975
SALES          BLAKE            2850

6 rows selected.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=10 Bytes=260)
   1    0   HASH JOIN (Cost=5 Card=10 Bytes=260)
   2    1     TABLE ACCESS (FULL) OF 'EMP' (Cost=2 Card=10 Bytes=130)
   3    1     TABLE ACCESS (FULL) OF 'DEPT' (Cost=2 Card=4 Bytes=52)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         15  consistent gets
          0  physical reads
          0  redo size
        354  bytes sent via SQL*Net to client
        275  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          6  rows processed

Code:
SQL> select  d.dname, e.ename, e.sal
  2    from  dept d left outer join emp e on d.deptno = e.deptno
  3   where  e.sal > 2000
  4  ;

DNAME          ENAME             SAL
-------------- ---------- ----------
ACCOUNTING     KING             5000
ACCOUNTING     CLARK            2450
RESEARCH       FORD             3000
RESEARCH       SCOTT            3000
RESEARCH       JONES            2975
SALES          BLAKE            2850

6 rows selected.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=10 Bytes=260)
   1    0   HASH JOIN (Cost=5 Card=10 Bytes=260)
   2    1     TABLE ACCESS (FULL) OF 'EMP' (Cost=2 Card=10 Bytes=130)
   3    1     TABLE ACCESS (FULL) OF 'DEPT' (Cost=2 Card=4 Bytes=52)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         15  consistent gets
          0  physical reads
          0  redo size
        367  bytes sent via SQL*Net to client
        275  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          6  rows processed
 
It was just my personal experience with 9i, that the "new" (for Oracle) ANSI syntax of FULL OUTER JOIN was much slower than the equivalent old syntax.

A lot of new Oracle features have bugs when they first come out, e.g. tablespace resize, table move.
 
I appreciate the feedback BWending - didn't mean to seem like I didn't. I played with the join syntax, didn't seem to make a difference. The DBA had an "Oracle Expert" come out and look at it, and here were his findings:

Queries being executed with connect by prior are having a problem with the optimizer. Enormous amounts of group by's are occurring causing the Oracle data blocks to get flushed out and reloaded with inv_phys_snapshot result set. This is causing the Oracle processes (ie, replication, recovery manager, process manager, etc) to spin almost out of control. They are doing massive hard drive reads at this point.

If the process executing this huge query (Over 800mg result set being grouped - sorted) is killed or disconnected, the process runs until all PGA (program global area) is used (area used for each user login).

By changing the following around, we at least were able to get the query to execute and return a resultset

connect by prior a.session_id = a.parent_session_id
start with a.session_id = 70

changed to

start with a.session_id = 70
connect by prior a.session_id = a.parent_session_id

... To further optimize this situation, we got rid of one index on inv_phys_snapshot and created a new one to match the query

Now, me personally? I think the index issue was 100% of the problem - or very close. I have always used CONNECT BY before START WITH, and it's always worked great - comes back quickly and correctly. Anyway, thanks for those who replied - I appreciate the feedback. Looks like I'm done wiping out the database for this week... until the next Oracle bug!! :D
 
Back
Top