Thursday, December 5, 2013

How to Capture Additional Information when Your Temporary Tablespace was full with ORA-1652?

In the alert log file of one of the production Oracle databases I manage, we often see ORA-1652 error. The database is primarily an OLTP database. But it also serves application that allows database users to run ad-hoc Brio queries. When a query that access large tables without join condition or has incorrect join predicates, the query can easily take large chunk of free space in temporary tablespace.

There are two possible outcomes when it happens: the query itself could fail with ora-1652 error; or at the worst, many other transactions in the database could fail with ORA-1652 error.

Using event attribute functions of AFTER SERVERERROR database event, we can capture information about the database sessions that received the ORA-1652 error and save them in a table for later review. But what if the session that got error was not the one using large amount of space in temp tablespace? Luckily since Oracle version 11gR2, a new column: TEMP_SPACE_ALLOCATED in v$active_session_history has the information we are looking for.

%%%%%%%%%
% Setup:
%%%%%%%%%

sqlplus '/as sysdba'

drop sequence system.temp_mon_sequence;

create sequence system.temp_mon_sequence
  START WITH 100
  INCREMENT BY 1
  NOCYCLE
  NOCACHE;

create or replace table system.temp_monitor
(id number
constraint PK_TEMP_MONITOR primary key,
error_date varchar2(50),
login_user varchar2(100),
msg varchar2(4000))
tablespace users
/

create or replace trigger system.failed_to_extend_temp
  after servererror on database
   declare
      sql_text ora_name_list_t;
      error_msg varchar2(4000);
    begin
        if ( ora_is_servererror(1652) )
        then
 for i in 1 .. ora_sql_txt(sql_text) loop
     error_msg := error_msg || sql_text(i);
  end loop;
           insert into SYSTEM.temp_monitor(id,error_date,login_user,msg) values
          (SYSTEM.TEMP_MON_SEQUENCE.nextval,to_char(sysdate, 'dd-MON-yyyy,hh24:mi:ss'),ora_login_user,error_msg);
       end if;
end;
/

grant select on system.temp_monitor to public;

create public synonym temp_monitor for system.temp_monitor;

%%%%%%%%%%%%%%%%%%%%%%%%%%
% Run a test case:
%%%%%%%%%%%%%%%%%%%%%%%%%%

create temporary tablespace test_temp
tempfile '/tmp/test_temp.dbf'
size 512k reuse
extent management local
uniform size 64k
/

create user dan identified by dan
default tablespace users
temporary tablespace test_temp
/

grant connect,resource to dan;

grant select any table to dan;

Now find the largest table in the database and do a select * with order by:

connect dan/dan@orcl
SQL> select * from my_big_table
order by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27;
select * from my_big_table
                      *
ERROR at line 1:
ORA-01652: unable to extend temp segment by 8 in tablespace TEST_TEMP
sqlplus '/as sysdba'
select * from temp_monitor;
        ID
----------
ERROR_DATE
--------------------------------------------------------------------------------
LOGIN_USER
--------------------------------------------------------------------------------
MSG
-------------------------------------------------------------------------------------------------------------------------------------------------------------
       100
07-AUG-2013,21:28:15
DAN

select * from MY_BIG_TABLE
order by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27

Following query can be used to show sessions that have used large amount of space in temporary tablespace in last one hour. You can adjust minimum value of temp_space_allocated and sample_time to suit your needs:


SQL> select distinct session_id,sample_time,temp_space_allocated from v$active_session_history
where temp_space_allocated=(select max(temp_space_allocated) from v$active_session_history
where to_char(sample_time,'hh24') >= to_char(sysdate, 'hh24')-1)
and temp_space_allocated >1024
and to_char(sample_time,'hh24') >= to_char(sysdate, 'hh24')-1
/
SESSION_ID
----------
SAMPLE_TIME
---------------------------------------------------------------------------
TEMP_SPACE_ALLOCATED
--------------------
        18
06-AUG-13 10.00.12.331 PM
             2097152
        18
06-AUG-13 10.00.11.331 PM
             2097152

%%%%%%%%%%%%%%%%
% Automation:
%%%%%%%%%%%%%%%%

By updating the shell script presented here:

http://danthedba.blogspot.com/2009_06_01_archive.html

we can automate the process. Here is updated version of that shell script:

#!/bin/sh
#
# Author: Dan Zheng, danthedba.blogspot.com
# June 2009
# This script checks database alert/log.xml file and email ORA error.
# It can be used on a Oracle database version 11g and higher.
#
# Before running this script for the first time, please create directory:
# $ORACLE_BASE/admin/cronjob_scripts/dbmonitor for files used by this script.
#
# August 2013: updated to collect additional information related to ORA-1652.
# Please make sure following database objects are created and proper privileges granted:
# Sequence: system.temp_mon_sequence
# Table:    system.temp_monitor
# Trigger:  system.failed_to_extend_temp
# Set up environment.
# Make necessary change on following line based on your Unix flavor:
#set -x
if [ $# != 1 ]
then
echo run this script as following:
echo $0 ORACLE_SID
exit 1
fi
. ~oracle/.bash_profile
export ORACLE_SID=$1
export TIMESTAMP=`date +%M%H%d%m%y`;
export mon_dir=/u01/app/oracle/dan
export fn=$mon_dir/email_body_${ORACLE_SID}_2.tmp
export alertck=$mon_dir/alertck_${ORACLE_SID}_2.log
touch $alertck
touch $fn
touch $mon_dir/donot_delete_${ORACLE_SID}_2.log
EMAIL='dan_zheng@danthedba.blogspot.com'

ORACLE_BASE
oracle_sid=$( echo "${ORACLE_SID}" |tr '[:upper:]' '[:lower:]')
adrci <spool $mon_dir/alert_${TIMESTAMP}_${ORACLE_SID}.log
set termout off
set homepath diag/rdbms/$oracle_sid/${ORACLE_SID}
SHOW ALERT -P "MESSAGE_TEXT LIKE '%ORA-%'" -term
spool off
exit
EOF

export c_log=`wc -l $mon_dir/alert_${TIMESTAMP}_${ORACLE_SID}.log | awk '{ print $1 }'`
export c_tmp=`wc -l $mon_dir/donot_delete_${ORACLE_SID}_2.log | awk '{ print $1 }'`
echo $c_log
echo $c_tmp
if (($(($c_log)) > $(($c_tmp)))); then
comm -23 $mon_dir/alert_${TIMESTAMP}_${ORACLE_SID}.log $mon_dir/donot_delete_${ORACLE_SID}_2.log | grep ORA- |sort -u > $alertck
exec 3< $alertck
# read until the end of the file
 until [ $done ]
 do
   read <&3 ERR_LINE
   if [ $? != 0 ]
   then done=1
    continue
   fi
  echo $ERR_LINE >> $fn
  export ERR_NO=`echo $ERR_LINE |awk '{ print $1 }'|awk -F- '{ print $2 }'| awk -F: '{ print $1 }'`
 echo " Oracle error is : ORA-$ERR_NO"
  oerr ora $ERR_NO >> $fn
  echo "  " >> $fn
 if [ $ERR_NO -eq 1652 ]
 then
sqlplus -s "/as sysdba" <>$fn
column ora_login_user format a100 head 'Person who runs the offending query that caused ORA-1652 error:'
column msg format a150 head 'Offending query:'
column program format a30
set linesize 200


select ora_login_user from system.temp_monitor
where id=(select max(id) from system.temp_monitor);
select msg from system.temp_monitor
where id=(select max(id) from system.temp_monitor);
set heading off
select 'Users who are using more than 1GB of temporary tablespace in the last one hour:' from dual;
set heading on
select a.sid,a.serial#,a.username,a.program,b.temp_space_allocated/1024/1024 "temp space used (MB)",b.sample_time
from v\$active_session_history b,v\$session a
where b.temp_space_allocated=(select max(temp_space_allocated) from v\$active_session_history
where to_char(sample_time,'hh24') >= to_char(sysdate, 'hh24')-1)
and b.temp_space_allocated/1024/1024 >1000
and to_char(b.sample_time,'hh24') >= to_char(sysdate, 'hh24')-1
and b.session_id=a.sid
and b.session_serial#=a.serial#;
exit
EOF
 fi

  echo "-----------------------------------------------------" >> $fn
  echo "  " >> $fn
 done
echo "ERRORS: sending mail!"
SUBJECT="${ORACLE_SID} - ORA Error Message found in alert log file log.xml at `date`"
echo $SUBJECT
# From a HPUX server, use following line to send email:
#cat $fn | /usr/bin/mailx -s "$SUBJECT" $EMAIL
# from a Linux server, use following line to send email:
cat $fn | mail -s "$SUBJECT" $EMAIL
mv $mon_dir/alert_${TIMESTAMP}_${ORACLE_SID}.log $mon_dir/donot_delete_${ORACLE_SID}_2.log
else
echo "No Oracle error is found in alert log log.xml since last time this script was run."
fi

rm $fn
echo "The script was executed successfully."
%%%%%%%%%%%%%%%
% Clean up:
%%%%%%%%%%%%%%%
sqlplus '/as sysdba'
drop user dan cascade;
drop tablespace test_temp including contents and datafiles;
--- If you want to completely remove the objects related to this setup:
drop public synonym temp_monitor;
drop trigger system.failed_to_extend_temp;
drop table system.temp_monitor cascade constraints;
drop sequence system.temp_mon_sequence;

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
References:
1. http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_triggers.htm#CHDCFDJG
2. http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:374218170986

No comments: