domingo, 3 de noviembre de 2013

EM 12c : El Error ORA-00020 en la BD del Repositorio OMS

El viernes pasado me llego una alerta que en la Base de Datos repositorio de uno de los EM 12c que le doy soporte , esta alerta es ORA-00020, básicamente lo que quiere decir es que el numero máximo de procesos que puedes tener en tu base de datos llego a su limite. Y de haber sabido la solución, me ahorro un dolor de cabeza :)

Lo primero que trate de hacer es conectarme a la BD , pero esto no me fue posible por el mismo error

oracle $ sqlplus

SQL*Plus: Release 11.2.0.3.0 Production on Sat Nov 2 04:55:20 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Enter user-name: /as sysdba
ERROR:
ORA-00020: maximum number of processes (500) exceeded

Trate de conectarme con la opción -prelim, pero no me permitio ver que estaba corriendo

oracle $ sqlplus -prelim / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Nov 2 04:57:38 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

SQL> select count(*) from v$process;
select count(*) from v$process
*
ERROR at line 1:
ORA-01012: not logged on
Process ID: 0
Session ID: 0 Serial number: 0

Entonces, sabiendo que era la BD del repositorio, mate un proceso de cliente y entre a la Base de Datos para ver que estaba corriendo

oracle $ kill -9 31438

oracle $ sqlplus

SQL*Plus: Release 11.2.0.3.0 Production on Sat Nov 2 04:58:28 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Enter user-name: /as sysdba

SQL> select count(1),USERNAME from gv$session group by USERNAME;

  COUNT(1) USERNAME
---------- ------------------------------
       486
  3 SYSMAN_MDS
  2 DBSNMP
  1 SYSMAN_APM
  2 SYSMAN_OPSS
  1 SYS

set lines 250
col inst_id for 99
col program for a60
col username for a15
col event for a35
col module for a30
col action for a30
col state for a10
select inst_id,sid,sql_id,module,action,username,status,state,event,p1,p2 from gv$session
where status='ACTIVE' and
event not in ('smon timer','pmon timer','rdbms ipc message','Null event',
                   'parallel query dequeue','pipe get','client message',
                   'SQL*Net message to client','SQL*Net message from client',
                   'SQL*Net more data from client','dispatcher timer',
                   'virtual circuit status','lock manager wait for remote message',
                   'PX Idle Wait','PX Deq: Execute Reply','PX Deq: Execution Msg',
                   'PX Deq: Table Q Normal','PX Deq Credit: send blkd','PX Deq: Execute Reply',
                   'PX Deq Credit: need buffer','PX Deq: Signal ACK','PX Deque wait',
                   'wakeup time manager','slave wait','i/o slave wait','jobq slave wait',
                   'null event','gcs remote message','gcs for action','ges remote message',
                   'queue messages','Streams AQ: waiting for messages in the queue',
                   'DIAG idle wait','VKTM Logical Idle Wait','ASM background timer','fbar timer',
                   'Streams AQ: waiting for time management or cleanup tasks',
                   'Streams AQ: qmn coordinator idle wait','Streams AQ: qmn slave idle wait',
                   'class slave wait','Space Manager: slave idle wait','JOX Jit Process Sleep',
                   'EMON slave idle wait','GCR sleep','LNS ASYNC end of log','PING',
                   'Streams AQ: emn coordinator idle wait','wait for unread message on broadcast channel')  

SQL> /

INST_ID        SID SQL_ID  MODULE    ACTION          USERNAME        STATUS STATE    EVENT           P1   P2
------- ---------- ------------- ------------------------------ ------------------------------ --------------- -------- ---------- ----------------------------------- ---------- ----------
      1  21   OMS                ACTIVE WAITING    library cache lock         5644182264 1937119896
      1  22   OMS                ACTIVE WAITING    library cache lock         5644182264 4874496520
      1  24   OMS                ACTIVE WAITING    library cache lock         5644182264 5032733856
      1  25   OMS                ACTIVE WAITING    library cache lock         5644182264 5549524880
      1  26   OMS                ACTIVE WAITING    library cache lock         5644182264 5543937040
      1  27   OMS                ACTIVE WAITING    library cache lock         5644182264 5362001912

Y lo que me di cuenta es que las 496 sesiones estaban esperando con library cache lock, así que la logica me llevo a que habia un objeto invalido, pero no fue el caso, entonces lo siguiente que hice fue tratar de averiguar que objeto era el causante de esto

SQL> SELECT sid, event, p1raw, seconds_in_wait, wait_time
FROM sys.v_$session_wait
WHERE event like 'library cache%'
AND state = 'WAITING'  2    3    4  ;

       SID EVENT
---------- ----------------------------------------------------------------
P1RAW   SECONDS_IN_WAIT  WAIT_TIME
---------------- --------------- ----------
 21 library cache lock
00000001506B62F8       48   0

 22 library cache lock
00000001506B62F8     1274   0

 24 library cache lock
00000001506B62F8      348   0

...

SQL>  tti "Object that is Blocking"
 COL owner format a8
 COL object format a70
 SELECT kglnaown AS owner, kglnaobj as Object
    FROM sys.x$kglob
   WHERE kglhdadr like '&P1RAW'
  /SQL> SQL> SQL>   2    3    4  
Enter value for p1raw: 00000001506B62F8
old   3:    WHERE kglhdadr like '&P1RAW'
new   3:    WHERE kglhdadr like '00000001506B62F8'

Sat Nov 02              page    1
       Object that is Blocking

OWNER  OBJECT
-------- ----------------------------------------------------------------------
  84

Como puedes ver el dueño y el objeto estaban raros, y confirmando este objeto obvio no existia, entonces lo que hice, fue tratar de subir y bajar el stack de OMS, y aquí fue donde salio todo el problema

Oracle Enterprise Manager Cloud Control 12c Release 12.1.0.1.0  
Copyright (c) 1996, 2011 Oracle Corporation.  All rights reserved.
Stopping WebTier...
WebTier Successfully Stopped
Stopping Oracle Management Server...
Oracle Management Server Successfully Stopped
AdminServer Successfully Stopped
Oracle Management Server is Down
Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0  
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.
Stopping agent ..... stopped.

Oracle Enterprise Manager Cloud Control 12c Release 12.1.0.1.0  
Copyright (c) 1996, 2011 Oracle Corporation.  All rights reserved.
Starting WebTier...
WebTier Successfully Started
Starting Oracle Management Server...
Oracle Management Server Could Not Be Started
Oracle Management Server is not functioning because of the following reason:
Unexpected error occurred. Check error and log files.
Please check
EM_Instance_HOME/em/EMGC_OMS1/sysman/log/emctl.log for error details

Así que lo que hice es fue ir a los logs, y lo que vi fue que el password de sysman habia sido cambiado

<Nov 2, 2013 6:29:09 AM EST> <Notice> <Security> <BEA-090898> <Ignoring the trusted CA certificate "CN=KEYNECTIS ROOT CA,OU=ROOT,O=KEYNECTIS,C=FR". The loading of the trusted certificate list raised a certificate parsing exception PKIX: Unsupported OID in the AlgorithmIdentifier object: 1.2.840.113549.1.1.11.>

<Nov 2, 2013 6:29:09 AM EST> <Notice> <Security> <BEA-090898> <Ignoring the trusted CA certificate "CN=GeoTrust Primary Certification Authority - G3,OU=(c) 2008 GeoTrust Inc. - For authorized use only,O=GeoTrust Inc.,C=US". The loading of the trusted certificate list raised a certificate parsing exception PKIX: Unsupported OID in the AlgorithmIdentifier object: 1.2.840.113549.1.1.11.>

Invalid Connection Pool. ERROR = ORA-01017: invalid username/password; logon denied

Una vez que cambie el password en el repositorio correctamente, pude arrancar OMS sin ningun problema


emctl config oms -change_repos_pwd -change_in_db -use_sys_pwd -sys_pwd sys_pass -new_pwd sysman_pass

Oracle Enterprise Manager Cloud Control 12c Release 12.1.0.1.0  
Copyright (c) 1996, 2011 Oracle Corporation.  All rights reserved.
Starting WebTier...
WebTier Successfully Started
Starting Oracle Management Server...
Oracle Management Server Successfully Started
Oracle Management Server is Up
Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0  
Copyright (c) 1996, 2012 Oracle Corporation.  All rights reserved.

Lo que hay que entender de esto es que cambiar el password de SYSMAN en en el repositorio es un proceso, no es nada mas en la base de datos, y espero que esto te ahorre los dolores de cabeza que me dio este pequeño problema.