This screen helps you find current stalls that represent
potential deadlocks, which become database hot spots. The screen
also helps you determine what stalls were last encountered by any
user.
Active user stall messages are not saved in the binary statistics
file created by the Output qualifier. Therefore, this screen is
not available when you execute the RMU Show Statistics command in
replay mode (with the Input qualifier).
The Stall Messages screen and the Active User Stall Messages
screen have the same format. However, the Active User Stall
Messages screen provides information on currently stalled
processes and on processes that were stalled but are no longer
stalled. In contrast, the Stall Messages screen provides
information only on currently stalled processes. Like the Stall
Messages screen, the Active User Stall Messages screen shows
when a process writes a bugcheck dump; a bugcheck dump file name
longer than 53 characters is truncated.
If there are more stalled processes than can fit on one page, the
notation "Page 1 of n" appears, where n is the total number of
pages. You can display successive pages by pressing the right
angle bracket (>) key or the Next Screen key. To display a
previous page of the Active User Stall Messages screen, press
the left angle bracket (<) key or the Prev Screen key.
The information shown in the screen includes:
o process
The process ID and the Oracle Rdb stream ID of the database
user. Normally the stream ID is 1. However, if the user is
attached to multiple databases or has explicitly detached and
attached to different database sessions during the same image
activation, the stream ID will uniquely identify the database
session. Stream ID values greater than 99 display as "**" to
indicate an integer display overflow on the screen the [Z]oom
function can be used to display the full stream ID in this
case.
Optionally, a single character following the stream ID field
indicates additional information about the process:
D - Database Recovery (DBR)
R - Server for a remote user
s - Database server (such as ABS, ALS, LCS, LRS, or RCS)
u - Attached for utility access
* - User process on another node in the cluster
A - Available for per-process monitoring
G - Actively being monitored
o T
The current transaction state. R indicates a read-only
transaction and W indicates a read/write transaction.
o since
By default, the time at which the current stall started. If
the stall has already completed, the column headed "since" is
blank, but the stall reason remains on the screen.
The Config menu option allows you to display the elapsed stall
time or the actual stall time.
o stall reason
The reason for the stall. For example, "waiting for..."
messages indicate a stalled lock request (along with the
requested lock mode). If the stall is still in progress, the
"since" column indicates the time at which the stall started.
If the stall has already completed, the "since" column is
blank, but the stall reason remains on the screen until it is
replaced by information for another stall.
The following list describes all of the stall reason messages
that can appear on the Active User Stall Messages screen,
and a brief explanation of what causes each of them. In most
cases, the messages are informational and should cause little
concern:
o Extending .AIJ file
This message displays whenever the .aij file is logically
or physically extended, which should occur infrequently.
o Extending .RUJ file
This message displays whenever the .ruj file is physically
extended, which should occur infrequently.
o Extending storage area !UL
This message displays whenever a storage area file
(identified by its numeric identifier, which can be
determined using RMU Dump) is physically extended. You can
determine the numeric identifier for a database's storage
areas by using the RMU Dump command. This message should
occur infrequently. However, this message may occur more
frequently with WORM areas because WORM area pages cannot
be reused once they have been written.
o Reading .AIJ file
This message displays whenever the AIJ lock information
needs to be refreshed; this typically only occurs the first
time a user attaches to the database. The .aij file is read
to determine the AIJ logical EOF (not to be confused with
the OpenVMS logical EOF). It is also read by the database
recovery (DBR) process.
o Reading ROOT file
This message displays whenever the in-memory database root
information is determined to be out of date and must be
read again from the disk. This message normally occurs only
when a database parameter is modified by a user online or
some information in the database root is modified by the
system (such as the AIJ sequence number).
o Reading .RUJ file
This message displays whenever an undo operation needs to
read the next RUJ page to acquire the rollback information
necessary to complete the operation. The .ruj file is read
one block at a time.
Sometimes a process that is not being rolled back receives
this message because it was necessary to read the RUJ file
in order to refresh cached recovery information.
o Reading pages !UL:!UL to !UL:!UL
This message displays whenever one or more pages is read
into either a user's local buffer or the global buffer.
One buffer full of pages is being read. The format string
"!UL:!UL" identifies the physical area and the page number.
o waiting for !AD (!AC)
This message displays whenever a process requests a lock
"with wait" and another process is holding the lock in an
incompatible mode. This message may indicate a database
hot spot and should be investigated using the RMU Show
Locks utility. The format string "!AD" identifies the lock
type (that is, storage area, page, MEMBIT, etc.) and the
string "!AC" identifies the requested lock mode (PR, CR,
EX, etc.).
The following list contains information on "waiting for"
messages:
o waiting for record or page
The "waiting for record" and "waiting for page" messages
display a process ID, the time, and the DBKEY for a
record or a page.
The dbkeys in "waiting for record" messages are logical
dbkeys. For example:
waiting for record 1:0:-4 (CR)
waiting for record 91:155:-1 (CW)
In this example of the "waiting for record" message, the
first two fields of the "waiting for record" message are
not shown. The first field of a "waiting for record"
message is the process ID of the stalled process,
and the second field is the time the stall began.
The third field in the "waiting for record" message
(the field with the "XX:YY:ZZ" format) represents the
DBKEY, and you can usually interpret it as "logical
area number:page number:line number." However, only
positive numbers represent the line number. When a
negative number appears, it refers to the record ALG
(adjustable lock granularity) locking level. Negative
numbers are interpreted as follows:
o -4 indicates the complete logical area
o -3 normally indicates 1000 database pages range
o -2 normally indicates 100 database pages range
o -1 normally indicates 10 database pages range
For example, in the second line of the example, the
DBKEY occurs in logical area 91 in a range of 10
database pages, one of which is page 155.
When you have a logical area number and want to get the
physical area name for that logical area, follow these
steps:
o Issue the following command:
$ RMU/DUMP/LAREAS=RDB$AIP db-name
o Search the resulting dump for the logical area with
that number.
o Note the corresponding physical area number.
o Issue the following command:
$ RMU/DUMP/HEADER db-name
Look up the physical area number from the output of
the RMU Dump Header command to find the name of the
physical area.
You can also look up columns RDB$STORAGE_ID or
RDB$INDEX_ID in system relations RDB$RELATIONS,
RDB$INDICES and RDBVMS$STORAGE_MAP_AREAS to identify
the Oracle Rdb entity (table or index) that the DBKEY
represents. For a description of the system relations,
see the System_Relations help topic by issuing the
following command:
$ HELP SQL SYSTEM_RELATIONS
The page number field in the DBKEY is the number of the
page in the corresponding physical area; the line number
is the number of the record on that page.
The dbkeys in "waiting for page" messages are physical
dbkeys, for example:
waiting for page 1:727 (PW)
In this example of the "waiting for page" message, the
first two fields of the "waiting for page" message are
not shown. The first field of a "waiting for page"
message is the process ID of the stalled process, and
the second field is the time the stall began. The DBKEY
format for a "waiting for page" message is interpreted
as "physical area number:page number".
When you have a physical area number and want to get
the physical area name for the area, issue the RMU Dump
Header command. Then look up the physical area number
in the command output to find the name of the physical
area.
You can also get a conversion table by issuing the
following command:
$ RMU/ANALYZE/LAREAS/OPTION=DEBUG/END=1 -
_$ /OUTPUT=LAREA.LIS db-name
This command produces a printable file containing
all logical areas, logical id numbers and physical id
numbers for a database.
CR, CW, and PW in the previous examples are requested
lock modes of Concurrent Read, Concurrent Write, and
Protected Write. The following table shows the lock
compatibility between a current transaction and access
modes other transactions can specify:
Mode of Current Lock
Mode of ____________________
Requested SR SW PR PW EX
Lock
_______________________________
SR Y Y Y Y N
SW Y Y N N N
PR Y N Y N N
PW Y N N N N
EX N N N N N
_______________________________
Key to lock modes:
SR - Shared Read
SW - Shared Write
PR - Protected Read
PW - Protected Write
EX - Exclusive
Y - Locks are compatible
N - Locks are not compatible
______________________________
Shared Read (SR) and Shared Write (SW) in the table are
equivalent to Concurrent Read and Concurrent Write.
o waiting for DBKEY scope
This message displays when a database user who attached
using the DBKEY SCOPE IS TRANSACTION clause has a
read/write transaction in progress (giving the user the
database key scope lock in CW mode), and a second user
who specifies the DBKEY SCOPE IS ATTACH clause (which
would give the user the database key scope lock in PR
mode) tries to attach. In this situation, the second
user's process stalls until the first user's transaction
completes.
You can specify the database key scope at run time using
the DBKEY SCOPE IS clause of the SQL ATTACH statement.
If the DBKEY SCOPE IS clause is used with the SQL CREATE
DATABASE or SQL IMPORT statements, the setting is in
effect only for the duration of the session of the user
who issued the statement; the setting does not become a
database root file parameter.
o waiting for snapshot cursor
This message displays when a process tries to start a
read-only transaction when snapshots are deferred, there
is no current read-only transaction, and a read/write
transaction is active.
Waiting for snapshot cursor is a normal state if
snapshots are deferred. The waiting will end when all
read/write transactions started before the first read-
only transaction have finished.
o waiting for MEMBIT lock
For each database, a membership data structure is
maintained. The membership data structure keeps track of
the nodes that are accessing the database at any given
time. The membership data structure for a database is
updated when the first user process from a node attaches
to the database and when the last user process from a
node detaches from the database.
The "waiting for MEMBIT lock" message means that a
process is stalled because the database's membership
data structure is in the process of being updated.
o waiting for client lock
A client lock indicates that an Rdb metadata lock is in
use. The term client indicates that Rdb is a client of
the Rdb locking services. The metadata locks are used
to guarantee memory copies of the metadata (table, index
and column definitions) are consistent with the on-disk
versions.
The "waiting for client lock" message means the database
user is requesting an incompatible locking mode. For
example, when trying to drop a table which is in use,
the drop operation requests a PROTECTED WRITE lock
on the metadata object (such as a table) which is
incompatible with the existing PROTECTED READ lock
currently used by others of the table.
These metadata locks consist of three longwords. The
lock is displayed in text format first, followed by its
hexadecimal representation. The text version masks out
non-printable characters with a dot (.).
The leftmost value seen in the hexadecimal output
contains the id of the object. The id is described below
for tables, routines, modules and storage map areas.
o For tables and views, the id represents the unique
value found in the RDB$RELATION_ID column of the
RDB$RELATIONS system table for the given table.
o For routines, the id represents the unique
value found in the RDB$ROUTINE_ID column of the
RDB$ROUTINES system table for the given routine.
o For modules, the id represents the unique value found
in the RDB$MODULE_ID column of the RDB$MODULES system
table for the given module.
o For storage map areas, the id presents the physical
area id. The "waiting for client lock" message on
storage map areas is very rare. This may be raised
for databases which have been converted from versions
prior to Rdb 5.1.
The next value displayed signifies the object type. The
following table describes objects and their hexadecimal
type values.
Object Type Values
-------------------------------------
Object Hexadecimal Value
-------------------------------------
Tables or views 00000004
Routines 00000006
Modules 00000015
Storage map areas 0000000E
-------------------------------------
The last value in the hexadecimal output represents the
lock type. The value 55 indicates this is a client lock.
NOTE
Because the full client lock output can be long,
it may require more space than is allotted for
the Stall.reason column and therefore can be
overwritten by the Lock.ID. column output.
For more detailed lock information, perform the
following steps:
1. Press the L option from the horizontal menu to
display a menu of lock IDs.
2. Select the desired lock ID.
o Writing .AIJ file
This message displays whenever a group commit process
writes the commit information to the .aij file. In a high
throughput environment, the write buffer length will be as
close to 64K as possible.
o Writing ROOT file
This message displays whenever the in-memory database
root information is modified by a user on line or some
information in the database root is modified by the system
(such as the AIJ sequence number).
o Writing .RUJ file
This message displays whenever a user process writes
data page modification information to the .ruj file. This
message always precedes the next message.
o Writing pages back to database
This message displays whenever one or more data pages is
written to the database. This is typically caused by a
request to access those pages from another process or by
detaching from the database.
o lock ID
The optional lock ID field is displayed only when the stall
is the result of a lock request. When other types of stalls
occur, such as stalls due to I/O activity, the lock ID field
is cleared from the screen.
When displayed, the lock ID field shows the lock
identification of the resource that is stalled. You can use
the lock identification number as input to the RMU Show Locks
command to obtain information about processes that own, are
blocking, or are waiting for locks.
The Active User Stall Messages screen has the following
attributes:
o The Active User Stall Messages screen reserves an empty line
for each process that is accessing the database from another
node in the VMScluster. These empty lines are unavoidable
because it is always possible for a process on another node in
the VMScluster to attach to the database on the current node,
thus using the empty line.
o If a process is active and running on the same node as the
Performance Monitor, the process' PID displays.
o The process information remains on that line until it detaches
from the database.
o The process stall text remains on the screen until it is
replaced by a new stall message.
o An active stall is identified by the stall starting time
being displayed. If the stall is no longer active, the stall
starting time is not displayed, but the message text remains
displayed.
o It is possible to page to multiple screen displays; each
process' state is preserved and refreshed when the new page
is displayed. When there is more than one page of Active User
Stalled Messages output, the header section contains the page
number currently displayed and the total number of pages in
the screen. Use the left angle bracket (<) key to go to the
previous page and the right angle bracket (>) key to go to the
next page of the screen.
The Active User Stall Messages screen has several advantages over
the Stall Messages screen. The advantages are:
o The location of a process remains static; because it is fixed
on a given page, it is always easy to locate.
o The process' last stall message (and lock ID, if applicable)
are displayed, even if the process is not currently stalled;
this is useful for identifying possible hot spots.
However, the Active User Stall Messages screen does have the
following disadvantages:
o It is difficult (but possible) to isolate the source of
a potential deadlock or a long-duration stall; the Stall
Messages screen is more useful for this.
o It is difficult (but possible) to isolate the set of actively
stalled processes from the complete set of processes doing
normal database accesses.
The following compares the Stall Messages screen and the Active
User Stall Messages screen:
o Processes displayed?
o Stall Messages screen-Displays only actively stalled
processes on the current node.
o Active User Stall Messages screen-Displays all processes
attached to the database on the current node.
o Process location?
o Stall Messages screen-Dynamic. The position of the process
reflects the duration of the stall relative to other
processes.
o Active User Stall Messages screen-Static. The position of
the process remains fixed in the same location until the
process detaches from the database.
o Display sequence?
o Stall Messages screen-Processes are displayed in descending
stall-duration sequence.
o Active User Stall Messages screen-Processes are displayed
in a fixed but arbitrary sequence.
o Indication of active stall?
o Stall Messages screen-The process stall text is displayed.
o Active User Stall Messages screen-The process stall text
starting time is displayed.
o Indication of inactive stall?
o Stall Messages screen-The process stall text is not
displayed.
o Active User Stall Messages screen-The process stall text
starting time is erased (message text remains).
o Duration of display?
o Stall Messages screen-The stall message is displayed only
if the stall is active.
o Active User Stall Messages screen-The last stall message
remains displayed until the process stalls again)
You can force frequent screen updates by using a negative number
for the Time qualifier in the RMU Show Statistics command. For
example, Time=-10 refreshes the screen every 10/100 (1/10) of a
second. Note that you use a lot of system resources, particularly
on the smaller CPU machines, when you specify this time interval.
If there are more stalls in progress than can fit on your
screen, some current stalls might not be displayed. Oracle Rdb
attempts to place as many active stall messages on the screen as
possible.