This screen shows a summary of database users' stall activity.
A user stalls whenever Oracle Rdb issues a system call on behalf
of the user's process. For example, a stall occurs while a user
waits for a lock or for completion of a physical disk read or
write.
By default, the Stall Messages screen shows all stalls, including
those of millisecond duration. When a high-performance, high-
volume online transaction processing (OLTP) application issues a
large number of I/Os on a high-speed disk device, a DBA may find
it impossible to differentiate between the many short millisecond
stalls of the OLTP application and the longer, more important
stalls that may be encountered by other applications using the
database.
By typing A to select the Alarm option from the Stall Messages
horizontal menu, you can specify a duration, in seconds, that
a process must stall before it appears on the Stall Messages
screen. For example, if you specify an alarm interval of 5
seconds, then only stalls of 5 seconds or longer duration will
appear on the Stall Messages screen. If you specify a value of 0
as the alarm interval, the default, all stalls will appear on the
Stall Messages screen.
By typing B once to select the Bell option from the horizontal
menu, you can activate the alarm bell option and the option will
be highlighted. Entering B again will deactivate the alarm bell
and the option will not be highlighted.
The alarm bell, even if activated, will be rung only if the alarm
option has also been activated.
When both the alarm and the alarm bell are activated, the alarm
bell will be sounded once per screen refresh (specified by the
Set_rate option) if there are any displayed stalls.
By typing F to select the filter option, you can control the
type of stall messages that are displayed. For example, you can
display processes that are stalled while writing to a certain
storage area.
The filter option allows you to enter a search string which is
used to filter the stall messages. Only those stall messages that
contain the specified search string are displayed. The search
string may contain one or both of the wildcard characters. The
asterisk wildcard character is mapped to zero or more characters
and the percent wildcard character is mapped to exactly one
character. Note that the search string is not case sensitive.
The filter menu option is highlighted when a search string is
actively filtering stall messages. To disable filtering, press
the Return key at the search string prompt.
Stall messages are not saved in the binary statistics file
created by the Output qualifier. Consequently, the Stall Messages
screen is not available when you replay a binary file using the
Input qualifier.
This screen lists database user processes and describes the most
recent stalls executed by users on the node from which the Show
Statistics command was issued. Because the stall messages are
sampled only at the screen update interval, most stalls are
missed. If the same stall message for a process persists, it
could indicate a problem. The screen also shows when a process is
writing a bugcheck dump; a bugcheck dump file name longer than 53
characters is truncated.
The Stall Messages screen shows only processes that are actively
stalling. Once a process finishes stalling, it disappears from
the screen. Processes that are still stalling ripple up to the
top of the screen. This means that the longest stalling processes
appear at the top of the screen. Newer stalls are added to the
bottom of the screen. Therefore, all users on the same node
share the same stall screen lines, and only the actively stalling
processes show up on the stall screen. This allows you to monitor
a relatively large number of stalling processes.
If there are more stalled processes than can fit on one page, the
notation "Page 1 of n" appears, where n is the total number of
pages. You can display successive pages by pressing the right
angle bracket (>) key or the Next Screen key. To display a
previous stall message page, press the left angle bracket (<)
key or the Prev Screen key.
A database with no stalls has a blank stall display.
You can force frequent screen updates by using a negative number
for the Time qualifier in the RMU Show Statistics command. For
example, Time=-10 refreshes the screen every 10/100 (1/10) of a
second. Note that you use a lot of system resources, particularly
on the smaller CPU machines, when you specify this time interval.
If more stalls are in progress than can fit on your screen, some
active stalls might not be displayed. Oracle Rdb attempts to
place as many active stall messages on the screen as possible.
You can use the Config menu option to configure the Stall
Messages screen. Select this option, by typing the letter C,
to display the configuration submenu. The configuration submenu
provides the following options:
o Display XXX Stall Time
Display either the actual stall time or the elapsed stall
time. The option description changes depending on which stall
time is currently being displayed.
o Set alarm interval
Specify a duration, in seconds, that a process must stall
before it appears on the Stall Messages screen.
o XXX alarm bell
Activate or deactivate the alarm bell. The option description
changes depending on whether the bell is activated.
o Filter Stall Messages
Display stall messages that contain the specified search
string.
The information shown in the screen includes:
o process
The process ID and the Oracle Rdb stream ID of the database
user. Normally the stream ID will be one (1). However, if
the user is attached to multiple databases or has explicitly
detached and attached to different database sessions during
the same image activation, the stream ID will uniquely
identify the database session. Stream ID values greater than
99 display as "**" to indicate an integer display overflow
on the screen; the [Z]oom function can be used to display the
full stream ID in this case. The Config menu allows you to
select sorting of database users by process ID.
Optionally, a single character following the stream ID field
indicates additional information about the process:
D - Database Recovery (DBR)
R - Server for a remote user
s - Database server (such as ABS, ALS, LCS, LRS, or RCS)
u - Attached for utility access
* - User process on another node in the cluster
A - Available for per-process monitoring
G - Actively being monitored
o T
The current transaction state. R indicates a read-only
transaction and W indicates a read/write transaction.
o since
By default, the time at which the current stall started.
The Config menu option allows you to display the elapsed stall
time or the actual stall time.
o stall reason
The reason for the stall. For example, "waiting for..."
messages indicate a stalled lock request (along with the
requested lock mode).
The following list describes all of the stall reason messages
that can appear on the Stall Messages screen, and a brief
explanation of what causes each of them. In most cases, the
messages are informational and should cause little concern:
o Extending .AIJ file
This message displays whenever the .aij file is logically
or physically extended, which should occur infrequently.
o Extending .RUJ file
This message displays whenever the .ruj file is physically
extended, which should occur infrequently.
o Extending storage area !UL
This message displays whenever a storage area file
(identified by its numeric identifier, which can be
determined using RMU Dump) is physically extended. You can
determine the numeric identifier for a database's storage
areas by using the RMU Dump command. This message should
occur infrequently. However, this message may occur more
frequently with WORM areas because WORM area pages cannot
be reused once they have been written.
o Reading .AIJ file
This message displays whenever the AIJ lock information
needs to be refreshed; this typically only occurs the first
time a user attaches to the database. The .aij file is read
to determine the AIJ logical EOF (not to be confused with
the OpenVMS logical EOF). It is also read by the database
recovery (DBR) process.
o Reading ROOT file
This message displays whenever the in-memory database root
information is determined to be out-of-date and must be
read again from the disk. This message normally occurs only
when a database parameter is modified by a user on line or
some information in the database root is modified by the
system (such as the AIJ sequence number).
o Reading .RUJ file
This message displays whenever an undo operation needs to
read the next RUJ page to acquire the rollback information
necessary to complete the operation. The .ruj file is read
one block at a time.
Sometimes a process that is not being rolled back receives
this message because it was necessary to read the RUJ file
in order to refresh cached recovery information.
o Reading pages !UL:!UL to !UL:!UL
This message displays whenever one or more pages is read
into either a user's local buffer or the global buffer.
One buffer full of pages is being read. The format string
"!UL:!UL" identifies the physical area and the page number.
o waiting for !AD (!AC)
This message displays whenever a process requests a lock
"with wait" and another process is holding the lock in an
incompatible mode. This message may indicate a database
hot spot and should be investigated using the RMU Show
Locks utility. The format string "!AD" identifies the lock
type (that is, storage area, page, MEMBIT, etc.) and the
string "!AC" identifies the requested lock mode (PR, CR,
EX, etc.).
The following list contains information on "waiting for"
messages:
o waiting for record or page
The "waiting for record" and "waiting for page" messages
display a process ID, the time, and the DBKEY for a
record or a page.
The DBKEYs in "waiting for record" messages are logical
DBKEYs. For example:
waiting for record 1:0:-4 (CR)
waiting for record 91:155:-1 (CW)
In this example of the "waiting for record" message, the
first two fields of the "waiting for record" message are
not shown. The first field of a "waiting for record"
message is the process ID of the stalled process,
and the second field is the time the stall began.
The third field in the "waiting for record" message
(the field with the "XX:YY:ZZ" format) represents the
DBKEY, and you can usually interpret it as "logical area
number:page number:line number." However, only positive
numbers represent the line number. When a negative
number appears, it refers to the record ALG (adjustable
lock granularity) locking level. By default, there are
three page locking levels and the negative numbers are
interpreted as follows:
o -4 indicates the complete logical area
o -3 normally indicates 1000 database pages range
o -2 normally indicates 100 database pages range
o -1 normally indicates 10 database pages range
For example, in the second line of the example, the
DBKEY occurs in logical area 91 in a range of 10
database pages, one of which is page 155.
When you have a logical area number and want to get the
physical area name for that logical area, follow these
steps:
o Issue the following command:
$ RMU/DUMP/LAREAS=RDB$AIP db-name
o Search the resulting dump for the logical area with
that number.
o Note the corresponding physical area number.
o Issue the following command:
$ RMU/DUMP/HEADER db-name
Look up the physical area number from the output of
the RMU Dump Header command to find the name of the
physical area.
You can also look up columns RDB$STORAGE_ID or
RDB$INDEX_ID in system relations RDB$RELATIONS,
RDB$INDICES and RDBVMS$STORAGE_MAP_AREAS to identify
the Oracle Rdb entity (table or index) that the DBKEY
represents. For a description of the system relations,
see the System_Relations help topic by issuing the
following command:
$ HELP SQL SYSTEM_RELATIONS
The page number field in the DBKEY is the number of the
page in the corresponding physical area; the line number
is the number of the record on that page.
The dbkeys in "waiting for page" messages are physical
dbkeys, for example:
waiting for page 1:727 (PW)
In this example of the "waiting for page" message, the
first two fields of the "waiting for page" message are
not shown. The first field of a "waiting for page"
message is the process ID of the stalled process, and
the second field is the time the stall began. The DBKEY
format for a "waiting for page" message is interpreted
as "physical area number:page number."
When you have a physical area number and want to get
the physical area name for the area, issue the RMU Dump
Header command. Then look up the physical area number
in the command output to find the name of the physical
area.
You can also get a conversion table by issuing the
following command:
$ RMU/ANALYZE/LAREAS/OPTION=DEBUG/END=1 -
_$ /OUTPUT=LAREA.LIS db-name
This command produces a printable file containing
all logical areas, logical id numbers and physical id
numbers for a database.
CR, CW, and PW in the previous examples are requested
lock modes of Concurrent Read, Concurrent Write, and
Protected Write. The following table shows the lock
compatibility between a current transaction and access
modes other transactions can specify:
Mode of Current Lock
Mode of ____________________
Requested SR SW PR PW EX
Lock
_______________________________
SR Y Y Y Y N
SW Y Y N N N
PR Y N Y N N
PW Y N N N N
EX N N N N N
_______________________________
Key to lock modes:
SR - Shared Read
SW - Shared Write
PR - Protected Read
PW - Protected Write
EX - Exclusive
Y - Locks are compatible
N - Locks are not compatible
______________________________
Shared Read (SR) and Shared Write (SW) in the table are
equivalent to Concurrent Read (CR) and Concurrent Write
(CW).
o waiting for DBKEY scope
This message displays when a database user who attached
using the DBKEY SCOPE IS TRANSACTION clause has a
read/write transaction in progress (giving the user the
database key scope lock in CW mode), and a second user
who specifies the DBKEY SCOPE IS ATTACH clause (which
would give the user the database key scope lock in PR
mode) tries to attach. In this situation, the second
user's process stalls until the first user's transaction
completes.
You can specify the database key scope at run time using
the DBKEY SCOPE IS clause of the SQL ATTACH statement.
If the DBKEY SCOPE IS clause is used with the SQL CREATE
DATABASE or SQL IMPORT statements, the setting is in
effect only for the duration of the session of the user
who issued the statement; the setting does not become a
database root file parameter.
o waiting for snapshot cursor
This message displays when a process tries to start a
read-only transaction when snapshots are deferred, there
is no current read-only transaction, and a read/write
transaction is active.
Waiting for snapshot cursor is a normal state if
snapshots are deferred. The waiting will end when all
read/write transactions started before the first read-
only transaction have finished.
o waiting for MEMBIT lock
For each database, a membership data structure is
maintained. The membership data structure keeps track of
the nodes that are accessing the database at any given
time. The membership data structure for a database is
updated when the first user process from a node attaches
to the database and when the last user process from a
node detaches from the database.
The "waiting for MEMBIT lock" message means that a
process is stalled because the database's membership
data structure is in the process of being updated.
o waiting for client lock
A client lock indicates that an Rdb metadata lock is in
use. The term client indicates that Rdb is a client of
the Rdb locking services. The metadata locks are used
to guarantee memory copies of the metadata (table, index
and column definitions) are consistent with the on-disk
versions.
The "waiting for client lock" message means the database
user is requesting an incompatible locking mode. For
example, when trying to drop a table which is in use,
the drop operation requests a PROTECTED WRITE lock
on the metadata object (such as a table) which is
incompatible with the existing PROTECTED READ lock
currently used by others of the table.
These metadata locks consist of three longwords. The
lock is displayed in text format first, followed by its
hexadecimal representation. The text version masks out
non-printable characters with a dot (.).
The leftmost value seen in the hexadecimal output
contains the id of the object. The id is described below
for tables, routines, modules and storage map areas.
o For tables and views, the id represents the unique
value found in the RDB$RELATION_ID column of the
RDB$RELATIONS system table for the given table.
o For routines, the id represents the unique
value found in the RDB$ROUTINE_ID column of the
RDB$ROUTINES system table for the given routine.
o For modules, the id represents the unique value found
in the RDB$MODULE_ID column of the RDB$MODULES system
table for the given module.
o For storage map areas, the id presents the physical
area id. The "waiting for client lock" message on
storage map areas is very rare. This may be raised
for databases which have been converted from versions
prior to Rdb 5.1.
The next value displayed signifies the object type. The
following table describes objects and their hexadecimal
type values.
Object Type Values
-------------------------------------
Object Hexadecimal Value
-------------------------------------
Tables or views 00000004
Routines 00000006
Modules 00000015
Storage map areas 0000000E
-------------------------------------
The last value in the hexadecimal output represents the
lock type. The value 55 indicates this is a client lock.
NOTE
Because the full client lock output is long,
it may require more space than is allotted for
the Stall.reason column and therefore can be
overwritten by the Lock.ID. column output.
For more detailed lock information, perform the
following steps:
1. Press the L option from the horizontal menu to
display a menu of lock IDs.
2. Select the desired lock ID.
o Writing .AIJ file
This message displays whenever a group commit process
writes the commit information to the .aij file. In a high
throughput environment, the write buffer length will be as
close to 64K as possible.
o Writing ROOT file
This message displays whenever the in-memory database
root information is modified by a user on line or some
information in the database root is modified by the system
(such as the AIJ sequence number).
o Writing .RUJ file
This message displays whenever a user process writes
data page modification information to the .ruj file. This
message always precedes the next message.
o Writing pages back to database
This message displays whenever one or more data pages is
written to the database. This is typically caused by a
request to access those pages from another process or by
detaching from the database.
o lock ID
The optional lock ID field is displayed only when the stall
is the result of a lock request. When other types of stalls
occur, such as stalls due to I/O activity, the lock ID field
is cleared from the screen.
When displayed, the lock ID field shows the lock
identification of the resource that is stalled. You can use
the lock identification number as input to the RMU Show Locks
command to obtain information about processes that own, are
blocking, or are waiting for locks.