This screen helps you find current stalls that represent potential deadlocks, which become database hot spots. The screen also helps you determine what stalls were last encountered by any user. Active user stall messages are not saved in the binary statistics file created by the Output qualifier. Therefore, this screen is not available when you execute the RMU Show Statistics command in replay mode (with the Input qualifier). The Stall Messages screen and the Active User Stall Messages screen have the same format. However, the Active User Stall Messages screen provides information on currently stalled processes and on processes that were stalled but are no longer stalled. In contrast, the Stall Messages screen provides information only on currently stalled processes. Like the Stall Messages screen, the Active User Stall Messages screen shows when a process writes a bugcheck dump; a bugcheck dump file name longer than 53 characters is truncated. If there are more stalled processes than can fit on one page, the notation "Page 1 of n" appears, where n is the total number of pages. You can display successive pages by pressing the right angle bracket (>) key or the Next Screen key. To display a previous page of the Active User Stall Messages screen, press the left angle bracket (<) key or the Prev Screen key. The information shown in the screen includes: o process The process ID and the Oracle Rdb stream ID of the database user. Normally the stream ID is 1. However, if the user is attached to multiple databases or has explicitly detached and attached to different database sessions during the same image activation, the stream ID will uniquely identify the database session. Stream ID values greater than 99 display as "**" to indicate an integer display overflow on the screen the [Z]oom function can be used to display the full stream ID in this case. Optionally, a single character following the stream ID field indicates additional information about the process: D - Database Recovery (DBR) R - Server for a remote user s - Database server (such as ABS, ALS, LCS, LRS, or RCS) u - Attached for utility access * - User process on another node in the cluster A - Available for per-process monitoring G - Actively being monitored o T The current transaction state. R indicates a read-only transaction and W indicates a read/write transaction. o since By default, the time at which the current stall started. If the stall has already completed, the column headed "since" is blank, but the stall reason remains on the screen. The Config menu option allows you to display the elapsed stall time or the actual stall time. o stall reason The reason for the stall. For example, "waiting for..." messages indicate a stalled lock request (along with the requested lock mode). If the stall is still in progress, the "since" column indicates the time at which the stall started. If the stall has already completed, the "since" column is blank, but the stall reason remains on the screen until it is replaced by information for another stall. The following list describes all of the stall reason messages that can appear on the Active User Stall Messages screen, and a brief explanation of what causes each of them. In most cases, the messages are informational and should cause little concern: o Extending .AIJ file This message displays whenever the .aij file is logically or physically extended, which should occur infrequently. o Extending .RUJ file This message displays whenever the .ruj file is physically extended, which should occur infrequently. o Extending storage area !UL This message displays whenever a storage area file (identified by its numeric identifier, which can be determined using RMU Dump) is physically extended. You can determine the numeric identifier for a database's storage areas by using the RMU Dump command. This message should occur infrequently. However, this message may occur more frequently with WORM areas because WORM area pages cannot be reused once they have been written. o Reading .AIJ file This message displays whenever the AIJ lock information needs to be refreshed; this typically only occurs the first time a user attaches to the database. The .aij file is read to determine the AIJ logical EOF (not to be confused with the OpenVMS logical EOF). It is also read by the database recovery (DBR) process. o Reading ROOT file This message displays whenever the in-memory database root information is determined to be out of date and must be read again from the disk. This message normally occurs only when a database parameter is modified by a user online or some information in the database root is modified by the system (such as the AIJ sequence number). o Reading .RUJ file This message displays whenever an undo operation needs to read the next RUJ page to acquire the rollback information necessary to complete the operation. The .ruj file is read one block at a time. Sometimes a process that is not being rolled back receives this message because it was necessary to read the RUJ file in order to refresh cached recovery information. o Reading pages !UL:!UL to !UL:!UL This message displays whenever one or more pages is read into either a user's local buffer or the global buffer. One buffer full of pages is being read. The format string "!UL:!UL" identifies the physical area and the page number. o waiting for !AD (!AC) This message displays whenever a process requests a lock "with wait" and another process is holding the lock in an incompatible mode. This message may indicate a database hot spot and should be investigated using the RMU Show Locks utility. The format string "!AD" identifies the lock type (that is, storage area, page, MEMBIT, etc.) and the string "!AC" identifies the requested lock mode (PR, CR, EX, etc.). The following list contains information on "waiting for" messages: o waiting for record or page The "waiting for record" and "waiting for page" messages display a process ID, the time, and the DBKEY for a record or a page. The dbkeys in "waiting for record" messages are logical dbkeys. For example: waiting for record 1:0:-4 (CR) waiting for record 91:155:-1 (CW) In this example of the "waiting for record" message, the first two fields of the "waiting for record" message are not shown. The first field of a "waiting for record" message is the process ID of the stalled process, and the second field is the time the stall began. The third field in the "waiting for record" message (the field with the "XX:YY:ZZ" format) represents the DBKEY, and you can usually interpret it as "logical area number:page number:line number." However, only positive numbers represent the line number. When a negative number appears, it refers to the record ALG (adjustable lock granularity) locking level. Negative numbers are interpreted as follows: o -4 indicates the complete logical area o -3 normally indicates 1000 database pages range o -2 normally indicates 100 database pages range o -1 normally indicates 10 database pages range For example, in the second line of the example, the DBKEY occurs in logical area 91 in a range of 10 database pages, one of which is page 155. When you have a logical area number and want to get the physical area name for that logical area, follow these steps: o Issue the following command: $ RMU/DUMP/LAREAS=RDB$AIP db-name o Search the resulting dump for the logical area with that number. o Note the corresponding physical area number. o Issue the following command: $ RMU/DUMP/HEADER db-name Look up the physical area number from the output of the RMU Dump Header command to find the name of the physical area. You can also look up columns RDB$STORAGE_ID or RDB$INDEX_ID in system relations RDB$RELATIONS, RDB$INDICES and RDBVMS$STORAGE_MAP_AREAS to identify the Oracle Rdb entity (table or index) that the DBKEY represents. For a description of the system relations, see the System_Relations help topic by issuing the following command: $ HELP SQL SYSTEM_RELATIONS The page number field in the DBKEY is the number of the page in the corresponding physical area; the line number is the number of the record on that page. The dbkeys in "waiting for page" messages are physical dbkeys, for example: waiting for page 1:727 (PW) In this example of the "waiting for page" message, the first two fields of the "waiting for page" message are not shown. The first field of a "waiting for page" message is the process ID of the stalled process, and the second field is the time the stall began. The DBKEY format for a "waiting for page" message is interpreted as "physical area number:page number". When you have a physical area number and want to get the physical area name for the area, issue the RMU Dump Header command. Then look up the physical area number in the command output to find the name of the physical area. You can also get a conversion table by issuing the following command: $ RMU/ANALYZE/LAREAS/OPTION=DEBUG/END=1 - _$ /OUTPUT=LAREA.LIS db-name This command produces a printable file containing all logical areas, logical id numbers and physical id numbers for a database. CR, CW, and PW in the previous examples are requested lock modes of Concurrent Read, Concurrent Write, and Protected Write. The following table shows the lock compatibility between a current transaction and access modes other transactions can specify: Mode of Current Lock Mode of ____________________ Requested SR SW PR PW EX Lock _______________________________ SR Y Y Y Y N SW Y Y N N N PR Y N Y N N PW Y N N N N EX N N N N N _______________________________ Key to lock modes: SR - Shared Read SW - Shared Write PR - Protected Read PW - Protected Write EX - Exclusive Y - Locks are compatible N - Locks are not compatible ______________________________ Shared Read (SR) and Shared Write (SW) in the table are equivalent to Concurrent Read and Concurrent Write. o waiting for DBKEY scope This message displays when a database user who attached using the DBKEY SCOPE IS TRANSACTION clause has a read/write transaction in progress (giving the user the database key scope lock in CW mode), and a second user who specifies the DBKEY SCOPE IS ATTACH clause (which would give the user the database key scope lock in PR mode) tries to attach. In this situation, the second user's process stalls until the first user's transaction completes. You can specify the database key scope at run time using the DBKEY SCOPE IS clause of the SQL ATTACH statement. If the DBKEY SCOPE IS clause is used with the SQL CREATE DATABASE or SQL IMPORT statements, the setting is in effect only for the duration of the session of the user who issued the statement; the setting does not become a database root file parameter. o waiting for snapshot cursor This message displays when a process tries to start a read-only transaction when snapshots are deferred, there is no current read-only transaction, and a read/write transaction is active. Waiting for snapshot cursor is a normal state if snapshots are deferred. The waiting will end when all read/write transactions started before the first read- only transaction have finished. o waiting for MEMBIT lock For each database, a membership data structure is maintained. The membership data structure keeps track of the nodes that are accessing the database at any given time. The membership data structure for a database is updated when the first user process from a node attaches to the database and when the last user process from a node detaches from the database. The "waiting for MEMBIT lock" message means that a process is stalled because the database's membership data structure is in the process of being updated. o waiting for client lock A client lock indicates that an Rdb metadata lock is in use. The term client indicates that Rdb is a client of the Rdb locking services. The metadata locks are used to guarantee memory copies of the metadata (table, index and column definitions) are consistent with the on-disk versions. The "waiting for client lock" message means the database user is requesting an incompatible locking mode. For example, when trying to drop a table which is in use, the drop operation requests a PROTECTED WRITE lock on the metadata object (such as a table) which is incompatible with the existing PROTECTED READ lock currently used by others of the table. These metadata locks consist of three longwords. The lock is displayed in text format first, followed by its hexadecimal representation. The text version masks out non-printable characters with a dot (.). The leftmost value seen in the hexadecimal output contains the id of the object. The id is described below for tables, routines, modules and storage map areas. o For tables and views, the id represents the unique value found in the RDB$RELATION_ID column of the RDB$RELATIONS system table for the given table. o For routines, the id represents the unique value found in the RDB$ROUTINE_ID column of the RDB$ROUTINES system table for the given routine. o For modules, the id represents the unique value found in the RDB$MODULE_ID column of the RDB$MODULES system table for the given module. o For storage map areas, the id presents the physical area id. The "waiting for client lock" message on storage map areas is very rare. This may be raised for databases which have been converted from versions prior to Rdb 5.1. The next value displayed signifies the object type. The following table describes objects and their hexadecimal type values. Object Type Values ------------------------------------- Object Hexadecimal Value ------------------------------------- Tables or views 00000004 Routines 00000006 Modules 00000015 Storage map areas 0000000E ------------------------------------- The last value in the hexadecimal output represents the lock type. The value 55 indicates this is a client lock. NOTE Because the full client lock output can be long, it may require more space than is allotted for the Stall.reason column and therefore can be overwritten by the Lock.ID. column output. For more detailed lock information, perform the following steps: 1. Press the L option from the horizontal menu to display a menu of lock IDs. 2. Select the desired lock ID. o Writing .AIJ file This message displays whenever a group commit process writes the commit information to the .aij file. In a high throughput environment, the write buffer length will be as close to 64K as possible. o Writing ROOT file This message displays whenever the in-memory database root information is modified by a user on line or some information in the database root is modified by the system (such as the AIJ sequence number). o Writing .RUJ file This message displays whenever a user process writes data page modification information to the .ruj file. This message always precedes the next message. o Writing pages back to database This message displays whenever one or more data pages is written to the database. This is typically caused by a request to access those pages from another process or by detaching from the database. o lock ID The optional lock ID field is displayed only when the stall is the result of a lock request. When other types of stalls occur, such as stalls due to I/O activity, the lock ID field is cleared from the screen. When displayed, the lock ID field shows the lock identification of the resource that is stalled. You can use the lock identification number as input to the RMU Show Locks command to obtain information about processes that own, are blocking, or are waiting for locks. The Active User Stall Messages screen has the following attributes: o The Active User Stall Messages screen reserves an empty line for each process that is accessing the database from another node in the VMScluster. These empty lines are unavoidable because it is always possible for a process on another node in the VMScluster to attach to the database on the current node, thus using the empty line. o If a process is active and running on the same node as the Performance Monitor, the process' PID displays. o The process information remains on that line until it detaches from the database. o The process stall text remains on the screen until it is replaced by a new stall message. o An active stall is identified by the stall starting time being displayed. If the stall is no longer active, the stall starting time is not displayed, but the message text remains displayed. o It is possible to page to multiple screen displays; each process' state is preserved and refreshed when the new page is displayed. When there is more than one page of Active User Stalled Messages output, the header section contains the page number currently displayed and the total number of pages in the screen. Use the left angle bracket (<) key to go to the previous page and the right angle bracket (>) key to go to the next page of the screen. The Active User Stall Messages screen has several advantages over the Stall Messages screen. The advantages are: o The location of a process remains static; because it is fixed on a given page, it is always easy to locate. o The process' last stall message (and lock ID, if applicable) are displayed, even if the process is not currently stalled; this is useful for identifying possible hot spots. However, the Active User Stall Messages screen does have the following disadvantages: o It is difficult (but possible) to isolate the source of a potential deadlock or a long-duration stall; the Stall Messages screen is more useful for this. o It is difficult (but possible) to isolate the set of actively stalled processes from the complete set of processes doing normal database accesses. The following compares the Stall Messages screen and the Active User Stall Messages screen: o Processes displayed? o Stall Messages screen-Displays only actively stalled processes on the current node. o Active User Stall Messages screen-Displays all processes attached to the database on the current node. o Process location? o Stall Messages screen-Dynamic. The position of the process reflects the duration of the stall relative to other processes. o Active User Stall Messages screen-Static. The position of the process remains fixed in the same location until the process detaches from the database. o Display sequence? o Stall Messages screen-Processes are displayed in descending stall-duration sequence. o Active User Stall Messages screen-Processes are displayed in a fixed but arbitrary sequence. o Indication of active stall? o Stall Messages screen-The process stall text is displayed. o Active User Stall Messages screen-The process stall text starting time is displayed. o Indication of inactive stall? o Stall Messages screen-The process stall text is not displayed. o Active User Stall Messages screen-The process stall text starting time is erased (message text remains). o Duration of display? o Stall Messages screen-The stall message is displayed only if the stall is active. o Active User Stall Messages screen-The last stall message remains displayed until the process stalls again) You can force frequent screen updates by using a negative number for the Time qualifier in the RMU Show Statistics command. For example, Time=-10 refreshes the screen every 10/100 (1/10) of a second. Note that you use a lot of system resources, particularly on the smaller CPU machines, when you specify this time interval. If there are more stalls in progress than can fit on your screen, some current stalls might not be displayed. Oracle Rdb attempts to place as many active stall messages on the screen as possible.