PCA tells me that a large amount of time is being spent at a
CALL instruction. Why? The CALL instruction should only consume a
small part of the time spent executing the routine.
First, check page faulting. Sometimes the faulting behavior of
a program causes a moderately called routine to get paged out
just before it is called. If that isn't the case, check for JSB
linkages to an RTL routine.
For performance reasons, some RTL routines use JSB linkages. This
can cause confusion for the user when the /MAIN_IMAGE qualifier
is used. This is especially true with PC sampling data, but can
occur with any kind of data for which you can gather stack PC
data.
Because a JSB linkage does not place a call frame on the stack,
the return address to the site of the call is lost to PCA.
Consequently, the first return address found by /MAIN_IMAGE is
the site of the call to the routine that called the RTL by means
of a JSB linkage. As an example, suppose routine MAIN called
routine FOO which in turn called the RTL via a JSB linkage.
Then, suppose that a PC sampling hit occurred in the RTL. This
will cause the PC of the call to FOO and the PC of the call to
MAIN to be recorded. Thus, in the presence of the /MAIN_IMAGE
qualifier, the first PC within the image is the PC of the call to
FOO. Consequently, FOO's call site will be inflated by the number
of data points in the RTL that are in routines which have JSB
linkages.
Note that the above can yield useful information. If you compare
the time with /MAIN to the time without /MAIN, you can tell how
much time was spent in JSB linkage routines. You cannot, however,
separate the various JSB linkage routines. Note further that if
the JSB routine is called from the main program, the data points
will be lost because there is no caller of the main program.