Time measurements in different CPU views |
|
Most likely you'll be interested in the wall clock time, that is the duration between the entry and the exit of a method as measured with a clock. For the profiling agent this is a straightforward measurement. While it might seem at first glance that measuring times should not have any significant overhead, this is not so if you need a high resolution measurement. Operating systems offer different timers with different performance overheads.
For example, on Microsoft Windows, the standard timer with a granularity of 10 milliseconds is very fast, because the operating system "caches" the current time. However, the duration of method calls can be as low as a few nanoseconds, so a high resolution timer is needed. A high resolution timer works directly with a special hardware device and carries a noticeable performance overhead. In JProfiler, CPU recording is disabled by default, however, call tree collection is always enabled. If you compare the duration of the startup sequence of an application server with and without CPU recording, you will notice the difference.
Wall clock time is measured separately for each thread. In CPU views where the thread selection
includes multiple threads, the displayed times can be larger than the total execution time of
the application. If you have 10 parallel threads of the same class MyThreadClass
whose run() method take 1 second and "All threads" is selected in the invocation tree,
the MyThreadClass.run() node in the invocation tree will display 10 seconds, even though
only one second has passed.
Since the CPU might be handling many threads with different priorities, the wall clock time is not the time the CPU has actually spent in that method. The scheduler of the operating system can interrupt the execution of a method multiple times and perform other tasks. The real time that was spent in the method by the CPU is called the CPU time. In extreme cases, the CPU time and the wall clock time can differ by a large factor, especially if the executing thread has a low priority.
The standard time measurement in JProfiler is wall clock time. If you wish to see the CPU time in the CPU views, you can change the measurement type in the profiling settings. The problem with CPU time measurement is that most operating systems provide this information with the granularity of the standard timer - high resolution measurements would carry to much overhead. This means the CPU times are only statistically valid for method that have a CPU time bigger that the typical granularity of 10 milliseconds.
The notion of time measurement must be refined further, since not all times are equally interesting. Imagine a server application with a pool of threads that waiting to perform a task. Most of the time would then be in the method that keeps the threads waiting while the actual task will only get a small part of the overall time and will be hard to spot. The necessary refinement is done with the concept of thread status. There are 4 different thread statuses in JProfiler:
Object.wait() and the current thread
will only become runnable again when some other thread calls Object.notify()
on the same object.
When looking for performance bottlenecks, you're mostly interested in the "Runnable" thread state although it's always a good idea to have a look at the "Net I/O" and "Blocking" thread states in order to check if the network or synchronization issues are reducing the performance of your application.
Method nodes in the invocation tree are sorted by total time. This is the sum of all execution times of this method on the particular call path as given be the ancestor nodes. Only threads in the current thread selection are considered and only measurements with the currently selected thread status are shown.
Optionally, the invocation tree offers the possibility to show the inherent time of a method. The inherent time is defined as the total time of of a method minus the time of its child nodes. Since child nodes can only be unfiltered classes, calls into filtered classes go into the inherent time. If you change your call tree collection filters, the inherent times in the invocation tree can change.
While the invocation view shows all call stacks in your application, the hotspot view shows the methods that take most of the time. Each method can potentially be called through many different call stacks, so the invocation counts in the invocation tree and the hotspots view do not have to match. The hotspot view shows the inherent time rather than the total time. In addition, the hotspot view offers the option to include calls to filtered classes into the inherent time. Please see the article on hotspots and filters for a thorough discussion of this topic.
When you open a hotspot node, you see a reverse invocation tree. However, the times that are displayed in those backtraces do not have the same meaning as those in the invocation tree, since they do not express a time measurement for the corresponding method node. Rather, the time displayed at each method node indicates how much time that particular call tree contributes to the hot spot. If there is only one backtrace, you will see the hotspot time at each node.
The times that are shown for method nodes in the method graph are the same as those in the hotspots view, since the method graph is also method-centric. The times that are associated with the incoming arrows are the same as those in the first level of the hot spot backtrace, since they show all calling methods and the cumulated duration of their calls. The time on the outgoing arrows is a measurement that cannot be found in the invocation tree. It shows the cumulated duration of calls from this method, while the invocation tree shows the cumulated duration of calls from the current call stack.
The CPU statistics offers three types of statistics: method, class and package statistics. Time measurements in the method statistics are the same as those in the hotspot view, only all recorded methods are shown and not only hotspot methods. The class and package statistics are cumulations of the method statistics. It is important to remain aware of the fact that the filter settings influence the class and package statistics just as they influence the hotspot times.