Using hprof with Coldfusion to troubleshoot slow code issues
I’m sure everyone has run into an issue that is difficult to isolate on a given system. Sometimes all the code looks fine, and none of the .cfm templates are really standing up and shouting “I’m your problem!” What to do when you are stumped and not sure where else to look for your performance woes?
Enter HPROF.
HPROF is a java profiler that you can invoke by modifying your jvm.config file. On ColdFusion multi server, that config file lives by default at {coldFusion root}/bin/jvm.config. When you open it up, you will see a few configuration variables being set, such as java.home (this tells coldfusion which JVM to use when starting up), java.library.path (sets the location for shared libraries to be loaded), and more, but the one we are concerned with is java.args. By default it will look something like this:
java.args=-server -Xmx512m -Dsun.io.useCanonCaches=false -XX:MaxPermSize=192m -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/
Once you modify your jvm.config file with the HPROF arguments (simply add the -agentlib:hprof=cpu=samples, or the sites argument, after the -server argument), you are instructing the JVM to gather the profiler information that you specify (in the hprof args), then once the JVM shuts down, dump that collected data to a file. This file created, by default, is called java.hprof.txt, and will be located in your {coldfusion root}/bin folder. So, to recap:
- Modify your jvm.config file to add the -agentlib arguments as defined in bold below
- Restart coldfusion with the modified config file
- Run a good sample of your application (my preference is to run some jmeter testing)
- Stop coldfusion (which triggers the generation of the java.hprof.txt file)
HPROF is a tool that is meant to be used locally for development, or on a test or staging server. Running most of these arguments *WILL* impact performance of your application, so please do not run this on a production server unless all other avenues of investigation have been exhausted. HPROF has many different switches, but there are 2 that I have found to be the most useful so far.
-agentlib:hprof=cpu=samples
Adding this switch is asking the JVM to gether time data for each method call. The report that comes back will give you the java class that is consuming CPU the most, in a list that goes from highest consumer to lowest. In the following output sample, this client has some database contention going on that is holding up other coldfusion threads, causing scalability issues.
rank
|
self
|
accum
|
count
|
trace
|
method
|
---|---|---|---|---|---|
1 | 36.64% | 36.64% |
104470
|
300685
|
java.net.SocketInputStream.socketRead0 |
2 | 31.45% | 68.08% |
89676
|
301551
|
java.net.PlainDatagramSocketImpl.receive0 |
3 | 20.35% | 88.43% |
58023
|
300553
|
java.net.PlainSocketImpl.socketAccept |
4 | 10.40% | 98.83% |
29655
|
301093
|
java.net.SocketInputStream.socketRead0 |
5 | 0.09% | 98.92% |
258
|
301720
|
com.rsa.jsafe.crypto.ap.a |
As you can see, the report goes top down, from the most CPU time spent in a method, down to the least (not listed here for brevity). It gives you the count (number of times this method was invoked), the trace address (this refers to a stack trace ID, which is listed prior to this pretty list in the output file, and can be helpful in finding how these methods are called), and the method itself (which will quickly show you what exactly is being invoked). To find the associated stack trace, simply copy the trace number from this table, then search for it within the report. With this sample output, all of the classes are java, but I have seen CFC’s in a stack trace show as large consumers of resources as well. This is a great investigative tool when you are not sure what is taking up so much CPU, or why your application is not running as quickly as it should.
-agentlib:hprof=heap=sites,depth=4
This jvm.args switch instructs hprof to provide an analysis of the contents of your heap memory at the time the application exits. The sites argument tells the JVM to group the output based on a stack trace of a specific depth. The depth argument requests a specific depth of stack trace (each depth level will make your file take that much longer, and be that much larger, so only ask for stack traces of a minimal depth to figure out where the code is originating from). This is a pretty poor sample, but if you have an object you are instantiating and its not going away, it will stand out very quickly when you view this output:
rank
|
self
|
accum
|
live bytes
|
live objs
|
alloc’ed bytes
|
alloc’ed objs
|
stack trace
|
class name
|
---|---|---|---|---|---|---|---|---|
1 | 29.94% | 29.94% | 170401344 | 2812 | 172626464 | 3148 | 461666 | byte[] |
2 | 20.44% | 50.38% | 116307712 | 601 | 121153664 | 700 | 461595 | byte[] |
3 | 10.82% | 61.20% | 61554880 | 514 | 61554880 | 514 | 461600 | byte[] |
This output is telling you that 29% of the objects in heap are of a type byte[] (an array of bytes), that there are less bytes marked as live than there are allocated (some of the records have been marked by the garbage collector as ready to be collected), and that these byte[] objects were created with methods that are detailed in stack trace 461666. That stack trace looks like this:
TRACE 461666: com.adobe.fontengine.font.FontByteArray.<init>(FontByteArray.java:56) com.adobe.fontengine.font.FontByteArray.<init>(FontByteArray.java:63) com.adobe.fontengine.font.opentype.FontFactory$TableToLoad.load(FontFactory.java:218) com.adobe.fontengine.font.opentype.FontFactory$TableToLoad.load(FontFactory.java:259)
Sometimes these traces can be helpful in isolating the source of an issue, or at least giving you some direction from which you can base your investigation on. When you have a stumper, sometimes a direction is all you need to get on the path to a resolution.