The amazing adventures of Doug Hughes

I’m sure everyone has run into an issue that is difficult to isolate on a given system. Sometimes all the code looks fine, and none of the .cfm templates are really standing up and shouting “I’m your problem!” What to do when you are stumped and not sure where else to look for your performance woes?
Enter HPROF.

HPROF is a java profiler that you can invoke by modifying your jvm.config file. On ColdFusion multi server, that config file lives by default at {coldFusion root}/bin/jvm.config. When you open it up, you will see a few configuration variables being set, such as java.home (this tells coldfusion which JVM to use when starting up), java.library.path (sets the location for shared libraries to be loaded), and more, but the one we are concerned with is java.args. By default it will look something like this:

java.args=-server -Xmx512m -Dsun.io.useCanonCaches=false -XX:MaxPermSize=192m -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/

Once you modify your jvm.config file with the HPROF arguments (simply add the -agentlib:hprof=cpu=samples, or the sites argument, after the -server argument), you are instructing the JVM to gather the profiler information that you specify (in the hprof args), then once the JVM shuts down, dump that collected data to a file. This file created, by default, is called java.hprof.txt, and will be located in your {coldfusion root}/bin folder. So, to recap:

  1. Modify your jvm.config file to add the -agentlib arguments as defined in bold below
  2. Restart coldfusion with the modified config file
  3. Run a good sample of your application (my preference is to run some jmeter testing)
  4. Stop coldfusion (which triggers the generation of the java.hprof.txt file)

HPROF is a tool that is meant to be used locally for development, or on a test or staging server. Running most of these arguments *WILL* impact performance of your application, so please do not run this on a production server unless all other avenues of investigation have been exhausted. HPROF has many different switches, but there are 2 that I have found to be the most useful so far.
-agentlib:hprof=cpu=samples
Adding this switch is asking the JVM to gether time data for each method call. The report that comes back will give you the java class that is consuming CPU the most, in a list that goes from highest consumer to lowest. In the following output sample, this client has some database contention going on that is holding up other coldfusion threads, causing scalability issues.

rank
self
accum
count
trace
method
1 36.64% 36.64%
104470
300685
java.net.SocketInputStream.socketRead0
2 31.45% 68.08%
89676
301551
java.net.PlainDatagramSocketImpl.receive0
3 20.35% 88.43%
58023
300553
java.net.PlainSocketImpl.socketAccept
4 10.40% 98.83%
29655
301093
java.net.SocketInputStream.socketRead0
5 0.09% 98.92%
258
301720
com.rsa.jsafe.crypto.ap.a

As you can see, the report goes top down, from the most CPU time spent in a method, down to the least (not listed here for brevity). It gives you the count (number of times this method was invoked), the trace address (this refers to a stack trace ID, which is listed prior to this pretty list in the output file, and can be helpful in finding how these methods are called), and the method itself (which will quickly show you what exactly is being invoked). To find the associated stack trace, simply copy the trace number from this table, then search for it within the report. With this sample output, all of the classes are java, but I have seen CFC’s in a stack trace show as large consumers of resources as well. This is a great investigative tool when you are not sure what is taking up so much CPU, or why your application is not running as quickly as it should.
-agentlib:hprof=heap=sites,depth=4
This jvm.args switch instructs hprof to provide an analysis of the contents of your heap memory at the time the application exits. The sites argument tells the JVM to group the output based on a stack trace of a specific depth. The depth argument requests a specific depth of stack trace (each depth level will make your file take that much longer, and be that much larger, so only ask for stack traces of a minimal depth to figure out where the code is originating from). This is a pretty poor sample, but if you have an object you are instantiating and its not going away, it will stand out very quickly when you view this output:

rank
self
accum
live bytes
live objs
alloc’ed bytes
alloc’ed objs
stack trace
class name
1 29.94% 29.94% 170401344 2812 172626464 3148 461666 byte[]
2 20.44% 50.38% 116307712 601 121153664 700 461595 byte[]
3 10.82% 61.20% 61554880 514 61554880 514 461600 byte[]

This output is telling you that 29% of the objects in heap are of a type byte[] (an array of bytes), that there are less bytes marked as live than there are allocated (some of the records have been marked by the garbage collector as ready to be collected), and that these byte[] objects were created with methods that are detailed in stack trace 461666. That stack trace looks like this:

TRACE 461666:
com.adobe.fontengine.font.FontByteArray.<init>(FontByteArray.java:56)
com.adobe.fontengine.font.FontByteArray.<init>(FontByteArray.java:63)
com.adobe.fontengine.font.opentype.FontFactory$TableToLoad.load(FontFactory.java:218)
com.adobe.fontengine.font.opentype.FontFactory$TableToLoad.load(FontFactory.java:259)

Sometimes these traces can be helpful in isolating the source of an issue, or at least giving you some direction from which you can base your investigation on. When you have a stumper, sometimes a direction is all you need to get on the path to a resolution.

Comments on: "Using hprof with Coldfusion to troubleshoot slow code issues" (2)

  1. Doug Hughes said:

    Chris – Can you translate the two tables above and tell us what you see in them? The top one looks to me like there’s a lot of network-related issues eating up CPU time. Could this be something like a slow database connection or something?

    Like

  2. Chris Peterson said:

    Yes, the top table is showing me that nearly 100% of the time that the CPU spent processing in this application (which is comprised of both Coldfusion base classes, and my .cfm / .cfc pages), was spent while talking to a socket object, aka communicating over TCP/IP. While this could be something like cfftp or cfhttp requests, my background knowledge of this application led me to further investigate their use of a PostgreSQL database. It turns out that this holdup was being caused by a cftransaction wrapping a database lock and a query. This was intended to insure that users would not have a race condition when talking to their user information table, but instead it was causing the entire application to have to basically stand in a single file line to speak to the database. After some re-factoring we got this issue worked out, while maintaining the integrity of their data, using some less-global locking.

    The bottom table, in this instance, really is not helping us very much (as the issue was with a TCP/IP connection, and not a full heap memory issue). If you are getting out of memory issues, or having gradual memory creep over time, this 2nd method using the hprof=heap=sites will help show you objects that are staying in memory, and where they came from.

    Like

Comments are closed.

Tag Cloud

%d bloggers like this: