The amazing adventures of Doug Hughes

This is a real world and recent example of tuning a ColdFusion application built in ColdSpring, ModelGlue and Reactor. Here are the details of the system and software this application is running on.

  • Windows Server 2003 – IIS6
  • ColdFusion 8 Enterprise with Sun JVM 1.6
  • ColdSpring
  • ModelGlue 2.0.304(2)
  • Reactor revision 4.29

In the interests of really making this posting interesting I am going to start with two graphs. Both show the performance of the application under load using a load test tool with scripts modeled from user behavior. This first graphic shows response times in our first load tests and before any tuning.

Before

This graphic shows a spectrum of request response times during a 15 Minute, 50 vUser load test with 8 second think times between “clicks. The bottom left access shows request response times with the fastest on the left progressing to the slowest at the bottom right axis point. The bottom right to farthest right shows the duration of the test, from the start at the axis point to the end at the far right.

I realize the numbers are hard to read but by looking at the 3D colored bars we can see that most of the requests response times are in the mid to high range, many of them being in excess of 5 seconds in duration. The next graphic shows the performance results of the application running the same 15 Minute, 50 vUser load test with 8 second think times load test, after JVM, code and SQL tuning.

After

The bottom left access shows request response times with the fastest on the left progressing to the slowest at the bottom right axis point. The bottom right to farthest right shows the duration of the test, from the start at the axis point to the end at the far right. I realize the numbers are hard to read but by looking at the 3D colored bars we can see that most of the requests response times are now much faster, many of them under 1 second in duration. So that is the graphical representation of what we achieved, here is what we achieved and how we achieved it.

The JVM

I firmly believe in starting by optimally tuning the JVM before the application code, SQL etc. My reason is this; in the web application world, most applications consist of objects that are created and hopefully destroyed fairly quickly. That is because the request lifecycle is fairly short lived, as opposed to something like a client-server paradigm where objects can and often will remain referenced for considerable amounts of time. For this reason there are typically some overall good practices that I have found to be beneficial, here they are.

Before making any changes to the JVM I enabled verbose garbage collection logging and referred to that log throughout the testing. Full details of how to do that can be found in This previous blog posting.

  • Set a Start Size for the JVM Memory Space (The Heap). This is done by passing the Xms{number}m (where {number} is a numeric value) argument to the JVM at start up. I would typically set this to 50% of the size set in the Xmx setting. If there is no Xmx setting I have found that safe “rule of thumb” settings of -Xms512m and -Xmx1024m, where a server has a minimum of 2GB of Ram. I like to do this by adding it directly to the jvm.config file in the Arguments to VM. The reason that I favor directly editing the jvm.config file is because there have been occasions where doing this via the CF Admin GUI has caused problems with the jvm.config file not being written correctly. In fairness, this was in earlier versions of ColdFusion. For the server I am reporting on in this test the setting was set to -Xms512m -Xmx1024m.
  • Another good practice is the set a start size for the Permanent Generation of the JVM Heap. On looking at the verbose garbage collection logs I often see this generation, also known as the permgen; fills up. The permanent generation is where the Servlet Container (JRun) has it’s working objects and I like to set both a start and maximum size by passing these settings via the Arguments to VM – -XX:PermSize=128m -XX:MaxPermSize=192m. These are the settings for the server in this test and again are usually fine on a system with a minimum of 2GB of RAM. The permanent generation is in addition to the other maximum and minimum that we set in the paragraph above, not a part of it.
  • Full Garbage Full GC’s) collections are another issue which comes very much down to tuning to find the right balance of enough Full GC’s and not too many. From what I have seen and this is a general observation. the Sun 1.4x and prior JVM’s seem to run Full GC’s at 60 second intervals. The 1.5 and 1.6 JVM’s seem to run too few Full GC’s, I have seen almost out of memory issues and the Xmx (maximum heap size setting) does not always kick in properly on 1.5 and 1.6 JVM’s. Typically I set explicit Full GC’s to occur on a regular basis, I have found that setting them to occur at 10 minute intervals is a good interval. However, in the case of this particular system I was load-testing, I was still seeing the heap running low on memory. So I set Full GC’s to run at 5 minute intervals with the following via the Arguments to VM -Dsun.rmi.dgc.client.gcInterval=300000 -Dsun.rmi.dgc.server.gcInterval=300000. I am still working through what effects the 1.5 and 1.6 Sun JVM’s have on ColdFusion and Flex/Air based applications.

All the settings here were arrived at by testing different settings against both the default CF8 JVM (Sun 1.6) and also the 1.5 as suggested in other postings to be a better JVM for cfc heavy apps; as this one certainly is. In actual fact and after doing lots of load testing the default 1.6 JVM proved to be much more responsive; I will enumerate this shortly.

ColdFusion

The first thing I did in ColdFusion was to establish Metrics Logging and split out the JRun logs by using the log.level modifier. This is really useful in my opinion, as the state of the memory, number of threads in use and if enabled number of current sessions are shown at regular intervals in the metrics log and the out log. This latter log is particularly useful in troubleshooting as all errors end up there. Being able to see problems and the state of threads and memory at or around the same time can be really helpful in pinning down cause and effect.

There were two settings in CF Admin that I also set, the first was to turn on Trusted Cache which is recommended for Production servers. With a formal deployment process in place, Production code should only change as and when needed, so Trusted Cache is a very beneficial setting. In addition I turned off the saving of Class Files to disk. Once again, these changes were made in a controlled fashion and load tested after changing to determine their effects. At this point I had tuned the JVM and ColdFusion and as a result we saw average request response times drop from over 10 seconds to just over 4 seconds; so this was a good start achieved by server side tweaks only.

SeeFusion

This is a ColdFusion 8 installation so I am sure there are those who could rightly question why I would choose to use a third-party tool in addition. I have used SeeFusion since before it was officially launched; I was a beta tester. What I found is that SeeFusion would often continue to respond after ColdFusion had stopped responding altogether. This is because SeeFusion runs at the Java layer and the Servlet Container and the JVM will often continue responding and reporting after the Application server has failed. In addition, it is very easy to set up SeeFusion to log information to a database and I had set this up before starting the load testing. This information enabled us to now start homing in on code level-SQL issues. Here is a screen shot taken during the initial load test of the JVM 1.5 tuning, this may be hard to read but we are almost out of heap memory, indicated by the yellow bar graphic at top left and the bottom left graph; I have found that both 1.5 1nd 1.6 Sun JVM’s have different memory handling characteristics to 1.4 and prior in the sense that Full GC’s occur far less frequently; hence my comments above in the JVM tuning section.

Here are some blow-ups of the “almost-out-of-heap-memory” parts of the SeeFusion monitored load test. Firstly the bar chart showing a yellow warning status…

And here is the bottom right graph showing the rapid rise in memory usage as the tests progressed…

Code and SQL

At this point I consulted the considerable programming-architectural talents of Alagad. By using the SeeFusion data logged into a SQL Server database we began to focus on the slower responding requests. Here are some examples with some pointers to what we interpreted them to be.

In this sort of situation we can see that the “TotalTimeMS” and “TimeToFirstByteMs” are very close but that “TotalQueryTime” is low; these are all milliseconds by the way. What this suggests is that there is either a network-communication issue back to the browser or a lot of ColdFusion churn to produce the GUI.

Here we see a fairly long “TotalTimeMs” with “NumQueries” of 16. We identified this as possibly a lot of queries for a single request and were able to reduce this to 3 queries after investigation.

Here we see “TotalQueryTime” here shows as 19095 which of course is very long so we looked at the SQL involved in this query and were able to reduce the run time down to just over 2 seconds. We carried on identifying long running code and queries, doing this via the SeeFusion data, in reality there are usually not that many different slow requests and by using this methodology we improved performance dramatically by methodically dealing with each prioritized issue.

This was a typical, successful application tuning engagement. At the commencement of the assignment we had slow application response time. In our initial load testing a 50 concurrent vUser (virtual user) test produced average request times in excess of 10 seconds. By the end of the engagement we had average page response times at just over 500MS.

After tuning the JVM and the application, I wanted to go back and run some more tests on both the 1.5 and 1.6 JVM’s just to make sure that the results I had been getting throughout this assignment, were still borne out; that 1.5 was not more efficient than 1.6 in ongoing usage; I had NOT set out to prove that by the way, in fact it was the first thing I tried at the onset of load-testing was to switch to 1.5.

Firstly, here is the summary of the 50 vUser test on the default CF 8 JVM (Sun Java 6 or 1.6):

** Results per URL for complete test **
URL#1 (): Average Click Time 302 ms, 300 Clicks, 0 Errors
URL#2 (): Average Click Time 237 ms, 300 Clicks, 0 Errors
URL#3 (): Average Click Time 244 ms, 300 Clicks, 0 Errors
URL#4 (): Average Click Time 241 ms, 300 Clicks, 0 Errors
URL#5 (): Average Click Time 248 ms, 300 Clicks, 0 Errors
URL#6 (): Average Click Time 191 ms, 300 Clicks, 0 Errors
URL#7 (): Average Click Time 194 ms, 300 Clicks, 0 Errors
URL#8 (): Average Click Time 210 ms, 283 Clicks, 0 Errors
URL#9 (): Average Click Time 254 ms, 270 Clicks, 0 Errors
URL#10 (): Average Click Time 299 ms, 257 Clicks, 0 Errors
URL#11 (): Average Click Time 488 ms, 251 Clicks, 0 Errors
URL#12 (): Average Click Time 4,573 ms, 250 Clicks, 0 Errors
URL#13 (): Average Click Time 972 ms, 250 Clicks, 0 Errors
URL#14 (): Average Click Time 501 ms, 250 Clicks, 0 Errors
URL#15 (): Average Click Time 279 ms, 250 Clicks, 0 Errors
URL#16 (): Average Click Time 249 ms, 250 Clicks, 0 Errors
URL#17 (): Average Click Time 333 ms, 250 Clicks, 0 Errors
URL#18 (): Average Click Time 1,370 ms, 250 Clicks, 0 Errors
URL#19 (): Average Click Time 615 ms, 250 Clicks, 0 Errors
Total Number of Clicks: 5,161 (0 Errors)
Average Click Time of all URLs: 590 ms

Here is the summary of the same 50 vUser test on the Sun Java 5 or 1.5.

** Results per URL for complete test **
URL#1 (): Average Click Time 1,078 ms, 254 Clicks, 0 Errors
URL#2 (): Average Click Time 877 ms, 250 Clicks, 0 Errors
URL#3 (): Average Click Time 1,111 ms, 250 Clicks, 0 Errors
URL#4 (): Average Click Time 1,450 ms, 250 Clicks, 0 Errors
URL#5 (): Average Click Time 1,445 ms, 250 Clicks, 0 Errors
URL#6 (): Average Click Time 888 ms, 250 Clicks, 0 Errors
URL#7 (): Average Click Time 683 ms, 250 Clicks, 0 Errors
URL#8 (): Average Click Time 496 ms, 250 Clicks, 0 Errors
URL#9 (): Average Click Time 485 ms, 250 Clicks, 0 Errors
URL#10 (): Average Click Time 492 ms, 250 Clicks, 0 Errors
URL#11 (): Average Click Time 1,006 ms, 250 Clicks, 0 Errors
URL#12 (): Average Click Time 8,400 ms, 250 Clicks, 0 Errors
URL#13 (): Average Click Time 2,254 ms, 250 Clicks, 0 Errors
URL#14 (): Average Click Time 1,380 ms, 250 Clicks, 0 Errors
URL#15 (): Average Click Time 810 ms, 250 Clicks, 0 Errors
URL#16 (): Average Click Time 709 ms, 242 Clicks, 0 Errors
URL#17 (): Average Click Time 749 ms, 221 Clicks, 0 Errors
URL#18 (): Average Click Time 3,849 ms, 211 Clicks, 0 Errors
URL#19 (): Average Click Time 1,743 ms, 206 Clicks, 0 Errors
Total Number of Clicks: 4,634 (0 Errors)
Average Click Time of all URLs: 1,559 ms

I have found that the only way to really determine the correct garbage collector to use for your applications and what the correct arguments are to pass to the JVM, via the jvm.config file, to effectively manage garbage collections is to first determine how many Full GC’s are occurring and control that if there are too many. Then load-test the application whilst tuning the arguments passed to the JVM, typically via the jvm.config file.

Comments on: "Details of Tuning A ColdSpring, Model Glue, Reactor Application On ColdFusion 8" (12)

  1. Brian Kotek said:

    I’m interested to hear more. Without a doubt, application startup time on CFC-heavy applications is far, far faster on 1.5 than on 1.6. Since this seems to be the major pain point, and since application startup is single-threaded (meaning load isn’t an issue there that I can see), I think that is why people are making this change. If the actual running of the fully-initialized application is of similar or slightly slower speed on 1.5, that may well be a worthy tradeoff. I know of many people who were seeing startup times in the MINUTES on 1.6, which is clearly unacceptable unless you have the luxury of a cluster of servers where you initialize the app in isolation and then make it available in the cluster. I’ll stay tuned to see what you’ve found.

    Like

  2. Mike Brunt said:

    @Brian, thanks for your comment and I have read of the issues being faced in some cases with the 1.6 JVM. I will publish details of what we did and what the results were. My thought is thia, as we hopefully should not have to restart Production instances very often (if we do have to restart often I would want to know why), that ongoing performance is far more important to most site users.

    Like

  3. Jaime Metcher said:

    This isn’t just about startup.

    1. Frameworks that do significant amounts of on-the-fly generation of CFCs (like Transfer) will provoke the class loader slowdown any time they are initialized – which will usually, but not necessarily, be on server startup.

    2. The other situation in which this can be a problem is when classes are dropped from the tag cache and have to be reloaded – which, once again, heavy code generators are more likely to provoke, and by definition will happen after server startup.

    3. Finally, every cfm page is compiled into a class file when it is first requested, so if you have a lot of pages the slowdown might be triggered arbitrarily by any heavy access of a particular site.

    Like

  4. Mike Brunt said:

    @Jamie thanks for your comments and insights. I am actually going to follow this up with a detailed run through of a recent tuning engagement I was working on. All of the cfc based frameworks I mentioned here ColdSpring, Model Glue and Reactor do have Production modes, as you are probably aware. So in theory they should need to initialized only the first time they are used, the cfc’s that is. This can be overridden though either by hard coding the mode in other than Production or by passing in an initialization command typically via URL.

    So you are quite correct it may not only be at server-instance start-up. When I get the time I will re-test the application I was tuning and pass in an initialization command whilst a load test is running.

    Like

  5. Aaron Longnion said:

    Thanks for the thorough analysis and write-up. The CF community really needs more information like this for tuning our apps under load… especially if we want the rest of the world to respect CF as enterprise-class software.

    I’d also be interested to see what your findings are running similar tests against procedural, non-CFC-heavy CF code, as many apps in the real world are running like this in my experience.

    Thanks so much!

    Like

  6. Mike Brunt said:

    @Aaron, thank you very much for the comments and I fully agree with you about pushing the perception of ColdFusion or more correctly CFML higher in Enterprise environments. The other major aspect of Enterprise CF that I will be blogging more on is the HA-Clustering aspect both internally and when CF is clustered with Hardware Clustering Device.

    I also agree with you regarding non OO implementations of CF which are still the vast majority of architectures out there. The relevance of this particular article was driven by the fact that these three paradigms all exist in the application scope and as such will in theory create more classes in the JVM heap memory.

    Like

  7. Leon Miller-Out said:

    Mike, this is a great article! Thanks for sharing with us.

    I was wondering what tools you use for running your virtual users and generating those cool 3D graphs. Can you fill me in?

    Like

  8. Mike Brunt said:

    @Leon, thanks for the kind comments. I use a tool from a German company called Paessler GmbH their tool is “Web Server Stress Tool”. It is a great tool for the price, as higher end tools can cost into the tens of thousands of dollars and the Paessler tool is in the hundreds of dollars .

    Like

  9. Curious: Have any of you looked at this tool? http://www.microsoft.com/downloads/details.aspx?FamilyID=e2c0585a-062a-439e-a67d-75a89aa36495&DisplayLang=en

    And thank you guys at Alagad for these great posts.

    Like

  10. Mike Brunt said:

    @noname sorry we don’t have a name on your comments and thanks for your words. I did work with clients who used the Microsoft tool but have not used it myself. The Paessler tool is perfect for what I have needed because clients often want some sort of report and Paessler tool produces good looking reports fairly painlessly. It is also very easy to get up and running quickly yet can be extended with .csv, .txt and even scripts if needed.

    Like

  11. Brian Kotek said:

    Mike, out of curiosity, did you ever revisit this to check for the difference in startup times due to the Java 6 classloader bug? I’m still interested to know under what conditions it is worth leaving it at 1.6 or downgrading to 1.5. Thanks!

    Like

  12. Mike Brunt said:

    @Brian I am still sifting through this, the application that sparked part of the 1.6 to 1.5 debate has some serious overall issues which go being slow instantiation.

    There are a series of load tests I will be blogging about which ostensibly relate to clustering which will revisit this issue.

    Like

Comments are closed.

Tag Cloud

%d bloggers like this: