The amazing adventures of Doug Hughes

I am just going to summarize quickly what we did to tune a client CF Instance over the past week. What we did came from load-testing in our LA Lab that was carried out against the client application which had been running very slowly. This was on a Windows 2003, CF8 Enterprise install. The application was created using Model Glue, Reactor and ColdSpring. As a note point, we effected a good improvement in the application without making any code changes as yet, the improvements we got are simply from JVM-server side tweaks.

The first thing we did is add verbose garbage collection logging, this showed me that we were getting almost out of memory before a major garbage collection or Full GC. This was with the default 1.6 JVM which comes with CF8. At this point we switched to the 1.5 JVM which is recommended in some circles for “cfc-heavy” applications. This actually gave no improvement.

Analyzing the verbose garbage collection statistics we found that the permanent part of the generation, “permgen” (which is where JRun-CF sit) was running out of memory regularly. So we added a start size to permgen of 128MB and added an argument to run explicit Full GC’s every 5 minutes as with Java 1.5 and 1.6 Full GC’s do not seem to run Till we are literally almost out of memory.

I also added a start size of 512MB to the overall heap size. The permgen, by the way, is in addition to the overall heap, not part of it. I also made two changes in CF Admin, I turned off the saving of class files and turned on Trusted Cache. Trusted cache is not the beast it used to be and is well worthy of use in Production environments. The net effect was this, when I started a 90 minute 50 concurrent user load test on the client application before the changes we got these results…

  • Total Number of Clicks: 17,136 (10 Errors)
  • Average Click Time of all URLs: 7,638 ms

After the various changes shown above the same loadtests produced these results…

  • Total Number of Clicks: 26,153 (0 Errors)
  • Average Click Time of all URLs: 2,061 ms

I believe that probably 90% of CF web sites out there need to have the server/JVM tuned (actually whilst I was with Allaire-MM every server- JVM<CFMX on> to needed to be tuned). This issue is that few people look in the logs at all and if there is slowness they typically throw hardware at it and sometimes throw ColdFusion out. If you are having performance problems or if you just want to know what is going on with the JVM and Garbage Collection, ping us, we really can help.

Comments on: "Don't Throw ColdFusion Out! We Can Help." (39)

  1. Marc Esher said:

    Great post, Mike.

    I’m curious: what do you think would’ve been the net effect if you had done all changes except the every-5-minute GC? And, if you had to say where you got the greatest mileage, in this specific case, could you pinpoint it?

    Finally, with the explicit GC every 5 minutes, did you notice that every 5 minutes the server bogged?

    Thanks.

    Marc

    Like

  2. Mike Brunt said:

    @Marc, thanks for your comment and questions Marc. I actually tried many combinations including a 10 Minute Explicit Full GC and none at all with the other changes applied. There is a considerable difference from 1.3 x and 1.4x JVM’s and 1.5 – 1.6. In 1.3 and 1.4 a default CF install always ended up with Explicit Full GC’s every 60 seconds. I typically would set this to every 10 minutes. In 1.5 and 1.6 it seems that in applications with a lot of classes-objects, such as those heavily based on cfc’s, the default install has too few Full GC’s resulting is an almost out of memory problem before a Full GC occurs. With this client application, even 10 minutes was too infrequent so I found 5 to be best.

    One take away from this is that different apps need different settings to perform optimally. In addition it is imperative to tune server-JVM before working on code fixes. Those are my findings.

    Like

  3. Marc Esher said:

    Fascinating, Mike. Were you able to gather about how long the Full GCs took after you put them down to 5 minutes? And have you gotten much mileage out of hooking up a profiler like Yourkit to a CF app?

    Thanks.

    Marc

    Like

  4. Mike Brunt said:

    @Marc, yes regarding Full GC times the average was .7 seconds, I always check that number. No regarding a profiler because I use SeeFusion which usually gives me enough information relating directly to requests, queries run and the stack. I will check out Yourkit though.

    Like

  5. Can you post here what your JVM Arguments look like?

    For CF 7 is 1.5 recommended? Using 1..4.2_13 now.

    Like

  6. Mike Brunt said:

    @Marc, tried to install Yourkit and add JRun as a profiled server and hit this error ERROR: JDWP unable to get necessary JVMTI capabilities.
    [YourKit Java Profiler 7.0.10] Using JVMTI (1.6.0_01-b06;Sun Microsystems Inc.;mixed mode, sharing;Windows;32 bit JVM). There is a known issue with Yourkit with Java 5 but not 6. I forwarded this issue to their tech support, let’s see what happens.

    Like

  7. Mike Brunt said:

    @Brian here are the jvm arguments…

    java.args=-server -Xms512m -Xmx1024m -XX:PermSize=128m -XX:MaxPermSize=192m -Dsun.rmi.dgc.client.gcInterval=300000 -Dsun.rmi.dgc.server.gcInterval=300000 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -verbose:gc -Xloggc:getchecked.log -Dsun.io.useCanonCaches=false -Dcoldfusion.rootDir={application.home}/ -Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=28999,suspend=n

    I have never tried 1.5 with CFMX7. Do you have a particular reason to wnat to do that?

    Like

  8. Marc Esher said:

    I’ve had pretty limited success with using a profiler on CF apps, mostly because I think they take me down a wrong path. For me it’s been a last resort. You know, along the lines of “What variable(s) are holding all the data that’s causing these OOM errors”. And I think that’s just an offshoot of what you’re saying, which is not to look at the code first. B/c when you’re looking at the code, you’re in essence trying to figure out what variables are holding too many bytes.

    Very recently at work, a colleague had a problem with OOM errors and was getting nowhere because she was spending hours staring at code. I don’t know that they ever resolved it so I’m going to point her to your blog and see if she can use it.

    One question I did want to ask: with a max heap of 1024, how much total memory do you have on a machine like that? The reason I ask is that when we tried 1024 on a 2G machine, JRUN crashed less but would simply get unstable/unresponsive but not completely die.

    Again, great post. Your one the other day with all the JVM goodness was excellent, too. I really dig what you’re putting out there, Mike. Good stuff.

    Marc

    Like

  9. Mike Brunt said:

    @Marc, thanks for your comments they are appreciated. With regard to the 1024 memory this is a 4GB

    Like

  10. Mike Brunt said:

    @Marc, I’ll try that again. This is a 4GB box but one thing to remember is that with only one jvm.config file, every instance in an Enterprise install will inherit the settings. Also the 1024MB in this case is the maximum heap memory size, the start is 512MB. So at a maximum 3 instances I would want to be adding more Ram or creating multiple .config files on a 4GB box.

    In my experience, increasing the heap size is rarely the solution, it helps a little but if a J2EE container is hurting there are usually other significant causes. Finding those causes can sometimes be very easy and sometimes a series of painstaking, one change at a time, laborious steps.

    I have a lot more to blog about and will in the hope that it helps the community.

    Like

  11. David Stockton said:

    Mike,

    I read with interest your CF/JVM notes and usually agree with your postings. However there is one comment here I find unusual…

    “In addition it is imperative to tune server-JVM before working on code fixes.”

    I always tend to the opposite… Working with FusionReactor and other tools to monitor the platform memory profile, you will find that code changes can change an applications memory profile. This will obviously change the effectiveness of any JVM tuning you do. Thus, the platform will require re-tuning, post code changes. Also I like to be wary of GC tuning as a first step because you can “tune out” your code issues and potentially postpone the seriousness of an architecture design flaw to another day.

    Whilst JVM tuning will give you some “breathing space” for the short-term, it can mask any long-term problems that need to be investigated – this becomes even more pertinent to applications where you really do need over 1GB heap space.

    What are your thoughts on this and can you give reasons for why you prefer to JVM/GC tune before looking into application architecture?

    Like

  12. Mike Brunt said:

    @Dave, you make some great points here and there is almost always more than one way to approach performance and/or stability issues. Pre CFMX-Java I would always start from the code, via the logs. Slowly I began to change as I came to realize that there are so many ways to optimize the “Engine” for JRun and CF Apps. My feeling is that as I see the JVM as the heart of all applications that run I always want to tune that optimally first; that is my way. However you do have a point that applications can have different characteristics.

    This in itself brings up a question, in a multi-instance install should we have multiple jvm.config files or even multiple JVM’s

    Thanks for the comment it has given me food for thought.

    Like

  13. David Stockton said:

    @Mike,

    I appreciate the feedback and differing points of view but am still trying to understand why you feel “engine” tuning is a good place to start – please don’t take this the wrong way; I’m just trying to get as much knowledge out into the community as possible!

    Using a motor engine analogy, a Ferrari engine would be useless in a tractor & vice-versa although both may be very well tuned. Using this analogy on the CF/JVM problem domain I would suggest that you could have a “tractor” application and a “ferrari” application; both requiring different JVM characteristics; hence code before JVM optimization.

    I do however, take note that there are some generic JVM tuning options that will almost certainly help all CF applications.

    I’m sure you know that it’s possible to use multiple jvm.config files in a multi-instance install and thus you can tune each individual instance and as you mention, and even move to using differing JVMs.

    The reason I bring this up is that I find as a consultant working with several different companies you *could* almost always make a quick improvement by tuning GC options; but as I mention in my previous comment, this can mask deeper issues. So I prefer to educate the developers as to the issues – using metrics (eg from FusionReactor logs) – *aswell* as tuning GC options for an immediate respite.

    I’d love to hear your thoughts or any comments you have about this methodology.

    Like

  14. Mike Brunt said:

    @Dave, thanks for another great set of points. Here is a concrete example of why I feel tuning the JVM should come before delving into the code, I realize this is in my blog post above but want to amplify this…

    “The net effect was this, when I started a 90 minute 50 concurrent user load test on the client application before the changes we got these results…

    Total Number of Clicks: 17,136 (10 Errors)
    Average Click Time of all URLs: 7,638 ms

    After the various changes shown above the same loadtests produced these results…

    Total Number of Clicks: 26,153 (0 Errors)
    Average Click Time of all URLs: 2,061 ms”

    That performance improvement came from load-testing whilst tuning the JVM with no changes in the code or SQL, those come next.

    As I say Dave, there are different ways to approach this issue and in tuning CFMX applications, I have found looking at the JVM first and getting those settings right for a slow performing applications is my best first step.

    Thanks again for your insights and time.

    Like

  15. David Stockton said:

    @Mike,

    Thanks for your quick replies – I’ll keep my options open on this one 🙂

    Dave

    Like

  16. Justin Lewis said:

    Heya,

    I am in the process of tuning our JVM and coming up with limited results. Seems Java 1.6 is fussy.

    We just got a brand new 8-core 16GB ram linux box to replace an older solaris 2GB ram box.

    Just thinking I could increase the max Heap size as big as it could go seemed like a good idea once I had CF installed which was 3584 MB. I set min heap to 1024.

    Once under load the thing just threw up all over the place with the usual java out of memory error. Seems it wasn’t cleaning itself very well.

    I have spend the last 2 days reading pretty much everything there is about young gen / perm gen heap structure.

    We have cfc heavy apps but no framework per say.

    Do you have any tips on tuning for such a powerful box and keeping everything running nicely, I would hate to have to keep the heap at 512MB.

    any advice is appreciated.

    Like

  17. Justin Lewis said:

    Oh and BTW

    here is a heap dump !

    http://www.8sided.com/heapDump.txt

    Like

  18. David Stockton said:

    Hi Justin,

    Personally I would not recommend setting CF up in this way (huge heap) – I would switch to having more instances using smaller heap sizes. The reason for this is that the more memory that needs to be GC’d; each full-GC will take much longer to complete. You would likely see better performance from having multiple; smaller instances (eg 1 – 1.4gb). The downside to this is obviously that you have the overhead of having separate JVMs in memory. However, this adds additional stability in that you will have fail-over instances (you can configure clustering etc depending on your needs).

    You also say you’re running on JVM1.6 – Is this Sun’s JVM? There is a bug in the class-loader for SunJVM in that classes have a long overhead to load. Many people have seen improvements by down-grading to JVM1.5 – especially where CFCs are concerned.

    If you’d like to mail me (david.j.stockton [at] gmail.com) I can offer professional consultancy services and help you get the most from your server.

    Regards,
    David

    Like

  19. Justin Lewis said:

    Thanks. Sent email.

    We did a switch back to java 1.5 and it seemed to collect garbage better but it seems that the ratios of young gen to perm gen is all off still.

    I will try to turn the heap down to 1.5Gish and see how it goes with the standard settings.

    Like

  20. Mike Brunt said:

    @Justin, sorry to hear of your problems and David is right there is evidence that Sun SDK 1.6 (6) has classloading issues with CF. The interesting thing for me is that in thoroughly load-testing a client application on both 1.5 and 1.6 i found 1.6 to be better and I ran those tests several times. This was a heavy CFC based app so it was a bit surprising. This was on a Windows 2003 server though. Hopefully David will be able to assist you. I don’t recall if he worked at Allaire and/or Macromedia with the rest of us in the server tuning group or not.

    Like

  21. Mike Brunt said:

    @Justin I have a question the heap dump above is that from the 1.5 or 1.6 JVM?

    Also would you be prepared to post your current jvm.config arguments here so we can see if we can help others also by making tuning suggestions. I think that might help the community?

    Like

  22. Justin Lewis said:

    That dump was with 1.6…

    At the time of that dump jvm settings were:

    -Xms=512m -Xmx=2048m MaxPermSize=256m

    After messing with it a bit:

    -server -Djava.awt.headless=true -Xms128m -Xmx1536m -Xbootclasspath/p:/opt/fusionreactor/etc/lib/fix6519088-1.0.0.jar -Dsun.io.useCanonCaches=false -XX:MaxPermSize=512m -XX:PermSize=128m -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib -Dcoldfusion.classPath={application.home}/../lib/updates,{application.home}/../lib,{application.home}/../gateway/lib/,{application.home}/../wwwroot/WEB-INF/flex/jars,{application.home}/../wwwroot/WEB-INF/cfform/jars

    :: Seems to work better…

    Server Tuning Settings :

    1000 cached queries,

    Request Tuning:
    Simultaneous request limit 20
    Flash Remoting request limit 10
    Web Service request limit 10
    CFC request limit 10
    CFThread Pool Size 20
    Maximum number of report threads 8
    Request Queue Timeout 75 seconds
    Maximum number of running JRun threads 100
    Maximum number of queued JRun threads 1000

    Like

  23. Mike Brunt said:

    @Justin thanks for this Info, I have a couple of questions.

    Is this CF8 Enterprise and if so did install it in a multiple instance mode?

    Can you re post a heap dump with these latest JVM arguments so we can see memory state with the new arguments?

    Another good tip for you would be to enable metrics logging so we can see the memory and thread states.

    Lastly, even if you are on CF8 with server monitoring I still recommend using either SeeFusion or FusionReactor because if CF is totally unresponsive the server monitoring GUI is not there whereas with SF or FS they are still running after CF is dead.

    Btw, you applied some pretty good tweaks here, for instance I noticed from your current heap dump that you were running low on permgen space and you added that -XX:PermSize=128m which should start the JVM up with 128MB of permgen space.

    Like

  24. Justin Lewis said:

    Here are 2 dumps from this morning when the new box was in the load with the above settings.. The times on the logs don’t match but it’s something.

    http://www.8sided.com/fr_memlog.txt
    http://www.8sided.com/heapdump1.txt

    This is server configuration, not multi server.

    Like

  25. Mike Brunt said:

    @Justin, thanks I will let you know what I find. As you have a server install of CF you would not be able to do what Dave was suggesting to have multiple instances.

    Like

  26. Mike Brunt said:

    @Justin in http://www.8sided.com/fr_memlog.txt do you know what the column headings would be, I can make assumptions but that is not a good idea, typically?

    Like

  27. Justin Lewis said:

    No I don’t sorry those are fusion reactor memory logs.

    I thought I would send those too.

    Like

  28. Mike Brunt said:

    @Justin, no problem I will see if I can get some info from FusionReactor on that as all of this should benefit the community.

    I do recommend this though, set your start memory higher again as I notice the old generation gets near to full often; try

    -Xms512m -Xmx1536m

    Like

  29. David Stockton said:

    @Mike,

    Looks like a version2 log:

    Check:
    http://www.fusion-reactor.com/fr/help/memory_log.htm

    Dave

    Like

  30. Mike Brunt said:

    @Dave thanks for the help Dave, I am more familiar with SeeFusion.

    Like

  31. We are seeing a LOT of JRun Connector closed errors after migrating from CF MX 6.1 to CF 8. We’ve tried a lot of things, including downgrading the JVM to 1.5, but nothing seems to improve the situation. Could this potentially be a solution?

    Like

  32. David Stockton said:

    Hi Dave,

    – Do you have any errors in your JVM logs?
    – What’s your memory profile look like?
    – Do you have any GC data?

    D

    Like

  33. Solaris 10 box with a 64-bit CF8 installation with 16GB RAM. Min Heap size: 64MB Max Heap size: 2048MB

    -server -Dsun.io.useCanonCaches=false -XX:MaxPermSize=192m -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib

    No GC log data – I still need to set that up (we also lack FusionReactor). I’m unfortunately not having much luck tracking down JVM logs….runtime/bin/*.logs?

    Like

  34. David Stockton said:

    Hi David,

    /opt/jrun4/logs/* by default. You may find some useful PID dumps in /opt/jrun4/bin/ if your JVM is dying too.

    This will probably get involved so you may want to contract Intergral (the guys behind FusionReactor – disclaimer I work for them) for some consulting. I’m sure we’ll be able to sort your problems out:

    http://www.fusion-reactor.com/services/cfservices.cfm

    … or I believe Mike also does consulting.

    Thanks,
    D

    Like

  35. Does anyone know what kinds of error’s the author was originally referring to?

    Like

  36. Martin Parry said:

    I have just posted a new blog entry which shows how you can do programmatic garbage collection if your memory gets below a certain threshold.. Hope it helps:-

    http://www.beetrootstreet.com/blog/index.cfm/2009/6/25/Clearing-ColdFusion-memory-using-garbage-collection-when-memory-gets-low

    Like

  37. :-(, First how to inject xanax pills [url=http://www.comicspace.com/howtoinjectxanaxpills#1]First how to inject xanax pills[/url], wyqife, how long does xanax stay in your system for you [url=http://www.comicspace.com/howlongdoesxanaxstayinyoursystem#1]how long does xanax stay in your system for you[/url], 8-PP, Real antidepressant drug you can take with phentermine [url=http://www.comicspace.com/antidepressantdrugyoucantakewithphentermine#1]Real antidepressant drug you can take with phentermine[/url], %))), xanax withdrawal symptoms free [url=http://www.comicspace.com/xanaxwithdrawalsymptoms#1]xanax withdrawal symptoms free[/url], znmuab, what is the difference between ativan and xanax [url=http://www.comicspace.com/whatisthedifferencebetweenativanandxanax#1]what is the difference between ativan and xanax[/url], =-]]],

    Like

Comments are closed.

Tag Cloud