I received an email from a client this morning: “The Server is Down!” This is never what you want to see first thing in the morning.
So, I hopped on the server to check it out. Sure enough, the JRun instance the client’s site was running on had kicked the bucket. Users were seeing an error saying that the ColdFusion instance was unavailable (or something to that effect).
I tried to restart the ColdFusion instance in question, but it didn’t work. I simply received a message that the shutdown didn’t occur in a timely manner. Wonderful!
I have 6 instances of CF running on the server in question. The instance the client’s site is running on has had a bad habit of sucking up hundreds of megs of ram. So, it wasn’t hard to find it in the Task Manager and I was able to kill the process. Once the process was dead I could restart the service.
The thing is, recently, as I restarted processes on this server the applications in this instance have started timing out. For example this morning a Mach-II application timed out in a cfloop tag. Last week it was a model-glue application. Next week it’ll probably be something without a framework. It seems impossible to predict.
I have a few “suspicious” sites in the instance. One Model-Glue + Reactor site, one Mach-II site, a few instances of Blog CFC and some other sites with a pseudo-framework I wrote years ago.
I would speculate that the Reactor site might be part of the problem. Reactor caches a lot of stuff in memory. Maybe it just sucked up too much ram, choked, and died. So I moved the site to another instance to see what would happen. So far, nothing. This leads me to my gripe:
I really hate that there’s no way to easily take a look at one ColdFusion aplication and see how much ram it’s taking up, it’s relative load, and other related information.
Yes, I know there’s perfmon, but let’s face it, it’s useless. I’m not too familiar with metric data logging. Maybe there’s a way to indicate which application is causing which load, but I don’t think so. Either way, neither of these tools make this process easy.
I’d love to see a panel in the ColdFusion Administrator which reported the instance’s load, memory usage, database hits, etc. (Preferably in understandable language!) Then, I’d love to be able to drill down into that report to see which applications are causing the load. And, while I’m dreaming, I’d love to be able to watch running requests to see how long they’re taking and what load they’re causing. Perhaps the app would even be able to point out specific files, URLS or fragments of code that had the most impact on the server.
Ah, but I live in the real world where the best debugging tools we get are cfdump and cftimer.
I don’t have much of a strategy for fixing this problem. It looks like I’ll have to move my sites between CF instances and note both their apparent stability and memory usage. Maybe I can identify which application is sucking up all that ram. From there maybe I can track down which portion of the application is causing all that ram usage and I’ll be able to fix it.
Of course, that’s probably only one of many problems. And heck, every change has an impact. If I reduce memory usage I’ll probably slow the site down, which might cause queued requests and timeouts which would cause the server to do the same thing.)
Anyhow, I just wanted to get that out of my system. Any tips you care to share on how to beat this?
Comments on: "ColdFusion Complaint -or- Using Your Sixth Sense to Figure out Why the Server Crashes" (12)
Of course, to add insult to injury Reactor ate my comment system. It’s fixed now.
I have not used it myself, but would FusionReactor (http://www.fusion-reactor.com) allow you to monitor this? We have debated trying this out internally here for debugging these kinds of issues.
I tried posting Brian’s suggestion but hit the Reactor problem earlier. 😉 We’re looking at both FusionReactor and SeeFusion to better monitor these sorts of issues on our servers.
Read this article on TechRepublic just this morning;
“Measure CPU and memory consumption of a Java application using standard APIs”
Not sure if this solution will work with ColdFusion but it looks interesting non-the-less.
IMHO, 6 CF Instances (especially using JRun) on one Wintel machine seems a bit extreme, even with a quad CPU machine with 2-4 GB of RAM. That’s quite a lot of context switching going on but mostly I’d think the instances (as well as the OS) were fairly RAM starved?
Maybe think about moving to less instances and/or moving to Tomcat and installing each instance as a seperate webapp? Even then, CF needs quite a bit of RAM for each instance unless you start removing libraries that you aren’t using (which is hacky I think).
With JRun + CF I usually don’t go more than one instance per CPU (not including the main cfusion instance which I give a minimal JVM config to just to manage instances), but that’s just me.
Actually Doug, there are a couple of products on the market that do this. I can’t remember all of them off the top of my head. I know Raymond Camden wrote a free one (Starfish I believe) and Productivity Enhancement wrote LORCAT.
We use Fusion Reactor for monitoring requests and memory usage. It’s great for looking at long running requests. You can also kill a problem request if you need to.
However, it doesn’t show memory and load stats per application. I totally agree that this would really help. (although, thinking about it, it might do if you are running each app as a separate CF instance. not sure about this)
I may name few tools that may help: FusionReactor, SeeFusion, Starfish, cfWatcher. Last 2 being freely available.
Thanks everyone. I need to take a look at some of those tools. I’ve been so busy I just havn’t had time.
I moved this blog off into it’s own instance and tuned it for one small site. I think now that if this blog is the issue (which it may be) I can at least be sure the other sites won’t go down when this one does.
Either way, the problem needs to be fixed… eventually.
Doug, since it took me a long time to figure this out, I share it with you. Although this thread is really old, but hey, it might help others who found this page through search engines (like I did).
The moment your ColdFusion Instance hits the memory limit that you set in the CF admin as “maximum JVM heap size”, the entire Instance is dead. And I mean completely dead. When trying to restart it will take ages to be “stopped” (shutdown didn’t occur in a timely manner), but just wait, it will be “stopping” for a while, but it will stop eventually.
Want to see it? Just create a page taking up loads of memory (e.g. loop which puts extra data into a memory variable). Watch the memory usage of your jvm.exe in the task manager, it will grow rapidly, and when it is somewhere around the limit that you set, it will kill the entire server.
Don’t waste time on tools like FusionReactor. I tried it too. Nice to see long running requests, requests which are running now, etc, but the crash protection doesn’t work with a crash like the one I described.
Don’t set your maximum heap size too low! And 500 Mb (default) is definately too low. Also don’t set it too high. 1.8Gb is the maximum for normal processors due to memory adressing problems (again, a nice undocumented feature of CFMX).
Thanks Ivo. Amazingly, I just found this page through Google a few hours after you posted your comment, and your information was exactly what I was looking for.
Note that if CF is unable to allocate the amount of memory you specify on this page, CF will fail to start. This is inconvenient because with CF not starting, you can’t revert the setting via CF Administrator.
It took me a little time to find out how to fix this, so I thought I’d share: the setting (on my system) lives in C:CFUSIONMX7runtimebinjvm.config
In that file, the java.args variable contains an argument that reads -Xmx1024m — indicating 1024 megabytes of RAM. This is where you’d change it back. Save the file, and then you can start the CF service.
As I understand it, FusionReactor also has low memory crash protection as well, so if the amount of free memory drops below a specified threshold, you can be notified.
Even when your instance completely halts, FR can still be useful if you install a second one in a separate instance and have it monitor the first. That way you’ll still be notified.