I received an email from a client this morning: “The Server is Down!” This is never what you want to see first thing in the morning.
So, I hopped on the server to check it out. Sure enough, the JRun instance the client’s site was running on had kicked the bucket. Users were seeing an error saying that the ColdFusion instance was unavailable (or something to that effect).
I tried to restart the ColdFusion instance in question, but it didn’t work. I simply received a message that the shutdown didn’t occur in a timely manner. Wonderful!
I have 6 instances of CF running on the server in question. The instance the client’s site is running on has had a bad habit of sucking up hundreds of megs of ram. So, it wasn’t hard to find it in the Task Manager and I was able to kill the process. Once the process was dead I could restart the service.
The thing is, recently, as I restarted processes on this server the applications in this instance have started timing out. For example this morning a Mach-II application timed out in a cfloop tag. Last week it was a model-glue application. Next week it’ll probably be something without a framework. It seems impossible to predict.
I have a few “suspicious” sites in the instance. One Model-Glue + Reactor site, one Mach-II site, a few instances of Blog CFC and some other sites with a pseudo-framework I wrote years ago.
I would speculate that the Reactor site might be part of the problem. Reactor caches a lot of stuff in memory. Maybe it just sucked up too much ram, choked, and died. So I moved the site to another instance to see what would happen. So far, nothing. This leads me to my gripe:
I really hate that there’s no way to easily take a look at one ColdFusion aplication and see how much ram it’s taking up, it’s relative load, and other related information.
Yes, I know there’s perfmon, but let’s face it, it’s useless. I’m not too familiar with metric data logging. Maybe there’s a way to indicate which application is causing which load, but I don’t think so. Either way, neither of these tools make this process easy.
I’d love to see a panel in the ColdFusion Administrator which reported the instance’s load, memory usage, database hits, etc. (Preferably in understandable language!) Then, I’d love to be able to drill down into that report to see which applications are causing the load. And, while I’m dreaming, I’d love to be able to watch running requests to see how long they’re taking and what load they’re causing. Perhaps the app would even be able to point out specific files, URLS or fragments of code that had the most impact on the server.
Ah, but I live in the real world where the best debugging tools we get are cfdump and cftimer.
I don’t have much of a strategy for fixing this problem. It looks like I’ll have to move my sites between CF instances and note both their apparent stability and memory usage. Maybe I can identify which application is sucking up all that ram. From there maybe I can track down which portion of the application is causing all that ram usage and I’ll be able to fix it.
Of course, that’s probably only one of many problems. And heck, every change has an impact. If I reduce memory usage I’ll probably slow the site down, which might cause queued requests and timeouts which would cause the server to do the same thing.)
Anyhow, I just wanted to get that out of my system. Any tips you care to share on how to beat this?