I presented at CF.Objective on the subject of clustering and distributing ColdFusion applications. During the presentation I pointed out a “gotcha” I have encountered many times; where I have been asked to review existing High Availability (HA) environments. It has been mentioned before in previous blog posts but I wanted to amplify it, as I believe it is very important to avoid this pitfall. Hardware clustering devices can and often do, perform two distinct functions.
- Failover – The device watches all members in the cluster it manages to make sure they are available. If one member fails, the hardware clustering device will stop sending requests to it.
- Load Balancing – The device sends requests around the cluster members based on whatever algorithm is chosen; RoundRobin, LeastConnections etc. The idea being to spread the load in an even manner across all members.
In the first of these functionalist (FAILOVER), the clustering device needs a way to know whether members in the cluster are available. Typically this is achieved by the clustering device “pinging” cluster members. How this is done is critical as the pinging itself can cause performance issues, if not done carefully and properly. Here are things I have found, with comments.
- The hardware clustering device simply looks for an http “200” response from the cluster members. This sounds innocuous however if the default website in the web server is a “heavy” ColdFusion site with lots going on in the root directory, for instance lots of things in the Application.cfm-Application.cfc and Index.cfm these will run with every single “ping” from the clustering device. I have seen that cause performance problems, particularly if the “pings” are very frequent – 3-5 second intervals.
- The hardware clustering device pings a ColdFusion page buried-located with in a “heavy” ColdFusion application, for instance lots of things in the Application.cfm-Application.cfc and Index.cfm these will run with every single “ping” from the clustering device. I have seen that cause performance problems, particularly if the “pings” are very frequent – 3-5 second intervals.
Neither of these situations are good ones to have and both can be avoided. The best way to do this, in my experience, is to create a “lightweight” ColdFusion page which returns a simple text string which is retrieved by a query for a database. By doing this, successfully, we can ascertain that the Web Server, ColdFusion and the Database Server are all responding and available. This page should located outside of any full-blown CF applications, unless it can be assured that no “heavy-lifting” takes place when the main application runs.
Comments on: "A Possible Problem When Using Hardware Clustering" (12)
Mike – enjoyed talking with you at the conference and after thinking more about what you told me, I’ll write up a few posts about my experiences. Thanks for the encouragement and your posts on the subject.
@Scott it was a pleasure and I really do think you should do that Scott because you have actually done what a lot in the CF Community will have to do, eventually. Please keep me updated with what you post.
This is exactly what we do. Once set up, it’s also helpful to filter your access log from all those pings, or you have a lot to wade through 🙂
P.S. are you going to offer your cfObjective preso materials online soon?
@Mo thank you for your comments and yes I intend to make the presentation materials available probably in the form of Captivate executables.
We do that exact thing. We have an alive page inside an alive directory (with it’s own Application.cfm page) that does a query. We found without the Application.cfm the alive page was hogging loads of resources on our new site. Now all it does is the query and quits. Absolutely no impact on the server or applications (of which there are 6 application/CF instances on each node in the cluster).
Glad to see we’re doing something right!
Mike, I enjoyed your session. I had a question that I didn’t get a chance to ask you so I thought I’d ask it now if you don’t mind.
If you have a cluster of physical CF web servers, is there any advantage to creating multiple CF instances on each server? And if so, what’s the advantage?
@Lincoln thanks for your comment and another real-world example of why this is a good way to maintain hardware clustering device monitoring. There are very often different ways to do the same thing and this was just my proposal, from experience, as to how to do it.
@Richard, thanks for your kind comments and your question here. Multiple instances on the same physical server can serve the two prime purposes of Clustering: Load Balancing and Fail-Over. There are a finite number of resources that can be allocated to a single ColdFusion instance. This mainly falls into the area of Threads and Heap Memory. Providing your physical server is well provisioned (CPU-Memory etc) you will typically get better performance with Clustered CF Instances on that server. The reason this covers fail-over too is that you are less likely to run out of CF resources. I would always advise RoundRobin with Sticky Sessions as the algorithm in the CF Cluster.
I hope hope this helps and please feel free to ask more in necessary.
Thanks for the response, Mike. It was very helpful. A couple of follow up questions:
Are there any rules of thumb as to what qualifies as a “well provisioned” server? Are two instances per server generally enough, or if you have a “very well provisioned” server should you use more?
@Richard thanks for the follow up. In clustering two is a magic number and I can’t emphasize that enough. So I would say that two instances is good and if you are currently running with one only and things run fine, then two instances should be good for you. As far as server resources, if you are on Windows 32bit operating systems then you can’t take much more that 1.4GB for the Java Heap at maximum. So, I would say you need a minimum of 4GB of RAM for two CF Instances and the Operating System. If you are on 64Bit the sky is the limit as far as allocating memory to the Java Heap, well there is virtually no limit as far os the OS goes.