The amazing adventures of Doug Hughes

Archive for December, 2004

Learning UML

Earlier this month I bought the book entitled UML Weekend Crash Course, 15 Hour Weekend Crash Course by Thomas A. Pender. Although it’s hard for me to find an entire weekend to work though this book I’ve been reading chapters as I can. As a part of this process and to help me learn UML a bit better I’ve decided to create a number of blog entries on my experiences as I work though the book.

As I read the book I plan on blogging about various management and design processes and issues, including requirements gathering and the various UML diagrams. To help with this process I’ve decided to work on a project which readers of my blog can follow along with, comment about and provide feedback on. I’m hoping that this will turn into a community learning experience and that we can all learn something new together. When the project is complete I will release the resulting code as a free and open source Alagad product.

The project I plan to work on is an update to the search system on DougHughes.net and Alagad.com. The current existing system uses Lucene instead of Verity for indexing and searching. I created an extremely simple system to spider pages and pass content to Lucene for indexing. However, the system is inefficient, rigid, inflexible and tightly tied to Mach-II. The only part of the system I’ve been happy with is Lucene. (I dislike Verity and its spider only works on Windows.)

There are a number of things I would like to see in the new search system. These include improved content spidering, the ability to index document types such as Acrobat, Word, PowerPoint and others. I also would like have more flexible configuration settings and better and faster content summarizing.

Before I get started on the search project I intend to go over some fairly basic issues such as what UML is, various development methodologies, and the requirements gathering processes. I may touch on other related issues, but it’s hard to say at this point. This is a journey for me as much as the readers of my blog.

Interested in coming along for the ride?


The book I’m working with is UML Weekend Crash Course, 15 Hour Weekend Crash Course by Thomas A. Pender. (ISBN: 0-7645-4910-3.) The book is a bit dated in that the content covers UML 1.4 and indicates that UML version 2.0 is not expected "until spring of 2003". There might be a more recent version of the book, but I don’t know about it. The book I have should cover my needs in learning and understanding UML.

Comments Captcha'd

I took a few moments today and updated my blog comments page to use the Alagad Captcha Component. The process was quite simple. Because I use a highly customized version of Ray Camden’s Blog.cfc I won’t get into many details.

In a nutshell, I placed the Captcha Component into the application scope at startup. When a user accesses my comments page I call CreateCaptcha() to generate the Captcha image and related hash. This method returns a structure of information on the generated Captcha. I’ve elected to store the file name for the generated image and the hash in the user session.

The Captcha image is displayed on the comments form using an image tag pointing to a file named captchaImage.cfm. This file uses cfcontent to display and then delete the image based on the file name stored in session. A form field is displayed at this point to for the user to type in the text displayed.

When the user submits the form I again use the Captcha component stored in the application scope. This time I call it’s validate() method and pass in the hash stored in the session and the text provided by the user. If these match I know a human has filled out the form. If these do not match I send the user back to the form and ask them to try again.

Visit Alagad for more information on the Alagad Captcha Component.

PHP Output Buffer Conundrum

As I’ve mentioned in previous entries I’ve been migrating from an old Linux box to a new Windows server. I have a friend who I’ve been hosting on my server from some time. I finally kicked him in the butt and made him move his site to the new server, which is where the weirdness began.

First off, his site was being migrated from Linux/Apache/PostgreSQL/PHP to Windows/IIS/MySQL/PHP. The old Linux server paled in comparison to the new Windows server. We expected to see execution times drop on his application. However, in many cases they got worse!

Here were the symptoms:

The site “felt” fine when you were using it. There was no perceptible difference when using the site on the new server or the old server. However, the site has an execution timer which outputs the time it took to process the page. The times on the new server were not really bad, but in many cases were worse than what we saw on the old server. The strangest thing about the execution times was that they seemed to vary by connection.

In other words, users further away from the server or on slower connections were seeing longer processing times! As an example, I hopped on the server console and pulled up the site. I believe the execution time was something on the order of 0.018 seconds. However, someone 500 miles away from the server was seeing 0.5 seconds. I was seeing 0.1 seconds. All of these times were on the same page, with the only different being the computer the site was being requested from.

We spent quite a bit of time trying to track this one down. It turned out that the problem was due to PHP having output buffering shut off. This resulting in the time it took the server to successfully transmit output to the client was being tacked onto the execution time.

Think about it this way. When you make an http request it is sent to a web server, in this case IIS. IIS then hands the request off to PHP for processing. PHP proceeds to process some logic. It then outputs some data. After than it processes some more logic and then outputs more data. This is repeated until all processing is completed at which point IIS drops the connection to the client.

From ColdFusion I’m used to the wonders of the output buffer. However, in this particular circumstance PHP was not buffering the output. The lack of a buffer meant that each output was being passed onto IIS which handed it right to the client. Because this output was happening as the page was being processed it was adding to the entire time it took to process the page. Faster connections received the data faster and therefore had lower execution times. Slower connections had slower execution times.

It turns out that there’s a setting to turn output buffering on in PHP. To turn it on simply edit your php.ini file and make sure that output_buffering is set to true. In my case it was set to false. By enabling this PHP will write all output into a buffer in memory while the page is processing. Once all processing is completed the buffer will be flushed to IIS which will output it to the client. Because PHP doesn’t need to wait for the output to happen inline, PHP code can execute much faster.

After enabling output buffering all page execution times dropped down to approximately 0.018 seconds. The site now screams and is much faster than it was on the hold hardware.

Firefox Firestorm

It’s been an interesting week here. You may have seen Joe’s blog entry yesterday about being DDOS’d poorly. I share my server with Joe and another guy who wrote the controversial Firefox extension which was being DDOS’d. This means that I was DDOS’d too! First, some more background

A coworker of mine at my Day Job used to be a Paramedic in New Jersey. I learned recently that he was among the first responders to the World Trade centers on 9/11. This experience prompted him to change careers and he took up web development (of all things!).

Doug’s Note: My coworker tends to refer to himself as “Boyzoid” online, though that’s not his real name. I choose to respect that and will refer to him as that from here on out.

To give Boyzoid some props, I want to say that he’s one of the smartest people I know. Not many people can move as easily as he did from saving peoples lives on a nightly basis to being a code samurai in such a short period of time.

Any how, because he comes from a medical background a lot of the work that he does is related to hospitals and emergency response systems. To this extent, he wrote a very simple web service which returns *gasp* the current US Homeland Security Threat Level. (Cue the dramatic 40s horror movie style organ music.)

Much to my surprise (perhaps I’m short-sited to this extent) his web service became quite popular. Many sites make use of it to supply the threat level to their users. Many of these sites are hospitals, government websites, schools, and more. In fact, the DHS recently released their own web service to do the exact same thing. I like to think that Boyzoid was the catalyst for this.

Aside from being an ex-paramedic, Boyzoid is also a rabid Firefox fan boy. He has been working to convert his clients in the medical industry from IE to Firefox. To this extent, he wrote a extension which shows the current DHS threat level in the Firefox status bar.

I suspect that most of us (myself, Boyzoid and Joe included) believe that the DHS Threat Level is a bunch of bullshit. However, whether or not you believe that, hospitals and first responders can not ignore the threat level. A change in the Threat Level forces certain agencies and companies to take certain preparatory measures.

So, Boyzoid’s plan was simple. By providing an extension for Firefox which is useful to this certain select set of people he might cause a number of these organizations, which otherwise may never have heard of Firefox, to switch. Just for this one extension.

On top of all that, Boyzoid was nice enough to provide the source code (which I haven’t looked at) which demonstrates how to create a Firefox extension, call a website from the extension, and then change the display based on the returned data.

When Boyzoid posted this to Mozilla.org he unintentionally sparked a huge debate which is still raging. Apparently he pissed someone off enough that they discovered the URL he was pulling data from and tried to DDOS it. Here are a few selected comments peopled posted to Mozilla.org.

This is the most stupid extension ever. People who wrote/use this extension should be castrated! or otherwise dealt with appropriately for advocating useless anxiety among the populous.


RUN RUN, hide under your tables, the enemy (who’s the enemy?) is attacking us….


Maybe Mozilla.org should establish a new category “war propaganda extensions” so people who don’t believe in lies won’t be bothered.

Many, many, more comments followed those same lines. Others were more positive and supportive. Any how, think what you want. I think it’s an interesting idea.

Anyhow, it seems that for now the DDOS attack has come to an end, which is good. I’m proud to say that even at its worst the attack never took the server over 50% CPU or over 10% of the network. Way to go me!

8193 is the Magic Number

I’ve been semiconsciously following a train of emails at my office about a problem some clients have been having with cfsearch returning no results when, in fact, there should be matching data returned. Today a reason for the problem was revealed.

As this Macromedia TechNote indicates, Verity has an upper limit to the amount of data it can handle. Specifically, cfsearch will not return more than 8193 results. I’m not sure, but I suspect that what this means is that the result set can’t return more than that number of rows or that no more data than that can be indexed. I assume the former.

The TechNote indicates that the Verity engine has a limitation of 64,000 elements (rows times columns). If this number is exceeded, Verity throws an error and ColdFusion simply returns an empty result set.

The resolution to the problem is to provide the maxrows attribute on the cfsearch tag with a value of 8192.

I honestly don’t use Verity often. I think Lucene is superior. However, perhaps this will be useful to you.

Two New Alagad Products Released

On Monday Alagad released two new products, a Captcha component based on Alagad Image Component technology, and an EXIF Reader Web Service which is intended to compliment the Alagad Image Component.

About the Alagad Captcha:

The Alagad Captcha is a ColdFusion Component (CFC) written in 100% native ColdFusion which generates images of obfuscated text. This is the sort of type of image you might see when signing up for free mail accounts or in other various forms across the web. The text is human readable but not machine readable.

The word “Captcha” is an acronym for “completely automated public Turing test to tell computers and humans apart”. More information on Captchas can be found at http://en.wikipedia.org/wiki/Captcha.

Interestingly enough, the Wikipedia article linked to above points out that computers can be tweaked to read captcha images. As a test of this, I found a few captcha images from various sources and ran them though some OCR software. Though none which I tried had all characters recognized, most had at least some characters which were recognized. I found that only VERY obscured text which was next to illegible and light text on dark backgrounds were never matched by my (cheap) OCR software. The Alagad Captcha, by the way, had none matched but the text is still very human readable.

This Component is based on portions of the Alagad Image Component. In general, this is a specialized version of the AIC which can be used to create and verify Captcha images.

Check it out here!

About the Alagad EXIF Reader Web Service:

A while ago I posted a web service to doughughes.net which would extract EXIF data from images and return an XML package of the data. After leaving it out on doughughes.net as a beta for a while and not getting any negative feedback I decided to go ahead and publish this.

The Web Service itself is free and very simple to use. The source code can be purchased and run locally or modified for some other purpose if desired.

Check it out here!

Tag Cloud