The amazing adventures of Doug Hughes

Two weeks ago I started noticing a strange problem with IIS. I use IIS 5.1 for development. Not that I’m a huge fan of IIS 5.1, but I do deploy sites to IIS 6, so 5.1 is as close as I can get to it for development work.

Anyhow, on almost every page request at least one asset would fail to download. For example, and most obviously, an image (or a few) would fail to download. This was really apparent when looking at a page with lots of images. Additionally, I’m quite sure, but I can’t prove it, that JavaScript and CSS files would occasionally fail to load. The problem occurred when I was using either IE or Firefox. Furthermore, when the problem began to occur it wouldn’t take long before pages stopped loading at all, which made it appear as if IIS itself were hanging.

The problem rapidly escalated over a few days. On day one I noticed this happen a few times. By day seven it was happening so frequently I was unable to get work done.

Of course, I tried all the normal culprits. I restarted IIS, ColdFusion, SQL Server, the whole server and so on. I also tried shutting off virus scanning and more. Nothing made a difference. I spent hours Googling for solutions and got, quite literally, nowhere.

Seeing as this was seriously impacting my ability to get work done I decided that I had two options: Reinstall Windows on my laptop (hey, it used to work just fine!) or call Microsoft support and make them fix the problem.

I figured a reinstall would take two days. Seriously! Making backups of important files takes several hours. Installing Windows take a few hours. Then, you’ve got to make sure all the drivers are installed correctly, restore your old files and download and reinstall all your software. It always takes me a few days to really get rolling with a new install.

Alternatively, I figured if I called Microsoft Support, even if I had to pay a couple hundred bucks for it, that in a matter of minutes, maybe a couple hours at most, they’d be able to find the problem and provide a fix. The time savings (and therefore the money savings) seemed obvious!

So, the first thing I did was call Windows support. Obviously this makes sense, I’m on a Windows operating system using a component that comes with Windows. Well, after about 10 minutes of trying to explain my problem to the support tech I finally had to break down and ask him if he’d ever even heard of IIS. His answer: No.

It turns out that despite the fact that Windows is shipped with IIS, IIS is a server product and you need to contact their Server support. When I found out the cost for this was $250 I nearly just reinstalled my laptop. However, I figured that I’d be guaranteed to run into the problem again within the next 6 months. I might as well pay the cash and learn the real solution to the problem.

It’s easy to bash Microsoft for products which have problems or support that may (or may not, as we will see) be miserable. However, after shelling out the cash things began to get interesting.

Over the span of a week and a half I spent many, many, hours on the phone with Rohan Makhija a Microsoft IIS/ASP.NET Support Engineer. One day I spent nearly ten hours on the phone with him. It turned out that Microsoft had not seen anything quite like this before. (Why do I always find the obscure bugs?! Am I secretly in QA or something?)

I use IIS Admin to allow multiple sites to run in IIS 5.1. Predictably, the first thing Rohan did was to remove all the extra sites from IIS. After that, we set the default site to point to the project I was working on and reproduced the problem and looked at the IIS logs. The logs showed that we were running into HTTP 403 errors when files were not loading.

The theory on these 403s is that IIS was reaching the maximum number of connections allowed and was rejecting all other connections. By default IIS 5.1 only accepts 10 connections. You can use an obscure command line tool to increase this to 40, but even that didn’t improve things.

Rohan used perfom to watch the number of open connections when we ran into problems. When the problem didn’t occur there was one connection for each asset plus one for the html document. When the problem did occur there were fewer connections.

To resolve this problem we turned on HTTP Keep-Alives. Traditionally, an HTTP request is a single, atomic, request for one document. Once the file is downloaded the HTTP connection is closed. HTTP Keep-Alives, however, leave the connection open and can send multiple requests over one connection (though still only one document can be delivered at a time through one connection). This did the trick for the 403s errors.

Unfortunately, we were still able to reproduce the problem, but this time IIS was logging 500 errors on static content.

Pop quiz, what about that last sentence is weird? That’s right: static content. One would expect a web server to be able to serve static content without server errors. However, gifs, jpegs and other static files were producing server errors when they’re not doing anything dynamic at all!

Next, Rohan made me remove the JRun ISAPI filter. This was simple enough using the Web Server Configuration Tool. But, it turns out that removing all connectors doesn’t actually remove the JRun ISAPI filter. So Rohan forced me, quite against my will, to uninstall ColdFusion.

At this point, I’ve got to say, I would have been absolutely pissed if the problem went away. Seriously. Like most techs, I have plenty of righteous indignation towards Microsoft and (my perception of) their crap software. If uninstalling ColdFusion fixed the problem I would have been forced to eat my words. Much to my relief, uninstalling ColdFusion didn’t fix the problem. Amusingly, in Rohan’s troubleshooting summary it simply says “Problem persists”.

Rohan did a bit more work to make sure nothing was mis-configured. He made sure Jrun was completely removed and that somehow there weren’t ISAPI extensions for GIF or JPEG images and other related settings. As expected, this did nothing.

Microsoft apparently has an internal ISAPI filter called IISMon. IISMon logs all the data going through IIS and is useful for debugging the sort of problem I was running into. This is actually not a public tool and has a built in “time bomb” to automatically disable the program. Why? Well, apparently it can even view the contents of pages delivered over SSL.

When IISMon was installed the problem never once occurred. The reigning theory was that IISMon somehow slowed the request down enough that the problem didn’t occur. It would have been a good enough solution, in my opinion, to just leave it installed. But, unfortunately, the built in time bomb made that unrealistic.

Next, Rohon removed all the standard ISAPI filters from IIS. Same problem.

After that, we uninstalled IIS completely and reinstalled it (which I did previous to calling Microsoft). We stopped all third party services and processes. Same problem.

And, to avoid boredom, I’m going to avoid describing each of the steps he took to debug the problem (you can see his report below). Let it suffice to say that Rohan left no leaf unturned.

Where it started to get interesting was when we enabled extended logging and noticed that whenever we received a 500 error it corresponded to a Win32 status of 6, “ERROR_INVALID_HANDLE”.

I’m not quite sure how, but at this point Rohan noticed something really weird, which I had completely failed to bring up before this point. Namely, I had several thousand small files with strange names in the root of my C drive. Here’s a (slightly forged) screenshot of these junk files:

Long list of files

Now, I’m sure that all my technical readers out there are screaming “It’s a virus you moron!” And that’s what I thought too at one point too. But I’d scanned my machine, checked everything I could think of and searched the web for “thousands of files starting with s’ in my c drive” (go ahead, you try to find something about this symptom) without success. More likely than not, there was some program I installed that was configured to write temp files to the root, for whatever reason.

This is where Rohan earned his money from Microsoft’s and why I’m just this dude who programs. He (you’re not going to believe this) opened one of these files in a text editor! Really! And what to our wondering eyes did appear? Well, something like this:

HTTP/1.1 200 OK
Server: Microsoft-IIS/5.1
Date: Tue, 19 Sep 2006 23:35:50 GMT
Content-Type: text/html
Accept-Ranges: bytes
Last-Modified: Tue, 19 Sep 2006 23:35:41 GMT
ETag: “308875244dcc61:953″
Content-Length: 237
<style>
img {
border: 1px solid black;
}
</style>
<IMG alt=home! src=”example_files/home8_100.jpg”>
<IMG src=”example_files/thumbnail1.gif”>
<IMG alt=”Antarctic Glacier ” src=”example_files/antaricwaves-thumb3_100.jpg”>

Bonus points if you can tell me what this is. That’s right, it’s an HTTP response!

We make a backup of these files and then deleted them. Subsequent HTTP requests rewarded us by writing one file to the root for each file delivered back to us. Rohan asserted that this was not normal behavior for IIS.

At this point Rohan started getting a bit Medieval on IIS. He used some tools from sysinternals.com to gather “hang dumps”, whatever those are, from the inetinfo process. This showed a third party component from ByteMobile installed, bmnet.dll. In other words, this dll was somehow attached to the inetinfo process.

A little googling showed that this was associated with the Sprint PCS Connection Manager application, which I had installed.

This is kind of amusing actually. About 9 months ago Sprint asked me to participate in their Ambassador program. The Ambassador program was really a marketing campaign where Sprint sent thousands of phones to thousands of bloggers along with free, unlimited service for 6 months. They did this under the guise of a beta test. But, it was rather apparent that they hoped for positive reviews.

30 second review: The phone was OK, but too bulky. They wanted me to review their media services. I thought they were extremely mediocre. Their streaming video and audio services were lame, honestly, and severely overpriced. However, I could plug the phone in to my laptop via USB and get wireless high speed internet access from just about anywhere. That was extremely cool but, ultimately, the genesis of my problem.

My time on the trial had just recently run out anyhow so Rohan had me uninstall the Sprint PCS Connection Manager. And, I’ll be damned, but, that solved the problem.

So, many days and $250 later my problem was solved. It wasn’t as efficient as simply reinstalling my OS, but I’ll bet that this blog entry will help at least one other person out there find and fix this same problem.

The next day, after the problem was solved, I received a call from someone who was not Rohan at Microsoft. They wanted to know how I felt about the service and I answered honestly. I was extremely happy. Rohan pulled out all the stops to fix the problem. He did exactly what I wanted, namely using tools I didn’t know how to use to identify and fix the cause of the problem without requiring me to do a full install.

And, you know what; Microsoft gave me my money back. That’s right, my money back. Because it took so long to actually find the cause of the problem, which was not even a Microsoft product.

So, all in all, a very positive experience. One I’m not sure I’ll ever repeat, but at least I know they’re competent and helpful. (I wonder how much a related call to RedHat would have cost?)

Anyhow, as one last piece of supporting information I’ve attached the case summary below.

So, the moral of the story, dont take candy from strangers.

Rohan’s Case Notes

Comments on: "IIS – or – Don't Take Candy from Strangers" (27)

  1. Tom Chiverton said:

    I would love to know why Sprint are mucking with your web server 🙂
    You should ask…

    Like

  2. Doug;

    Amazing story! – I’m no major MS fan (I’m writing this on OSX) but that sure is great customer service!

    Due to the ability of 3rd party apps to ‘plug into’ MS products many times I think MS apps get the blame due to errors in other apps (as in your example).

    Cheers

    David

    Like

  3. Doug Hughes said:

    If only I knew!

    Like

  4. Raymond Camden said:

    +1 – You should contact Sprint.

    Like

  5. Scott F Stroz said:

    I’d be on the phone with Sprint screaming like a mad-man.

    Like

  6. Raymond Camden said:

    Oh – and you could also ask Sprint if they would pay for the MS Tech support cost (even though you got it back). Don’t demand it – just ask if they would have paid you back.

    Like

  7. Doug Hughes said:

    First off, I have no interest in calling Sprint. I don’t care to waste any more of my time. But, let’s say I did. What would I expect to get from Sprint? $250? Money for my lost time? Do you really think I’d get anything at all?

    I can hear the conversation now:

    Me: Yea, I got this free phone and service from you… and it broke IIS.

    Sprint: (Interrupting) What’s IIS?

    Me: It’s a webserver

    Sprint: (Interrupting) A What?

    Me: Never mind about that. I called Microsoft and spent 10 days on the phone and paid them $250 bucks (but they did give me my money back). It turns out it was your software that caused the problem:

    Sprint: Too bad?

    Me: I want you to pay for my time!

    Sprint: Tell you what, I’ll give you your money back on the phone.

    I just don’t see how this would be productive at all. On the other hand, I am curious to see how Sprint handles something like this. But, all in all, I doubt that I’d get anywhere at all.

    Like

  8. I think it’s time you start using VMWare. In my experience, the majority of all server problems I’ve had in development are related to non-server software. Keep your dev enviornment isolated from your laptop.

    Kudos to M$. What an legit business practice.

    Like

  9. Andy Jarrett said:

    Quick, remove the bit from your blog about the money back and get Sprint to compensate you! :o)

    I gotta say even though I’ve jumped ship from MS now you’ve highlighted one of their biggest problems, third party tools. I mean even Sony were installing root kits! But seriously like the other comments said you gotta find out in what way the Sprint software was using your IIS installation, at the end of the day cause of there product how much of your time(money) was really wasted?

    Like

  10. Tom Chiverton said:

    Hay, I don’t expect Sprint to care either.
    It would be interesting as you say though.

    Like

  11. Paul Carney said:

    Doug – I also had a similar (great) experience with the MS SQL Server support team. When our database got corrupted due to a software RAID issue, we were sunk.

    I called the SQL Server support line, paid the $250, and they, too, were on the phone with me for 9 hours! They even stayed on hold while I was dealing with my hosting provider, who was building a new machine (this time, with hardware RAID).

    They then followed up over the next 3 days to make sure everything was working fine. I found out who their manager was and sent a thank-you email.

    Wouldn’t you know that I got a personal “thank you” from that manager – and not just a quick note, but one in which he knew who had worked with us (there were 2 folks because the crisis spanned their working shifts) and what they had done.

    I, too, was totally pleased and even amazed, at their level of service. Now if only my hosting provider could provide that….. 🙂

    Like

  12. There used to be an old joke:

    A Cessna was flying into Seattle in heavy fog with little fuel left when it’s instrumentation went out. The pilot happened to see a tall building through the fog, with a window washer doing his job. The pilot flew closely around the building and, opening the small window of his door, screamed out to the man “Where am I?” Upon the next pass by the man he heard the response, “You’re in a plane!” The pilot immediately banked left 10 degrees, flew two miles forward, and came upon the runway lights of the airport. The passengers were astonished, and asked the pilot just how it was that he had gotten them safely to the ground. He responded, “Well, I asked a perfectly straightforward question, and received a perfectly valid but useless answer. I immediately knew that we were passing the Microsoft Technical Support building, and knew how to get us in from there.”

    I guess that’ll be the last time I tell that one.

    Like

  13. Aaron West said:

    Doug, I’ve had an almost exact experience you are talking about only with IIS 6 and CFMX 6.1. This was at work and to date we’ve spent the better part of 3-4 weeks troubleshooting the issue. The sites we build use frames (ugh) and on occassion one of the frames would simply not load content. Sometimes it would be more than one frame that wouldn’t load. We went down similar debugging paths even to the point of not loading the root site, but typing the direct path to frames that weren’t loading. The problem subsided some when doing this but did not go away entirely. We have yet to come up with a 100% resolution on this problem but we think we may have issues with our load balancer. You see, when changing DNS records of problem sites to a specific production server – instead of the load balancer – the problem went away.

    We are working to replace our load balancer, hoping it’s the culprit, but I’m not 100% convinced. We’ll see what happens. Thanks for posting this entry. I’m going to tuck it away and reference it when we do more testing.

    Like

  14. Great Blog Entry. Plenty of people take the time to bash bad service, but it’s nice to read a positive experience for a change.

    Like

  15. David L Dietz said:

    For those who are wondering how the Sprint software was causing a problem:

    BMNET.dll is a Layered Service provider – somethign that skips into the TCP stack to provide a service of some sort, in this case data compression for transmission over a cellular connection. The problem is that the BMNet LSP is faulty and causes the TCP stack to behave improperly. It also appears in this case that the LSP was logging files to the C: drive (beats me why this would happen…).

    This same DLL has been know to cause Visual Studio and/or ASP.Net applications to hang for no other apparent reason. BMNet.dll has been shipped by both Cingular and Sprint – not sure if there are any other companies out there distributing it. here is a program called LSPFix (web search will turn it up easily) that will remove faulty LSPs from the TCP stack in a near automatic fashion and resolve problems related to this component.

    (I’ve run into this issue as well…..)

    Like

  16. I ran into the exact same issue. Any request to asp.net content would result in a ding.wav (the messagebox sound) and the browser would hang for a response. There were no entries in the w3 log or event logs. My MS guy ran IISDiags and it found that bmnet.dll was attempting to send a messagebox to inetinfo. I removed the Cingular connection manager and all was fine. Shows how vulnerable the stack really is. No wonder there is so much spyware out there.

    PS- The tech I worked with, Chris Haun, found it in 2 hours.

    Like

  17. Doug’s call would have created a new ticket resolution item in Microsoft’s Support guidebook for their reps.

    Like

  18. Interesting. I’ve had the exact same problem recently. Running XP Pro and IIS 5.1 as well as IISAdmin.NET, I noticed that I was getting a few dropped requests and I also had those same HTTP request files in the root of C. I’ve not had hte time to investigate what’s going on, and after deleting those odd files a few times the problem seemed to go away by itself. I don’t recall uninstalling anything else.
    I do not, and never have, however, had Sprint installed, or even CF on the machine in question.
    Great blog!

    Like

  19. Awesome info.

    Sprint (and damn near every telecom app – Broadcom, etc.) attaches to the IIS process to use the server to send usage info back to Sprint and to be a host for anything Sprint wants to push down the pipe witout the hassle of letting you know and reminding you to relax your security settings. Read the fine print for some scary generalizations. The DLL attaches to the inet process for traversing firewalls and virusware silently. The inet process(es) are usually allowed full reign, and related security alerts are usually ignored. Shudder.

    Like

  20. I’ve been searching for a solution to this problem for 6 months. I’m a developer and I have IIS 5.1 installed on tablet PC’s, and I’m using IIS on those tablets for a local web app written in asp. I use Cingular and Sprint cards to upload the data to our corporate servers. I was noticing these files, and knew it was linked to IIS somehow. After reading this, I un-installed all of the Sprint and Cingular connection software, and no more strange files (i was getting hundreds of them). So now I hope this LSPFix thing works.

    Is there anyway to delete the bmnet.dll file and everything still work ok?

    Like

  21. I usually have to call MS for one or two problems a year. I can’t remember the last time they didn’t refund the charges.

    Bear in mind that if it is a problem with the OS or MS product you can ask for a refund and it will be granted, if they haven’t already offered it.

    MS support has improved greatly over the past 5 years. They made an effort and it paid off.

    Like

  22. Kevin Myers said:

    Similar problems as described above on my laptop with a recently installed Cingular wireless card. Web pages routinely fail to load, but will generally load after multiple attempts. Getting hundreds of files starting with “s” in my “c:Documents and SettingsLocal SettingsTemp” folder. Located “C:WINDOWSSystem32bmnet.dll”. File version info says Bytemobile Optimization Client version 2.3.1.3031. Unfortunately, having the Cingular wireless card installed is critical in my present situation. Any way to solve this problem and still have the Cingular card work properly?

    Kevin M.

    Like

  23. Kevin Myers said:

    Similar problems as described above on my laptop with a recently installed Cingular wireless card. Web pages routinely fail to load, but will generally load after multiple attempts. Getting hundreds of files starting with “s” in my “c:Documents and SettingsLocal SettingsTemp” folder. Located “C:WINDOWSSystem32bmnet.dll”. File version info says Bytemobile Optimization Client version 2.3.1.3031. Unfortunately, having the Cingular wireless card installed is critical in my present situation. Any way to solve this problem and still have the Cingular card work properly?

    Kevin M.

    Like

  24. Kevin Myers said:

    Similar problems as described above on my laptop with a recently installed Cingular wireless card. Web pages routinely fail to load, but will generally load after multiple attempts. Getting hundreds of files starting with “s” in my “c:Documents and SettingsLocal SettingsTemp” folder. Located “C:WINDOWSSystem32bmnet.dll”. File version info says Bytemobile Optimization Client version 2.3.1.3031. Unfortunately, having the Cingular wireless card installed is critical in my present situation. Any way to solve this problem and still have the Cingular card work properly?

    Kevin M.

    Like

  25. Great blog – it’s exactly the same symptoms I’m seeing under XP Pro / IIS 5.1 but I’m in the UK where Cingular / Sprint basically doesn’t exist (there’s no BMNET.dll anywhere either) so if anyone’s found any other solutions / causes that’d be great 🙂

    Meantime I may well switch over and start using a Win 2003 Server VirtualPC for local dev work to get around the problem.

    I too have had some great experiences with Microsoft support working for an MS Gold Partner. I have memories of getting 4Gb dump files across to some guys in Paris and Seattle – they were actually handing over the support issue around the world to work on it 24×7. Granted, it was on an Early Adopter for new MS app so they pulled out the stops – still impressive tho.

    Back to my non serving images then…! 🙂

    S

    Like

  26. Hi,
    My application has been configured at IIS 6.0. in my application, i am using one textarea field used to enter large data. I am preparing xml file and sending it to another ASP page.
    If i set maxlength of that text area more than 135900 then next page is giving me error message “Page can not be displayed”
    In the next page i am using Response.Form(“strXML”) to return the value.
    My XML file containing 432000 character.

    I have copied both pages into another system that is having IIS 5.1. This is allowing me to pass that large xmldata from one page to another. I mean to say, i am able to retrieve the xmldata into second page.

    It seems it is not a coding issue, because code is working fine into one IIS 5.1 server.
    I think this problem is related to IIS 6.0.
    Could you please help me to resolve this issue. I would also request you to give me the way and appropriate solution.

    Thanks & Regards

    Like

  27. We are working to replace our load balancer, hoping it’s the culprit, but I’m not 100% convinced. We’ll see what happens. Thanks for posting this entry. I’m going to tuck it away and reference it when we do more testing.

    Like

Comments are closed.

Tag Cloud

%d bloggers like this: