A week or so ago I was asked by a reader of my blog to explain how I handle search engine safe URLs on my websites. The process I use is quite simple and has remained more or less unchanged over the last few years.
As you can see on this site my URLs tend to look like this:
http://www.doughughes.net/index.cfm/page-blogLink/entryId-44
The first part of the URL looks just like any other URL. We all know what this does. If not, don’t bother reading any further. However, after the index.cfm things begin to look a little different:
/page-blogLink/entryId-44
What I’ve done is replace the “?” and “&” characters with front slashes. Equal signs are changed to hyphens. This creates a name and value list which we can easily parse. Everything before the first hyphen in each pair indicates the variable name. Everything after the first hyphen indicates the value of the variable.
I’ve seen other solutions which have a format where every other slash indicates a variable and value. This style of URL might look like this:
http://www.somedomain.net/index.cfm/page/blogLink/entryId/44
I’m not a fan of this because it’s a little hard to read and seems like it could cause errors if, for whatever reason, a variable doesn’t have a value. In my example, if the page variable didn’t have a value it would simply look like this:
http://www.somedomain.net/index.cfm/page-/entryId-44
What would the other style look like? How would ColdFusion know what to do? It’s my opinion that my way is a little nicer and more reliable. It’s up to you how you choose to do it.
Another cool thing about these URLs is that, if need be, you can tack additional URL variables in the traditional format like this:
http://www.somedomain.net/index.cfm/page-/entryId-44?anotherVar=anotherVal
I’ve needed this capability in the past and it’s come in quite handy.
If you create any ColdFusion page and format a URL according to the way I defined them above and dump of the CGI.PATH_INFO variable you will see something similar to this:
/index.cfm/page-blogLink/entryId-44
As a note, I’m not sure if using CGI variables is the only option here. It’s the only one I know of. The biggest problem with them is that they’re not consistent between platforms and web servers. For instance, on Apache on Linux CGI.PATH_INFO would have looked like this:
/page-blogLink/entryId-44
On Windows and IIS it looks like this:
/index.cfm/page-blogLink/entryId-44
You may want to use other CGI variables to determine which parts of PATH_INFO contain the variables you want and which parts don’t. For instance, on Windows and IIS there’s a variable CGI.SCRIPT_NAME which holds only the path to the file:
/index.cfm
When I moved this site from Apache to IIS I was a bit confused because on Windows I had an extra variable named “URL.index.cfm” being set to nothing. How odd. A little debugging solved the problem.
Once you’ve isolated the portion of the URL which contains the variable names and values parsing them out is quite simple. All I do is loop over the list of name value pairs and extract them. I then split them apart and set a URL variable to the value provided.
Here’s a complete example:
<!---
Make sure that the CGI.PATH_INFO var is longer than CGI.SCRIPT_NAME + 1.
If not, then we don't have any url variables.
I add one to the length of CGI.SCRIPT_NAME because of the / after the
file path. IE: "/index.cfm/"
--->
<cfif + 1 GT Len(CGI.PATH_INFO) Len(CGI.SCRIPT_NAME)>
<!--- we have SES URL vars --->
<cfset urlString=Right(CGI.PATH_INFO, Len(CGI.PATH_INFO) - Len(CGI.SCRIPT_NAME) - 1)/>
<!---
urlString is now a list of name value pairs (separated by url.seperator).
loop over the list and extract them
--->
<cfloop delimiters="#arguments.seperator#" index="varAndVal" list="#urlString#">
<!--- grab the variable name and value --->
<cfset varName=ListFirst(varAndVal, arguments.equal)/>
<cfset varValue=ListDeleteAt(varAndVal, 1, arguments.equal)/>
<!--- set the url variable --->
<cfset "URL.#varName#"=varValue/>
</cfloop>
</cfif>
I’ve grouped all of the code above into a CFC which can be downloaded from the attachments section below.
The CFC provides a method parseURL which accepts two arguments, the variable separator and the equals sign. These default to my preferences of “/” and “-” respectively. This allows you to change them to be whatever you want. This could easily be turned into a Mach-II filter too.
(Note: This CFC isn’t as encapsulated as it could be, but it’s simple enough for this example.)
Have fun! Good luck! Don’t come up higher than me on Google!
Search Engine Safe URL CFC