www. and Redirects
I mentioned on Twitter that I think most websites ought to provide service
with both www. and www.-less hostnames, and also
that one of these ought to be canonical, and the other redirect to it.
I’d like to take a little more space to unpack my reasoning, and what I personally do about it.
The Really Obvious
On one hand:
- A respectable number of non-technical users (at least, the ones
who don’t simply use web search to find everything — raise your hand if
you’ve seen relatives laboriously type
f-a-c-e-b-o-o-kinto Google…) habitually typewww.in URLs, even if they haven’t been explicitly told to do so.
- It’s often more difficult to serve a site from your bare
(sans-
www.) domain; you’ll typically need to use anArecord rather thanCNAMErecords at that level, which is kind of crappy from an architectural perspective, and may not be ideal for e.g. load balancing.
On the other hand:
- Specifying
www.is a bit redundant. That information — that you’re talking to the “web server” — is already implicit in URL scheme (and port).
- Partly for this reason, the Web 2.0 aesthetic favors URLs of the form
http//mydomain.tld/. But also becausehttp://cleverna.me(not a live website as of this writing) simply looks a lot cooler thanhttp://www.cleverna.me.
- It’s a bit less for people to type.
- It affords you more room in printed collateral like business cards (provided you’re not gauche enough to use a “URL shortener” in those contexts).
Most sites these days provide both when they can.
The Only Slightly Less Obvious
If you offer your site at both www.mydomain.tld and
mydomain.tld, however, that means you end up with at least
two major URL prefixes for every page. In a lot of situations this won’t
matter too much (and services like Google are typically smart enough to
collapse duplicate results in this sort of situation), but sometimes it
actually matters a bit.

One of the places it matters has to do with browser history. If you care about visually distinguishing visited and unvisited links, for example, this will be easily defeated by serving your site from multiple URLs.
This can be particularly bad for sites like MS Paint Adventures, where link coloration is the main mechanism available to discover where you left off reading.
The obvious solution to this is to pick one or the other domain as the canonical one to use in URLs, and redirect users who visit non-canonical URLs to it.
How to Make Things Worse
I’ve seen a lot of people try to address this at the application level.
You can make sure to generate all your links with an explicit
www. (or none) in them, so that whatever users click they
are getting the. Maybe you even do a bit of dynamic stuff so that your
PHP emits a 301 if the domain is wrong.
The trouble with this approach is that it’s very easy to miss something
and end up with an inconsistent result. If you hadn’t done anything
fancy, a user who had bookmarked one version or the other of your URL
and relied on that bookmark would still get consistent results. But since
you can end up redirecting in some cases and not others, or generating links
with www. in them sometimes and not other times, then there’s
not really much the user can do about that situation.
Fixing It at the Source
Rather than trying to cover all the cases in your application (refactoring to DRY helps, but it is not a panacea for corner cases), I think it’s better to have an entirely separate web application that handles redirecting requests to non-canonical URLs to the real application. At its very simplest, this could look like (as a minimal Rack application):
run proc { |env|
[301, {"Location" => "http://mydomain.tld/",
"Content-Type" => "text/plain"},
["http://mydomain.tld/"]]
}
Deploy as a config.ru on Heroku, set
up www.mydomain.tld on Heroku and in DNS as a virtual host for
it, and you’re all set.
Of course, it would be pretty obnoxious if you were to actually do only this.
There’s no need to penalize anyone who happens to add a www.
to the URL of a page on your site by redirecting them to the top-level
page. It’s pretty easy (if a little tedious) to redirect them to the same
URL, just without the leading www.:
require 'uri'
CORRECTIONS = {
'www.mydomain.tld' => 'mydomain.tld'
}
run proc { |env|
scheme = env['rack.url_scheme']
host = env['HTTP_HOST'] || env['SERVER_NAME']
real_host = CORRECTIONS[host]
if not real_host
return [404, {"Content-Type" => "text/plain"},
["No correction for host #{host}\n"]]
end
path = URI.escape(env['SCRIPT_NAME']) +
URI.escape(env['PATH_INFO'])
query = env['QUERY_STRING']
query = "?#{query}" if query and not query.empty?
real_url = "#{scheme}://#{real_host}#{path}#{query}"
headers = {"Location" => real_url,
"Content-Type" => "text/plain"}
if env['REQUEST_METHOD'] == 'HEAD'
body = []
else
body = [real_url]
end
[301, headers, body]
}
(If you were extravagant and registered a bunch of typo domains for your site, this will work for redirecting users from them too.)
This is approximately the app I use for my own sites (excluding, for now,
moonbase.rydia.net). Note that I’ve omitted some things like
cache headers (and the actual class definition — the real thing isn’t just
a bare proc) for brevity. If there’s interest (let me know on Twitter or
by using the feedback form), I can package the real thing in a gem as a more
proper Rack application.
Edit: There’s also this