This blog has historically had a relatively high PageRank value, a benefit I attributed to the fact that the software that drives it was created to meet what everybody understood to be best SEO practices in 2000-2001 — for example, each article has a short and eternal URL (mouse over any article title to see its permalink), a unique and descriptive page title, and so on. I’m not sure whether SEO was an industry in 2001, but certainly it was a skill of webmasters and web engineers to build indexable websites, and SEO guidelines were formost in my mind as I architected this website.
My site’s PageRank changed in early 2006. I was reading about SEO, ironically, and I learned that sites answering to multiple URLs risk getting penalized in Google’s index due to Google’s new (early-2006) duplicate content filtering algorithm. Although I’ve only ever published my own blog URL without the ‘www’ hostname, the server hosting this site responded to both forms of the URL: both debris.com and www.debris.com. Every page on the website was available at both addresses.
I followed Matt Cutts’ advice in early 2006 to use a 301 redirect to automagically forward everyone who was trying to get to www.debris.com (or any page there) to the equivalent page at debris.com. Shortly after that, my PageRank disappeared — the Firefox PageRank plug-in reported it as “n/a;” multiple-datacenter survey tools mostly reported it as blank, with a couple datacenters inexplicably still showing 4-5.
At first I thought this was a temporary condition resulting from the 301 redirection. I waited a month or two to see if a subsequent PageRank push would reveal the effect I’d intended, that the previous PageRank of 5 for www.debris.com and 4 for debris.com would consolidate to a solid 5 or maybe even a 6 for the canonical domain. Alas, this never happened.
In May, 2006 I created an XML sitemap to seed Google’s crawlers with the newest content from this site, thinking that perhaps this would cement in the crawlers’ collective mind that, despite inbound links from 3rd party sites to formerly duplicate-content URLs, everything was happy and canonical and uniquely addressed on the server, using the technique advocated by Google’s own webspam master, Matt Cutts. The sitemap reduced traffic from the crawlers — reflecting my on-again, off-again publishing style — but unfortunately didn’t correct the site’s missing PageRank.
By early 2007, I had waited more than six months. I wondered if my site had been inadvertently penalized, for its PageRank never came back.
So, I did the thing I should have done last March, as soon as my PageRank disappeared — I filed a reinclusion request. Honestly I didn’t really need to be “reincluded,” as my site was still in the Google index, and did turn up in searches for which my site is authoritative. But it was the only trigger I had left to pull.
A week or two later I took an additional step, inspired by a blog post by Rogers Cadenhead. He described that his blog software showed his entries at multiple addresses — the home page, the category pages, the tag pages, and on each item’s permalink page. This is how debris.com works as well. Rogers describes this as a “huge mistake.” I personally disagree; this seems to me to be a service to the user. And I have to point out that Matt Cutts’ blog shows full posts at multiple URLs, all of which turn up in Google’s index.
But as Rogers points out, the best case outcome of this seems to be that Google shows an italicized message at the bottom of results pages indicating that duplicate results have been omitted. I followed his lead and sequestered all but the permalink URL from Google’s crawlers, via “nofollow” attributes on category-page HREFs and robots.txt entries for those pages. Also I dropped the category pages from the XML sitemap.
I’m not sure these category-page changes made any difference, but the reinclusion request certainly did; this site is now showing an all-time high PageRank of 6.
And it only took 30 days. Or 13 months, depending on how you count it.