Is cloaking evil? It’s one of the most heavily debated topics in the SEO industry — and people often can’t even agree on what defines cloaking. In this column, I wanted to look at an example of what even the search engines might consider “good” cloaking, the middle-ground territory that page testing introduces plus revisiting how to detect when “evil” old-school page cloaking is happening.
Back in December 2005, the four major engines went on record at Search Engine Strategies Chicago to define the line between cloaking for good and for evil. From the audience, I asked the panelists if it was acceptable to — selectively for spiders — replace search engine unfriendly links (such as those with session IDs and superfluous parameters) with search engine friendly versions. All four panelists responded “No problem.” Charles Martin from Google even jumped in again with an enthusiastic, “Please do that!”
URL Rewriting? Not Cloaking!
My understanding is that their positions haven’t changed on this. Cloaking — by its standard definition of serving up different content to your users than to the search engines — is naughty and should be avoided. Cloaking where all you’re doing is cleaning up spider-unfriendly URLs, well that’s A-OK. In fact, Google engineers have told me in individual conversations that they don’t even consider it to be cloaking.
Because search engines are happy to have you simplify your URLs for their spiders — eliminating session IDs, user IDs, superfluous flags, stop characters and so on — it may make sense to do that only for spiders and not for humans. That could be because rewriting the URLs for everyone is too difficult, costly or time intensive to implement. Or more likely, it could be that certain functionality requires these parameters, but that functionality is not of any use to a search engine spider — such as putting stuff in your shopping cart or wish list or keeping track of your click path in order to customize the breadcrumb navigation.
Many web marketers like to track which link was clicked on when there are multiple links to the same location contained on the page. They add tracking tags to the URL, like “source=topnav” or “source=sidebar.” The problem with that is it creates duplicate pages for the search engine spiders to explore and index. This leads to a dilution of link gain or PageRank, because all the votes that you are passing on to that page are being split up because of the different URLs you are using. Ouch.
How about instead you employ “good cloaking” and strip out those tracking codes solely for spiders? Sounds like a good plan to me. Keep your analytics-obsessed web marketers happy, and the search engines too.
Is Testing Bad Cloaking?
Uncovering User Agent Based Cloaking
The “bad” cloaking from a search engine point of view is that deliberate showing to a spider content that might be entirely different than what humans see. Those doing this often try to cover their tracks by making it difficult to examine the version meant only for spiders. They do this with a “noarchive” command embedded within the meta tags. Googlebot and other major spiders will obey that directive and not archive the page, which then causes the “Cached” link in that page’s search listing to disappear.
So getting a view behind the curtain to see what is being served to the spider can be a bit tricky. If the type of cloaking is solely user agent based, you can use the User Agent Switcher extension for Firefox. Just create a user-agent of:
under Tools > User Agent Switcher > Options > Options > User Agents in the menu. Then switch to that user agent and have fun surfing as Googlebot in disguise.
Uncovering IP Based Cloaking
But hard-core cloakers are too clever for this trick. They’ll feed content to a spider based on known IP addresses. Unless you’re within a search engine — using one of these known IP addresses — you can’t see the cloaked page, if it also has been hidden by being kept out of the search engine’s cache.
Actually, there’s still a chance. Sometimes Google Translate can be used to view the cloaked content, because many cloakers don’t bother to differentiate between the spider coming in for the purpose of translating or coming in for the purpose of crawling. Either way, it uses the same range of Google IP addresses. Thus, when a cloaker is doing IP delivery they tend to serve up the Googlebot-only version of the page to the Translate tool. This loophole can be plugged, but many cloakers miss this.
And I bet you didn’t know that you can actually set the Translation language to English even if the source document is in English! You simply set it in the URL, like so:
In the code above, replace the bolded URLGOESHERE part with the actual URL of the page you want to view. That way, when you are reviewing someone’s cloaked page, you can see the page in English instead of having to see the page in a foreign language. You can also sometimes use this trick to view paid content, if you’re too cheap to pay for a subscription.
Many SEOs dismiss cloaking out-of-hand as an evil tactic, but in my mind, there is a time and a place for it (the URL simplifying variety, not the content differing variety), even if you are a pearly white hat SEO.
Stephan Spencer is founder and president of Netconcepts, a 12-year-old web agency specializing in search engine optimized ecommerce. He writes for several publications plus blogs at StephanSpencer.com and Natural Search Blog. The 100% Organic column appears Thursdays at Search Engine Land.