Page Cloaking - To Cloak or Not to Cloak
By Sumantra
Roy
Page cloaking can broadly
be defined as a technique used to deliver different web pages under
different circumstances. There are two primary reasons that people use
page cloaking:
i) It allows them to
create a separate optimized page for each search engine and another
page which is aesthetically pleasing and designed for their human visitors.
When a search engine spider visits a site, the page which has been optimized
for that search engine is delivered to it. When a human visits a site,
the page which was designed for the human visitors is shown. The primary
benefit of doing this is that the human visitors don't need to be shown
the pages which have been optimized for the search engines, because
the pages which are meant for the search engines may not be aesthetically
pleasing, and may contain an over-repetition of keywords.
ii) It allows them
to hide the source code of the optimized pages that they have created,
and hence prevents their competitors from being able to copy the source
code.
Page cloaking is implemented
by using some specialized cloaking scripts. A cloaking script is installed
on the server, which detects whether it is a search engine or a human
being that is requesting a page. If a search engine is requesting a
page, the cloaking script delivers the page which has been optimized
for that search engine. If a human being is requesting the page, the
cloaking script delivers the page which has been designed for humans.
There are two primary
ways by which the cloaking script can detect whether a search engine
or a human being is visiting a site:
i) The first and simplest
way is by checking the User-Agent variable. Each time anyone (be it
a search engine spider or a browser being operated by a human) requests
a page from a site, it reports an User-Agent name to the site. Generally,
if a search engine spider requests a page, the User-Agent variable contains
the name of the search engine. Hence, if the cloaking script detects
that the User-Agent variable contains a name of a search engine, it
delivers the page which has been optimized for that search engine. If
the cloaking script does not detect the name of a search engine in the
User-Agent variable, it assumes that the request has been made by a
human being and delivers the page which was designed for human beings.
However, while this
is the simplest way to implement a cloaking script, it is also the least
safe. It is pretty easy to fake the User-Agent variable, and hence,
someone who wants to see the optimized pages that are being delivered
to different search engines can easily do so.
ii) The second and
more complicated way is to use I.P. (Internet Protocol) based cloaking.
This involves the use of an I.P. database which contains a list of the
I.P. addresses of all known search engine spiders. When a visitor (a
search engine or a human) requests a page, the cloaking script checks
the I.P. address of the visitor. If the I.P. address is present in the
I.P. database, the cloaking script knows that the visitor is a search
engine and delivers the page optimized for that search engine. If the
I.P. address is not present in the I.P. database, the cloaking script
assumes that a human has requested the page, and delivers the page which
is meant for human visitors.
Although more complicated
than User-Agent based cloaking, I.P. based cloaking is more reliable
and safe because it is very difficult to fake I.P. addresses.
Now that you have an
idea of what cloaking is all about and how it is implemented, the question
arises as to whether you should use page cloaking. The one word answer
is "NO". The reason is simple: the search engines don't like
it, and will probably ban your site from their index if they find out
that your site uses cloaking. The reason that the search engines don't
like page cloaking is that it prevents them from being able to spider
the same page that their visitors are going to see. And if the search
engines are prevented from doing so, they cannot be confident of delivering
relevant results to their users. In the past, many people have created
optimized pages for some highly popular keywords and then used page
cloaking to take people to their real sites which had nothing to do
with those keywords. If the search engines allowed this to happen, they
would suffer because their users would abandon them and go to another
search engine which produced more relevant results.
Of course, a question
arises as to how a search engine can detect whether or not a site uses
page cloaking. There are three ways by which it can do so:
i) If the site uses
User-Agent cloaking, the search engines can simply send a spider to
a site which does not report the name of the search engine in the User-Agent
variable. If the search engine sees that the page delivered to this
spider is different from the page which is delivered to a spider which
reports the name of the search engine in the User-Agent variable, it
knows that the site has used page cloaking.
ii) If the site uses
I.P. based cloaking, the search engines can send a spider from a different
I.P. address than any I.P. address which it has used previously. Since
this is a new I.P. address, the I.P. database that is used for cloaking
will not contain this address. If the search engine detects that the
page delivered to the spider with the new I.P. address is different
from the page that is delivered to a spider with a known I.P. address,
it knows that the site has used page cloaking.
iii) A human representative
from a search engine may visit a site to see whether it uses cloaking.
If she sees that the page which is delivered to her is different from
the one being delivered to the search engine spider, she knows that
the site uses cloaking.
Hence, when it comes
to page cloaking, my advice is simple: don't even think about using
it.