Maybe you’ve already heard of this, since it’s not very recent news, but I thought it was worth mentioning because it hasn’t got the exposure it deserves, plus it appears it’s still going on.
I had been spotting dozens of strange organic search referers in a web site’s stats lately: extremely generic keywords, that do occur in the web site’s corpus of web pages, but for which that web site never ranked on any search engine that I know of.
Today I finally decided to check out the logs, and ran into this:
65.55.165.122 - - [02/Nov/2007:05:07:14 +0100] "GET /requested/url.html HTTP/1.0" 200 51862 "http://search.live.com/results.aspx?q=keyword&mrt=en-us&FORM=LIVSOP" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
It would look like a genuine Live Search referer, except that:
- the requested URL is nowhere to be found on the referring SERP (note the unusual FORM=LIVSOP URL parameter);
and - the client IP is from Redmond.
Hmm… :/
A simple Google search pointed me to the right answer: it turns out that this is a “quality check” [sic!] that the Live Search team have been doing for a while, as officially confirmed by msndude (Live Search’s rep) in this WebmasterWorld thread (msg #3442263).
Now, why the Live Search folks decided to hit web sites with fake referers is beyond me: maybe a stupid attempt to check for referer-based cloaking? I dunno. I had seen Slurp (Yahoo!’s crawler) issuing a spoofed user-agent before, but nothing nearly as sneaky and spammy as this. Many small webmasters are understandably furious about Microsoft deliberately choosing to fill their logs with junk.
The only advice I can give them is to filter out all referers having the string “FORM=LVSP” or “FORM=LIVSOP” in the URL.
A few blog references:
- Microsoft Live Search’s Strange Spam-Like Referrals Are Official Tests [Search Engine Roundtable, September 6, 2007]
- Microsoft is lying and intentionally screwing up your log files (FORM=LVSP|LIVSOP) [Exposure Online, October 9, 2007]
- Stupid msnbot FORM=LVSP and FORM=LIVSOP [Blogboing, October 10, 2007]



well, i think you spotted the point. referer-based cloaking is maybe the hardest cloaking to detect.
so, adding a fake referer can help search engines to find it out…
am i wrong?
Hi Stefano!
I’m still not sure this is an automated check for referer-based cloaking. The same IP has also been requesting linked CSS and JS files… Take a look:
65.55.165.15 - - [02/Nov/2007:02:02:58 +0100] "GET /requested/url.html HTTP/1.0" 200 65429 "http://search.live.com/results.aspx?q=keyword&mrt=en-us&FORM=LIVSOP" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"65.55.165.15 - - [02/Nov/2007:02:02:59 +0100] "GET /css/main.css HTTP/1.0" 200 2832 "http://www.example.com/requested/url.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
65.55.165.15 - - [02/Nov/2007:02:02:59 +0100] "GET /js/prototype.js HTTP/1.0" 200 96046 "http://www.example.com/requested/url.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
65.55.165.15 - - [02/Nov/2007:02:03:00 +0100] "GET /js/scriptaculous.js?load=effects HTTP/1.0" 200 2152 "http://www.example.com/requested/url.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
65.55.165.15 - - [02/Nov/2007:02:03:01 +0100] "GET /js/effects.js HTTP/1.0" 200 31969 "http://www.example.com/requested/url.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
65.55.165.15 - - [02/Nov/2007:02:03:01 +0100] "GET /js/lightbox.js HTTP/1.0" 200 23825 "http://www.example.com/requested/url.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
65.55.165.15 - - [02/Nov/2007:02:03:02 +0100] "GET /css/lightbox.css HTTP/1.0" 200 1637 "http://www.example.com/requested/url.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
If the guys at MS really wanted to check for cloaking, why issue an easily recognizable (and, thus, cloakable!) referer in the first place?
I don’t think this was intentional, after all.
I think the referer with the “FORM=LIVSOP” parameter might have been leaked by an internal interface used by Live Search’s quality raters instead (remember, msndude said this was a “quality check”)… Maybe (I’m guessing) “LIVSOP” = “LIVe Search OPerator [Console]”?
Just a thought…