I post at SearchCommander.com now, and this post was published 14 years 5 months 17 days ago. This industry changes FAST, so blindly following the advice here *may not* be a good idea! If you're at all unsure, feel free to hit me up on Twitter and ask.
Inside of Google Webmaster Tools, there is an option called “Fetch As Googlebot” that is supposed to go crawl that page and return what it sees.
Until this moment, I’ve never had much use for this, but that’s not the case now!
I discovered a problem when a website which had been hacked and then fixed was still showing the polluted snippet in the description on the results page.
A client reported seeing pharmaceuticals mentioned on their listing on a SERPS page (never good) and I repeated the search, saw the same thing, then visited the page to view the source. I verified that yes, the page is clean, and there’s no drugs mentioned there….
Normally there’s no way to get Google to update the cache faster unless you link to it, but I decided to update the XML site map anyway, and decided to take a shot at “Fetch as Googlebot” figuring that it couldn’t hurt.
What I found, surprised me…
Was Google not “fetching” this live? Was it pulling some outdated version of the page? Β I thought for sure that the site was clean, so either Google is pulling from cache, or the site isn’t really fixed, right?
I decided to do a quick test by changing some words in a blog post on my own site, and then I did a quick fetch.
I went to Google Webmaster Tools – Chose to Fetch as Googlebot – and yes, Google DID fetch it live!
So, this is the dilemma… Is the site still hacked? Is there something really insidious that makes it look okay to us, but not to others?
Obviously something is wrong, but what is it?
I’m posting this now without a solution, because I DON’T KNOW THE ANSWER – but as I look into it further I’ll update this post.
In the meantime, Β got any ideas?
For the record –
- There are no warnings about Malware in Google WMT
- “view source” showing correct title tag etc., and no WP info
- Visiting the site though a proxy shows it’s fine too
- The server response code thrown back by the URL is a 200 OK
***Update – July 9th, Β 9 am***
Ok, so this site is NOT WordPress, but thanks to @blafrance we think we’re on the right track after he sent me a link to the WordPress Pharma hack –
While we exhibit none of the same afflictions, the end result is the same – a hack visible only to the search engines.
Last night their programmer found some malicious code in a .php file that he removed, but it’s still fetching the bad info. Β Here’s the code in it’s entirety, although not much help…
***Update July 10, 2010***
The solution was of course that the site was still hacked, and there was another piece of malicious code like what is shown above in the header file.
I think the most interesting and scary thing about this pharma hack was that it did not affect every page. Β It’s going to become harder and harder to protect yourself…
Try scanning for malware at – http://sucuri.net/
Cool site, Roseli, thanks – wow- pretty spendy, isn’t it?
I suppose it could be well worth it, especially with monitoring…
This sounds complicated to me, but it’s a useful case for me to refer. Hope that you will soon find the solution.
Awesome article. You have some great insights. I love to read your blog while I’m at work to help pass time.
I also use google webmaster tools but never really knew what the google bot fetch was for.
My WP blog was recently hacked and had a similar change added. The monitoring from companies like sucuri is really useful but there are still so many things we can do as prevention. WP Secure & Invisible Defender are my two favourites – hope they work for you.
Cheers
After reading this article, should I assume or Must assume that Google Bot not working properly?
Actually, I think it is, because based on the page names, i can tell that someone scraped my content…
Same exact issue. Really frustrating because sucuri, google and go daddy malware scan are saying that it’s clean…