I post at SearchCommander.com now, and this post was published 13 years 10 months 8 days ago. This industry changes FAST, so blindly following the advice here *may not* be a good idea! If you're at all unsure, feel free to hit me up on Twitter and ask.
This morning I got an e-mail regarding a domain from Google Webmaster Tools, pointing out that “…some of your pages were using techniques that are outside our quality guidelines”.
Wow! What the…?
It went on to say that “pages from yourdomain.com are scheduled to be removed temporarily from our search results for at least 30 days”.
So in looking at the pages, I admit that they were actually some crappy pages from 2004, that was never meant to go live.
Furthermore, they were in development subdomains that should never have even been crawled.
There has never been a link to either of these subdomains anywhere before, and I can’t imagine how they got crawled, but that’s clearly MY fault, since I did not exclude them with my robots.txt file.
(Never mind the fact that looking at your robots file gives your competitors a window into your dev domains, thereby forcing you to hassle with use passwords, Googlebot is free to, and will, just root around wherever it wants, unless you ask them not to.)
Thanks for the Notice
First let me commend Google for the communication. They’ve apparently tracked down nearly every e-mail address on the domain and set a copy of this to everyone so it couldn’t possibly be missed. (I wish the Malware team could show that same level of communication).
However, SHOULD the pages really be removed from Google’s index?!? Should they be censored?
The link to Google’s guidelines (of course) doesn’t actually say what’s wrong, but if they have a problem with content, perhaps they could be more specific? It seems pretty arbitrary…
I just looked at each page again, and while I DO admit they ARE worthless, I still don’t see how they actually violate their guidelines.
Is it Just Those Pages?
Another question is “Which pages exactly would they remove with this censorhip?” It doesn’t say, “… THESE pages will be removed” , it just threatens “…pages from yourdomain.com will be…” so it appears as if they COULD be talking about more than just these pages?
So is Google going to become the arbiter of quality? Might this be the beginning of what Matt Cutts was talking about at Pubcon when he was asked about what he called “content farms” when he mentioned the internal debate at Google over their value at Google?
According to WebProNews:
Another thing that turns Google users off, however, is bad content. According to Cutts, there is a debate going on internally at Google over whether they should consider content farms web spam. Rich Skrenta’s Blekko was mentioned, as it has innovated with its slash tags that allow human recommendation. Cutts says they’re wrestling about this at Google, so it will be interesting to keep an eye on that.
Users are pretty angry with content farms, he says, adding that there may be a time for web spam at Google to take action against them. It’s worth noting that when we talked to Cutts at SMX Advanced back in the summer, he mentioned that Google’s infamous “Mayday” update tends to affect auto-generated and content farms the most, but the update was not part of the web spam team’s efforts. It was part of general search quality, with no human intervention involved. It’s strictly algorithmic.
Matt made it pretty clear at Pubcon when asked what HE thought should be done, that he was leaning towards the side of censorship, (okay, he didn’t actually use THAT word) but it concerns me because I think that makes a pretty slippery slope.
I’d like to know who has the right to make that decision to remove content, and exactly how it’s being made? How many people are given this power?
What happens to the employee that pulls something out of the index because they simply disagree, or think that someone is offering up a stupid opinion? Can they pull pages from the index without a second opinion? Is there a commitee?
I suppose that the question of whether this is censorship or not could only be judged by me posting and sharing the content here for opinions, I’m not going to. I admit, it was crap.
The timing of this email was a bit ironic, because last night at an SEMpdx talk I stated (quite confidently) that Google didn’t have any real motivation to filter out the crap unless it ranked well or unless made them look bad, because their mission is really to make money for stockholders and sell ads.
Well in this case, the “crap” neither ranked well NOR was it ever intended to even be public, so it looks like I might be a fool, huh? Besides, the folks at Google can do whatever they want. Nowhere in their mission statement does it say they’ll be fair…
Here’s the whole email…
The way that email reads it sounds like they are more concerned about sites looking like they have been hacked.
Hmm, maybe so, but to remove pages from the index just because they don’t like ’em? Seems like they’re overreaching t ome…
Yea i agree but, Googles can be kinda strange sometimes.
I guess I don’t really understand what was wrong with your pages. It would really help to know what the deal was with them. Were they using outdated search engine ranking techniques like keyword stuffing? Were they just gobbledygook? It would be nice to know what you mean by worthless.
They were basically keyword stuffed repetitive text – Not nonsensical gobbledygook, but close…
If google would remove all crappy content from their database the internet would be a better place… if…
I have a web site http://www.tekaseo.com
it originally contains 23 pages but from last week may be due to some virus, automatically my pages is increasing in Google index. i am using online tools to check indexed page and request many time to remove from google index but not works what should i do… ?
Well it sounds like you must have a virus. Have you found the ascending pages and deleted them from server? If they are all in a certain directory you could exclude that directory from your robots.txt file but that still doesn’t get the root of the problem, which is the site may be hacked.