You may have been impacted by the April 2018 update to Yoast that has seen all your attachment pages indexed in WordPress. This is something that will hurt those of you with OCD, you know the OCD types, they like things straight, ordered, neat and tidy.
This oversight by Yoast isn’t just driving your OCD crazy, it is highly likely having an impact on your ranking and here is why!
If you already know the seriousness of this issue and would like the fix then click here.
Google Panda Is Why This A Problem
You have probably heard of Google Panda, launched in February 2011, its job is to asses the quality of your site and the lower Panda deems the quality of your site, the harder it will be to achieve higher rankings.
Google doesn’t count quality by the design of your site, what you think is quality, Google more than likely disagrees with.
So whilst you have what you believe to be the worlds best looking website, if it doesn’t rank for anything other than your brand or business name, there is a good reason!!!
Panda is trying to find out if your site is low quality or thin. The definition of thin content is a page with not much content on it and that is exactly what these attachment pages that WordPress creates for you are.
Given your site may have hundreds of images because like a good SEO, you are trying to create media-rich pages. By being media rich you are giving Google exactly what they want.
Say you have four images on each of your pages or posts, that means WordPress will create an attachment page for each image that you upload to your WordPress site.
If you have four images on a page or post, that means these attachment pages will outstrip your content pages or posts by a ratio of four to one.
That’s four low-quality thin content pages that WordPress has created by default for you!!!!!
So are you starting to see why this isn’t great for your SEO efforts?
Google Is Bad At Cleaning Up
Like a teenager, trying to get Google to clean up after themselves is no mean feat. They want to index as much of the web as they can and whilst they do a great job at getting stuff into the index, they do a terrible job at getting stuff out of the index.
Anyone who has done a site audit or is up to speed with Technical SEO, knows all too well the daily battle with Google in trying to clean a site up.
The issue is further compounded by the fact Google won’t tell you what pages they have in the index for yours or your clients website.
Remember the good old site: search operator you could use, well for a long time now this hasn’t returned all of the pages Google has indexed for your site.
So according to this, we currently have 856 pages in the index.
A quick trip to Google Search Console and you can see it’s more like 1251
Want to download all 1251 pages so you can make sure that what is in the index is what you want? Good luck with that one!!!!
A quick look at our website sitemap will reveal the true number.
I know this sitemap is working as should be, so that means I have nearly double the number of pages in the index than I should have and no way to identify what they are.
So how can we get these extra pages from the index? Well, we’ve created a handy little WordPress plugin that will do the trick for you. If you want to download a copy click here.
How This Has Come About
The default setting in WordPress when you Add Media to your Posts and Pages is to link to these media Attachment Pages. So when your site is crawled, the spiders will discover the existence of these pages.
So it’s best practice to make sure when adding media you select the Link To None option
The second issue is that WordPress have done a good job of interlinking all of the attachment pages. So if the option is set to the default Link To Attachment Page option, you are in a whole world of pain!!!!
So you can bet your bottom dollar, that when you link to one of these attachment pages, they are all going to end up in the index, if your site is in its default settings.
Please note, Google does a good enough job of finding these without you having to ever link to them in the first place.
Yoast has historically done a great job of dealing with this particular issue, in its list of great features that it currently has. It’s been our WordPress go to SEO plugin for years.
It has a setting in WordPress to allow you to be able to noindex your media attachment pages or to redirect the pages to the parent image.
If you’ve never switched this on, it’s a good idea and it’s something we like to do at Outline Creative for all our new WordPress builds.
SEO > Search Appearance > Media
You should see the following settings:
The first one will redirect all your attachment pages to the image URL via a 301 redirect.
This is useful for both stopping and fixing the issue as when the page is crawled or recrawled, the search engines pick up on the fact the page has moved and therefore removes the Attachment Page from the index.
From the testing we did, whereby over 1800 attachment URLs had been indexed and quadrupled the size of a site’s number of indexed pages, the redirect to attachment pages seemed to take forever to work its way through.
The second option will set the Attachment Pages to noindex and also remove the attachment pages from your sitemap.
Either of these two options will prevent the indexation of media pages but none are too great at recovery.
When Yoast updated their plugin to version 7, they reset these options which led to hundreds of thousands if not millions of website pages being impacted.
The worst part is this came at a time when Google has released updates targeting sites with thin content and poor quality pages.
robots.txt is No Use Either
Usually, in order to prevent the indexation of unnecessary pages, it’s a good idea to have a well structured robots.txt file.
This can help with your crawl budget and make sure that when your site is being crawled, the search engines only visit the pages you want them too.
However, once something is in the index, modifying your robots.txt file can make the issue worse. You are likely to get all kinds of warnings in Google Search Console.
Another reason that robots.txt is no use with this particular issue, is there is no pattern to the creation of these media pages.
You need a pattern match like the ones below to effectively block something:
So whilst robots.txt is great at stopping the problem, it’s not too great when it comes to fixing the problem.
In fact, it can make it worse as we need these pages to be recrawled in order for the noindex flag to be picked up. By blocking them you are telling the search engines to not bother crawling these pages.
Manually Remove Them!!!
That sounds like hard work and it is!!! Google has a removal tool so that you can notify them of outdated content. Outdated falls into a couple of categories, the page is suffering from a 4xx error, like a 404 or a 410 or the page has a noindex flag.
One of the things we tried to fix this issue is to set the page to noindex using the Yoast settings like below:
Then using this removal tool you can submit the removal request and within a few hours the page is removed from the index.
This works quite well for small sites, but if you have a site with thousands of indexed Attachment Pages you are going to be at it for some time.
Also at this point, it’s worth noting that there is a limit of 1000 pages that can be removed using this tool. So as well as being slow it’s not going to help you on a larger site.
410 Error To The Rescue
So what is a 410 error? Unlike a 404 error which can be a temporary error, a 410 is saying the page has permanently gone and will never come back again.
Therefore you are telling the search engines to give up trying to come back and see if the page still exists.
A 410 error results in almost the immediate deindexation of a page or post and seeing as Google likes a clean index (in fact seems to rewards it), we would advocate that if you remove a page from your website intentionally, then you should either 301 redirect that page or 410 the page.
What if it was possible to harness the power of a 410 error on some of these Attachment Pages to get rid of them from the index as quickly as possible?
We couldn’t find an easy way to identify the Attachment Pages like you can author pages, tag pages or category pages via the permalink structure. So we had to reach for the PHP editor and code ourselves a solution.
After testing the plugin we’ve seen over 700 attachment pages dropped from the index on one site in just a few days, which is astonishing really and much faster than trying to remove them using the removal tool which Google supplies.
Although we are pretty chuffed with our latest creation, our client was even happier as they started to see their site recover and their revenue restored.
Attachment Page to 410 Plugin
We have developed a WordPress plugin that will return a 410 error message, whenever a client tries to access one of your Attachment Pages.
The client could be a website visitor or Googlebot or any search engine for that matter.
The importance of making the error message a 410 rather than the usual 404 is important for a long-term fix for this issue.
A 410 error is telling the search engines this page has intentionally been removed so take it from the index, whereas a 404 error can be deemed as a temporary error.
A recent audit on a site saw a page that was deleted 6 months previously, still being flagged as a 404 error in Google SC and still in the index.
What we like about a 410 is that when Google sees one, it takes immediate decisive action and using it on an Attachment Page means that when it next crawls, it will see the 410 error and delete the page from the index.
A 404 could take weeks or months for Google to eventually give up and remove the page from the index.
Please feel free to download the plugin and upload and install it in the usual way.
Disclaimer – Every effort has been taken to test this plugin but it is supplied as is. We cannot be held responsible if the plugin does not behave as expected.
Outline Creative are an SEO and Design Agency. Our main focus is on Search Engine Optimisation or SEO for short. Simply put, SEO is where we use our expert knowledge to get your website to the top of Google, and other major search engines.
About us and this blog
We are a digital marketing company with a focus on helping our customers achieve great results across several key areas.