Google keeps tweaking their search and algorithms, and lately it has been causing a lot of grief. I've been having some difficulties of finding what I have been searching. I've seen pages not indexed for myriad reasons. I've been frustrated with Google Search Console notifying me about issues that don't really exist.
Today, I received a notification from Google Search Console about a new coverage issue: "Submitted URL seems to be a Soft 404"
I had no clue what Soft 404 is.
Of course I know what 404 is. It is a common HTTP status code. It's received when you try to access content that doesn't exist anymore or didn't exist in the first place. Often also known as (Page) Not Found error, it usually happens because you have a typo in a web page address or you clicked a broken link.
Often websites have configured a custom 404 error page. They may just plainly tell that the content you are looking is not there and ask to check the address again. Or they may include links to popular pages or articles on the site, a search form or some other ways to direct you towards finding relevant content on the site. In some cases 404 page attempts to be funny or cutesy.
But what the hell is Soft 404?
Turns out Soft 404 is Google's own little thing. It has been introduced already in 2010, but it appears to have been increasing in recent years.
Simply put, HTTP status codes are how your website's server tells a browser if your site is working, if the requested page is available and if the address is redirected elsewhere, and much more.
When the server is functioning and the page accessed is okay, the server usually returns code 200, which is also known as OK. It's the standard response for successful HTTP requests, which doesn't tell much more than the server is working and the page exists.
Google introduced the Soft 404 apparently as a response to many sites having misconfigured 404 pages. Instead of giving out the standard Not Found status, these sites have 404 pages that are giving out 200 OK. This leads to broken links and 404 pages being indexed and returned in search results. So Google invented Soft 404, as an attempt to sort out useless content from being indexed.
However, the misconfigured 404 pages appear not to be only content that gets punished as Soft 404. Google attempts to exclude empty pages, which sometimes get published accidentally with your CMS or blogging system such as WordPress. Some have theorised that a short content, which includes the phrase "not found", could lead to Soft 404.
The pages that Google had marked Soft 404 on my site were short blog posts. Brief notes about the first website I created, a post that mainly included a video, an invitation to vote on a past poll and three words about a blog series I had ended.
Granted, the video I had embedded on that one post had been removed from YouTube, and I had to embed a new one. But other content didn't include anything broken. Just short, extremely short.
It looks like Google is attempting to sort out the so called thin content with these Soft 404 errors. If you have very short, blurb like blog posts on your site, they may end up being mistaken as broken or empty pages by Google's algorithms.
I understand this need for sorting out what appears to be thin content. Especially in the light of recent events where misleading or completely false content has been used to influence politics and elections, this seems like a good idea. Yet I'm unimpressed.
Too much can go wrong with this.
While most of the pages that were marked as Soft 404 on my site don't really matter to me that much, it's still rather unnerving to see that short posts are automatically considered to be errors.
As I have often tried to underline, short content doesn't mean thin content. It just means short. Short content can be rich of information, as well as long content can be full of fluff.
As an example, the well known author and entrepreneur Seth Godin writes famously short blog posts. While some of them are more superficial, there are also some pretty deep thoughts in these short blurbs. I wonder, if Google sees them as Soft 404 as well, or if fame and popularity makes a person exempt.
It's unclear how much these Soft 404 error affect your site's overall search ranking. Obviously it means that those pages aren't indexed anymore, but if they can give the whole site trouble... That's what I don't know. I would assume they shouldn't affect your website in general, since the regular 404 errors don't usually do that either.
That said, a whole lotta Soft 404 errors could be a problem and make Google see your site as less relevant. If this is concerns you, I recommend to try and fix the issues.
Start by checking which pages Google is marking with this made up error. You can see the pages listed in the Google Search Console, under Coverage.
Are those pages empty or just not working as they should? Empty and otherwise useless pages can be removed completely. Try to fix broken content. If you can configure your website's sitemap, you can remove pages that don't belong there. Set redirects, if needed.
If the content marked as Soft 404 is just short, see if you can expand it. You might also consider marking these pages as not to be indexed with the noindex tag, so that Google doesn't even attempt to include them in the searches. This can be accomplished with different SEO tools or by editing the meta tags directly in the page's code. Just don't accidentally mark your whole site as not to be indexed.
I had no interest in editing these posts, other than fixing that missing video and some links. I removed the three sentence post from my sitemap, but left it published and available for indexing for now. Just to see what happens. I'm funny like that.
Anyway, that's the basics about Soft 404 and what you can do if your site gets hit by it. It's not the end of the world, and probably it doesn't affect the overall ranking of your site. Unless all or most of your content gets marked with it. It feels unfair and really annoying, but other than fixing, removing and editing the content, setting redirects and noindex tags, there's not much to do. Google's algorithms do what they do, sometimes messing up things gloriously.