How to prevent indexing of filtered pages and paid content

No bickering, I’ll start with,

How to prevent indexing of filtered pages

What are filtered or faceted pages?

You have an eCommerce site.

This site has a lot of pages.

Like,

-homepage

-category pages

-listing pages

-collection pages

-product pages

And others.

Every page will have a meta canonical URL tag.

But what about filtered pages or faceted pages?

Example:

On the eCommerce portal, Flipkart,

flipkart homepage

Check for any section listing page.

Say, go to Sports Shoes under Men.

flipkart mens shorts shoes

You get the below page.

mens sports shoe listing page

The link for the above page is

flipkart mens shoes page link

And the same for canonical URL.

Now I want running shoes that have over 20% discount and are Size 7.

I apply the filter.

I get the below page.

sports shoes filtered page

But the page link is as below.

flipkart mens shoes faceted link

Now logically speaking, both the pages are the same.

As the filtered page is a subset of the main listing page.

But I’m sure you have noticed that,

Every time you apply a filter, the page reloads with different query parameters.

And hence will be considered as different pages.

And therefore, it will be indexed.

So what is the issue here?

Answers:

1.Duplication

The filtered page will be considered as a duplicate.

So imagine with all the listing pages and all the filters each of those has.

As a result, the number of filtered pages increases exponentially.

And hence the number of duplicate-indexed pages increases at the same rate.

Which is bad for organic results.

2. Link authority/equity

The link equity will be split between the original and the duplicate pages.

3. Search engine crawl budget

Again, due to duplicate pages, the total number of pages on the website will increase.

Which is fine, but it is increasing because of duplicate pages, not original ones.

So how do you solve this?

Note:

There is no single solution to this. Depends on your site structure.

You need to add tags that inform the search engines to not index these pages.

Let us look at the options here.

1. Avoid generating new links

The first and simplest solution would be to avoid generating links through filters.

Altogether.

With the help of Javascript, you can achieve this functionality.

Where any number of filters would generate only one link.

You need to work with your coder on this one.

Downside?

Yes, there is.

You need to manually make sure that only the desired filters are applied.

And key filtered combination pages are still available for indexing.

2. Canonicalization

Many companies would use this as a solution.

It is, yes. But partly.

Canonical URL would ensure that the main page is still the original page.

And not the filtered page.

The link equity issue would be solved here.

But not the crawl budget one.

As even with the canonical tag, Google will still index the page.

3. Noindex, follow

What about this tag?

Google will follow but won’t index.

This should be ideal right?

Unfortunately, no.

Crawl budget will still be wasted here.

And these pages would continue to receive link equity.

Say, you want to index “black shoes” but not “black shoes under Rs.999”.

If you add Noindex to the latter it will be excluded.

But it won’t prevent bots from indexing.

4. “Disallow” on Robots.txt

What if the instructions were given in the root itself?

Providing Disallow instructions in the robots file would also pose an issue.

As link equity would be negatively affected.

As it would not be able to pass it anywhere.

Another downside here is even if you tell Google to not visit a certain page (or section), Google can still index it.

5. “Nofollow”

Filtered pages with this tag will also not solve the issue completely.

Duplicate pages will still get indexed.

So how DO you prevent filtered pages from indexing?

Well, as a general best practice, the following combination is recommended.

Canonical + Nofollow

Canonicalization will solve Duplicate content and link equity issues.

Nofollow will check for the crawl budget factor.

Together they should be effective enough to prevent filtered-page indexing.

What you need to remember is to apply this to ALL the desired filtered pages.

So that Google can identify the pattern.

An example would be Klein Feld Bridal.

This site has used this method to block the filtered pages

Now, coming to the second section of this post on

How to prevent indexing of paid content

By this, I mean pages that are available to read, only when subscribed.

Like the below one:

et paid content

You will be able to read it only once you start your subscription.

These are also called paywalled content.

I won’t be going through the payment process and connecting it here.

You need to check with your coder or service provider for that.

From an SEO point, you need to add a class with a JavaScript code to the content (in the body tag) which is paid or paywalled.

Like below

<body>

<h1>Xerox in a bid to buy HP.</h1>

<div class=”paywall”>In a surprise move, Xerox, today made a bold attempt to buy out HP for…</div>

</body>

In the above code, the H1 header should be indexed and visible.

But the second line is not.

The idea is that the heading captures the attention.

And users sign up to read the entire content.

The “paywall” added is an HTML class.

An HTML class is an attribute that assigns a style for elements.

All elements with the same class will have, display or function with the same style.

Simple Example:

Say, you have the below text.

html class

And the backend HTML code would be

html class backend code

Now, I want to add background colors to all three sections.

Instead of creating it for each of the sections, I create a separate class.

And then “call” that class to each of these sections.

As below.

html class added to body

You add the class in the head section.

I’ve created a class called “dm” (Digital Marketing).

html class added

And then I call them for each of the sections.

And the output will be:

html class output

This is a small example of the usage of the HTML class.

So, coming to the main topic, you add a class to your text.

And create the class using the Javascript below.

In the page source.

 “isAccessibleForFree”: “false”,

“hasPart”:

    {

                “@type”: “WebPageElement”,

                “isAccessibleForFree”: “False”,

                “cssSelector” : “.paywall”

“paywall” here is the name of the class.

This code is pretty much standard.

You can use it not only for paywall but for any content which you don’t want Google to index.

Due to a condition of accessing.


Conclusion

Filtered pages are often used in eCommerce sites or any site with filters.

It is often a lesser-known part of technical SEO set-up.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *