Duplicate pages are very easy to create, not always easy to delete, and definitely impossible to overestimate their detrimental effect on your project. Why is duplicate content so bad, and where does it come from on the site, even if you didn’t do anything like that, and in general you wanted it to be the best? We will deal with this in detail in this article.
Duplicate content on a site is chronic in e-commerce. It seems that every platform, no matter how optimised for SEO, creates some form of repetitive material, preventing the site from achieving maximum performance.
Types of duplicates
Full duplicates are when the pages are completely identical. The same content is available at two URLs.
Partial duplicates are when pages are created for the same user need, are designed to solve the same problem, have common semantics and, as a result, compete with each other, which leads to keyword cannibalisation. For example:
-sorting parameters of the form site.com/phone/?price=min: the URL changes the display of products on the page based on sorting by price, which leads to partial duplication of content
-print versions – essentially a copy of a page without a design
-duplication of content blocks – for example, displaying an identical comment block on a group of pages
Why are duplicate pages bad?
There are five reasons why duplicate pages are harmful to your site.
It is important to understand that a search engine is a business. Like any business, a search engine does not want to waste resources, so a crawling budget is determined for each site – the number of resources that the search engine is ready to spend on crawling and indexing the site.
1.This leads to the first reason to avoid duplicates on the site: the search engine will spend a crawling budget on them instead of crawling really important pages, for example, landing pages.
2. Problems with crawling lead to problems with indexing – this is the second reason to get rid of duplicates. If a page that is important to your business is not crawled, it will not be indexed. And if your site is small and young, chances are that the crawl will have to wait a long time.
3. The third reason is the possible cannibalisation of keywords or a situation where different pages compete for the same search terms. Let’s draw the following analogy: you entered a new supermarket and on the far right you see a sign “bread”, and on the far left, quite unexpectedly, you see exactly the same sign. You have a logical question: where is the right product actually located, and why are you fooling around? In the case of duplicate pages, it will also be difficult for the search engine to figure out which one needs to rank. Therefore, do not fool the bits and bytes with the search bot.
It is worth noting that duplicate pages are not the only cause of cannibalisation. This problem can also occur due to duplicate titles or H1s, the use of the same keywords in the content, due to external links with the keyword in the anchor, leading to a non-target page.
4. The fourth reason is inbound links that you can get to duplicate pages to the detriment of the main pages. This can increase the effect of cannibalisation, but not necessarily.
5. The fifth reason is the Google Panda algorithm, which pessimizes the site, including because of duplicate content.
Causes of duplicates on the site
Content manager error
The most commonplace situation is when content was added to the site twice, that is, identical pages were created. Fortunately, these situations can be easily avoided.
If you have a site with mostly text content, you should maintain a content plan with which you can keep track of your publications. In any case, it is necessary to periodically revise the content and monitor the landing pages to avoid problems with cannibalisation and duplicates.
If the content has been added and indexed, you need to define the master page and leave it alone.
URL with parameters
More often, it is the parameters that cause duplicate content and waste of the crawling budget on pages of no value.
Parameters and duplicate pages can appear when:
-using filters for displaying content on the page.
-filtering goods
-using UTM tags
-using sorting options by price, from cheap to expensive, etc.
-incorrectly working pagination
-passing other technical information via URL parameters
Same type products with different options
Make sure to be practical when it comes to almost identical products. For example, T-shirts of different colours – use the same product card, and the option your customer need can be selected when ordering. Thus, the number of duplicates – product cards with the same product – is minimised, and the user always gets exactly the product he is looking for. This solution also saves the crawling budget and avoids cannibalisation.
Regional versions of sites
Sometimes for regional versions of an online store website, folders are used instead of subdomains. As a result, each such folder contains the same content. It is better to use subdomains in such situations. But you can also optimise your pages to avoid duplication.
If you still use folders for an online store, you must at least unique Title and H1, and also mix the display of products in such a way that it differs from other pages. Unfortunately, even in this case, there is no guarantee that search robots will crawl the pages correctly – you may have to carry out additional work on uniqueness.
For service sites, the problem is easier to solve. If you create pages for different cities, write unique local content for a specific location.
Product availability in different categories
Often, online stores add a product to several categories at once. This can cause duplication if the URL contains the full path to the product.
This problem can be solved by correcting the logic of the CMS so that the same URL is always used for products in different categories.
Technical problems
One of the most popular duplication problems is technical. The problem is especially common in self-written or unpopular CMS, but more eminent systems are also guilty of this. Therefore, an SEO specialist should always be on the alert and control the parameters that lead to duplication: whether the main mirror is configured, whether trailing slashes are processed, etc.
How to prevent duplicate page problems
During the site creation phase, you can use a robots.txt file to prevent unwanted URLs from being crawled. Just remember to be careful and always check that the rule is working with the Robots Testing Tool in Google Search Console. This way you can avoid a situation where the robots.txt file will prevent crawlers from crawling and indexing the pages you want.
If you find duplicate content on the site, do not rush to delete it. SEO specialists from Sydney advise you to perform a series of checks:
1. Determine which page ranks better in the search engine by key.
The quickest way to check which of the duplicate pages ranks better is to use the SE Ranking modules. In the Positions module, you have the option to specify a target URL for each key. If the URL that ranks for the keyword and the target URL does not match, the icon next to the keyword will be red.
2. Next, determine the number of external links to each of the duplicate pages. It is advisable to leave a page with more backlinks.
3. Determine the number of keywords for which the page was shown. To do this, you can use GSC again, but filter the data by page, and not by key, as before.
4. Determine how much traffic the pages are getting, what is the bounce rate and conversion rate. For this, you can use GA or SE Ranking. You can see the data in the Analytics and Traffic section.
5. Based on all the metrics, decide which page you want to leave.
After deleting the duplicate page, you need to set up a 301 redirect from it to the main page. After that, it is recommended that you crawl the site again to find the internal links to the remote page – they need to be replaced with the URL that you decided to keep.
Conclusion
It is quite clear that duplicate pages are a threat to your site. Don’t underestimate that. Having understood the principle of the problem and its possible sources, you can easily control the appearance of duplicates at all stages of the site’s life. It is important to spot the problem in time and quickly fix it, which will help you with regular site audits, setting target URLs for keywords and regular monitoring of positions.