All search engines, including Google, have issues with duplicate content. When the same text is displayed in multiple places on the Internet, a search engine cannot determine which friendly URL should be displayed in theSearch Engine Results Pages (SERPs). This can negatively affect a webpage's ranking. The problem is only compounded when altered versions of content are linked. In this article, we will help you understand some of the reasons for duplicate content and how to solve the problem.
What is the definition of duplicate content?
If you're at a crossroads and multiple road signs point in different directions to the same destination, you won't know where to go. If, in addition, the final destinations are slightly different, the problem is even greater. As a web user, you won't mind too much because you'll find the content you need, butweb search engineYou need to choose which page should show up in your results because you don't want to show the same content more than once.
Suppose an article about "keyword a" appears on http://www.website.com/keyword-a/, but the same content also appears on http://www. website.com/category/keyword-a/. This scenario really happens a lot in a CMS. If this article is re-distributed by multiple bloggers but some of them are linking to URL 1 while the rest are linking to URL 2, then the search engine problem now becomes your problem as each link now promotes a different URL. As a result of this division, you are less likely to be able to rank for 'keyword a' and it would be much better if all the links pointed to the same URL.
How to use duplicate content finder?
Google and other search engines determine unique content as the main ranking factor. Using the website duplicate content checker to identify internal duplicates of an entire website is very easy, fast and free! No credit card needed.
Step 1 – Enter your URL and start your free trial
Just add your domain and start the test by clicking on the button. You can use Google or Facebook to sign up for a free trial with no problem.
Step 2: Get the result
We will crawl your site and find duplicate content issues: duplicate pages without canonical, title, h1 and description duplicates. All these types of issues you can see in the on-site audit of the "Duplicate Content" category. In addition, we offer you a small addition that will help you to use the functionality of our service in the most effective way.
Click "Check Passed" for the "Duplicate Content" category. You'll find the list of issues that our crawler checked on your site and didn't find.
Duplicate Checker Features
In addition, you will receive a complete website audit report after domain verification that will help you identify various types of issues on your website. The report will also include instructions on how to fix the identified issues.
Most importantly, you can filter Issues by types such as Critical, Warnings, Opportunities, Warnings or Zero Issues. Or you can categorize them by links, indexability, content relevance, etc. This allows you to quickly reach and resolve the most urgent site success.
How did Google penalize sites for duplicate content?
When duplicate content is found on the site, there is a high chance that Google will apply penalties. What can happen? In most cases, website owners may experience loss of traffic. This happens because Google stops indexing your page where plagiarized text is detected. When it comes to prioritizing which page has the most value for the user, Google has the right to choose which page on the site is most likely to be on the SERP. Therefore, it makes some sites no longer visible to users. In difficult cases, Google may impose a duplicate content penalty. In this way, you will getDMCA Noticewhich means you are suspected of search result manipulation and copyright infringement.
There are countless reasons why you need unique content on your website. But duplicates do exist, and the reasons are mostly technical. Humans rarely store the same content in more than one place without making sure it's clear which one is the original. Technical reasons mainly occur because developers don't think like browsers or even users, much lesssearch robots. In the example mentioned above, a developer will see the article as if it only exists once.
misunderstood url
Developers aren't crazy, but they see things from a different perspective. A CMS that powers a website will only have one article in the database, but the website software will allow the same article to be retrieved via more than one URL. From the developer's point of view, the article's unique identifier is not the URL, but the article's database ID. However, a search engine sees a URL as a unique identifier for any text. If this is explained to the developers, they will understand the problem. This article will also provide solutions to this issue.
session id
E-commerce sites track visitors and allow them to add the products they want to a shopping cart. This is done by giving each user a "session". This is a brief history of the visitor's actions on the site and can include things like items in a shopping cart. To preserve a session when a visitor moves between pages, session IDs must be stored somewhere. This is most commonly done with cookies. However, search engines do not store cookies.
Some systems add a session ID to the URL, resulting inHTML internal linkson the site by obtaining a session ID appended to the URL. Since session IDs are unique within a session, new URLs are created and this results in duplicate content.
Parameters passed via URL
Duplicate content is also created when URL parameters are used, for example in link tracking, but the page content does not change. Search engines look at http://www.website.com/keyword-a/ and http://www. website.com/keyword-a/?source=facebook as different URLs. While the latter helps you track where users are coming from, however, it can make it difficult for your page to rank high, and that's not something you want!
The same applies to any other type of parameter added to URLs where the content does not change. Other examples of parameters would be changing the sort order or showing a different sidebar.
Content distribution and scraping
Duplicate content is mostly caused by something on your site or Google. It turns out that other sites pull content from your site without linking to the original article. In these cases, search engines don't know and treat it as if it were simply a new version of the article. With more popular sites, more scrapers use their content, only adding to the problem.
Order of Parameters
CMSs generally don't use direct URLs, but URLs that look like /?id=4&cat=6, where id is the article number and cat is the category number. The URL /?cat=6&id=4 will show the same result on most websites, but it's not the same for search engines. find out easilyWhat is this site built on?with Sitechecker.
comment pagination
In WordPress and other systems, it is possible to paginate comments. This result is that the content is duplicated in the article URL and in the article URL & /comment-page-x etc.
Pages designed to be printed
If printable pages are created and linked to article pages, search engines will generally select them unless specifically blocked. Google then has to decide which version to show: the one that only shows the article, or the one with peripheral content and ads.
With or without WWW
Although this has been around for a long time, search engines still make mistakes. If both versions of the site are accessible, this creates duplicate content issues. A similar problem that occurs, though not as often, ishttps frente a httpURLs that contain the same texts. Therefore, when planning your SEO strategy, you should always keep this in mind.
Canonical URLs: A Possible Solution
Although multiple URLs can point to the same piece of text, this problem is easy to resolve. To do this, someone in the organization must determine without a doubt what the "correct" URL for content should be. Search engines know the "correct" URL for content likeCanonical URL.
How to check duplicate content on website?
If you're not sure if you're having issues with duplicate content on your site, there are several ways to find out. The easiest way to use our tool.
Find webpages with duplicate content
Audit your site to find out which pages have duplicate content, replace it and get more traffic
Please be aware of any content changes to your site as this may harmon page optimizationprocess.
Google Search Console
Pages with duplicate titles or descriptions are not good. Clicking on them in the tool will display the relevant URLs and help you identify the issue. If, for example, you wrote an article about the keyword a, but it appears in more than one category, your titles might be different. It could be 'Keyword A - Category Y - Website' and 'Keyword A - Category Z - Website'. Google won't see them as duplicate titles, but you can spot them by doing a search.
Search for fragments or titles
You can use some useful search operators to help you in these cases. If you need to identify all website URLs with the One article keyword, use the following string on Google:
site:website.com intitle:”Palabra clave A”
Google will show all pages on website.com that have the keyword A in the title. If you're very specific with the title, it's easy to spot duplicates. The same method can be used to find plagiarized content on the Internet. If the full title of the article reads "Keyword A is great", you can search as follows:
intitle:"Keyword A is great"
For this query, Google will return all pages that match the title. It's also worth looking for a few full sentences of an article, as scrapers can make the title look different. Google sometimes displays a warning below results that similar results have been ignored. This shows that Google is "deduplicating" the results, but since that's still not good, click the link and see the full results to see if any of them can be fixed.
But there's always the quickest way to tell if someone duplicates your content. You can use the duplicate content checker and get quick answers to the most troubling questions. These tools can help you check your site pages for duplicate content and provide the corresponding score. Use it to find internal and external sources that duplicate your site's content. Since search engines prefer texts that are unique and valuable to users, it is important for SEOs to avoid stealing entire articles or parts of web pages. Duplicate checker finds duplicate text on other pages. In most cases, it works as an SEO plagiarism checker and compares your page content with all websites matching individual phrases and words. They can do all the functions described above, but faster.
How to resolve duplicate content issues?
Once you know which URL should be used as the canonical URL for specific content, start canonicalizing your site. This means that search engines know what the canonical version of a page is and allow them to find it as quickly as possible. There are several methods to solve the problem:
- Don't create duplicate content.
- Use a canonical URL for similar text.
- Add canonical links to all duplicate pages.
- Add HTML links from all duplicate pages to the canonical page.
Don't create duplicate content
Several causes of duplicate content mentioned above can be easily fixed:
- Disabled session IDs in a URL in system settings.
- Easy-to-print pages are unnecessary and print style sheets should be used.
- Comment pagination options should be disabled.
- Parameters must always be ordered in the same sequence.
- To avoid problems with link tracking, use hashtag-based tracking, not parameter-based tracking.
- Use WWW or not, but stick with one and redirect the other to it.
If the problem isn't easy to fix, it might be worth doing anyway. However, the ultimate goal should be to avoid duplicate content altogether.
Redirect pages similar to a canonical URL
It might be impossible to completely stop your system from generating a bad URL, but you can still redirect them. If you are able to fix some duplicate content issues, make sure the above duplicate content URLs redirect to the correct canonical URLs.
Add a canonical link to all duplicate pages
Sometimes it's impossible to remove duplicate versions of articles even if you use the wrong URL. Search engines introduced the canonical link element to solve this problem. The element is placed in the section of a website like this:
<enlace rel="canonical" href="http://sitioweb.com/articulo_correcto/"/>
Put the canonical URL of the article in the href section. Search engines that support the canonical element will perform soft 301 redirects, relocating most of the page's link value to the canonical page.
if possible normal301 redirectionit's even better because it's faster.
Add an HTML link of all duplicate pages to the canonical page
If none of the solutions mentioned above are feasible, you can add links to the original article below or above the duplicate article. You can also implement this in the RSS feed by inserting a link to your original article. While some scrapers may filter the link, others may leave it as is. If Google finds multiple links pointing to the original article, it will assume this is the canonical version.
The duplicate issue can cause serious problems. Depending on the structure of yourpagination pages, it is very likely that some pages may contain similar or identical content. Also, you will often find that you have the same title and meta description tags on your site. In this case, duplicate content can make it difficult for search engines to determine the most relevant pages for a given search query.
You can remove index pagination using “noindex tag. In most cases, this method takes priority and is implemented as soon as possible. Its essence is to exclude all pagination pages from the index, except for the first one.
It is implemented as follows: such a meta tag
<meta name="robots" content="noindex,follow" />
added with a <head> section on all pages except the first. Thus, we exclude from the index all pagination pages, except the main page of the catalog and at the same time guarantee the indexing of all pages belonging to this directory.
Also,neil patelprovides an insightful SEO tip on duplicate content strategy.