๐Ÿ•ต๏ธTech SEO

Obviously, tech seo is something that helps you control how a search engine (Google) crawls and indexes your site. This includes the following issues:

  1. Crawl/index/serving pipeline

  2. Duplicate content & Canonicals

  3. Blocked resources

  4. Robots

  5. Sitemaps

  6. Multi-language websites: hreflang, etc

  7. Migrating a site: redirects

  8. Structured data

  9. User Experience: Core Web Vitals report / PageSpeed Insights

  10. Search appearance: article date / title links / search result snippets

Crawling process involves effective allocation of resources by Google as well as the ability to reach all pages on the site. Both issues are addressed if the SEO strategy starts from creating the efficient website architecture.

Website architecture

Historically, the best website architecture implied clear categories and pages structure or a Siloing as well as ability for the search engine to reach every page from by one or two links from the parent [page]. Obviously, it is not a trivial task to do.

It includes creating such a way to distribute the โ€œlink juiceโ€ that all the important pages receive an adequate number of internal links. Since, the number of internal links is a sign of page importance as per website. By contrast, the presence of so-called โ€œorphanedโ€ pages requires more resources to crawl a website and steals a PageRank (InRank) from other pages.

What we check on the site:

1. Using Screaming Frog SEO Spider or similar programmes, check the level of nesting of important pages. The most important pages should be located as close to the root of the site as possible.

2. Check the number of significant internal links to important pages and the variety of anchor text (using Screaming Frog SEO Spider or similar programmes).

Key notions: UX, significant inbound links, saving the crawling budget.

Related cases studies:

Crawling

Crawling problems are also cured by creating effective sitemap(s), HTML sitemap (where relevant) as well as removing directives that block resources or pages from being crawled.

It is not uncommon that pages are being "Crawled but not indexed by Google". Apart from usual procedure reasons it might be caused by the pagesโ€™ low quality (as perceived by Google) or overall siteโ€™s trust.

There has been experimentally established connection 1) between the number (and quality) of incoming links & the indexing pace as well as 2) between the number of internal links and the speed of indexing.

It's possible to think of low quality content as the main reason of a problem of Crawling without Indexing.

Low quality content consists of 2 groups. The first is pages that should never be indexed. These include a variety of service pages, pages of search, filters in various sections, not optimised for indexing, etc. These pages are standard for each CMS and / or site and are found by analysing the site structure, robots.txt and robots meta tags on these pages. Getting rid of this poor quality content is easy enough. To begin with, we write on such pages meta tag robots (noindex,nofollow) or (noindex,follow) depending on the situation.

After the pages fall out of the index (about a month), write a prohibition in robots.txt.

The second type of low-quality content is that which is rejected by Google itself. Itโ€™s especially important for YMYL-sites. It is about this content John Mueller, the official

Google spokesman responded to the question: "Does the presence of a low-quality section on a site affect the quality of the entire site?" To this, the Google spokesperson replied that it does.

Indexing

Overall, the following reasons can be shown for non-indexing, or for problematic indexing as per Google Help Center:

  1. Not indexed

  2. Server error (5xx)

  3. Redirect error

  4. URL blocked by robots.txt

  5. URL marked โ€˜noindexโ€™

  6. Soft 404

  7. Blocked due to unauthorised request (401)

  8. Not found (404)

  9. Blocked due to access forbidden (403)

  10. Blocked by page removal tool

  11. Crawled - currently not indexed

  12. Discovered - currently not indexed

  13. Alternate page with proper canonical tag

  14. Duplicate without user-selected canonical

  15. Duplicate, Google chose different canonical than user

  16. Page with redirect

Duplicate content

Duplicate content is when the same page is given by another URL path. So, it means that it has been written to the database several times by different parameters or as a result of mistakes in implementation of canonicals.

Parameters in URL

Example: Often, use or parameters (such as ) in URL is a reason of duplicate content. Compare www.mydomain.com/news/all and www.mydomain.com/news/all?page=1

Personal Anecdote:

I believe this type of duplicate content makes a little problem to SEO. It's easily dealt with using robots.txt file and Google can normally deal with it herself, unless it came to numbers.

Internal Duplicate Content due to website templates

Also, duplicate content term relates to cases where the pages are different, but the content served on the page is [almost] identical to the content of another page. Once reason for this is heavily templated website structure with a low number of unique content, sometimes called โ€˜thinโ€™ content pages.

To check the percentage of duplicate content you may, for example, use https://www.siteliner.com which gives 250 URLs check for free.

Wrong implementation of canonicals

Canonical tags are often used incorrectly on websites. By adding a rel="canonical" element to a page, you tell search engines which version of the page should appear in search results.

When using 'canonical' tags, it is important to make sure that the URL that you specify in the rel="canonical" element leads to an existing page. canonical links to non-existent pages make it harder to crawl the site and index content, which leads to lower crawling efficiency and waste of a crawling budget.

Most common problem: programmatical instruction to mirror the URL in rel="canonical" tag.

Semantic markup

According to Quality Rater Guidelines (QRG), when researching the quality of a website, all content should be divided into 3 components: Main Content (MC), Supplemental Content (SC) and adverts. Google algorithmically determines MC and SC when indexing a site. Yet, doing so requires a lot of resources, which is something Google dislikes. It is in order to guide Google as to the content type on a page, we apply semantic page design.

The HTML5 standard has provided new elements for structuring, grouping content and markup of textual content. The new semantic elements have improved the web page structure by adding meaning to the content they enclose.

The use of semantic elements is quite straightforward and simple. For example:

<main> tag - to display the main content and function of the page,
<nav> - for navigation part,
<footer> - for the footer of the site,
<article> - for article,
<section> tag - for each header tag,
<image>, <table>, <citation> - for images, tables and links in content,
<aside> tag - for additional content.

So, semantic mark-up consists of using 15 different tags that define the individual components in the page hierarchy. With this code structure you can explicitly tell Google the purpose and content of your individual page, separate MC from SC and adverts.

It has been empirically proven that the introduction of semantic markup has a positive impact on the budget of site crawling, which is quite important, especially for large projects. In some cases, the implementation of semantic layout influenced traffic. A similar study was conducted by Jason Barnard and got the following results:

In any case, in order for Google to better understand the purpose, function, content and usefulness of theof a web page - implement and use semantic HTML.

When working on restoring traffic and getting out from under quality updates - check the correctness of your semantic markup, if any, and the correctness of its use.

Hreflang tags and international targeting

On multi-language sites, you can often see problems with incorrect language versions. The hreflang attribute (rel="alternate" hreflang="x") helps the search engine understand which page should be shown to visitors based on their location. This attribute should be used if you have a multilingual site and you want users from other countries to easily find content in their own language.

You should make sure that all links in the hreflang attributes are absolute URLs that return a 200 status code, otherwise search engines will not be able to interpret them correctly and as a result the wrong language version of the page will be shown to the relevant audience. language version of the page.

Errors with language versions can be different, but the main thing is that they are extremely common on multi-language sites, which requires mandatory checking

Checklist for usage of hreflangs

Structured data

As with the use of semantic HTML, structured data can be used to demonstrate the functions of web pages to the Googlebot. In addition, structured data is mandatory for the so-called "advanced" results of the renditions.

The use of structured data on YMYL sites is a must, as it solves many problems: to describe the brand of the site, to indicate the type of content, to improve the visibility in the output, including thanks to the zero position in the output. I believe that the minimum required structured data types are:

  1. FAQ (FAQPage, Question, Answer) - See Google Documentation

  2. Breadcrumb (BreadcrumbList) - See Google Documentation thereto

  3. Article (Article, NewsArticle, BlogPosting) - Documentation

  4. Local business (LocalBusiness) - Documentation

  5. Product (Product, Review, Offer) - Documentation

Naturally, the sets of structured data types depend on the type of site, and other data sets can be used.

Using micro markup, you can stand out well in the organic output, I especially recommend at the moment to apply FAQ both for information pages and for product cards. This micro markup allows you to increase the CTR in the output, which will affect the growth of the position in organic and increase traffic. positions in organics and increase traffic.

Here is an example of using FAQ micro markup:

Check the correct and complete use of structured data: https://search.google.com/structured-data/testing-tool/u/0/

Last updated