Google release the best document for SEO practices here: https://developers.google.com/search/docs. There are 2 sections:
If you're just interested in performances, please refer to the Web Vitals document.
- Concepts
- Keywords
- Crawl budget or how to use
robots.txt
sitemap.xml
andnoindex
- What is the crawl budget
- Factors that improve the crawl budget
- Understanding how the crawl budget is spent
- Non-marketing pages
- Dealing with duplicate content
robots.txt
or how to block pages with no marketing value to waste your crawl budget- sitemap.xml
X-Robots-Tag
withnoindex
robots.txt
vsnoindex
or the difference between crawling and indexing- Website to-do list
- Progressive Web Apps aka PWA & SEO
- Videos
- Google Search Console
- UX and SEO
- Tools
- Tips and tricks
- How to
- Annex
- References
Stands for Search Engine Result Page
.
Once you've found the list of keywords you wish a web page to rank for (ideally only a few because each technique only support one keywors at a time, therefore, too many different keywords will dilute your results), place them in the following HTML tags (sorted by order of importance):
title
tag in the HTML head.meta description
tag in the HTML head.canonical link
tag in the HTML head (e.g.,<link rel="canonical" href="https://example.com"/>
). The Google bot hates duplicate content. This tag tell the GBot which page is the one and only page that should receive SEO love from it. Use it even if you think you don't have any duplicate page, because in reality, you do. Indeed, as far as the GBot is concerned, https://example.com and https://example.com?refer=facebook are duplicated page.h1
(1) tag in the HTML body.h2
(1) tag in the HTML body.h3
(1) tag in the HTML body.image alts
tag in the HTML body. There are so many missingalt
attributes in web page. This is a shame as it is a missed opportunity to rank.anchor text
. Make sure that the text you use in youra
tag describes the link as clearly as possible. If that links points to an external website, that website's domain authority will benefit from your good description. The same applies to an internal link. The Google bot loves organized content.
(1) A typical mistake is to use a H2 with no H1 because H1 looks too big. The issue is that H1 tags worth more than H2s when it comes to SEO. If the content of your header contains keywords you wish to rank for, try to use H1. Use CSS to change its style so it matches your design.
To figure out what are the keywords for which a specific domain ranks, use the Organic Keywords Trend
chart. To see that chart:
- Login to Semrush.
- Select
Domain Analytics/Overview
, enter the domain in the top input and click theSearch
button. - Select the
Organic Research
in the left pane to see theOrganic Keywords Trend
chart.
The top horizontal bar can be read as follow:
Keywords
: Current number of keywords that rank within the first 100 Google pages.Traffic
: Current number of users that those keywords have redirected to your website this month.Traffic cost
: How much would it cost to rank the way your keywords do.
The Organic Keywords Trend can be read as follow:
- The legend show the colors that represent the keywords categories based on how they rank in the SERP (e.g.,
Top 3
are keywords that makes your domain rank in the top 3 Google pages). - Each vertical bar is a snapshot of the keywords ranking. For example, in the image above, hovering on the March 20 bar shows that 33 keywords in total ranked your webiste inside the first 100 SERPs. Amongst those 33 keywords:
- 0 ranked in the first top 3 SERPs.
- 5 ranked between the 4th and 10th SERP.
- 4 ranked between the 11th and 20th SERP.
- 11 ranked between the 21st and 50th SERP.
- 13 ranked between the 51st and 100th SERP.
When you click on that bar, you can see the details of those keywords.
This is achieved by following the same steps as the Finding the keywords ranking for a domain section. However, you'll need to have the Semrush Guru tier at minimum (almost USD200/month). In the Organic Keywords Trend
chart, click on any bar in the chart to see the keywords ranking details for that point in time.
Tl;dr Those three techniques aim to optimize your crawl budget:
- Use a
robots.txt
to prevent non-marketing pages to consume your crawl budget. - Use one or many
sitemap.xml
to make sure that marketing pages are crawled to make the best out of your crawl budget. - Ise the
noindex
value in the HTML head to prevent certain pages that can't be listed in therobots.txt
to be crawled. This technique is used to:- Prevent duplicate content.
- Deal with faceted navigation
- Soft 404 error page (i.e., pages that return a 200 status code saying that the page is not found), instead of a explicit 404 status HTML page.
- Infinite space (e.g., calendar page where the URL contains the date)
WARNING: Optimizing crawl budget only worth it if your website contains a few thousands web pages. Otherwise, it is a waste of time. That being said, nurturing good SEO habits doesn't hurt and will make it easier to grow.
Crawling your website is not effortless. This means that search engine companies don't allocate an infinite amount of resources to crawl your precious website. Instead, they allocate it a specific budget called the crawling budget. This budget is usually denominated in number of pages that the search engine will crawl. This budget depends on many factors that are left to the discretion of each search engine company, though some factors have become public. Without knowing exactly what your budget is, you should do your best to configure your website to prioritize the pages you want to be index and de-prioritize the pages that you do not wish to consume any amount of your precious crawl budget. The pages you should block from consuming your crawl budget are:
- Duplicate content.
- Page that are important to users but present no marketing value (e.g., admin panel, settings page).
- Soft 404 pages (i.e., pages that return a 200 status code saying that the page is not found), instead of a explicit 404 status HTML page.
- Fast web pages even under pressure: If the GoogleBot notices that your pages load very quickly even with a lot of traffic, it can decide that increase the number of pages it schedules to crawl.
- No errors
- JS and CSS files: Every resource that Googlebot needs to fetch to render your page counts toward your crawl budget. To mitigate this, ensure these resources can be cached by Google. Avoid using cache-busting URLs (those that change frequently).
- Avoid long redirect chains. A redirect count as an additional page to crawl in your budget.
For a detailed explanation of the Google Search Console's
Coverage
report, please refer to https://support.google.com/webmasters/answer/7440203?hl=en.
- Use the
Coverage
section of the Google Search Console. - Review the URLs in the
Valid
category to confirm they are listed as expected. Unexpected pages are:- Duplicated content (often due to faceted URLs).
- Soft 404s.
- Non-marketing pages.
- Thank you page. Those pages could rank for long-funnel keywords.
- User settings.
- Determine whether duplicate pages have already been indexed:
- Login to the Google Search Console.
- Select the correct property.
- Click on the
Coverage
section in the menu. - Review all
Valid
URLs and look for duplicate URLs.
- For all duplicate URLs:
- Do not block them in the
robots.txt
yet. Otherwise, this won't give Google a change to deindex them first. - Make sure they have a canonical URL set the head.
- Add the
noindex
. - Wait until the effect of the previous steps shows the duplicate page in the
Excluded
URLs category (this could take a couple of days). - Block that page in the
robots.txt
. - Optionally, if that page was useless and could be deleted in favor of the canonical version, then do it. Then make sure to create a 301 from that duplicate link to the canonical.
- Do not block them in the
<link rel="canonical" href="https://example.com/dresses/green-dresses" />
- A canonical URL impacts both indexing and crawlability:
- Indexing: When all duplicate pages use the same canonical URL, only the canonical URL is indexed.
- Crawlability: Once the page has been crawled and indexed once, Google will now which page is a duplicate. This means that subsequent crawls will only crawl the canonical URL and skip the duplicated content. This will allow to avoid wasting the crawl budget. Also, by making sure that the only the canonical URLs are added to the sitemap.xml, we can implicitely improve the crawl budget (as opposed to listing the duplicated links in the sitemap).
- Both
rel="canonical"
andcontent="noindex"
will prevent the page to be indexed by Google. - Do not mix canonical URL with
noindex
. This confuses the GoogleBot. If it sees both, it will choose to follow the canonical URL signal (ref: Google: Don’t Mix Noindex & Rel=Canonical). - Canonical URL has the same effect as a 301 permanent redirect. In fact, canonical URL was originally made for situation where a 301-redirect was not possible.
- Use only the URLs that are canonical for your sitemap.
Shopify Robots.txt Guide: How To Create & Edit The Robots.txt.liquid
- To create a robots.txt online, please refer to https://www.seoptimer.com/robots-txt-generator
- To test a robots.txt file, use the Google robots.txt tester tool.
- The
X-Robot-Tag
withnoindex
will not block crawling. It just prevents the GoogleBot from indexing the page. Pages withnoindex
will still consume crawl budget.
User-agent: Googlebot
Disallow: /nogooglebot/
User-agent: *
Allow: /
Sitemap: http://www.example.com/sitemap.xml
Where:
- The user agent named Googlebot is not allowed to crawl any URL that starts with http://example.com/nogooglebot/.
- All other user agents are allowed to crawl the entire site. This could have been omitted and the result would be the same; the default behavior is that user agents are allowed to crawl the entire site.
- The site's sitemap file is located at http://www.example.com/sitemap.xml.
Manually submit it to the Google Serch Console.
To create a robots.txt online, please refer to https://www.xml-sitemaps.com/
Pages that are not listed in the sitemap.xml as well as not listed in the robots.txt will still be eventually crawled, but they won't receive as much attention from Google.
There are two ways to make Google aware of your sitemap.xml:
- Include it in the robots.txt. To see an example, please refer to the robots.txt section.
- Manually submit it to the Google Serch Console.
#1 is considered a best practice.
<meta name="robots" content="noindex" />
To double-check that the list below is correctly implemented, refer to successfull website that tick all the SEO boxes:
At a minimum, the page's head
tag must contain:
<title>Example Title</title> <!-- Keep it between 50 and 60 characters. Use your targetted keywords as well as long-tail keywords. -->
<link rel="canonical" href="https://your-website.com/"> <!-- Don't forget the trailing slash -->
<link rel="alternate" href="https://your-website.com/" hreflang="en"> <!-- Don't forget the trailing slash -->
<!-- SEO, Meta and Opengraph -->
<meta name="title" content="Example Title">
<meta name="description" content="This is meta description Sample."> <!-- Keep it between 50 and 160 characters. -->
<meta name="robots" content="index,follow"> <!-- Very important, otherwise, Google might not be able to index your page -->
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<meta itemprop="description" content="Clear page description">
<meta itemprop="name" content="Page title">
<meta name="description" content="Clear page description">
<meta name="keywords" content="SEO keywords">
<meta name="og:keywords" content="SEO keywords">
<meta property="og:description" content="Clear page description.">
<meta property="og:image" content="Path to image">
<meta property="og:title" content="Page title">
<meta property="og:type" content="website">
<meta property="og:url" content="https://your-website.com/"> <!-- Don't forget the trailing slash -->
<meta name="twitter:card" content="summary">
<meta name="twitter:description" content="Clear page description">
<meta name="twitter:image" content="Path to image">
<meta name="twitter:title" content="Page title">
-
USE ABSOLUTE PATH IN ALL LINKS AND MAKE THE ROOT DOMAIN REDIRECT TO WWW. USE WWW. IN ALL YOUR ABSOLUTE PATH. The reason behind this is to create unique content. Otherwise, if your content is accessible from:
Then it is technically duplicated 4 times, which confuses the Google bot and dilute your SEO efforts.
-
Use keywords in your URL paths. For example, if you're a shoe manufacturer, you may want to use a path similar to
/shoe-manufacturing
rather than/shoe
. -
Use hyphens in your pathname (no underscore). Google treats a hyphen as a word separator, but does not treat an underscore that way.
-
Use an absolute path in the
canonical
URL on all pages and make sure there is a trailing slash. Also, make sure that the the search params are included if they matter. -
Add a trailing
/
on all internal links and make sure that all web page are using/
, otherwise, Google may think the 2 version are duplicated content.
- Use JSONLD on all pages (please refer to Annex in the JSONLD examples section).
- Add the
lang
attribute on thehtml
tag:<html lang="en">
- If you know the language of a link, try to add
hreflang
on it. - Explicitely set up the
hreflang
, even if you only use a single language. Use an absolute path for that URL.
- Use
alt
attribute on all images. Favor proper description, rather than SEO keywords. Use 50-55 characters (up to 16 words) in the alt text.
- Find a way to organize your text content so that important keywords are in
H1
tags and less important keywords are inh2
.
robots.txt
does not have authority than the robots
meta tag, and vice versa, but noindex
in any of those two places will stop the bot to index.
There are 3 types of sitemap files:
WARNING: The info in the sitemap.xml must be the same as in the actual page, otherwise, it will confuse the crawler bot, which might result in worst results than no sitemap at all (e.g., outdated hreflang or canonical URL).
- Use your page is duplicated in another language, add an
alternate
link in the header:
<link rel="alternate" href="https://your-website.com/" hreflang="en"> <!-- Don't forget the trailing slash -->
As of 2019, PWA are all the rage and Google has made a lot of progress to index them properly. To test how Google sees your PWA, please refer to the How to test how your page is seen by Google? section.
That being said, there are a series of caveats to avoid in order to not be penalized by Google:
- Avoid any URL with a
#
. Anything following the#
is ignore by the Googlebot. - Reduce the number of embedded resources in the page (especially the number of JavaScript files required to render the page), since these might not be fully loaded.
- Make sure required resources aren’t blocked by robots.txt.
- Use an accurate sitemap file to signal any changes to your website when using Accelerated Mobile Pages (AMP).
The first two points are the most important as the last two points are good practices for any websites in general.
https://developers.google.com/search/docs/data-types/video#clip
This online tool from Google allows to gain insights on how your website is being crawled by the GoogleBot. It can also submit pages for crawling.
- Submit new sitemap.xml or explicit new URLs.
- Get alerted on issues.
- Understand how Google sees your pages.
- Test:
- Mobile usability
- Rich results ()
This is mainly detailed under the Performance
section.
Queries
: Details which keywords drive the most traffic.Pages
: Shows which pages receive the most traffic.
How to use this section:
- Improve conversion: Use the
Pages
sectino to identify the pages that receive a lot of traffic but do not convert in terms of click. - Optimize your website for the best keywords. Use the
Queries
to understand which keywords is driving the most traffic and create dedicated pages just for those keyrods. - Compare your page performance from one period to another:
- Select filter at the top (e.g., Query with keywords)
- Click on the
Date
filter at the top and selectCompare
rather thanFilter
- You may see an increase of traffic due to:
- Seasonality
- Better content optimization for specific keywords.
- Improvement is Web Vitals and fixed issues.
- You may see a decrease of traffic due to:
- Seasonality
- Page errors (jump to the Anlysing a specific URL section to diagnoze issues)
- Content is less popular
- You've canibalized that page with a new optimized landing page
Simply paste the URL in the search bar at the top.
- Link your property in Google Analytics with the one in Google Search Console:
- Open your property in Google Analytics
- Click the
Admin
- Under the
Property
section, under thePRODUCT LINKING
, click on theAll Products
- Link Google Search Console to feed new valuable data into your Google Analytics.
- List of all Google SEO tools: https://support.google.com/webmasters/topic/9456557
- JS minification techniques: Optimizing JavaScript bundle size
- Article: Small Bundles, Fast Pages: What To Do With Too Much JavaScript
Topic | Description | Link |
---|---|---|
robots.txt |
Create a robots.txt online | https://www.seoptimer.com/robots-txt-generator |
robots.text |
Test the validity of a robots.txt | https://www.google.com/webmasters/tools/robots-testing-tool |
robots.text |
Test URLs against an inline robots.txt | https://technicalseo.com/tools/robots-txt/ |
sitemap.xml |
Create a sitemap.xml online | https://www.xml-sitemaps.com/ |
sitemap.xml |
Validate a sitemap.xml online | https://www.xml-sitemaps.com/validate-xml-sitemap.html |
- Sudden spike in valid URLs in the
Coverage
section. This is usually due to misconfigured faceted pages.
- Go to the Google search console.
- Select the URL inspection.
- Click on the View crawled page.
This renders all the HTML, but unfortunately, it won't render a full image of that HTML, just the beginning. To see the full render, you have not choice but to copy paste the HTML in a local file and render it yourself 😫.
Please refer to the Finding the historical keywords ranking for a domain section.
- Login to the Google Search Console (https://search.google.com/search-console).
- Choose one of the two options:
- Upload a new sitmaps.xml with new
lastmod
date for the URLs you wish to refresh. That's the fastest way to perform a batch re-crawl. - Paste a URL in the
Inspect
search bar at the top, then click in theREQUEST INDEXING
button.
- Upload a new sitmaps.xml with new
<script type="application/ld+json">
{
"@context" : "http://schema.org",
"@type" : "Organization",
"legalName" : "Australian Barnardos Recruitment Services",
"alternateName" : "ABRS",
"url" : "https://www.abrs.net.au/",
"contactPoint" : [{
"@type" : "ContactPoint",
"telephone" : "(02) 9218 2334",
"Email" : "[email protected]",
"contactType" : "Sydney Office"
}],
"logo" : "https://www.abrs.net.au/images/abrs-logo-hd.png",
"sameAs" : "https://www.linkedin.com/company/abrs---australian-barnardos-recruitment-service/"
}
</script>
<script type="application/ld+json">
{
"@context":"http://schema.org",
"@type":"ItemList",
"itemListElement":[
{
"@type":"SiteNavigationElement",
"position":1,
"name": "Home",
"description": "{{ page.homeSiteNavDescription }}",
"url":"https://www.abrs.net.au/"
},
{
"@type":"SiteNavigationElement",
"position":2,
"name": "About Us",
"description": "{{ page.aboutUsSiteNavDescription }}",
"url":"https://www.abrs.net.au/about-us/"
},
{
"@type":"SiteNavigationElement",
"position":3,
"name": "Job Types",
"description": "{{ page.aboutUsSiteNavDescription }}",
"url":"https://www.abrs.net.au/job-types/"
},
{
"@type":"SiteNavigationElement",
"position":4,
"name": "Industry Sectors",
"description": "{{ page.aboutUsSiteNavDescription }}",
"url":"https://www.abrs.net.au/industry-sectors/"
},
{
"@type":"SiteNavigationElement",
"position":5,
"name": "Clients",
"description": "{{ page.clientsSiteNavDescription }}",
"url":"https://www.abrs.net.au/clients/"
},
{
"@type":"SiteNavigationElement",
"position":6,
"name": "Jobs",
"description": "{{ page.candidatesSiteNavDescription }}",
"url":"https://www.abrs.net.au/jobs/"
},
{
"@type":"SiteNavigationElement",
"position":7,
"name": "Blog",
"description": "{{ page.aboutUsSiteNavDescription }}",
"url":"https://www.abrs.net.au/blog/"
},
{
"@type":"SiteNavigationElement",
"position":8,
"name": "Contact",
"description": "{{ page.contactSiteNavDescription }}",
"url":"https://www.abrs.net.au/contact/"
}]
}
</script>
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [{
"@type": "ListItem",
"position": 1,
"name": "Home",
"item": "https://www.abrs.net.au/"
}, {
"@type": "ListItem",
"position": 2,
"name": "About us",
"item": "https://www.abrs.net.au/about-us"
},{
"@type": "ListItem",
"position": 3,
"name": "Our values",
"item": "https://www.abrs.net.au/about-us/values"
}]
}
</script>
- How to Get on the First Page of Google
- A Simple (But Effective) 31-Point SEO Checklist
- How to Improve SEO: 8 Tactics That Don’t Require New Content
- SEO For Beginners: A Basic Search Engine Optimization Tutorial for Higher Google Rankings
- How To Do Keyword Research for SEO — Ahrefs’ Guide
- Keyword Difficulty: How to Determine Your Chances of Ranking in Google
- How many keywords can you rank for with one page? (Ahrefs’ study of 3M searches)