What is a robots.txt file? How to create a perfect robots.txt file?

What is a robots.txt file? How to create a perfect robots.txt file?

A robots.txt file is a small text file that resides in your site’s root folder. This file tells search engine Bots which part of the site to crawl and index and which part not to.

If you make any mistake while editing/customizing it, the search engine Bots will stop crawling and indexing your site and your site will not be visible in the search results.

In this article, I will tell you what a Robots.txt file is and how to create a Perfect Robots.txt file for SEO.

Why a website require robots.txt?

When search engine Bots come to websites and blogs, they follow the robots.txt file and crawl the content. But if your site has not a Robots.txt file, so the search engine Bots will start indexing and crawling all the content of your website with the content you do not want to index too.

Search engine Bots search the robots.txt file before indexing any website. When they do not get any Instructions by Robots.txt file, then they start indexing all the contents of the website. And if the bot gets any Instructions, then it follows that and indexes the website.

Hence Robots.txt file is required for these reasons. If we do not give instructions to the search engine Bots through this file, then they index our entire site.

Advantages of robots.txt file

  • Robotds.txt file tells search engine Bots to crawl which part of the site and index and which part not to crawl.
  • A particular file, folder, image, pdf, etc. can be prevented from being indexed in the search engine with robots.txt file.
  • Sometimes search engine crawlers crawl your site like a hungry lion, which affects your site performance. But you can get rid of this problem by adding crawl-delay to your robot’s file. However, Googlebot does not obey this command. But you can set the Crawl rate in Google Search Console. This protects your server from being overloaded.
  • You can make the whole section of any website private.
  • You can Internal search results page from appearing in SERPs.
  • You can improve your website SEO by blocking low-quality pages.

Where does the Robots.txt file remain on the website?

If you are a WordPress user, the robots.txt file resides in your site’s root folder. If this file is not found in this location, the search engine bot starts indexing your entire website. Because the search engines do not search your entire website for the bot Robots.txt file.

If you do not know if your site has a robots.txt file or not? So go to the search engine address bar all you have to do is type it – example.com/robots.txt

A text page will open in front of you as you can see in the screenshot.

Robots.txt

This is the robots.txt file of Us. If you do not see any such text page, then you have to create a robots.txt file for your site.

Apart from this, you can check it by going to Google Search Console tools.

Basic Format of Robots.txt file

The basic format of the robots.txt file is very simple and looks like this,

User-agent: [user-agent name]
Disallow: [URL or page you don’t want to crawl]

These two commands are considered a complete robots.txt file. However, a robots.txt file can contain multiple commands of user agents and directives (disallows, allows, crawl-delays, etc.).

  • User-agent: Search Engines are Crawlers / Bots. If you want to give the same instruction to all search engine bots, use the * sign after User-agent: Like – User-agent: *
  • Disallow: This prevents files and directories from being indexed.
  • Allow: This parameter allows search engine bots to crawl and index your content.
  • Crawl-delay: How many seconds the bots have to wait before loading and crawling the page content.

 

Preventing all Web Crawlers from indexing websites

User-agent: *
Disallow: /

Using this command in the robots.txt file can prevent all web crawlers/bots from crawling the website.

All Web Crawlers Allowed to Index All Content

User-agent: *
Disallow:

This command in the robots.txt file allows all search engine bots to crawl all the pages of your site.

Blocking a Specific Folder for Specific Web Crawlers

User-agent: Googlebot
Disallow: /webpage-subfolder/

This command only prevents the Google crawler from crawling example-subfolder. But if you want to block all Crawlers, then your Robots.txt file will be like this.

User-agent: *
Disallow: /webpage-subfolder/

Preventing a Specific Page (Private Page) from being indexed

User-agent: *
Disallow: /page URL (Private Page)

This will prevent all crawlers from crawling your page URL. But if you want to block Specific Crawlers, then you have to write it like this.

User-agent: Google
Disallow: /page URL

This command will only prevent Googlebot from crawling your page URL.

Add a Sitemap to a robots.txt file

Sitemap: https://www.Yourwebsite.com/sitemap.xml

You can add your sitemap to robots.txt anywhere – top or bottom.

You can comment on any type of question or suggestion related to this article. If this article is kind of helpful for you, then do not forget to share it!

Print Friendly, PDF & Email

Leave a Reply

CommentLuv badge