The robots.txt file is an important little file. A robots.txt file communicates with search engines. When a search engine crawl robot comes to your website, the first thing it does is look for this special little file on the server.
What is robots.txt?
It’s just a simple text file. But it has an important job – it tells the crawl robot which pages or folders of your website should be crawled and indexed, and which ones should be ignored.
The way search engines crawl websites nowadays has changed quite a bit so it was time for a robots.txt file update for our WordPress websites.
What does our new WordPress robots.txt file look like?
Honestly it looks very simple.
User-agent: * Disallow: /ga/ Sitemap: http://domain.nz/sitemap_index.xml
The first line names the user agent. This is the actual name of the search crawl bot we are trying to communicate with – for example it could be Googlebot or Bingbot. To instruct all search engine bots to craw your website, we simply use the *.
In the next line we have disallowed the bots to crawl our GA folder. This folder houses the file which sets a Google Analytics filter cookie which prevents your own visits to your website being added to your website statistics. We may also block specific directories depending on the project.
Last we provide the bots the link to website’s XML sitemap.
Note: We also add the sitemap link manually to our Google Webmaster Tools account for each website we launch. This allows us to monitor feedback from the search engines.
Googles recommendation for robots.txt
No need for further blocking of the wp-admin folder. Currently WordPress adds a so called “robots meta x-http” header on the admin pages that prevents search engines from showing these pages in the search results.
Absence of a robots.txt file will not stop search engines from crawling and indexing a website. However, it’s good practice that you create one. Just keep up to date with the latest recommendations from the search engines.