13 April 2015 | Web Development

WordPress robots.txt update

The robots.txt file is an important little file. A robots.txt file communicates with search engines. When a search engine crawl robot comes to your website, the first thing it does is look for this special little file on the server.

What is robots.txt?

It’s just a simple text file. But it has an important job – it tells the crawl robot which pages or folders of your website should be crawled and indexed, and which ones should be ignored.

The way search engines crawl websites nowadays has changed quite a bit so it was time for a robots.txt file update for our WordPress websites.

What does our new WordPress robots.txt file look like?

Honestly it looks very simple.

User-agent: *
Disallow: /ga/
Sitemap: http://domain.nz/sitemap_index.xml

The first line names the user agent. This is the actual name of the search crawl bot we are trying to communicate with – for example it could be Googlebot or Bingbot. To instruct all search engine bots to craw your website, we simply use the *.

In the next line we have disallowed the bots to crawl our GA folder. This folder houses the file which sets a Google Analytics filter cookie which prevents your own visits to your website being added to your website statistics. We may also block specific directories depending on the project.

Last we provide the bots the link to website’s XML sitemap.
Note: We also add the sitemap link manually to our Google Webmaster Tools account for each website we launch. This allows us to monitor feedback from the search engines.

Googles recommendation for robots.txt

The way Google crawls websites has changed quite a bit over the last few years. Google used to fetch website pages without any styling or JavaScript but now it fetches everything (including CSS styles, images and scripts).

Gone are the days when best practice was to block access to the wp-includes and plugins folders. The wp-includes folder has all the default JavaScript often used in WordPress themes. And it’s the same for the plugins folder; this has the specific plugin styles and scripts needed for the plugins output.

No need for further blocking of the wp-admin folder. Currently WordPress adds a so called “robots meta x-http” header on the admin pages that prevents search engines from showing these pages in the search results.

Final thoughts

Absence of a robots.txt file will not stop search engines from crawling and indexing a website. However, it’s good practice that you create one. Just keep up to date with the latest recommendations from the search engines.