Part 2: The Importance Of A Robots.txt File
Click here to read Part 1 of this post before reading Part 2.
Allow everything apart from certain web pages
Some web pages on your website might not be appropriate to show in search engine results pages, and you may block individual pages as well utilizing the robots.txt file. Web pages that you might want to block might be your terms and conditions page, a page that you would like to remove rapidly for specific reasons or page with very sensitive info on which you do not want to be searchable:
Disallow: /blog/how-to blow-up-the-moon/
Disallow: /secret-list-of contacts.php
Allow everything apart from certain patterns of URLs
Finally, you can have a clumsy pattern of URLs that you might want to disallow, ones which can be nicely grouped in a certain subdirectory. Samples of URL patterns you might want to block could be internal search results pages, leftover test pages from development or 2, 3, 4, etc. pages of an online news category page:
Putting all of it together
Overall, you might want to use a mixture of those methods to block off different regions of your website. The key things to remember are if you disallow a subdirectory, then ANY file, subdirectory or webpage within that URL model, will be disallowed. The star symbol (*) substitutes for any character or number of characters. The dollar symbol ($) signifies the end of the URL; without using this for blocking file extensions you may block a large number of URLs by accident. The URLs are case sensitive matched so you may have to put in both caps and non-caps versions to capture all. It can take search engines several days to a few weeks to notice the banned URL and remove it from their index. The user-agent setting allows you to block specific searcher bots or treat them differently if necessary.
All in all, a robots.txt file is highly important in having search engines correctly and efficiently crawl your site and preventing unwanted files from appearing on search engine results pages.