Incorrect use of this file can harm the site's ability to hold rank.
This root file instructs bots not to crawl urls. That's not to CRAWL. That's different than not to INDEX. The difference is huge. Robots.txt will prevent crawling of a url, but if that same url has a link to it from another site, it can still get indexed. You can't prevent indexing using robots.txt. To prevent indexing use a robots meta tag instruction of noindex.
My observation is that very few sites get this right - we see many errors involving improper declarations of User-agent, wild cards, sitemaps, syntax, etc. And because it's not well understood, it is often ignored after changes have been made that should have been reflected in the file. I view it as an indicator of a well managed site.
Incorrect use of this file can not only harm the site's ability to hold rank, it can also take the website down.
This is a very power file used for everything from redirects and rewrites (how you get seo friendly names) to blocking ips, preventing hotlinking images, password protection, etc.
Common errors involve redirects, https, regular expressions
Incorrectly setting up the domain name servers can completely disable the website, or harm the ability of the site to hold rank.
Pointing the dns for a single host to more than one domain will let all those domains be indexed with the same content. Most devs/hosts have caught on not to do this.
Dns issues are often the root of penalties where subdomains are conflicting with the top level domain. It's very common to find indexed subdomain clones of sites triggering rank failure.
Both the www and the non-www versions of the site require a separate dns value. We strongly advise allowing both, but choosing ONE way to display and index the site to prevent any conflicts/redundancies from being indexed.
Our government at work: I've noticed that for years, the NSA website - nsa.gov - did not have both dns set up as best practices suggest. Note that you cannot get to the site by typing "nsa.gov" - you have to use "www.nsa.gov". For a commerce site, this would be unforgivable, not just for the obvious reason that people might not be able to access the site, but because many natural links are given to a domain without the www, so any rank push from those links would be lost.
Incorrectly using redirects can harm the ability of a site to hold rank.
There are a number of ways to achieve a redirect. The 2 most common are using an instruction in the .htaccess file, or using a header instruction on the url itself. Redirects can be used to point visitors to a replacement url, to move to another domain, repair a broken link, etc.
Common errors include using a 302 temporary instead of a 301 permanent for any url involving content (302 does not pass PR!). Also common is the chaining of redirects. What most people don't realize is that some PR is lost across every redirect. This means that chained redirects should be avoided. If file A changes to file B, then redirect from A -> B. If B then changes to C. Do not chain from A to B to C. Instead, to conserve PR, use 2 redirects: A -> C and B -> C.
Incorrectly using the rel canonical tag can prevent a url from being indexed.
Used to tell Google which url to index. If the canonical url is different from the url being viewed, the url being viewed will not be indexed. Canonicals can point to urls on other domains.
Most common error is in syntax - homepage url needs a slash at the end (http://www.domain.com/)
Error in the automation code can harm the ability of a site to hold rank and can take the system down.
Used in every aspect of a modern site, automation is definitely our friend, until a human introduced error is magnified by it. Some of the search considerations are discussed in my post on re1y, "Automation and SEO".
Simple tools once successfully used to automate content - used mostly for geo targeting - are much more likely to trigger a penalty now as Panda is able to detect templated content. Common errors: the creation of large numbers of redundant files, pagination routines, date metrics, inventory, etc. Anywhere data is involved, automation is needed to manage the data, and that process, if it involves a high ranking website, demands informed management.
Incorrect implementation of multiple sites can trigger domain level penalties.
Businesses with multiple sites must abide by some rules meant to enforce Google's (overly naive) philosophy that money should not influence the search. For example, you may not use owned sites to push the rank of other owned sites. So if you wish to interconnect owned sites via links, be aware of the ways that ownership are revealed.
Most common error is rampant interlinking of owned sites combined with revealing the ownership of the sites.
To preserve the search compliance of multiple site with common ownership, keep them completely independent. Do not share images, scripts, media, etc. with another owned site.
Incorrect use, or errors in implementation can harm the ability of a site to hold rank, or trigger penalties.
Scripts are abused in 2 ways - what they do, and how they're used.
A script intended to do something is basically automation - read the section on this above.
An example of a usage error is including scripts meant for one domain onto another. If done with Google products, you are revealing common ownership among domains that should be implemented completely independent of each other, as well as contaminating your data.
Inappropriate implementation of a site's internal search can harm the ability to hold rank, and trigger penalties.