[back to Inside Google Sitemaps] Inside Google Sitemaps: Analyzing a robots.txt fileYour source for product news and developments Analyzing a robots.txt fileEarlier this week, we told you about a feature we made available through the Sitemaps program that analyzes the robots.txt file for a site. Here are more details about that feature. What the analysis means The Sitemaps robots.txt tool reads the robots.txt file in the same way Googlebot does. If the tool interprets a line as a syntax error, Googlebot doesn't understand that line. If the tool shows that a URL is allowed, Googlebot interprets that URL as allowed. This tool provides results only for Google user-agents (such as Googlebot). Other bots may not interpret the robots.txt file in the same way. For instance, Googlebot supports an extended definition of the standard. It understands Allow: lines, as well as * and $. So while the tool shows lines that include these extensions as understood, remember that this applies only to Googlebot and not necessarily to other bots that may crawl your site. Subdirectory sites A robots.txt file is valid only when it's located in the root of a site. So, if you are looking at a site in your account that is located in a subdirectory (such as http://www.example.com/mysite/), we show you information on the robots.txt file at the root (http://www.example.com/robots.txt). You may not have access to this file, but we show it to you because the robots.txt file can impact crawling of your subdirectory site and you may want to make sure it's allowing URLs as you expect. Testing access to directories If you test a URL that resolves to a file (such as http://www.example.com/myfile.html), this tool can determine if the robots.txt file allows or blocks that file. If you test a URL that resolves to a directory (such as http://www.example.com/folder1/), this tool can determine if the robots.txt file allows or blocks access to that URL, but it can't tell you about access to the files inside that folder. The robots.txt file may have set restrictions on URLs inside the folder that are different than the URL of the folder itself. Consider this robots.txt file: User-Agent: *If you test http://www.example.com/folder1/, the tool will say that it's blocked. But if you test http://www.example.com/folder1/myfile.html, you'll see that it's not blocked even though it's located inside of folder1. Syntax not understood You might see a "syntax not understood" error for a few different reasons. The most common one is that Googlebot couldn't parse the line. However, some other potential reasons are:
We are working on a few known issues with the tool, including the way the tool processes capitalization and the analysis with Google user-agents other than Googlebot. We'll keep you posted as we get these issues resolved. Previous Posts:
Copyright © 2005 Google Inc. All rights reserved. |
|