On Thu, Oct 14, 2010 at 10:28 AM, Talin <span dir="ltr"><<a href="mailto:viridia@gmail.com">viridia@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div class="im">On Wed, Oct 13, 2010 at 11:10 PM, Anton Korobeynikov <span dir="ltr"><<a href="mailto:anton@korobeynikov.info" target="_blank">anton@korobeynikov.info</a>></span> wrote:<br></div><div class="gmail_quote">

<div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>> indexing the <a href="http://llvm.org" target="_blank">llvm.org</a> svn archive. This means that when you search for an<br>
> LLVM-related symbol in code search, you get one of the many (possibly<br>
> out-of-date) mirrors, rather than the up-to-date <a href="http://llvm.org" target="_blank">llvm.org</a> version. This is<br>
> sad.<br>
</div>This is intentional. The workload of the server was pretty huge w/o this.<br></blockquote><div><br></div></div><div>Could we at least add a rule allowing the codesearch crawler, rather than opening it up to all crawlers? The user agent string is SVN/1.5.4/GoogleCodeSearch. </div>

</div></blockquote><div><br></div><div>So what I am proposing is replacing the contents of the robots.txt with the following:</div><div>----------------------------------------------------------<br><span class="Apple-style-span" style="font-family: monospace; font-size: medium; white-space: pre-wrap; ">User-agent: GoogleCodeSearch
</span><span class="Apple-style-span" style="font-family: monospace; font-size: medium; white-space: pre-wrap; ">Allow: /svn
</span><span class="Apple-style-span" style="font-family: monospace; font-size: medium; white-space: pre-wrap; ">Disallow: /</span></div><div><span class="Apple-style-span" style="font-family: Times; font-size: medium; "><pre style="word-wrap: break-word; white-space: pre-wrap; ">

<span class="Apple-style-span" style="font-family: Times; white-space: normal; "><pre style="word-wrap: break-word; white-space: pre-wrap; ">User-agent: *
Disallow: /bugs
Disallow: /doxygen
Disallow: /cvsweb
Disallow: /stats
Disallow: /testresults/X86
Disallow: /nightlytest
Disallow: /viewvc
Disallow: /nightlytest2
Disallow: /devmtg/2008-08/*.m4v$
Disallow: /devmtg/2008-08/*.3gp$
Disallow: /svn
<span class="Apple-style-span" style="font-family: arial; white-space: normal; font-size: small; ">----------------------------------------------------------</span></pre></span></pre></span></div><div><span class="Apple-style-span" style="font-family: Times; font-size: medium; "><pre style="word-wrap: break-word; white-space: pre-wrap; ">

<span class="Apple-style-span" style="font-family: Times; white-space: normal; "></span></pre></span></div><div>(See also <a href="http://www.robotstxt.org/norobots-rfc.txt">http://www.robotstxt.org/norobots-rfc.txt</a>)</div>

</div><br>-- <br>-- Talin<br>