[llvm-dev] 404s within LLVM documentation

Neil Nelson via llvm-dev llvm-dev at lists.llvm.org
Tue Sep 3 11:44:28 PDT 2019


A practical way to proceed may be to have LLVM provide an html file list 
from their server by going to the top level https://llvm.org directory 
and executing the following command

find . -name '*.htm?' > llvm.org_html_file_list

giving all file names with parent directories for extensions with html 
or htm. It may be that there are multiple top directories of interest, 
such as one for https://clang.llvm.org/, that could also be put into 
their own file lists, though this is secondary at the moment. Having the 
name of that top level directory in each case may help or the top level 
web-page name could work. We just need to be sure the changes get back 
to the proper directory. Tar or zip the list(s) for easy download.

The LLVM html files could then be downloaded to a local user's computer 
from the list using wget, the analysis done and the changes made. The 
changes could then be uploaded to https://bugs.llvm.org using diff files 
as patches or as LLVM directs.

Without the file lists from LLVM for this local procedure, the only 
option would be to remove the html link tags for the dead-links, which 
removes an easy ability to make corrections, if can be done, to those 
links. This procedure be done by downloading the LLVM site's html pages 
through page links with wget. Since possibly useful information is lost 
with this procedure it is not likely a preferred option.

The first option, without parent pages for the dead-links below, would 
tend to require the download of possibly all or most of the html files 
in the list in order to find those few of concern. Whether or not there 
are copyright or other issues with downloading large chunks of the LLVM 
site may be considered.

There is an option in wget when downloading a site to change all the 
links to local files in a manner Patrick suggests that may obtain that 
objective. Considering the scale of that change it would best be done on 
the LLVM server in the manner of a copy with changes using wget and then 
directing a browser to the copy to see that result before going live. It 
may be the case that wget would not work or further link changes done 
with a program would be required. It would be easy to redirect back to 
the prior LLVM site if critical problems were found later. But the scale 
of this change suggests it would be done with more detailed 
consideration at LLVM as against the relatively few dead-link changes to 
this point identified that could be addressed with diff uploads.

The option for writing a program for the dead-link analysis and changes 
seems less likely in that the programmer would need to write for an 
environment not immediately available to him and a program would not 
allow the more incremental and clear visibility of diff uploads.

Regards, Neil Nelson

On 9/1/19 4:33 AM, Patrick Nappa wrote:
>
>     It would be a fairly direct procedure to associate links by their
>     file name (less path) with file locations. The process would then
>     update the links for the correct paths, list links without an
>     existing file, and list dead links having more than one existing
>     file with the same name.
>
> and
>
>     Patrick, how long does the crawl take? I suspect if we fixed
>     internal documentation links so that they point to local copies of
>     documentation when building locally it would be quite quick (no
>     actual idea though).
>
> That crawl was actually done on the live site, using the linkchecker 
> tool.
>
> Doing it locally would indeed be much better, and it turns out Sphinx 
> has a builtin tool for doing such a check (`cd llvm/docs && make -f 
> Makefile.sphinx linkcheck`), but also checks external hyperlinks are 
> reachable. Now, the runtime for this can be seriously reduced if we 
> change all internal document links to actually point to internal 
> document links (i.e. link to /docs/foo/bar, rather than 
> https://llvm.org/docs/foo/bar, or llvm.org/docs/foo/bar 
> <http://llvm.org/docs/foo/bar> - easily fixable), so as to avoid an 
> internet check. I do believe we should check external links still, as 
> having documentation link to nowhere can be jarring, however I don't 
> think such crawls need to be as frequent.
>
> Cheers,
> Patrick
>
> On Thu, Aug 29, 2019 at 7:20 PM James Henderson via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     Patrick, how long does the crawl take? I suspect if we fixed
>     internal documentation links so that they point to local copies of
>     documentation when building locally it would be quite quick (no
>     actual idea though). That in turn would probably make it feasible
>     to add to the existing documentation build bots, I think.
>
>     James
>
>     On Thu, 29 Aug 2019 at 03:47, Neil Nelson via llvm-dev
>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>         Patrick, You have identified a good way to do this. Given it
>         is likely that the links are to files in a directory structure
>         on a single server with that file structure/path given by the
>         link text, as we see in your dead link list, and that in a
>         good number, perhaps likely a large majority of the cases,
>         that the file names (less the directory path) are unique,
>
>         It would be a fairly direct procedure to associate links by
>         their file name (less path) with file locations. The process
>         would then update the links for the correct paths, list links
>         without an existing file, and list dead links having more than
>         one existing file with the same name.
>
>         The frequency of that run would depend on the frequency of
>         dead-link discovery that the run could provide.
>
>         Regards, Neil Nelson
>
>         On 8/28/19 7:52 PM, Patrick Nappa via llvm-dev wrote:
>>         Hi all,
>>
>>         I'm currently in the process of updating the Kaleidoscope
>>         tutorials (first and foremost, the ORC/BuildingAJIT ones),
>>         and I've noticed a fair few 404s which are lingering within
>>         the current visible documentation. Some of these don't seem
>>         to have linked to existing pages for a while.
>>
>>         I was wondering if there was a way to set up a check in the
>>         buildbot to ensure that documentation doesn't break between
>>         builds? I'm happy to fix the current dead links I've found
>>         (see below) but thought it might be wise to set up a more
>>         automated approach in the future. Does anyone have any tips
>>         on how I'd go about doing this/if this should be set up at all?
>>
>>         I ran a web crawler to find each of the dead links (this may
>>         not be exhaustive), and they are as follows:
>>         https://llvm.org/docs/TestSuiteMakefileGuide
>>         https://llvm.org/docs/doxygen/structLICM.html
>>         https://llvm.org/docs/tutorial/LangImpl5.html#for-loop-expression
>>         https://llvm.org/docs/tutorial/LangImpl7.html#user-defined-local-variables
>>         http://llvm.org/docs/lnt/modindex.html
>>         https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl6.html#user-defined-unary-operators
>>         https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl5.html#for-loop-expression
>>         https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl7.html#user-defined-local-variables
>>         https://llvm.org/docs/tutorial/LangRef.html#instruction-reference
>>         https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl4.html#adding-a-jit-compiler
>>         https://llvm.org/docs/tutorial/WritingAnLLVMPass.html
>>         https://llvm.org/docs/tutorial/Passes.html
>>         https://llvm.org/docs/tutorial/ProgrammersManual.html#viewing-graphs-while-debugging-code
>>         https://llvm.org/docs/tutorial/SourceLevelDebugging.html
>>         https://llvm.org/docs/tutorial/Frontend/PerformanceTips.html
>>         https://llvm.org/docs/tutorial/GetElementPtr.html
>>         https://llvm.org/docs/tutorial/GarbageCollection.html
>>         https://llvm.org/docs/tutorial/ExceptionHandling.html
>>         https://www.llvm.org/docs/doxygen/structLICM.html
>>         http://llvm.org/docs/TestSuiteMakefileGuide
>>         http://llvm.org/docs/doxygen/structLICM.html
>>         https://www.llvm.org/docs/TestSuiteMakefileGuide
>>         http://llvm.org/docs/tutorial/LangImpl5.html#for-loop-expression
>>         http://llvm.org/docs/tutorial/LangImpl7.html#user-defined-local-variables
>>
>>
>>         Some of these are trivial mistakes (i.e.
>>         https://llvm.org/docs/tutorial/LangRef.html#instruction-reference
>>         -> https://llvm.org/docs/LangRef.html#instruction-reference),
>>         and some require a bit more inspection.
>>
>>         Regards,
>>         Patrick
>>
>>         _______________________________________________
>>         LLVM Developers mailing list
>>         llvm-dev at lists.llvm.org  <mailto:llvm-dev at lists.llvm.org>
>>         https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>         _______________________________________________
>         LLVM Developers mailing list
>         llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>         https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190903/089ac216/attachment-0001.html>


More information about the llvm-dev mailing list