[llvm-dev] 404s within LLVM documentation
Neil Nelson via llvm-dev
llvm-dev at lists.llvm.org
Tue Sep 3 11:44:28 PDT 2019
A practical way to proceed may be to have LLVM provide an html file list
from their server by going to the top level https://llvm.org directory
and executing the following command
find . -name '*.htm?' > llvm.org_html_file_list
giving all file names with parent directories for extensions with html
or htm. It may be that there are multiple top directories of interest,
such as one for https://clang.llvm.org/, that could also be put into
their own file lists, though this is secondary at the moment. Having the
name of that top level directory in each case may help or the top level
web-page name could work. We just need to be sure the changes get back
to the proper directory. Tar or zip the list(s) for easy download.
The LLVM html files could then be downloaded to a local user's computer
from the list using wget, the analysis done and the changes made. The
changes could then be uploaded to https://bugs.llvm.org using diff files
as patches or as LLVM directs.
Without the file lists from LLVM for this local procedure, the only
option would be to remove the html link tags for the dead-links, which
removes an easy ability to make corrections, if can be done, to those
links. This procedure be done by downloading the LLVM site's html pages
through page links with wget. Since possibly useful information is lost
with this procedure it is not likely a preferred option.
The first option, without parent pages for the dead-links below, would
tend to require the download of possibly all or most of the html files
in the list in order to find those few of concern. Whether or not there
are copyright or other issues with downloading large chunks of the LLVM
site may be considered.
There is an option in wget when downloading a site to change all the
links to local files in a manner Patrick suggests that may obtain that
objective. Considering the scale of that change it would best be done on
the LLVM server in the manner of a copy with changes using wget and then
directing a browser to the copy to see that result before going live. It
may be the case that wget would not work or further link changes done
with a program would be required. It would be easy to redirect back to
the prior LLVM site if critical problems were found later. But the scale
of this change suggests it would be done with more detailed
consideration at LLVM as against the relatively few dead-link changes to
this point identified that could be addressed with diff uploads.
The option for writing a program for the dead-link analysis and changes
seems less likely in that the programmer would need to write for an
environment not immediately available to him and a program would not
allow the more incremental and clear visibility of diff uploads.
Regards, Neil Nelson
On 9/1/19 4:33 AM, Patrick Nappa wrote:
>
> It would be a fairly direct procedure to associate links by their
> file name (less path) with file locations. The process would then
> update the links for the correct paths, list links without an
> existing file, and list dead links having more than one existing
> file with the same name.
>
> and
>
> Patrick, how long does the crawl take? I suspect if we fixed
> internal documentation links so that they point to local copies of
> documentation when building locally it would be quite quick (no
> actual idea though).
>
> That crawl was actually done on the live site, using the linkchecker
> tool.
>
> Doing it locally would indeed be much better, and it turns out Sphinx
> has a builtin tool for doing such a check (`cd llvm/docs && make -f
> Makefile.sphinx linkcheck`), but also checks external hyperlinks are
> reachable. Now, the runtime for this can be seriously reduced if we
> change all internal document links to actually point to internal
> document links (i.e. link to /docs/foo/bar, rather than
> https://llvm.org/docs/foo/bar, or llvm.org/docs/foo/bar
> <http://llvm.org/docs/foo/bar> - easily fixable), so as to avoid an
> internet check. I do believe we should check external links still, as
> having documentation link to nowhere can be jarring, however I don't
> think such crawls need to be as frequent.
>
> Cheers,
> Patrick
>
> On Thu, Aug 29, 2019 at 7:20 PM James Henderson via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Patrick, how long does the crawl take? I suspect if we fixed
> internal documentation links so that they point to local copies of
> documentation when building locally it would be quite quick (no
> actual idea though). That in turn would probably make it feasible
> to add to the existing documentation build bots, I think.
>
> James
>
> On Thu, 29 Aug 2019 at 03:47, Neil Nelson via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Patrick, You have identified a good way to do this. Given it
> is likely that the links are to files in a directory structure
> on a single server with that file structure/path given by the
> link text, as we see in your dead link list, and that in a
> good number, perhaps likely a large majority of the cases,
> that the file names (less the directory path) are unique,
>
> It would be a fairly direct procedure to associate links by
> their file name (less path) with file locations. The process
> would then update the links for the correct paths, list links
> without an existing file, and list dead links having more than
> one existing file with the same name.
>
> The frequency of that run would depend on the frequency of
> dead-link discovery that the run could provide.
>
> Regards, Neil Nelson
>
> On 8/28/19 7:52 PM, Patrick Nappa via llvm-dev wrote:
>> Hi all,
>>
>> I'm currently in the process of updating the Kaleidoscope
>> tutorials (first and foremost, the ORC/BuildingAJIT ones),
>> and I've noticed a fair few 404s which are lingering within
>> the current visible documentation. Some of these don't seem
>> to have linked to existing pages for a while.
>>
>> I was wondering if there was a way to set up a check in the
>> buildbot to ensure that documentation doesn't break between
>> builds? I'm happy to fix the current dead links I've found
>> (see below) but thought it might be wise to set up a more
>> automated approach in the future. Does anyone have any tips
>> on how I'd go about doing this/if this should be set up at all?
>>
>> I ran a web crawler to find each of the dead links (this may
>> not be exhaustive), and they are as follows:
>> https://llvm.org/docs/TestSuiteMakefileGuide
>> https://llvm.org/docs/doxygen/structLICM.html
>> https://llvm.org/docs/tutorial/LangImpl5.html#for-loop-expression
>> https://llvm.org/docs/tutorial/LangImpl7.html#user-defined-local-variables
>> http://llvm.org/docs/lnt/modindex.html
>> https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl6.html#user-defined-unary-operators
>> https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl5.html#for-loop-expression
>> https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl7.html#user-defined-local-variables
>> https://llvm.org/docs/tutorial/LangRef.html#instruction-reference
>> https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl4.html#adding-a-jit-compiler
>> https://llvm.org/docs/tutorial/WritingAnLLVMPass.html
>> https://llvm.org/docs/tutorial/Passes.html
>> https://llvm.org/docs/tutorial/ProgrammersManual.html#viewing-graphs-while-debugging-code
>> https://llvm.org/docs/tutorial/SourceLevelDebugging.html
>> https://llvm.org/docs/tutorial/Frontend/PerformanceTips.html
>> https://llvm.org/docs/tutorial/GetElementPtr.html
>> https://llvm.org/docs/tutorial/GarbageCollection.html
>> https://llvm.org/docs/tutorial/ExceptionHandling.html
>> https://www.llvm.org/docs/doxygen/structLICM.html
>> http://llvm.org/docs/TestSuiteMakefileGuide
>> http://llvm.org/docs/doxygen/structLICM.html
>> https://www.llvm.org/docs/TestSuiteMakefileGuide
>> http://llvm.org/docs/tutorial/LangImpl5.html#for-loop-expression
>> http://llvm.org/docs/tutorial/LangImpl7.html#user-defined-local-variables
>>
>>
>> Some of these are trivial mistakes (i.e.
>> https://llvm.org/docs/tutorial/LangRef.html#instruction-reference
>> -> https://llvm.org/docs/LangRef.html#instruction-reference),
>> and some require a bit more inspection.
>>
>> Regards,
>> Patrick
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190903/089ac216/attachment-0001.html>
More information about the llvm-dev
mailing list