[PATCH] D41081: Fix clang Lexer Windows line-ending bug

Benoit Belley via cfe-commits cfe-commits at lists.llvm.org
Mon Dec 11 18:56:17 PST 2017


Thanks Zachary.

Maybe, we can approach the issue through another, hopefully less controversial angle. Clang currently has a limited number of tests verifying that it correctly handles all types of line endings. IMHO,  Zhen's solution (dos2unix & unix2dos conversions) provides an handy mechanism to expand these tests without having to duplicate test cases.

For reference, here's the current list of clang test files which seems to explicitly test various types of line endings (CRLF instead of LF only):

clang/test/CXX/lex/lex.literal/lex.string/p4.cpp
clang/test/FixIt/fixit-newline-style.c
clang/test/Frontend/system-header-line-directive-ms-lineendings.c

And here's the list of remaining clang test files with CRLF line endings, although AFAICT, that seems to be unintentional:

clang/test/CXX/expr/expr.prim/expr.prim.lambda/p15-star-this-capture.cpp
clang/test/CodeGenCXX/attr-x86-no_caller_saved_registers.cpp
clang/test/CodeGenOpenCL/no-signed-zeros.cl
clang/test/Driver/ps4-analyzer-defaults.cpp
clang/test/Frontend/Inputs/rewrite-includes-bom.h
clang/test/Index/preamble-reparse-with-BOM.m
clang/test/Misc/ast-dump-c-attr.c
clang/test/OpenMP/simd_codegen.cpp
clang/test/Preprocessor/macro_vaopt_check.cpp
clang/test/Preprocessor/macro_vaopt_expand.cpp
clang/test/SemaCXX/attr-non-x86-no_caller_saved_registers.cpp
clang/test/SemaCXX/attr-x86-no_caller_saved_registers.cpp
clang/test/SemaOpenCL/array-init.cl

I am of course opened to other solutions to improve the current test coverage.

Cheers,
Benoit


From: Zhen Cao <zhen.cao at autodesk.com<mailto:zhen.cao at autodesk.com>>
Date: lundi 11 décembre 2017 à 16:23
To: Zachary Turner <zturner at google.com<mailto:zturner at google.com>>
Cc: Benoit Belley <benoit.belley at autodesk.com<mailto:benoit.belley at autodesk.com>>, "reviews+D41081+public+5a71b504a12c16ac at reviews.llvm.org<mailto:reviews+D41081+public+5a71b504a12c16ac at reviews.llvm.org>" <reviews+D41081+public+5a71b504a12c16ac at reviews.llvm.org<mailto:reviews+D41081+public+5a71b504a12c16ac at reviews.llvm.org>>, "rnk at google.com<mailto:rnk at google.com>" <rnk at google.com<mailto:rnk at google.com>>, "cfe-commits at lists.llvm.org<mailto:cfe-commits at lists.llvm.org>" <cfe-commits at lists.llvm.org<mailto:cfe-commits at lists.llvm.org>>
Subject: RE: [PATCH] D41081: Fix clang Lexer Windows line-ending bug

Thank you very much, Zachary :)

From: Zachary Turner [mailto:zturner at google.com]
Sent: Monday, December 11, 2017 4:21 PM
To: Zhen Cao <zhen.cao at autodesk.com<mailto:zhen.cao at autodesk.com>>
Cc: Benoit Belley <Benoit.Belley at autodesk.com<mailto:Benoit.Belley at autodesk.com>>; reviews+D41081+public+5a71b504a12c16ac at reviews.llvm.org<mailto:reviews+D41081+public+5a71b504a12c16ac at reviews.llvm.org>; rnk at google.com<mailto:rnk at google.com>; cfe-commits at lists.llvm.org<mailto:cfe-commits at lists.llvm.org>
Subject: Re: [PATCH] D41081: Fix clang Lexer Windows line-ending bug

I'll let rnk@ weigh in (he's on vacation today though).

I don't feel comfortable lgtm'ing any change where "don't use core.autocrlf=true" is an alternative solution, but if other people want to disagree with me and lgtm this, then that suggests I'm in the minority, which is fine.  :)

On Mon, Dec 11, 2017 at 1:13 PM Zhen Cao <zhen.cao at autodesk.com<mailto:zhen.cao at autodesk.com>> wrote:
I think adding two lit substitutions is not a significant issue since dos2unix and unix2dos are common Unix utilities. This solution also has the advantage of testing both styles of line-endings with only one test case, which I think actually reduces complexity and maintenance burden.

From: Zachary Turner [mailto:zturner at google.com<mailto:zturner at google.com>]
Sent: Monday, December 11, 2017 3:42 PM
To: Benoit Belley <Benoit.Belley at autodesk.com<mailto:Benoit.Belley at autodesk.com>>
Cc: reviews+D41081+public+5a71b504a12c16ac at reviews.llvm.org<mailto:reviews%2BD41081%2Bpublic%2B5a71b504a12c16ac at reviews.llvm.org>; Zhen Cao <zhen.cao at autodesk.com<mailto:zhen.cao at autodesk.com>>; rnk at google.com<mailto:rnk at google.com>

Subject: Re: [PATCH] D41081: Fix clang Lexer Windows line-ending bug

To be honest, I'm not a fan of testing code in a way that tries to work around issues related to source control.  I think we had a similar discussion before on a previous patch, and my suggested fix was to simply set core.autocrlf=false in your git config.  This solves all of these problems and does not come with any real downsides that I'm aware of.

It also yields better tests IMO, because the fewer steps of translation that an input has to go through before being fed to the test program the better.  If you want to test the behavior of a file with \r\n, just check in an input file with \r\n and you're ready to go.  Furthermore, by not doing this we are adding complexity (and hence, technical debt) *to the code* to deal with issues that could easily be solved using an appropriate developer configuration.  I'm not saying that we should not be testing the case of \r\n newlines, because we can't control what perforce outputs and the tool itself needs to work with either style of newline.  But we can control how we configure our git clones, and if doing so leads to better tests with less testing infrastructure hacks and workarounds, then I think we should do that.

On Mon, Dec 11, 2017 at 12:32 PM Benoit Belley <Benoit.Belley at autodesk.com<mailto:Benoit.Belley at autodesk.com>> wrote:
Hi Zachary,

We are trying to be agnostic to how a particular SCM (SVN, Git) is handling line termination. For example, any CR, CRLF line-ending would be translated automatically by Git (CR only on unix, and CRLF on windows) unless these files are marked explicitly as binary files using a special .gitattribute file.

In summary, the proposed solution is:

  *   SCM agnostic
  *   Tests the handling both types of line endings on any platforms where the tests are run.
Cheers,
Benoit

From: Zachary Turner <zturner at google.com<mailto:zturner at google.com>>
Date: lundi 11 décembre 2017 à 15:22
To: "reviews+D41081+public+5a71b504a12c16ac at reviews.llvm.org<mailto:reviews+D41081+public+5a71b504a12c16ac at reviews.llvm.org>" <reviews+D41081+public+5a71b504a12c16ac at reviews.llvm.org<mailto:reviews+D41081+public+5a71b504a12c16ac at reviews.llvm.org>>
Cc: Zhen Cao <zhen.cao at autodesk.com<mailto:zhen.cao at autodesk.com>>, "rnk at google.com<mailto:rnk at google.com>" <rnk at google.com<mailto:rnk at google.com>>, Benoit Belley <benoit.belley at autodesk.com<mailto:benoit.belley at autodesk.com>>
Subject: Re: [PATCH] D41081: Fix clang Lexer Windows line-ending bug

Would it be possible to do this without any substitutions at all? It seems a little heavy handed to add a new lit substitution for something that is only going to be used in one test.

Can we just check in two different input files, one with CR and another with CRLF, and use those as inputs to the test?
On Mon, Dec 11, 2017 at 12:18 PM Zhen Cao via Phabricator <reviews at reviews.llvm.org<mailto:reviews at reviews.llvm.org>> wrote:
caoz added a subscriber: cfe-commits.

https://reviews.llvm.org/D41081<https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D41081&d=DwMFaQ&c=76Q6Tcqc-t2x0ciWn7KFdCiqt6IQ7a_IF9uzNzd_2pA&r=wR2gM5Rr7Ie8nJT0AKKx0nretMcnu3YZMyPRVEnnIr0&m=uejO8PxG5kINlnu9hNgIioUZcZun7px3wczMnmy_ocU&s=chrRP4fl0LJR04El0VzGUg6oiCuC7dRdQOhbn7m7Jt0&e=>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20171212/2f0b2c71/attachment-0001.html>


More information about the cfe-commits mailing list