[PATCH] D97869: [OpenCL][Draft] Add OpenCL builtin test generator

Mon Mar 15 12:28:31 PDT 2021

Anastasia added a comment.

In D97869#2617063 <https://reviews.llvm.org/D97869#2617063>, @svenvh wrote:

> In D97869#2616772 <https://reviews.llvm.org/D97869#2616772>, @Anastasia wrote:
>
>> Regarding 2 and 4 I think we should drive towards deprecation of `opencl-c.h` as it is a maintenance overhead but we could convert it into a test instead?
>
> Perhaps you misunderstood option 4, as that does not require maintaining `opencl-c.h`.  Only for option 2 we would have to maintain `opencl-c.h`.  For option 4, the checked in files could look something like this:
>
> name=clang/test/SemaOpenCL/builtins-opencl2.0.check
>   ; RUN: clang-tblgen -emit-opencl-builtin-count -cl-std=cl2.0 OpenCLBuiltins.td | FileCheck '%s'
>   
>   ; This file lists the number of builtin function declarations per builtin function name
>   ; available in the OpenCL 2.0 language mode.
>   
>   CHECK: absdiff 48
>   CHECK: abs 48
>   CHECK: acos 18
>   CHECK: acosh 18
>   CHECK: acospi 18
>   ...
>
> To illustrate the use, suppose a developer writes a patch that accidentally makes the `double` variants of `acos` unavailable in OpenCL 2.0.  The reported number of `acos` builtins would be different so FileCheck would fail on the line containing `acos 18`.  If we use option 4 in combination with option 1, a Clang developer can regenerate the test and observe the differences for the `acos` tests: from there it becomes obvious that the double variants of `acos` disappeared.

Ok, I see. It is more clear now. I like that it is easy enough to understand. Still it has the issue of non-standard testing but we have mentioned it already.

>> Aside from option 3 all other options will require adding non-standard testing formats in Clang. This means that if there is any core refactoring changes even outside of OpenCL affecting the functionality in the headers, it would be required to learn how we do the testing in order to address any breakages. Perhaps this can be mitigated by the documentation, but overall it is an extra burden to the community. Speaking from my personal experience I was in the situations breaking functionality outside of OpenCL and having a unified testing format is a big advantage in understanding and addressing the issues in unfamiliar code.
>
> Do you expect this to be a problem in practice for the other proposals, and if so could you elaborate in more detail about any particular concerns for the other proposals?

Not sure I understand what you mean by "other proposals" though. I think this is only an issue for the proposals that don't use regular clang invocation for testing. It is probably not a widespread problem but it could bother someone someday potentially.

>> What problems do you see with checking in autogenerated files? There are some existing tests that check similar functionality and they seem to contain repetitive patterns. Honestly, I wouldn't be able to say if they were written with copy/paste or auto-generated but I am not sure whether it is very important. Of course, it would have been better if we have built up the functionality incrementally, and it is still an option. But it hasn't been possible in the past and as a matter of fact the original header was committed as one large file (around 15K lines).
>>
>>>   Updates to OpenCLBuiltins.td will give large diffs when regenerating the tests (regardless of whether the file is split or not).
>>
>> Could you provide more details. I don't follow this.
>
> There are two problems here that combine to make the problem bigger: checking in autogenerated files, and the fact that those files are large.  The problem I anticipate with regenerating the test files (at least with the current generator), is that inserting a new builtin in the .td file results in renumbering of the test functions, which results in a large diff.
>
> Other problems I see with checking in large autogenerated files:
>
> - Merge conflicts in the generated files when reordering/rebasing/porting patches.  A single conflict in the .td description could expand to many conflicts in the autogenerated file(s).
> - Diffs in the autogenerated files clutter the output of for example `git log -p`.
> - Code reviews of changes to those files are harder due to the size.
> - It bloats the repository.
>
> Disclaimer: I may be biased by my previous experience on projects that had large autogenerated files checked in, that's why I feel that we should avoid doing so if we reasonably can.  I have probably revealed now that option 3 is my least favorite option. :-)

Ok, fair point. :D But however your option 3 and my option 3 are apparently not the same. I was thinking we would only generate test once and then manually modify and extend it every time new functionality is added. We could see it as if the file was written with copy-paste or perfect formatting. So every time someone would add anything in the headers they would also need to modify the test and reflect the functionality being added. So we could see the test generator as a tool to help us get started with testing. The advantage of extending the test manually is that it makes more clear what functionality is being implemented because you can see OpenCL code lines and it also separates the implementation from testing to minimize the probability of mistakes. The disadvantage is that for every few lines in Tablegen definitions we might need hundreds or thousands lines in the test.

If we instead regenerate the test every time then I agree with you that we should just generate it when the tests are build and never check it into the repo.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97869/new/

https://reviews.llvm.org/D97869