[llvm-dev] RFC: Add bitcode tests to test-suite

Alina Sbirlea via llvm-dev llvm-dev at lists.llvm.org
Tue Mar 1 10:54:36 PST 2016


Comments noted! Thank you for the feedback, this is great!
I will integrate as many of your suggestions as possible in the updated
patch and continue the discussion from there.


On Tue, Mar 1, 2016 at 12:45 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:

>
> On Feb 29, 2016, at 11:55 PM, Mehdi Amini via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
> On Feb 29, 2016, at 11:28 PM, Chandler Carruth <chandlerc at google.com>
> wrote:
>
> I really don't understand this. Why is a C++ reference useful? Instead,
> we have an IR reference: the unoptimized IR.
>
>
> Mmmmm, thinking more about it, yes we could compile the IR two times and
> use the unoptimized one as a reference.
> However, right now the tests are only running llc, so the optimization
> level is not that impactful and it is not clear to me that a miscompile
> would be detected this way.
>
> Which lead me to a separate question for Alina: is this IR "unoptimized"?
> What kind of optimization pipeline is used by Halide? Is the plan to only
> exercise the CodeGen?
>
>
> Answer to myself:
> https://github.com/halide/Halide/blob/master/src/CodeGen_LLVM.cpp#L836
> This is the "regular" O3 pipeline, with the addition of the
> "AlwaysInlinerPass" at the beginning, and the "Function Passes" re-ran
> again in a "custom way" at the end of the pipe (??).
>
> I feel it would increase our coverage to have the tests committed being
> the unoptimized IR out of the Halide CodeGen, and running the optimization
> pipeline over it.
>
> --
> Mehdi
>
>
>
>
> I think it pretty important to not phrase everything in terms of C or C++
> or Clang... =/
>
> What if there are IR constructs that simply cannot be produced by C++?
> We're adding lots of those for managed languages.
>
>
> Having a first-class test-suite for IR with a managed runtime is probably
> a whole new discussion :)
>
> If there is a practical concern with having IR-based test cases, I'd like
> to actually discuss those head on rather than fishing for phrasing even
> more of our testing in terms of a single language and front-end.
>
>
> I think we're not talking about the same thing, because I feel that
> "phrasing the test in term of a single language" is misrepresenting my
> point about having a reference in C++. I mentioned in a previous email that
> I'm all for having the Halide source file along the IR to "phrase the test".
> The point about the reference is "how to validate the output of running
> the IR test"?
> I'm fine with alternate design, like having the reference computed in
> Python or Julia (or ...), but I assumed C++ would seem just more natural to
> everyone and integrates with the test infrastructure (written in C++).
> I'm also fine with having a textual reference output file instead of a C++
> code that computes it (I believe this is what we are doing for the C/C++
> tests in the test-suite), but we are losing the current scheme to have a
> random input for these tests.
> (by the way Alina: the seed needs to be kept and printed on errors for
> reproducibility, also a command line option would be nice to pass it).
>
> --
> Mehdi
>
>
>
>
> On Tue, Mar 1, 2016 at 2:16 AM Mehdi Amini via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Feb 29, 2016, at 10:50 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>
>>
>>
>> ------------------------------
>>
>> *From: *"Mehdi Amini via llvm-dev" <llvm-dev at lists.llvm.org>
>> *To: *"Alina Sbirlea" <alina.sbirlea at gmail.com>
>> *Cc: *"llvm-dev" <llvm-dev at lists.llvm.org>
>> *Sent: *Monday, February 29, 2016 7:06:51 PM
>> *Subject: *Re: [llvm-dev] RFC: Add bitcode tests to test-suite
>>
>>
>>
>> Sent from my iPhone
>>
>> On Feb 29, 2016, at 3:39 PM, Alina Sbirlea <alina.sbirlea at gmail.com>
>> wrote:
>>
>>
>>
>> On Mon, Feb 29, 2016 at 2:06 PM, Mehdi Amini <mehdi.amini at apple.com>
>> wrote:
>>
>>>
>>> On Feb 29, 2016, at 1:50 PM, Alina Sbirlea <alina.sbirlea at gmail.com>
>>> wrote:
>>>
>>>
>>>
>>> On Mon, Feb 29, 2016 at 12:18 PM, Mehdi Amini <mehdi.amini at apple.com>
>>> wrote:
>>>
>>>>
>>>> On Feb 29, 2016, at 11:40 AM, Mehdi Amini via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>
>>>> On Feb 29, 2016, at 11:16 AM, Alina Sbirlea via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>> All,
>>>>
>>>> To get the discussion going in a focused manner, here is an initial
>>>> patch with a running test. The test is from the Halide suite and is
>>>> checking the correctness of several simd operations.
>>>> (Notes: the patch is large due to the number of operations being
>>>> tested;
>>>> I expect a lot of changes before actually landing it, this is simply to
>>>> continue the discussion using a concrete example.)
>>>> http://reviews.llvm.org/D17726
>>>>
>>>>
>>>> I can't figure how to download the patch *with the bitcode files* from
>>>> Phabricator. Can you push this on github (or somewhere else)? (or if I
>>>> missed how to proceed...).
>>>>
>>>>
>>>> I was able to figure how get them "one by one", it would still be more
>>>> convenient to have an archive or a repo to clone somewhere.
>>>>
>>>> A few questions/todos to start the discussion:
>>>> 1. What is a good location for these tests? They are in a separate
>>>> Bitcode directory atm, but using the llvm_multisource. This may change to
>>>> more closely model the approach for external tests (see next item).
>>>>
>>>>
>>>> A good location would be their own external repository IMO :)
>>>>
>>>> 2. There is a single .cpp file testing all operations provided by
>>>> individual bitcode files. I expect this to change. Instead of using
>>>> llvm_multisource to have the same test run with specific arguments, each
>>>> run testing a single operation.
>>>> 3. The building approach I took is to first link all bitcode files into
>>>> a single one, then obtain the assembly for it, which cmake knows to take as
>>>> an input source.
>>>>
>>>>
>>>> Yeah, so I'd rather have a split-build model, with a split execution
>>>> model. Having a gigantic bitcode file to debug an issue is not friendly.
>>>> I'd expect to have a .cpp file that contains the main and the logic to
>>>> run test, and then every test that is linked-in to be executed, a bit like
>>>> gtests is doing (there are multiple registering mechanisms that would avoid
>>>> to declare explicitly a test in the header).
>>>> -> filters.h and filter_headers.h should just go away.
>>>>
>>>
>>> I agree, this is related to point 2. The plan here is to update the
>>> current test .cpp file to test each operation individually. In this model
>>> it will be enough to link with a single bitcode file per test.
>>>
>>>
>>>>
>>>> Also on the test in general: we should have an idea for each test what
>>>> it is doing and how.
>>>> I was expecting your tests to be on the pattern of having an
>>>> implementation in C++ and an implementation in Halide bitcode of a filters
>>>> (or whatever) and run both on random data and verifies that the result is
>>>> matching.
>>>> Unfortunately from what I can see you are feeding the tests with random
>>>> data, and the tests are "blackboxes" that set an error flag if they detect
>>>> an error.
>>>> This is not super robust: the compiler can mess with the error checking
>>>> and eliminate it for instance, making any error undetected.
>>>>
>>>
>>> The Halide bitcode filters compare the result of vectorized operations
>>> vs scalar runs of the same code. The error code against which we compare
>>> the output will be set to loose tolerance - it is currently 0. We're
>>> interested in codegen bugs that return the wrong value entirely, not
>>> accuracy differences (especially for floating point tests).
>>> With the new error threshold, the data fed into may be random or read
>>> from provided input files, I can do either.
>>> The filters will still look somewhat like blackboxes, though the name of
>>> the filter says what operation it's  being tested and the disassembled
>>> bitcode files are reasonably readable.
>>> Using your suggestion, the driver .cpp file will test one operation at a
>>> time (argvs set accordingly) and return right away once an error is found.
>>> Sound about right?
>>>
>>>
>>> All of this is great.
>>> The part that is not clear to me is why isn't it to have (what does it
>>> buy us over, or why is it better for us compared to) a possible a C/C++
>>> reference implementation of the filter, and hoisting (and refactor) all the
>>> logic to feed the tests and validate the output *out* of the filters. A
>>> filter would be just the mathematical function performed and nothing more
>>> (separation of concerns, more robust framework, easier debugging when
>>> things-go-wrong, etc.).
>>>
>>
>> I believe the answer is that Halide generates vectorized code in a way
>> that is not generated by llvm when starting from C/C++.
>>
>>
>> I don't really see how *this* addresses my point.  This is justifying why
>> your bitcode is interesting and why we are having this conversation at all
>> :)
>> It does not say why we can't have a scalar naive C/C++ impl along with
>> the bitcode for filter.
>>
>>
>> Having a C/C++ scalar reference would involve quite a bit of effort for
>> all tests. The primary reason Halide is being used is that you don't need
>> to write a lot of C/C++ code to get different optimizations for the same
>> code (e.g. vectorized vs scalar is a one line difference).
>>
>>
>> Yes, this is what is nice with Halide: "write once, codegen multiple
>> variant". But it does not mean you can't write a c++ reference for every
>> Halide filter  (not for every codegen variant of a filter!)
>>
>> It's been 2 years since my last experiments with Halide, but my memories
>> were that there was a C backend?
>> I had in mind for each test to have (possibly in a separate directory for
>> each test):
>>  - the halide source for the filter.
>>  - the c/c++ (maybe generated?) for the filter.
>>  - the bitcode generated for the filter (potentially multiple variant
>> depending on the CPU support and/or the schedule).
>>
>> Then some common code/infrastructure to interact with the individual
>> filters, loading them, executing the variants for a filter, and checking
>> results.
>>
>> If the reference c/c++ can't be generated by halide (or obtained
>> somehow), and we can't do better than the current tests infrastructure you
>> have, then I'm worried about the cost/benefit for this test-suite.
>>
>>
>> I think that a C/C++ version would be nice to have, but not necessary.
>>
>> IR generated by non-Clang frontends and/or IR going through alternate
>> optimization pipelines tend to hit bugs that are much harder to hit with
>> Clang alone.
>>
>>
>> To make it clear: the point of the C++ reference I was asking for is *not
>> to stress clang* at all. It is intended to compute the *golden result* to
>> be compared to the runs of each variant for a filter.  Having a reference
>> is important when a diff is detected in the output and you need to figure
>> out what is going on.
>>
>>
>> It would help to have a description of what each test does, but
>> including, for example, the Halide source code for each test will hopefully
>> be enough of a guide.
>>
>>
>> So we can get really fast test coverage for possible codegen bugs by
>> comparing that different layout optimizations in Halide give the same
>> result.
>> I think having each filter tested separately should give a good
>> separation of concerns and easy debugging for each particular test.
>>
>>
>> This is great for halide validation, we are all agreeing with this I
>> think. The question is where is the tradeoff for the LLVM project. I'm
>> trying to make sure that the extra coverage doesn't come with a burden to
>> debug and triage issues when something will break: i.e. the tests need to
>> be very friendly to interact with.
>>
>>
>> I don't think that any non-trivial tests are truly "friendly" to interact
>> with.
>>
>>
>> Yeah, I just think we shouldn't make it arbitrarily worse by not having a
>> good infrastructure to begin with :)
>>
>> In the end, if we expect this suite to be accepted and maintained as a
>> "first-class citizen" by LLVM developers (i.e. accepting things like
>> reverting a commit if it breaks something in this suite), we'd better make
>> sure the burden to interact with it is minimal.
>>
>>
>> Tracking down self-hosting bugs is not friendly, and those aren't
>> anywhere near the worst ;) --
>>
>> These tests with their simple driver seem like good input that bugpoint
>> can reduce (assuming the tests runtimes are not too long), and that's
>> friendlier than most of the other multisource tests.
>>
>>
>> If we're talking exclusively about crashes, I agree. However if we're
>> considering correctness issue as well (miscompiles), I believe that the
>> structural changes I proposed are very important to easily perform bugpoint
>> on them for instance (or bugpoint would just turn the test into "return
>> false;").
>> Also I believe these changes are necessary to perform timing measurement
>> for these tests, if we are interested in the quality of the
>> optimization/codegen (to be hooked into something like
>> https://github.com/google/benchmark ?).
>>
>> Best,
>>
>> --
>> Mehdi
>>
>>
>>
>>
>>  -Hal
>>
>> This is the motivation for my comments so far.
>> Other people in the community may have a different opinion/appreciation
>> of the situation, this just represents my current thoughts.
>>
>> Hope it makes sense.
>>
>> --
>> Mehdi
>>
>>
>>
>>
>>
>>>
>>> Also, just looking quickly at one IR I'm surprised by things like:
>>>>
>>>> "assert succeeded165":                            ; preds = %"assert
>>>> succeeded146"
>>>>   %buf_host181 = getelementptr inbounds %struct.buffer_t,
>>>> %struct.buffer_t* %error_op_pcmpeqq_272.buffer, i64 0, i32 1
>>>>   %23 = bitcast i8** %buf_host181 to double**
>>>>   %error_op_pcmpeqq_272.host226227232 = load double*, double** %23,
>>>> align 8
>>>>   %24 = icmp eq %struct.buffer_t* %error_op_pcmpeqq_272.buffer, null
>>>>   br i1 %24, label %"assert failed183", label %"assert succeeded184",
>>>> !prof !4
>>>>
>>>> Here you have as check for nullptr at %24, but you already
>>>> loaded %error_op_pcmpeqq_272.host226227232 from this pointer just before!
>>>>
>>>
>>> It's checking that the host value loaded from buffer_t is not null. I
>>> don't see what's wrong with this. What am I missing?
>>>
>>>
>>> I may be misreading it, my impression when skimming through the code was
>>> that it seems equivalent to:
>>>
>>> foo(buffer_t *out) {
>>>   auto value = out->host;
>>>   if (!out) {
>>>     error("nullptr");
>>>   }
>>> }
>>>
>>>
>>> In case I haven't been clear: I think this work is valuable for the
>>> project, and thank you for putting some effort into it :)
>>>
>>> --
>>> Mehdi
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> A separate discussion is on reading metadata (mcpu and mattr) in llc. I
>>>> added a script to work around that for now.
>>>>
>>>>
>>>> The generic way of doing it in llvm is (I think) to use function
>>>> attributes:
>>>>
>>>> attributes #0 = { "target-cpu"="x86-64" "target-features"="+avx2" }
>>>>
>>>> You shouldn't need it on the command line I think?
>>>>
>>>
>>> Yes, I believe so too. Currently these are set in mcpu and mattr by
>>> Halide and not read in by llc, hence the need for feeding them as
>>> parameters. It's a separate issue that we'll need to go into in depth, but
>>> I don't want it to interfere with getting feedback on how to best publish
>>> these tests.
>>>
>>>
>>>>
>>>> --
>>>> Mehdi
>>>>
>>>>
>>>>
>>>>
>>>> Looking forward to your feedback!
>>>>
>>>> Thanks,
>>>> Alina
>>>>
>>>>
>>>>
>>>> On Fri, Feb 19, 2016 at 6:50 AM, Kristof Beyls <kristof.beyls at arm.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On 18/02/2016 19:12, Alina Sbirlea via llvm-dev wrote:
>>>>>
>>>>>
>>>>>
>>>>>> I have more questions for Alina. What kind of tests do you have:
>>>>>>
>>>>>> - "the compiler takes the bitcode and generates code without crashing"
>>>>>> - "the compiled test runs without crashing"
>>>>>> - "the compiled test will produce an output that be checked against a
>>>>>> reference"
>>>>>> - "the compiled test is meaningful as a benchmarks"
>>>>>>
>>>>>
>>>>> We have all 4 kinds of tests in Halide. The bitcode files for the
>>>>> first category is already available and I'm working on building the ones
>>>>> for the next 3. We'd like to include all incrementally.
>>>>>
>>>>>
>>>>> It seems to me that the first category ("the compiler takes the
>>>>> bitcode and generates code without crashing") are tests that should be
>>>>> added to the "make check-all" tests in the LLVM subproject, rather than the
>>>>> test-suite subproject?
>>>>> Or if these tests currently don't crash the compiler anymore, the bugs
>>>>> must have been fixed, and there should already be equivalent tests?
>>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>>
>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160301/90a41193/attachment.html>


More information about the llvm-dev mailing list