[llvm-dev] Possible AVX512 codegen bug in LLVM 10.0.1?

Fri Sep 4 21:35:57 PDT 2020

Hey Craig,

Thanks for the clarification.  Your tip about removing
min-legal-vector-width does seem to work on this test case.

For additional context, I developed this test case based on a problem I
encountered from a custom LLVM pass that outlines loops into separate
functions.  Looking again at that original problem, it looks like the
caller, from which the vectorized loop was outlined, still has
min-legal-vector-width=0 after outlining.  It seems like the right answer
is for that custom pass to remove the min-legal-vector-width attribute from
the caller after outlining.  Does that sound right to you?

Thanks again for your help.

Cheers,
TB

On Sat, Sep 5, 2020 at 12:17 AM Craig Topper <craig.topper at gmail.com> wrote:

> I forgot, another option is to compile your main with
> -mprefer-vector-width=512 which will add another attribute
> "prefer-vector-width" to main that will tell the backend to not split 512
> bit vectors either.
>
> ~Craig
>
>
> On Fri, Sep 4, 2020 at 9:11 PM Craig Topper <craig.topper at gmail.com>
> wrote:
>
>> I believe this is an interaction with our method for avoiding zmm
>> registers on skylake-avx512 by default. The clang frontend adds a function
>> attribute "min-legal-vector-width" to tell about any explicit vectors used
>> in function arguments, returns, inline assembly, or x86intrin.h intrinsics
>> used by the C code. The backend uses this to know if any 512 bit vectors it
>> sees came from the user code or from the auto vectorizers. If it came from
>> user code we need to use zmm, but if it came from the auto vectorizers
>> we're allowed to split into smaller vectors.
>>
>> In your case your main function has the "min-legal-vector-width"
>> attribute set to 0 which means the original C code was all scalar. None of
>> the other functions have the attribute. So the backend thinks any vectors
>> it sees in main came from the auto vectorizers and are allowed to be split.
>> Lack of attribute is treated conservatively. We assume that the vector
>> widths weren't checked. So any 512-bit vectors will use zmm in the other
>> functions.
>>
>> I notice in the ll file that the call to main has been modified to use a
>> vector when it didn't originally. So clang didn't see the vector when it
>> generated the code. I think you can remove the min-legal-vector-width
>> attribute to fix your issue.
>>
>> Hope that helps. Let me know if you have any questions.
>>
>> ~Craig
>>
>>
>> On Fri, Sep 4, 2020 at 8:49 PM TB Schardl via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hey LLVMDev,
>>>
>>> Perhaps I'm missing something, but I think I've stumbled across a
>>> codegen bug in LLVM 10.0.1 related to AVX512.  I've attached a small LLVM
>>> IR testcase and generated x86_64 assembly file that shows the bug.
>>>
>>> The test case is small, but not quite minimal, mostly because of driver
>>> code included in the test case so one can compile and run the program.  The
>>> program does a simple vectorizable computation two ways — once with a
>>> vectorized loop, and then with a recursive function that contains a
>>> vectorized loop at its base case — and then compares the results of those
>>> two computations.  If it behaves correctly, both computations should
>>> produce the same result, and the program should produce no output.  But
>>> right now it seems that the recursive-function version produces roughly
>>> half incorrect results, in a repeating pattern of 4 correct results
>>> followed by 4 incorrect results.  (There are also some commented-out lines
>>> in the LLVM file, from my own testing of alternative implementations to
>>> confirm that the recurisve-function code is otherwise correct.)
>>>
>>> The crux seems to be that the recursive function, _Z7loopdacllPjl,
>>> takes a vector of 8 64-bit integers as one of its arguments.  There's no
>>> issue with such an argument in LLVM IR, but the generated assembly seems to
>>> be incorrect.  Examining the assembly file, it seems that
>>> _Z7loopdacllPjl loads this vector argument off the stack with a 64-byte
>>> reload (notably on line 78).  But before the call to _Z7loopdacllPjl
>>> from main (line 595), I only see a single 32-byte spill corresponding to
>>> this vector argument.  Hence, it seems that the vectorized loop in
>>> _Z7loopdacllPjl gets a vector half-filled with garbage values, leading
>>> to the observed misbehavior.
>>>
>>> I'm not familiar enough with LLVM's x86_64 backend to understand why it
>>> generates this particular assembly.  But the generated assembly seems
>>> incorrect to me.  Am I missing something?
>>>
>>> Please let me know if there's any other information you need from me.
>>>
>>> Cheers,
>>> TB
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200905/ee46390c/attachment.html>