[llvm-dev] Vectorizer has trouble with vpmovmskb and store
Johan Engelen via llvm-dev
llvm-dev at lists.llvm.org
Sat Dec 1 04:38:34 PST 2018
Hello Craig,
Thank you for the quick response and fix.
However, the improvement turns out to be quite fragile. If I run `opt` on
the original testcase, and run the output through `llc` then the previous
very long assembly output results. (things work for a bitcast from <16 x
i1> to i16, but not for a <16 x i1>* store)
Godbolt link: https://llvm.godbolt.org/z/j1ob9w
regards,
Johan
On Tue, Nov 27, 2018 at 4:00 AM Craig Topper <craig.topper at gmail.com> wrote:
> We should handle this a lot better after r34763
>
> ~Craig
>
>
> On Mon, Nov 26, 2018 at 3:13 PM Craig Topper <craig.topper at gmail.com>
> wrote:
>
>> Here's a quick patch that fixes this. I don't know to avoid it in IR. I
>> haven't checked any other tests, but it does fix your case. I'll try to put
>> up a real phabricator tonight or tomorrow.
>>
>> diff --git a/lib/Target/X86/X86ISelLowering.cpp
>> b/lib/Target/X86/X86ISelLowering.cpp
>> index e31f2a6..d79c0be 100644
>> --- a/lib/Target/X86/X86ISelLowering.cpp
>> +++ b/lib/Target/X86/X86ISelLowering.cpp
>> @@ -4837,6 +4837,11 @@ bool X86TargetLowering::isCheapToSpeculateCtlz()
>> const {
>>
>> bool X86TargetLowering::isLoadBitCastBeneficial(EVT LoadVT,
>> EVT BitcastVT) const {
>> + if (!LoadVT.isVector() && BitcastVT.isVector() &&
>> + BitcastVT.getVectorElementType() == MVT::i1 &&
>> + !Subtarget.hasAVX512())
>> + return false;
>> +
>> if (!Subtarget.hasDQI() && BitcastVT == MVT::v8i1)
>> return false;
>>
>>
>> ~Craig
>>
>>
>> On Mon, Nov 26, 2018 at 2:51 PM Johan Engelen via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi all,
>>> I've run into a case where the optimizer seems to be having trouble
>>> doing the "obvious" thing.
>>>
>>> Consider this code:
>>> ```
>>> define i16 @foo(<8 x i16>* dereferenceable(16) %egress, <16 x i8> %a0) {
>>> %a1 = icmp slt <16 x i8> %a0, zeroinitializer
>>> %a2 = bitcast <16 x i1> %a1 to i16
>>> %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64
>>> 0, i64 7
>>> ;store i16 %a2, i16* %astore
>>> ret i16 %a2
>>> }
>>> ```
>>> The optimizer recognizes this and llc nicely outputs a vpmovmskb
>>> instruction:
>>> ```
>>> foo: # @foo
>>> vpmovmskb eax, xmm0
>>> ret
>>> ```
>>>
>>> Writing to the output vector also works well:
>>> ```
>>> define void @writing(<8 x i16>* dereferenceable(16) %egress, <16 x i8>
>>> %a0) {
>>> %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64
>>> 0, i64 7
>>> store i16 123, i16* %astore
>>> ret void
>>> }
>>> ```
>>> outputs:
>>> ```
>>> writing: # @writing
>>> mov word ptr [rdi + 14], 123
>>> ret
>>> ```
>>>
>>> Now, combining these two by uncommenting the store in `foo()` suddenly
>>> results in a very large function, instead of just:
>>> vpmovmskb eax, xmm0
>>> mov word ptr [rdi + 14], ax
>>> ret
>>>
>>> Is there something wrong with my IR code, or is the optimizer somehow
>>> confused? Can I rewrite the code such that the optimizer does understand?
>>>
>>> Godbolt link: https://llvm.godbolt.org/z/OgExDk
>>>
>>> Thanks a lot for the help.
>>> Cheers,
>>> Johan
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181201/5c29155c/attachment.html>
More information about the llvm-dev
mailing list