[llvm-dev] Vectorizer has trouble with vpmovmskb and store
Craig Topper via llvm-dev
llvm-dev at lists.llvm.org
Mon Nov 26 19:00:39 PST 2018
We should handle this a lot better after r34763
~Craig
On Mon, Nov 26, 2018 at 3:13 PM Craig Topper <craig.topper at gmail.com> wrote:
> Here's a quick patch that fixes this. I don't know to avoid it in IR. I
> haven't checked any other tests, but it does fix your case. I'll try to put
> up a real phabricator tonight or tomorrow.
>
> diff --git a/lib/Target/X86/X86ISelLowering.cpp
> b/lib/Target/X86/X86ISelLowering.cpp
> index e31f2a6..d79c0be 100644
> --- a/lib/Target/X86/X86ISelLowering.cpp
> +++ b/lib/Target/X86/X86ISelLowering.cpp
> @@ -4837,6 +4837,11 @@ bool X86TargetLowering::isCheapToSpeculateCtlz()
> const {
>
> bool X86TargetLowering::isLoadBitCastBeneficial(EVT LoadVT,
> EVT BitcastVT) const {
> + if (!LoadVT.isVector() && BitcastVT.isVector() &&
> + BitcastVT.getVectorElementType() == MVT::i1 &&
> + !Subtarget.hasAVX512())
> + return false;
> +
> if (!Subtarget.hasDQI() && BitcastVT == MVT::v8i1)
> return false;
>
>
> ~Craig
>
>
> On Mon, Nov 26, 2018 at 2:51 PM Johan Engelen via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>> I've run into a case where the optimizer seems to be having trouble
>> doing the "obvious" thing.
>>
>> Consider this code:
>> ```
>> define i16 @foo(<8 x i16>* dereferenceable(16) %egress, <16 x i8> %a0) {
>> %a1 = icmp slt <16 x i8> %a0, zeroinitializer
>> %a2 = bitcast <16 x i1> %a1 to i16
>> %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 0,
>> i64 7
>> ;store i16 %a2, i16* %astore
>> ret i16 %a2
>> }
>> ```
>> The optimizer recognizes this and llc nicely outputs a vpmovmskb
>> instruction:
>> ```
>> foo: # @foo
>> vpmovmskb eax, xmm0
>> ret
>> ```
>>
>> Writing to the output vector also works well:
>> ```
>> define void @writing(<8 x i16>* dereferenceable(16) %egress, <16 x i8>
>> %a0) {
>> %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 0,
>> i64 7
>> store i16 123, i16* %astore
>> ret void
>> }
>> ```
>> outputs:
>> ```
>> writing: # @writing
>> mov word ptr [rdi + 14], 123
>> ret
>> ```
>>
>> Now, combining these two by uncommenting the store in `foo()` suddenly
>> results in a very large function, instead of just:
>> vpmovmskb eax, xmm0
>> mov word ptr [rdi + 14], ax
>> ret
>>
>> Is there something wrong with my IR code, or is the optimizer somehow
>> confused? Can I rewrite the code such that the optimizer does understand?
>>
>> Godbolt link: https://llvm.godbolt.org/z/OgExDk
>>
>> Thanks a lot for the help.
>> Cheers,
>> Johan
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181126/95fc98a1/attachment.html>
More information about the llvm-dev
mailing list