[llvm-dev] Vectorizer has trouble with vpmovmskb and store

Craig Topper via llvm-dev llvm-dev at lists.llvm.org
Mon Nov 26 15:13:32 PST 2018


Here's a quick patch that fixes this. I don't know to avoid it in IR. I
haven't checked any other tests, but it does fix your case. I'll try to put
up a real phabricator tonight or tomorrow.

diff --git a/lib/Target/X86/X86ISelLowering.cpp
b/lib/Target/X86/X86ISelLowering.cpp
index e31f2a6..d79c0be 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -4837,6 +4837,11 @@ bool X86TargetLowering::isCheapToSpeculateCtlz()
const {

 bool X86TargetLowering::isLoadBitCastBeneficial(EVT LoadVT,
                                                 EVT BitcastVT) const {
+  if (!LoadVT.isVector() && BitcastVT.isVector() &&
+      BitcastVT.getVectorElementType() == MVT::i1 &&
+      !Subtarget.hasAVX512())
+    return false;
+
   if (!Subtarget.hasDQI() && BitcastVT == MVT::v8i1)
     return false;


~Craig


On Mon, Nov 26, 2018 at 2:51 PM Johan Engelen via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi all,
>   I've run into a case where the optimizer seems to be having trouble
> doing the "obvious" thing.
>
> Consider this code:
> ```
> define i16 @foo(<8 x i16>* dereferenceable(16) %egress, <16 x i8> %a0) {
>     %a1 = icmp slt <16 x i8> %a0, zeroinitializer
>     %a2 = bitcast <16 x i1> %a1 to i16
>     %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 0,
> i64 7
>     ;store i16 %a2, i16* %astore
>     ret i16 %a2
> }
> ```
> The optimizer recognizes this and llc nicely outputs a vpmovmskb
> instruction:
> ```
> foo: # @foo
>     vpmovmskb eax, xmm0
>     ret
> ```
>
> Writing to the output vector also works well:
> ```
> define void @writing(<8 x i16>* dereferenceable(16) %egress, <16 x i8> %a0)
> {
>     %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 0,
> i64 7
>     store i16 123, i16* %astore
>     ret void
> }
> ```
> outputs:
> ```
> writing: # @writing
>     mov word ptr [rdi + 14], 123
>     ret
> ```
>
> Now, combining these two by uncommenting the store in `foo()` suddenly
> results in a very large function, instead of just:
>     vpmovmskb eax, xmm0
>     mov word ptr [rdi + 14], ax
>     ret
>
> Is there something wrong with my IR code, or is the optimizer somehow
> confused? Can I rewrite the code such that the optimizer does understand?
>
> Godbolt link: https://llvm.godbolt.org/z/OgExDk
>
> Thanks a lot for the help.
> Cheers,
>   Johan
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181126/c51293a2/attachment.html>


More information about the llvm-dev mailing list