[llvm-dev] Help me modify llvm optimizer to eliminate valgrind false positive?

Craig Topper via llvm-dev llvm-dev at lists.llvm.org
Sat Feb 27 13:31:35 PST 2021


Hi Eyal,

LLVM knows that those bits are garbage and did this consciously. But it
does create false execution dependency on xmm3. Potentially delaying the
execution depending on what instruction is producing xmm3. This is
definitely not the only case that this can happen.

This is probably the easiest fix which avoids ever creating the false
dependency by recognizing that there is only one real input and assigning
it to both sources.

*--- a/llvm/lib/Target/X86/X86InstrSSE.td*

*+++ b/llvm/lib/Target/X86/X86InstrSSE.td*

@@ -3917,6 +3917,26 @@ let Constraints = "$src1 = $dst" in {

 }

 } // ExeDomain = SSEPackedInt


+let Predicates = [UseSSE2] in {
+def : Pat <(v16i8 (X86Unpckl undef, VR128:$src)),
+           (PUNPCKLBWrr VR128:$src, VR128:$src)>;
+def : Pat <(v8i16 (X86Unpckl undef, VR128:$src)),
+           (PUNPCKLWDrr VR128:$src, VR128:$src)>;
+def : Pat <(v4i32 (X86Unpckl undef, VR128:$src)),
+           (PUNPCKLDQrr VR128:$src, VR128:$src)>;
+def : Pat <(v2i64 (X86Unpckl undef, VR128:$src)),
+           (PUNPCKLQDQrr VR128:$src, VR128:$src)>;
+
+def : Pat <(v16i8 (X86Unpckh undef, VR128:$src)),
+           (PUNPCKHBWrr VR128:$src, VR128:$src)>;
+def : Pat <(v8i16 (X86Unpckh undef, VR128:$src)),
+           (PUNPCKHWDrr VR128:$src, VR128:$src)>;
+def : Pat <(v4i32 (X86Unpckh undef, VR128:$src)),
+           (PUNPCKHDQrr VR128:$src, VR128:$src)>;
+def : Pat <(v2i64 (X86Unpckh undef, VR128:$src)),
+           (PUNPCKHQDQrr VR128:$src, VR128:$src)>;
+}
+

Unfortunately, this can introduce extra mov instructions to copy the input
register if it is used again by a later instruction. For example

 define <16 x i8> @test_remconstant_16i8(<16 x i8> %a) nounwind {

 ; SSE2-LABEL: test_remconstant_16i8:

 ; SSE2:       # %bb.0:

+; SSE2-NEXT:    movdqa %xmm0, %xmm1

 ; SSE2-NEXT:    punpckhbw {{.*#+}} xmm1 =
xmm1[8],xmm0[8],xmm1[9],xmm0[9],xmm1[10],xmm0[10],xmm1[11],xmm0[11],xmm1[12],xmm0[12],xmm1[13],xmm0[13],xmm1[14],xmm0[14],xmm1[15],xmm0[15]

Note, there's a weird quirk that only the tied source uses the result of
the movdqa. The other source is still using xmm0.

On CPUs without move elimination, pre-Ivybridge for Intel) this potentially
adds an extra delay to the execution of the punpck to wait for the movdqa
to complete. But depending on the rest of the code in the function that
might be better than the false dependency.

My other thought would be to do something different for undef sources
in TwoAddressInstructionPass::collectTiedOperands. Rather than setting it
to the destination virtual register, if there is another source with the
same register class with a kill flag set, assign the undef vreg to that and
keep the tied pair. This would do the same thing as the change in
X86InstrSSE.td above, but applied much later and would be limited to cases
where the source register isn't used later. But valgrind could still have
an issue for code that doesn't get fixed due to an additional use of the
register.

~Craig


On Sat, Feb 27, 2021 at 9:53 AM Eyal Soha via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> I spent a few days debugging a valgrind memcheck false positive that was
> introduced by a new optimization in clang version 10.
>
> Valgrind is mistakenly reporting an error due to the way that clang 10 is
> emitting asm.  But it's a trivial matter to fix the code.  I just needed to
> change this:
>
>   401697:       66 0f 6e d2             movd   %edx,%xmm2
>   40169b:       66 0f 60 d2             punpcklbw %xmm2,%xmm2
>   40169f:       66 0f 61 da             punpcklwd %xmm2,%xmm3
>
> to this:
>
>   4016a3:       66 0f 6e da             movd   %edx,%xmm3
>   4016a7:       66 0f 60 db             punpcklbw %xmm3,%xmm3
>   4016ab:       66 0f 61 db             punpcklwd %xmm3,%xmm3
>
> So instead of using $xmm2 and then $xmm3 just use $xmm3.  This was a tiny
> change and I was able to modify the executable to do it.
>
> Is it possible to modify clang/llvm to do this where possible?  Valgrind
> is a popular tool for finding bugs in code and if it's not too much trouble
> for llvm and there's no performance penalty, maybe LLVM could accommodate?
>
> I'm willing to do the work but I'm not sure where to start.help but I'm
> not sure where to start.
>
> There are more details about the issue in the Valgrind bug report:
> https://bugs.kde.org/show_bug.cgi?id=432801#c12
>
> Thanks for your help!
>
> Eyal
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210227/41dbc3ff/attachment.html>


More information about the llvm-dev mailing list