[MachineCopyPropagation] Handle undef flags conservatively so that we do not remove copies that are useful after breaking some hardware dependencies

Quentin Colombet qcolombet at apple.com
Thu May 28 14:50:34 PDT 2015

Hi Pierre-Andre,

Thanks for the testcase.

I have to investigate a bit more, but I believe the copy propagation is not doing the right thing.
Indeed, the copy that is killed is used as undef, but only for the first few values, i.e., the value must be preserved as some point and the pass does check for that.

Let me check this is the issue and I’ll see how it can be fixed.


> On May 28, 2015, at 7:06 AM, Pierre-Andre Saulais <pierre-andre at codeplay.com> wrote:
> Hi Quentin,
> I think I have found a possible regression in LLVM that was revealed by one of your commits (r235647) which changes the handling of undefs by MachineCopyPropagation. This occurs on X86-64 with the following code:
> %vreg92<def> = IMPLICIT_DEF; VR128:%vreg92
> %vreg91<def,tied1> = PUNPCKLBWrr %vreg78<tied0>, %vreg92; VR128:%vreg91,%vreg78,%vreg92
> %vreg94<def> = IMPLICIT_DEF; VR128:%vreg94
> %vreg93<def,tied1> = PUNPCKLWDrr %vreg91<tied0>, %vreg94; VR128:%vreg93,%vreg91,%vreg94
> %vreg95<def,tied1> = PSLLDri %vreg93<tied0>, 31; VR128:%vreg95,%vreg93
> %vreg96<def,tied1> = PSRADri %vreg95<tied0>, 31; VR128:%vreg96,%vreg95 
> Later on the IMPLICIT_DEFs are turned into <undef> which, after your changes, causes MachineCopyPropagation to remove one copy:
> MOVAPSmr %RSP, 1, %noreg, 16, %noreg, %XMM0<kill>; mem:ST16[FixedStack11]
> %XMM2<def> = MOVAPSrm %RSP, 1, %noreg, 160, %noreg; mem:LD16[FixedStack2]
> %XMM0<def> = KILL %XMM2 ; This was COPY before MachineCopyPropagation
> %XMM1<def> = COPY %XMM2
> %XMM2<def,tied1> = PUNPCKLBWrr %XMM2<kill,tied0>, %XMM0<undef>
> %XMM2<def,tied1> = PUNPCKLWDrr %XMM2<kill,tied0>, %XMM0<undef>
> %XMM2<def,tied1> = PSLLDri %XMM2<kill,tied0>, 31
> %XMM2<def,tied1> = PSRADri %XMM2<kill,tied0>, 31
> One of our test that was previously passing now fails, and the removed COPY is the only difference in the generated code I can see. Looking at only the code above it seems that the copy is not needed, which is strange.
> I have reduced the IR that exhibits this issue to a manageable size and created a .ll file for testing. When I run this file with lli using the interpreter it passes, same with the JIT on ARM. It fails however on X86-64 using the JIT. Reverting your changes, it passes on X86-64 using the JIT.
> Do you think that it's a bug in the X86 target that was revealed by your changes or that MachineCopyPropagation is doing something wrong?
> Thanks,
> Pierre-Andre
> Pierre-Andre Saulais
Principal Software Engineer (Compilers)
> <test_sext_v16i1_punpcklbw_machine_cp_arm.ll><test_sext_v16i1_punpcklbw_machine_cp_x86-64.ll>

