[llvm-dev] MachineCSE of copy instructions

Matt Arsenault via llvm-dev llvm-dev at lists.llvm.org
Wed Sep 23 15:13:16 PDT 2015


Hi,

I noticed that MachineCSE::isCSECandidate does not consider COPY instructions as CSE candidates and I’m wondering why. I would expect COPY to the the best way to enable target independent optimizations to work.

We have to process instructions after instruction selection to make sure their operands satisfy a few restrictions based on the register classes of the operands. Sometimes the same copy will be inserted to the required register class if multiple instructions need the same operand legalized, but these aren’t getting eliminated as expected.

In this example, we need to insert a copy for the src1/%b operand of each FMA.

define void @test_s0_s1_k(float addrspace(1)* %out, float %a, float %b) #0 {
  %fma0 = call float @llvm.fma.f32(float %a, float %b, float 1024.0) #1
  %fma1 = call float @llvm.fma.f32(float %a, float %b, float 4096.0) #1
  store volatile float %fma0, float addrspace(1)* %out
  store volatile float %fma1, float addrspace(1)* %out
  ret void
}

A COPY is inserted for when processing each instruction’s operands:

%vreg12<def> = COPY %vreg4; VGPR_32:%vreg12 SGPR_32:%vreg4
%vreg11<def> = V_FMA_F32 0, %vreg3, 0, %vreg12, 0, %vreg10, 0, 0, %EXEC<imp-use>; VGPR_32:%vreg11,%vreg12,%vreg10 SGPR_32:%vreg3

%vreg15<def> = COPY %vreg4; VGPR_32:%vreg15 SGPR_32:%vreg4
%vreg14<def> = V_FMA_F32 0, %vreg3, 0, %vreg15, 0, %vreg13, 0, 0, %EXEC<imp-use>; VGPR_32:%vreg14,%vreg15,%vreg13 SGPR_32:%vreg3

Which ends up getting emitted as:

v_mov_b32_e32 v1, s0
v_mov_b32_e32 v2, s0  // redundant copy of s0
v_fma_f32 v0, s2, v2, v0
v_fma_f32 v1, s2, v1, v2

I would expect the redundant copy to be eliminated, but it is not. If I remove the  MI->isCopyLike() restriction, it is CSEd as expected in this case and others (although a variety of tests break mostly with assertions).

Also if I modify the operand legalization to insert the v_mov_b32_e32 instruction directly, it is also correctly CSE’d. However, I would expect inserting COPY would be more ideal since it will allow the PeepholeOptimizer and other passes to optimize the copies. Why is this restriction there? Would it be possible to fix MachineCSE to support copies and add a target option for them? There might not be a reason to avoid emitting the v_mov_b32 right away, but for 64-bit copies it requires emitting 2 instructions so it’s more convenient to emit the COPY and have that be split later.

-Matt


More information about the llvm-dev mailing list