[PATCH] D107677: Prevent machine licm if remattable with a vreg use

Thu Aug 12 05:00:44 PDT 2021

dmgreen added a comment.

I wouldn't worry too much about VCTP's. There are generated in two ways - one from the vectorizer in which case they are pretty glued in place, not able to be  moved much. They are also created from intrinsics, which are the cases I was seeing them behave differently. They will be rarer and still only used in certain situations, but will be more free to move. We have many more tests there than we would in the llvm tests, so it's not surprising that there are not any test that change.

>From those results, I would say the on average this change is flat - some improvements, some decreases, mostly balances out - except for one case. That's a matrix multiply kernel that used to do this:

  outerloop:
    VCTP r0
    ...
  innerloop:
    LDR      r1,[sp,#0x74]       
    VLDRW.U32 q3,[r0],#16        
    VLDRW.U32 q2,[r1,q1,UXTW #2] 
    VRMLALVHA.S32 r6,r7,q3,q2    
    VLDRW.U32 q3,[r12],#16
    VRMLALVHA.S32 r4,r5,q3,q2
    VLDRW.U32 q3,[r11],#16
    VRMLALVHA.S32 r2,r9,q3,q2
    VLDRW.U32 q3,[r8],#16
    VRMLALVHA.S32 r10,r3,q3,q2
    LDR      r1,[sp,#0x70]
    VADD.I32 q1,q1,r1
    LE       lr,#innerloop
  ...

And now has a bad day:

  outerloop:
  ...
  innerloop
    LDR      r9,[sp,#0x74]
    VLDRW.U32 q3,[r0],#16
    VLDRW.U32 q2,[r9,q1,UXTW #2]
    VRMLALVHA.S32 r12,r7,q3,q2
    VLDRW.U32 q3,[r6],#16
    VRMLALVHA.S32 r10,r5,q3,q2
    VLDRW.U32 q3,[r11],#16
    VRMLALVHA.S32 r4,r1,q3,q2
    VLDRW.U32 q3,[r8],#16
    VRMLALVHA.S32 r2,r3,q3,q2
    MOV      r9,r7
    MOV      r7,r5
    MOV      r5,r4
    MOV      r4,r2
    MOV      r2,r1
    MOV      r1,r3
    LDR      r3,[sp,#0x70]
    VADD.I32 q1,q1,r3
    MOV      r3,r1
    MOV      r1,r2
    MOV      r2,r4
    MOV      r4,r5
    MOV      r5,r7
    MOV      r7,r9
    LE       lr,#innerloop
  ...
    VCTP lr

I guess this patch can't be blamed for the register allocator going haywire :-)

Taking a step back, my understanding of "Trivial Rematerialization" comes from the definition above isTriviallyReMaterializable:

  /// Return true if the instruction is trivially rematerializable, meaning it
  /// has no side effects and requires no operands that aren't always available.
  /// This means the only allowed uses are constants and unallocatable physical
  /// registers so that the instructions result is independent of the place
  /// in the function.
  bool isTriviallyReMaterializable(const MachineInstr &MI,
                                   AAResults *AA = nullptr) const {

So they are expected to only have operands that are available everywhere in the program. That is what makes them trivial to rematerialize. It sounds like D105742 <https://reviews.llvm.org/D105742> (and D87280 <https://reviews.llvm.org/D87280> although I'm not sure it should have) has changed that definition to now include instruction that include virtual uses. Non-trivial rematerialization, if you will. And it is now the responsibility of the caller to ensure that the virtual uses are valid at the point it is being moved. That is what D106396 <https://reviews.llvm.org/D106396> was fixing, and what D106408 <https://reviews.llvm.org/D106408> is extending. Does that sound about right so far?

If so can we update the docs to match the new behaviour? I'm not sure I would really count it as "trivially rematerializable" anymore, but I don't have a better name for it. From there moving the profitability check into Machine LICM sounds like a fine idea.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107677/new/

https://reviews.llvm.org/D107677