[PATCH] D111088: [AArch64][GlobalISel] Fold 64-bit cmps with 64-bit adds

Tue Oct 5 15:08:40 PDT 2021

paquette added a comment.

So I've been trying to use the "widen G_ICMP to 64 bits" approach instead. Like I mentioned before, there were some code size regressions on CTMark with this approach. I've managed to get them down to just these 3 at -Os:

  test-suite :: CTMark/SPASS/SPASS.test          412964       413024       0.0%
  test-suite...TMark/7zip/7zip-benchmark.test    574664       574692       0.0%
  test-suite...-typeset/consumer-typeset.test    418756       418764       0.0%

Overall, the code size change for CTMark -Os looks like this:

  Program                                        outputPUlsd6 outputN4L9tx diff 
   test-suite :: CTMark/SPASS/SPASS.test          412964       413024       0.0%
   test-suite...TMark/7zip/7zip-benchmark.test    574664       574692       0.0%
   test-suite...-typeset/consumer-typeset.test    418756       418764       0.0%
   test-suite :: CTMark/Bullet/bullet.test        475356       475360       0.0%
   test-suite :: CTMark/kimwitu++/kc.test         435104       435104       0.0%
   test-suite...ark/tramp3d-v4/tramp3d-v4.test    368124       368124       0.0%
   test-suite :: CTMark/lencod/lencod.test        429860       429856      -0.0%
   test-suite...:: CTMark/ClamAV/clamscan.test    382552       382548      -0.0%
   test-suite...Mark/mafft/pairlocalalign.test    249104       249100      -0.0%
   test-suite...:: CTMark/sqlite3/sqlite3.test    286196       286176      -0.0%

A few things I've observed here:

1. Originally, I assumed the G_ANYEXT->G_ZEXT combine would only would be useful for legalization-related stuff. Turns out we end up with situations like this too:

  %s32_cmp = G_ICMP ...
  %s64_ext = G_ANYEXT %s32_cmp
  %s64_and = G_AND %s64_ext

The combine helps with this situation, which is unrelated to legalization. Probably worth keeping.

2. It seems like we need to do a similar transformation to the one in this patch regardless, because we //also// want to handle the following situation:

  %s64_cmp = G_ICMP ...
  %s32_cmp = G_TRUNC %s64_cmp
  %s32_add = G_ADD %s32_cmp ...

In this situation we want a 64-bit cmp instruction, but a 32-bit csinc instruction.

This is fine because the result of the cmp is dead; csinc just looks at the flags and increments.

So, similar to the transformation in this patch, we still have to walk through something. In this case, the `MatchCmp` function above ends up looking something like this:

  auto MatchCmp = [&](Register Reg) -> MachineInstr * {
    if (!MRI.hasOneNonDBGUse(Reg))
      return nullptr;
    MachineInstr *Def = getDefIgnoringCopies(Reg, MRI);
    unsigned Opc = Def->getOpcode();
    if (Opc == TargetOpcode::G_TRUNC)
      return getOpcodeDef(TargetOpcode::G_ICMP, Def->getOperand(1).getReg(),
                          MRI);
    if (Opc != TargetOpcode::G_ICMP)
      return nullptr;
    return Def;
  };

3. The code size regressions don't really make much sense to me. I'm seeing things like

  9	        b.lt    0x10006b3c8
  30	        add     x16, x0, x15, lsl #5
  31	        ldr     x17, [x16, #8]
  32	        cmp     x17, #0
  33	        cset    x16, eq
  34	        cbnz    x17, 0x10006b39c

appearing rather than

  b.ge    0x10006b39c

in the disassembled output. I'm assuming that this is a later pass' work; I can't find anywhere relating to branches in the selector that assumes we have a 32-bit compare.

Anyway, personally, I'm starting to think this patch is probably the simplest approach for this problem and maybe not as hacky as I initially thought. It turns out the apparently-cleaner approach involves a lot of the same work, but involves changing a lot more of the selector and potentially breaking assumptions later down the line.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111088/new/

https://reviews.llvm.org/D111088