[PATCH] D77804: [DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits (WIP)

Mon Jul 18 07:28:41 PDT 2022

uweigand added a comment.

IMO the `fun2` regression probably shouldn't block the patch from being merged.  I've looked into the sequences, and actually neither of them is even close to optimal.

Looking at the semantics, we have 8 x i32 inputs, which need to be truncated to i31, concatenated, and then stored, occupying 31 bytes of memory.  Memory is written via three 8-byte stores, followed by a 4-byte, a 2-byte, and a 1-byte store, which does look optimal to me.  However, the computation of the 64-bit values to be stored is not.

The first of these should be the value

  (A << 33) | ((B << 2) & 0x1fffffffc) | ((C >> 29) & 3)

where A, B, and C are the first three i32 inputs.

However, the computation being performed is more like

  ((A << 25) | ((B >> 6) & 0x01ffffff)) << 8
  | ((B << 58) | ((C & 0x7fffffff) << 27)) >> 56

which gets the correct result, but in about double the number of instructions or cycles that should be required.

While the variant with this PR is even slightly worse than the variant before, that's probably not really relevant given the fact both sequences are rather inefficient.   Ideally, we could fix this to get (close to) an optimal sequence, but that would be a different issue.  (I'm not even sure yet whether the current inefficiency is due to the middle end or the back end.)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77804/new/

https://reviews.llvm.org/D77804