[PATCH] D45173: [InstCombine] Recognize idioms for ctpop and ctlz

Wed Apr 4 07:33:22 PDT 2018

spatel added a comment.

In https://reviews.llvm.org/D45173#1056850, @kparzysz wrote:

> I don't know what the long-term strategy is for dealing with the interactions between pattern-recognition code and instcombine.  This is a reoccurring issue for the polynomial multiplication code in HexagonLoopIdiomRecognition and it likely to affect any code of that nature.

I don't know if there's an actual strategy. There's no formal definition of 'canonical IR' AFAIK, so we continue to simplify code via peepholes in instcombine. Anything downstream of that has to adjust to those changes. I've dealt with that many times as an interaction between instcombine and DAG combine.

Let's look at pop32 as an example. We have 3 non-loop variations to consider so far IIUC: (a) all add ops (Hacker's Delight), (b) replace first add+mask with sub, and (c) replace ending mask+shift+add with multiply.

As IR, these are:

  define i32 @pop32_all_adds(i32 %x) {
    %v0 = and i32 %x, 1431655765
    %v1 = lshr i32 %x, 1
    %v2 = and i32 %v1, 1431655765
    %v3 = add nuw i32 %v0, %v2
    %v4 = and i32 %v3, 858993459
    %v5 = lshr i32 %v3, 2
    %v6 = and i32 %v5, 858993459
    %v7 = add nuw nsw i32 %v4, %v6
    %v8 = and i32 %v7, 117901063
    %v9 = lshr i32 %v7, 4
    %v10 = and i32 %v9, 117901063
    %v11 = add nuw nsw i32 %v8, %v10
    %v12 = and i32 %v11, 983055
    %v13 = lshr i32 %v11, 8
    %v14 = and i32 %v13, 983055
    %v15 = add nuw nsw i32 %v12, %v14
    %v16 = and i32 %v15, 31
    %v17 = lshr i32 %v15, 16
    %v18 = add nuw nsw i32 %v16, %v17
    ret i32 %v18
  }

  define i32 @pop32_sub(i32 %x) {
    %shr = lshr i32 %x, 1
    %and = and i32 %shr, 1431655765
    %sub = sub i32 %x, %and
    %shr1 = lshr i32 %sub, 2
    %and2 = and i32 %shr1, 858993459
    %and3 = and i32 %sub, 858993459
    %add = add nuw nsw i32 %and2, %and3
    %shr4 = lshr i32 %add, 4
    %add5 = add nuw nsw i32 %shr4, %add
    %and6 = and i32 %add5, 252645135
    %shr7 = lshr i32 %and6, 16
    %add8 = add nuw nsw i32 %shr7, %and6
    %shr9 = lshr i32 %add8, 8
    %add10 = add nuw nsw i32 %shr9, %add8
    %and11 = and i32 %add10, 63
    ret i32 %and11
  }

  define i32 @pop32_mul(i32 %x) {
    %shr = lshr i32 %x, 1
    %and = and i32 %shr, 1431655765
    %sub = sub i32 %x, %and
    %and1 = and i32 %sub, 858993459
    %shr2 = lshr i32 %sub, 2
    %and3 = and i32 %shr2, 858993459
    %add = add nuw nsw i32 %and3, %and1
    %shr4 = lshr i32 %add, 4
    %add5 = add nuw nsw i32 %shr4, %add
    %and6 = and i32 %add5, 252645135
    %mul = mul i32 %and6, 16843009
    %shr7 = lshr i32 %mul, 24
    ret i32 %shr7
  }

First, do we have consensus on which of these is canonical? Generally, we prefer the form with less instructions (pop32_mul), but is the instruction count reduction justified by using a mul?
Second, can we add instcombines that would reduce one or more of these to another form? If so, let's add those. If not, then this pass needs to match all of those forms to be effective (but that doesn't have to happen in one patch of course).

Repository:
  rL LLVM

https://reviews.llvm.org/D45173