[PATCH] D48278: [SelectionDAG] Fold redundant masking operations of shifted value

Fri Jul 13 01:00:06 PDT 2018

dnsampaio added a comment.

In https://reviews.llvm.org/D48278#1155793, @spatel wrote:

> In https://reviews.llvm.org/D48278#1155779, @spatel wrote:
>
> > In https://reviews.llvm.org/D48278#1155560, @dnsampaio wrote:
> >
> > > It reduces the number of computation operations, from 3 to 2, and the number of constants kept int constants for performing the masking, from 2 to 1.
> > >  I don't see how it increases the latency. If you are going to perform the masking and the shift anyway.
> >
> >
> > Ah, I see that now. But I'm not convinced this is the right approach. Why are we waiting to optimize this in the backend? This is a universally good optimization, so it should be in IR:
> >  https://rise4fun.com/Alive/O04
> >
> > I'm not sure exactly where that optimization belongs. Ie, is it EarlyCSE, GVN, somewhere else, or is it its own pass? But I don't see any benefit in waiting to do this in the DAG.
>
>
> This also raises a question that has come up in another review recently - https://reviews.llvm.org/D41233. If we reverse the canonicalization of shl+and, we would solve the most basic case that I showed above:
>
>   define i32 @shl_first(i32 %a {
>     %t2 = shl i32 %a, 8
>     %t3 = and i32 %t2, 44032
>     ret i32 %t3
>   }
>  
>   define i32 @mask_first(i32 %a) {
>     %a2 = and i32 %a, 172
>     %a3 = shl i32 %a2, 8
>     ret i32 %a3
>   }
>  
>

This "canonicalization" won't help to prevent even basic duplicated masked values when using a lshr:

  %0 = sext i16 %a to i32
  %1 = lshr i32 %0, 8
  %2 = and i32 %1, 172
  %3 = and i32 %0, 44032

And a simplest case, that it is already in the test case, that won't be handled in the IR level:
define i32 @ror(i32 %a) {
entry:

  %m2 = and i32 %a, 3855
  %shl = shl i32 %a, 24
  %shr = lshr i32 %a, 8
  %or = or i32 %shl, %shr
  %m1 = and i32 %or, 251658255
  %or2 = or i32 %m1, %m2
  ret i32 %or2

}

The shl shr instructions become a ror that masks the same masked value.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:5986
+  const auto &MaskedValue = dyn_cast<SDNode>(MASKED);
+  for (SDNode *OtherUser : MaskedValue->uses()) {
+    if ((&(*OtherUser) == ShiftAmount) || (OtherUser->getOpcode() != ISD::AND))
----------------
efriedma wrote:
> Walking the uses of an SDNode can get expensive... is this O(N^2) in the number of uses of MaskedValue?  If not, please add a comment explaining why it isn't.
Ok, will change it as to work only with MaskedValue used by a shift and a AND operations.

https://reviews.llvm.org/D48278