[PATCH] D48223: Allow binop C1, (select cc, CF, CT) -> select folding

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 18 14:58:39 PDT 2018


rampitec added inline comments.


================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1883-1889
+  unsigned SelOpNo = 0;
   SDValue Sel = BO->getOperand(0);
+  if (Sel.getOpcode() != ISD::SELECT || !Sel.hasOneUse()) {
+    SelOpNo = 1;
+    Sel = BO->getOperand(1);
+  }
+
----------------
rampitec wrote:
> spatel wrote:
> > rampitec wrote:
> > > rampitec wrote:
> > > > spatel wrote:
> > > > > rampitec wrote:
> > > > > > spatel wrote:
> > > > > > > This change is independent of the and/or enhancement. We're now allowing folding when the binop has a constant operand 0. That seems like a good enhancement for non-commutative binops, but it should have its own tests using opcodes besides 'and'/'or' and be split into its own patch.
> > > > > > > 
> > > > > > > Example based on one of the original tests from rL296699:
> > > > > > > 
> > > > > > > ```
> > > > > > > define <2 x double> @sel_constants_fmul_constant_vec(i1 %cond) {
> > > > > > >   %sel = select i1 %cond, <2 x double> <double -4.0, double 12.0>, <2 x double> <double 23.3, double 11.0>
> > > > > > >   %bo = fsub <2 x double> <double 5.1, double 3.14>, %sel
> > > > > > >   ret <2 x double> %bo
> > > > > > > }
> > > > > > > 
> > > > > > > ```
> > > > > > I am not sure why this would be beneficial. For example:
> > > > > > 
> > > > > > ```
> > > > > > sub (select 0, -1), x -> select (sub 0, x), (sub -1, x)
> > > > > > add (select 0, -1), x -> select x, (add -1, x)
> > > > > > 
> > > > > > ```
> > > > > > In this case select would remain and two subs instead of one, seems worse.
> > > > > > For add both select and add would remain, so the benefit is not obvious.
> > > > > > 
> > > > > > In turn and/or have clear benefit in this change because binop completely goes away, and 
> > > > > > In fact the same can be done to xor, but I have decided not to since xor would be needed anyway with a non-zero operand (and one would never have a select with both sizes zero).
> > > > > It's beneficial for exactly the same reason as the existing fold - we eliminate the binop. The fact that the case with constant operand 0 isn't already folded was just an oversight in the original patch. 
> > > > > 
> > > > > Currently, this patch miscompiles on these cases, so we need more tests and a code change whether we treat that difference as part of this patch or not.
> > > > > 
> > > > > Here's a scalar example for Aarch64 in case it makes it easier to see:
> > > > > 
> > > > > ```
> > > > > define i32 @sel_constants_sub_constant_op0(i1 %cond) {
> > > > >   %sel = select i1 %cond, i32 12, i32 42
> > > > >   %bo = sub i32 500, %sel
> > > > >   ret i32 %bo
> > > > > }
> > > > > 
> > > > > ```
> > > > > Current codegen:
> > > > > 
> > > > > ```
> > > > > 	tst	w0, #0x1
> > > > > 	mov	w8, #42
> > > > > 	orr	w9, wzr, #0xc
> > > > > 	csel	w8, w9, w8, ne
> > > > > 	mov	w9, #500
> > > > > 	sub	w0, w9, w8
> > > > > 	ret
> > > > > ```
> > > > > 
> > > > > And with this patch applied (miscompile):
> > > > > 
> > > > > ```
> > > > > 	tst	w0, #0x1
> > > > > 	mov	w8, #-458  ; 500 - 42 != -458
> > > > > 	mov	w9, #-488  ; 500 - 12 != -488
> > > > > 	csel	w0, w9, w8, ne
> > > > > 	ret
> > > > > 
> > > > > ```
> > > > Yes, there is a bug. I will fix it.
> > > Thanks for catching this! The confusion on my side was about "constant operand 0". I have misread it as "constant zero" instead of "operand zero". I will update diff shortly.
> > Ah...I can see how that was confusing. :)
> > 
> > I think this patch is correct now, but let me make a couple of requests:
> > 1. Commit the new PPC and x86 tests with assertions based on trunk (ie, without this patch applied). It's easier to review if we can see the before/after changes directly in this review. I'd do the same for the AMDGPU tests too, but I'm not qualified to review those diffs anyway.
> > 2. Split the and/or enhancement into a follow-up patch. IIUC, that's independent of allowing the more flexible constant matching.
> Will do. I will not do so with amdgpu tests as these are not autogenerated, but will do it with x86 and PPC.
Baseline tests committed https://reviews.llvm.org/rL334987


https://reviews.llvm.org/D48223





More information about the llvm-commits mailing list