[PATCH] Distribute binary operators through SELECT_CC
Richard Sandiford
rsandifo at linux.vnet.ibm.com
Mon Jul 15 04:18:41 PDT 2013
Hi Duncan,
Thanks for the review.
Duncan Sands <duncan.sands at gmail.com> writes:
> On 28/06/13 12:49, Richard Sandiford wrote:
>> This patch transforms something like:
>>
>> (add (select_cc X, Y, 10, 20, CC), 10)
>>
>> into:
>>
>> (select_cc X, Y, 20, 30, CC)
>>
>> but generalised for a tree of select_ccs. It allows at most one of
>> the selected values to be non-constant.
>>
>> If this goes in, I'd like to follow it with patches to handle unary
>> operations and floating-point binary operations in the same way.
>>
>> The main motivation was to optimise operations on scalarised vector
>> selects. It also changed the output of the R600/alu-shift.ll test that
>> Tom removed earlier. Tom reckoned the new code was an improvement
>> (but defeated the original purpose of that test).
>
> while this looks OK (I have a few small nits to pick, see below), it's not good
> that the IR level optimizers don't get this. For general transformations of
> this type, the usual is (1) teach the IR level optimizers how to do it; (2) if,
> due to type legalization or whatever, this nonetheless still pops up a lot at
> the codegen level then do it in codegen too. In short, I'd like to see this be
> implemented at the IR level first (in which case it should probably look through
> phi nodes too).
Argh, it looks like I overreduced the testcase to the point where it already
passes. :-( These cases are already handled at the IR level by InstCombine.
Sorry for being careless.
The original idea was to tweak the code generated by llvmpipe.
I wanted to get to the point where:
define void @f1(<4 x i32> *%ptra, <4 x i32> *%ptrb) {
%a = load <4 x i32> *%ptra
%b = load <4 x i32> *%ptrb
%cmp = icmp eq <4 x i32> %a, %b
%ext = sext <4 x i1> %cmp to <4 x i32>
%and = and <4 x i32> %ext, <i32 2, i32 2, i32 2, i32 2>
store <4 x i32> %and, <4 x i32> *%ptra
ret void
}
produces the same code as:
define void @f2(<4 x i32> *%ptra, <4 x i32> *%ptrb) {
%a = load <4 x i32> *%ptra
%b = load <4 x i32> *%ptrb
%cmp = icmp eq <4 x i32> %a, %b
%res = select <4 x i1> %cmp, <4 x i32> <i32 2, i32 2, i32 2, i32 2>, <4 x i32> zeroinitializer
store <4 x i32> %res, <4 x i32> *%ptra
ret void
}
on targets where <4 x i32> isn't a legal type. At the moment, the first
version selects between 0 and -1 and then has separate ANDs for each
element. (Or, depending on the target, selects between 0 and 1,
sign extends it, and then ANDs it with 2.) The second version selects
directly between 0 and 2.
So like you say, the case I was originally looking at was one where the
select is introduced in codegen during legalisation.
This patch was supposed to be a step towards that goal, but given the
above is probably useless standalone. The idea was to include the SETCC
conversion as a separate patch, but I think I'll need to do it in one.
>> +/// Return null if N is a tree containing just constants and single-use selects.
>> +/// Otherwise return a node L such that replacing L with a constant would
>> +/// give such a tree.
>> +static SDNode *findNonConstLeaf(SDNode *N, unsigned Depth = 0) {
>> + if (N->getOpcode() == ISD::Constant || N->getOpcode() == ISD::ConstantFP)
>> + return 0;
>> +
>> + if (N->getOpcode() == ISD::SELECT_CC && N->hasOneUse() && Depth < 6) {
>
> Please turn the recursion limit into a named constant.
OK, I agree that would be better. The reason I'd chosen a hard-coded 6
is because that's what other DAG routines seem to do. E.g.:
// Don't recurse exponentially.
if (Depth > 6) return 0;
in DAGCombiner.cpp:isNegatibleForFree,
if (Depth > 6 || Aliases.size() == 2) {
in DAGCombiner::GatherAllAliases(), and:
if (Depth == 6)
return; // Limit search depth.
in SelectionDAG::ComputeMaskedBits(). Would it be better for these 6s
to be a single named constant, or should they be separate named constants
that happen to have the same value at the moment?
>> + SDNode *A = findNonConstLeaf(N->getOperand(2).getNode(), Depth + 1);
>> + if (!A)
>> + return findNonConstLeaf(N->getOperand(3).getNode(), Depth + 1);
>> + SDNode *B = findNonConstLeaf(N->getOperand(3).getNode(), Depth + 1);
>> + if (!B)
>> + return A;
>
> Since you always call findNonConstLeaf on both sides, this seems neater:
>
> SDNode *A = findNonConstLeaf(N->getOperand(2).getNode(), Depth + 1);
> SDNode *B = findNonConstLeaf(N->getOperand(3).getNode(), Depth + 1);
> if (!A)
> return B;
> if (!B)
> return A;
Yeah, true. I was trying to emphasise the tail call for !A, but that's the
compiler's job...
>> +SDValue DAGCombiner::distributeBinary(SDNode *N) {
>> + for (unsigned I = 0; I < 2; I++) {
>> + SDValue Other = N->getOperand(I ^ 1);
>> + if (Other.getOpcode() == ISD::Constant ||
>> + Other.getOpcode() == ISD::ConstantFP) {
>
> Since the DAG combiner should be normalizing constants onto the RHS, this loop
> shouldn't be necessary.
The idea was to cope with constant first operands in noncommutative operations,
e.g. (shl (select ..., 1, 0), 2)
Thanks,
Richard
More information about the llvm-commits
mailing list