[PATCH][x86] Teach how to fold target specific dags for packed shifts by immedate count into build_vector.

Fri Dec 27 12:32:16 PST 2013

Thanks!
Committed revision 198084.

On Fri, Dec 27, 2013 at 6:50 PM, Nadav Rotem <nrotem at apple.com> wrote:
> The patch looks really good. Please commit. Thank you Andrea!
>
> Sent from my iPhone
>
>> On Dec 27, 2013, at 9:52, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
>>
>> Hi Nadav,
>>
>>> On Thu, Dec 26, 2013 at 6:57 AM, Nadav Rotem <nrotem at apple.com> wrote:
>>> Overall the patch LGTM.  Do we have something like that for the non-target-specific code?  If not then we should move it there and figure out a way to handle the target-specific x86 nodes.
>>
>> Thanks for the feedback!
>> I created a new patch that fixes the original poor code-gen bug in the
>> target independent part.
>> I managed to add a new rule that teaches the DAGCombiner how to fold a
>> sext_inreg of a build_vector of ConstantSDNodes (or UNDEFs) into a
>> build_vector.
>>
>> The differences with respect to my previous patch are:
>> 1) this new patch does not handle the target specific packed vector
>> shift nodes;
>> 2) I had to touch test test/CodeGen/X86/blend-msb.ll (see below).
>>
>> About point 1.
>> For simplicity I plan to send a separate patch that adds target
>> specific combine rules to fold packed vector shifts by immediate count
>> (VSHLI/VSRLI/VSRAI) into a build_vector in the case where the vector
>> to shift is a vector of constants (or undefs).
>>
>> About point 2.
>> With this fix, we now generate a slightly different (but still valid)
>> mask for the 'blendvps' instruction expected by test
>> test/CodeGen/X86/blend-msb.ll.
>>
>> -;CHECK: movl $-2147483648
>> +;CHECK: movl $-1
>>
>> The test originally expected a Mask of $-2147483648 in input to the
>> 'blendvps'; with my patch however however we produce $-1.
>> The two masks are both ok; the difference is that before we were able
>> to simplify the constant value for the mask knowing that 'blendvps'
>> only cares about specific bits in the sequence.
>>
>> The target specific VSELECT combine uses method SimplifyDemandedBits
>> to 'simplify' the value of the mask in input to the blend.
>> It looks like method 'TargetLowering::SimplifyDemandedBits' knows how
>> to simplify the Constant Mask in input to a vector SELECT node when it
>> is generated by a SIGN_EXTEND_INREG; it doesn't seem to know how to
>> handle the case where the Mask in input is a BUILD_VECTOR.
>> I decided for now to fix the test modifying the expected number in
>> input to the movl.
>>
>>> +// Return true if this is a BUILD_VECTOR of ConstantSDNodes.
>>> +// This is a helper function used by `getTargetVShiftByConstNode`.
>>> +static bool isBuildVectorOfConstantSDNodes(SDValue N) {
>>> +  if (N->getOpcode() != ISD::BUILD_VECTOR)
>>> +    return false;
>>> +
>>> +  for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i)
>>> +    if (!isa<ConstantSDNode>(N->getOperand(i)))
>>> +      return false;
>>> +  return true;
>>> +}
>>> +
>>>
>>> You can also handle UNDEFs here.
>>
>> I moved this function in the ISD namespace into
>> lib/CodeGen/SelectionDAG/SelectionDAG.cpp and now it also accepts
>> UNDEFs.
>>
>> Please let me know what you think about this new patch and if it is ok
>> to submit.
>>
>> Thanks!
>> Andrea
>> <patchv2.diff>