[llvm-commits] [PATCH] ARM/NEON: Custom lower ctpop for appropriate vector types

Tue Dec 4 06:44:27 PST 2012

Hi Evan,

Thanks for the review.  Currently, this only enables HW support for vector
types, not scalars.  It would be easy enough to add scalar support in.
Should I roll that into this patch?  Or put it in a following patch?
If you'd like a following patch, would you mind committing this one for me?
Thanks!

Pete

On Mon, Dec 3, 2012 at 6:29 PM, Evan Cheng <evan.cheng at apple.com> wrote:

> I don't think use Pat<> buys anything here. The C++ lowering code is
> actually much more readable.
>
> The patch looks fine to me. Please
> add ScalarTargetTransformInfo::PopcntHwSupport() to take advantage of the
> popcount loop idiom recognition support (See r168931). ARM crafty score
> should improve!
>
> Evan
>
> On Nov 30, 2012, at 11:37 PM, Pete Couperus <pjcoup at gmail.com> wrote:
>
> Hi Anton,
>
> I don't think this will work cleanly using Pat<>.  The custom lowering
> uses a bitcast, which gives me an unrecoverable error for using bitconvert
> in an output pattern, or a non-fatal (?) Type inference contradiction error.
>
> Assuming there was some way around this, the unoptimized pattern for the
> v4i16 case looks something like:
> def : Pat<(v4i16 (ctpop DPR:$Dm)),
>           (v4i16 (EXTRACT_SUBREG
>             (v8i16 (VMOVLuv8i16
>               (v8i8  (VUZPd8
>                 (v8i8 (IMPLICIT_DEF)),
>                 (v8i8  (VADDv8i8
>                   (v8i8  (VREV16d8
>                     (v8i8 (VCNTd (v8i8 DPR:$Dm)))
>                   )),
>                   (v8i8 (VCNTd (v8i8 DPR:$Dm)))
>                 )),
>                 (v8i8  (VADDv8i8
>                   (v8i8  (VREV16d8
>                     (v8i8 (VCNTd (v8i8 DPR:$Dm)))
>                   )),
>                   (v8i8 (VCNTd (v8i8 DPR:$Dm)))
>                 ))
>               ))
>             )), dsub_0
>           ))>;
>
> This actually builds with the Type inference errors, and gives the right
> assembly.
> As is, I think this is more clearly expressed in the code.
> So, unless there are ways to deal with these issues in tablegen that I
> haven't thought of, it needs to be lowered in the code.
> Was fun to try though :).  Any other comments?
>
> Pete
>
>
> On Fri, Nov 30, 2012 at 6:05 AM, Pete Couperus <pjcoup at gmail.com> wrote:
>
>> Hi Anton,
>>
>> Thanks for taking a look.  I had trouble with complex patterns for other
>> similar items, but I'll take another look at this case if that actually
>> sounds feasible.
>> Looking at LowerCTTZ, using Pat<> would be the same for cttz, although
>> its lowering is much simpler, correct?  Is there guidelines for when to use
>> Pat<> and when to lay it out in the code?
>> Thanks again for taking a look!
>>
>> Pete
>>
>>
>>
>>
>> On Thu, Nov 29, 2012 at 9:53 PM, Anton Korobeynikov <
>> anton at korobeynikov.info> wrote:
>>
>>> Pete,
>>>
>>> > The population count intrinsic (ctpop) is supported on ARM/NEON for
>>> v8i8 and
>>> > v16i8 types via NEON's vcnt instruction.
>>> > This patch leverages vcnt and other NEON instructions to custom lower
>>> ctpop
>>> > for v2i32/v4i32 and v4i16/v8i16 types.
>>> > As you'll see, a fair chunk of this patch is comments describing the
>>> > lowering, which I am happy to adjust to people's liking.
>>> > Please review!
>>> > Thanks!
>>> Can't you use Pat<> magic in .td file to expand ctpop into series of
>>> nodes? It seems exactly what's your lowering does...
>>>
>>> --
>>> With best regards, Anton Korobeynikov
>>> Faculty of Mathematics and Mechanics, Saint Petersburg State University
>>>
>>
>>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121204/fa23d66b/attachment.html>