[llvm-commits] [PATCH] ARM/NEON: Custom lower ctpop for appropriate vector types

Tue Dec 4 14:56:43 PST 2012

Hi, Pete:

   Do you mind to implement 
ScalarTargetTransformInfo::getPopcntHwSupport() for ARM?
This function will enable scalar opt to recognize popcount idioms.

Thanks
Shuxin

On 12/4/12 2:43 PM, Evan Cheng wrote:
>
> On Dec 4, 2012, at 6:44 AM, Pete Couperus <pjcoup at gmail.com 
> <mailto:pjcoup at gmail.com>> wrote:
>
>> Hi Evan,
>>
>> Thanks for the review.  Currently, this only enables HW support for 
>> vector types, not scalars.  It would be easy enough to add scalar 
>> support in.
>> Should I roll that into this patch?  Or put it in a following patch?
>> If you'd like a following patch, would you mind committing this one 
>> for me?
>
> I've committed it as r169325 after some minor changes (including 
> naming the test case popcnt.ll). Please add the scalar support. Thanks!
>
> Evan
>
>> Thanks!
>>
>> Pete
>>
>>
>> On Mon, Dec 3, 2012 at 6:29 PM, Evan Cheng <evan.cheng at apple.com 
>> <mailto:evan.cheng at apple.com>> wrote:
>>
>>     I don't think use Pat<> buys anything here. The C++ lowering code
>>     is actually much more readable.
>>
>>     The patch looks fine to me. Please
>>     add ScalarTargetTransformInfo::PopcntHwSupport() to take
>>     advantage of the popcount loop idiom recognition support (See
>>     r168931). ARM crafty score should improve!
>>
>>     Evan
>>
>>     On Nov 30, 2012, at 11:37 PM, Pete Couperus <pjcoup at gmail.com
>>     <mailto:pjcoup at gmail.com>> wrote:
>>
>>>     Hi Anton,
>>>
>>>     I don't think this will work cleanly using Pat<>.  The custom
>>>     lowering uses a bitcast, which gives me an unrecoverable error
>>>     for using bitconvert in an output pattern, or a non-fatal (?)
>>>     Type inference contradiction error.
>>>
>>>     Assuming there was some way around this, the unoptimized pattern
>>>     for the v4i16 case looks something like:
>>>     def : Pat<(v4i16 (ctpop DPR:$Dm)),
>>>               (v4i16 (EXTRACT_SUBREG
>>>                 (v8i16 (VMOVLuv8i16
>>>                   (v8i8  (VUZPd8
>>>                     (v8i8 (IMPLICIT_DEF)),
>>>                     (v8i8  (VADDv8i8
>>>                       (v8i8  (VREV16d8
>>>                         (v8i8 (VCNTd (v8i8 DPR:$Dm)))
>>>                       )),
>>>                       (v8i8 (VCNTd (v8i8 DPR:$Dm)))
>>>                     )),
>>>                     (v8i8  (VADDv8i8
>>>                       (v8i8  (VREV16d8
>>>                         (v8i8 (VCNTd (v8i8 DPR:$Dm)))
>>>                       )),
>>>                       (v8i8 (VCNTd (v8i8 DPR:$Dm)))
>>>                     ))
>>>                   ))
>>>                 )), dsub_0
>>>               ))>;
>>>
>>>     This actually builds with the Type inference errors, and gives
>>>     the right assembly.
>>>     As is, I think this is more clearly expressed in the code.
>>>     So, unless there are ways to deal with these issues in tablegen
>>>     that I haven't thought of, it needs to be lowered in the code.
>>>     Was fun to try though :).  Any other comments?
>>>
>>>     Pete
>>>
>>>
>>>     On Fri, Nov 30, 2012 at 6:05 AM, Pete Couperus <pjcoup at gmail.com
>>>     <mailto:pjcoup at gmail.com>> wrote:
>>>
>>>         Hi Anton,
>>>
>>>         Thanks for taking a look.  I had trouble with complex
>>>         patterns for other similar items, but I'll take another look
>>>         at this case if that actually sounds feasible.
>>>         Looking at LowerCTTZ, using Pat<> would be the same for
>>>         cttz, although its lowering is much simpler, correct?  Is
>>>         there guidelines for when to use Pat<> and when to lay it
>>>         out in the code?
>>>         Thanks again for taking a look!
>>>
>>>         Pete
>>>
>>>
>>>
>>>
>>>         On Thu, Nov 29, 2012 at 9:53 PM, Anton Korobeynikov
>>>         <anton at korobeynikov.info <mailto:anton at korobeynikov.info>>
>>>         wrote:
>>>
>>>             Pete,
>>>
>>>             > The population count intrinsic (ctpop) is supported on
>>>             ARM/NEON for v8i8 and
>>>             > v16i8 types via NEON's vcnt instruction.
>>>             > This patch leverages vcnt and other NEON instructions
>>>             to custom lower ctpop
>>>             > for v2i32/v4i32 and v4i16/v8i16 types.
>>>             > As you'll see, a fair chunk of this patch is comments
>>>             describing the
>>>             > lowering, which I am happy to adjust to people's liking.
>>>             > Please review!
>>>             > Thanks!
>>>             Can't you use Pat<> magic in .td file to expand ctpop
>>>             into series of
>>>             nodes? It seems exactly what's your lowering does...
>>>
>>>             --
>>>             With best regards, Anton Korobeynikov
>>>             Faculty of Mathematics and Mechanics, Saint Petersburg
>>>             State University
>>>
>>>
>>>
>>>     _______________________________________________
>>>     llvm-commits mailing list
>>>     llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>>     http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121204/18674936/attachment.html>