[llvm-commits] [PATCH] ARM/NEON: Custom lower ctpop for appropriate vector types

Pete Couperus pjcoup at gmail.com
Fri Nov 30 23:37:36 PST 2012


Hi Anton,

I don't think this will work cleanly using Pat<>.  The custom lowering uses
a bitcast, which gives me an unrecoverable error for using bitconvert in an
output pattern, or a non-fatal (?) Type inference contradiction error.

Assuming there was some way around this, the unoptimized pattern for the
v4i16 case looks something like:
def : Pat<(v4i16 (ctpop DPR:$Dm)),
          (v4i16 (EXTRACT_SUBREG
            (v8i16 (VMOVLuv8i16
              (v8i8  (VUZPd8
                (v8i8 (IMPLICIT_DEF)),
                (v8i8  (VADDv8i8
                  (v8i8  (VREV16d8
                    (v8i8 (VCNTd (v8i8 DPR:$Dm)))
                  )),
                  (v8i8 (VCNTd (v8i8 DPR:$Dm)))
                )),
                (v8i8  (VADDv8i8
                  (v8i8  (VREV16d8
                    (v8i8 (VCNTd (v8i8 DPR:$Dm)))
                  )),
                  (v8i8 (VCNTd (v8i8 DPR:$Dm)))
                ))
              ))
            )), dsub_0
          ))>;

This actually builds with the Type inference errors, and gives the right
assembly.
As is, I think this is more clearly expressed in the code.
So, unless there are ways to deal with these issues in tablegen that I
haven't thought of, it needs to be lowered in the code.
Was fun to try though :).  Any other comments?

Pete


On Fri, Nov 30, 2012 at 6:05 AM, Pete Couperus <pjcoup at gmail.com> wrote:

> Hi Anton,
>
> Thanks for taking a look.  I had trouble with complex patterns for other
> similar items, but I'll take another look at this case if that actually
> sounds feasible.
> Looking at LowerCTTZ, using Pat<> would be the same for cttz, although its
> lowering is much simpler, correct?  Is there guidelines for when to use
> Pat<> and when to lay it out in the code?
> Thanks again for taking a look!
>
> Pete
>
>
>
>
> On Thu, Nov 29, 2012 at 9:53 PM, Anton Korobeynikov <
> anton at korobeynikov.info> wrote:
>
>> Pete,
>>
>> > The population count intrinsic (ctpop) is supported on ARM/NEON for
>> v8i8 and
>> > v16i8 types via NEON's vcnt instruction.
>> > This patch leverages vcnt and other NEON instructions to custom lower
>> ctpop
>> > for v2i32/v4i32 and v4i16/v8i16 types.
>> > As you'll see, a fair chunk of this patch is comments describing the
>> > lowering, which I am happy to adjust to people's liking.
>> > Please review!
>> > Thanks!
>> Can't you use Pat<> magic in .td file to expand ctpop into series of
>> nodes? It seems exactly what's your lowering does...
>>
>> --
>> With best regards, Anton Korobeynikov
>> Faculty of Mathematics and Mechanics, Saint Petersburg State University
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121130/6c693c1a/attachment.html>


More information about the llvm-commits mailing list