[PATCH] Rewrite arm_neon.h generation from scratch

Thu Jun 5 03:40:53 PDT 2014

Hi James,

Generally I think I like this change, and I see two other benefits at least,

1) The function body of intrinsics defined in arm_neon.h is amazingly
consistent everywhere.
2) The solution of solving name conflict brought by nested macros turns to
be so simple, and we don't need to care this rat hole any longer.
Previously, we came across this issue several times, and it can only be
fixed case by case.

For remove_high,

+// remove_high - Return the name of the current intrinsic with the string

+//               "_high" removed. Raises an error if "_high" does not exist
+//               in the current intrinsic name.
+// example: (call (remove_high), $p0) (to call the non-high version of this
+//           intrinsic).
+def remove_high;

It's a little bit strange at my first look. Do we have to introduce this?
Or can we have an even general solution like NameMangling<Remove, "_high">
to generate whatever names we want? But I would still be OK, if you insist
on keeping this operation. At least we don't need over-design.

For shuffle stuff,

+// rotl - Rotate set left by a number of elements.
+// example: (rotl mask0, 3) -> [3, 4, 5, 6, 0, 1, 2]
+def rotl;
+// rotl - Rotate set right by a number of elements.
+// example: (rotr mask0, 3) -> [4, 5, 6, 0, 1, 2, 3]
+def rotr;
+// highhalf - Take only the high half of the input.
+// example: (highhalf mask0) -> [4, 5, 6, 7] (assuming mask0 had 8
elements)
+def highhalf;
+// highhalf - Take only the low half of the input.
+// example: (lowhalf mask0) -> [0, 1, 2, 3] (assuming mask0 had 8 elements)
+def lowhalf;
+// rev - Perform a variable-width reversal of the elements. The zero'th
argument
+//       is a width in bits to reverse. The lanes this maps to is
determined
+//       based on the element width of the underlying type.
+// example: (rev 32, mask0) -> [3, 2, 1, 0, 7, 6, 5, 4] (if 8-bit elements)
+// example: (rev 32, mask0) -> [1, 0, 3, 2]             (if 16-bit
elements)
+def rev;
+// mask0 - The initial sequence of lanes for shuffle ARG0
+def mask0 : MaskExpand;
+// mask0 - The initial sequence of lanes for shuffle ARG1
+def mask1 : MaskExpand;

can they be all dropped into another independent class or a single
operation with parameter, so we can make an even clean Operation
classification? Or they can be essentially an extension to SetTheory
instead?

Thanks,
-Jiangning

2014-06-05 16:23 GMT+08:00 Jiangning Liu <liujiangning1 at gmail.com>:

> Hi James,
>
> For first patch of moving SetTheory location is lack of the change of
> CMakeLists.txt for TableGen folder, so it would trigger cmake configure
> failure otherwise.
>
> --- a/lib/TableGen/CMakeLists.txt
> +++ b/lib/TableGen/CMakeLists.txt
> @@ -6,4 +6,5 @@ add_llvm_library(LLVMTableGen
>    TableGenBackend.cpp
>    TGLexer.cpp
>    TGParser.cpp
> +  SetTheory.cpp
>    )
>
> Thanks,
> -Jiangning
>
>
>
> 2014-06-05 8:20 GMT+08:00 Jim Grosbach <grosbach at apple.com>:
>
> Woah. This sounds fantastic. I'm at WWDC this week so won't be able to
>> give this an in depth review for a bit. Please don't mistake my silence as
>> lack of interest! If I haven't responded in more depth by early next week,
>> please ping me.
>>
>> Jim.
>>
>> > On Jun 4, 2014, at 3:39 AM, James Molloy <james.molloy at arm.com> wrote:
>> >
>> > Hi all,
>> >
>> > [Obvious stakeholders put on CC, feel free to add more]
>> >
>> > NeonEmitter.cpp, the arm_neon.td tablegen backend, really needs
>> improving. I
>> > wanted
>> > to change the behaviour slightly for big endian, but quickly realised
>> that
>> > this may
>> > well be the hack that broke the camel's back. I tried to incrementally
>> > refactor it
>> > but to be honest it's beyond repair. The attached patch reimplements it
>> from
>> > scratch.
>> > Luckily the testing coverage on this stuff is absolutely massive, both
>> with
>> > regression tests and the "emperor" random test case generator.
>> >
>> > The main change is that previously, in arm_neon.td a bunch of
>> "Operation"s
>> > were
>> > defined with special names. NeonEmitter.cpp knew about these Operations
>> and
>> > would emit code based on a huge switch. Actually this doesn't make much
>> > sense -
>> > the type information was held as strings, so type checking was
>> impossible.
>> > Also
>> > TableGen's DAG type actually suits this sort of code generation very
>> well
>> > (surprising that...)
>> >
>> > So now every operation is defined in terms of TableGen DAGs. There are a
>> > bunch
>> > of operators to use, including "op" (a generic unary or binary
>> operator),
>> > "call"
>> > (to call other intrinsics) and "shuffle" (take a guess...). One of the
>> main
>> > advantages of this apart from making it more obvious what is going on,
>> is
>> > that
>> > we have proper type inference. This has two obvious advantages:
>> >
>> >  1) TableGen can error on bad intrinsic definitions easier, instead of
>> just
>> >     generating wrong code.
>> >  2) Calls to other intrinsics are typechecked too. So
>> >     we no longer need to work out whether the thing we call needs to be
>> the
>> > Q-lane
>> >     version or the D-lane version - TableGen knows that itself!
>> >
>> > Here's an example: before:
>> >
>> >  case OpAbdl: {
>> >    std::string abd = MangleName("vabd", typestr, ClassS) + "(__a, __b)";
>> >    if (typestr[0] != 'U') {
>> >      // vabd results are always unsigned and must be zero-extended.
>> >      std::string utype = "U" + typestr.str();
>> >      s += "(" + TypeString(proto[0], typestr) + ")";
>> >      abd = "(" + TypeString('d', utype) + ")" + abd;
>> >      s += Extend(utype, abd) + ";";
>> >    } else {
>> >      s += Extend(typestr, abd) + ";";
>> >    }
>> >    break;
>> >  }
>> >
>> > after:
>> >
>> >  def OP_ABDL     : Op<(cast "R", (call "vmovl", (cast $p0, "U",
>> >                                                       (call "vabd", $p0,
>> > $p1))))>;
>> >
>> > As an example of what happens if you do something wrong now, here's what
>> > happens
>> > if you make $p0 unsigned before the call to "vabd" - that is, $p0 ->
>> (cast
>> > "U", $p0):
>> >
>> > arm_neon.td:574:1: error: No compatible intrinsic found - looking up
>> > intrinsic 'vabd(uint8x8_t, int8x8_t)'
>> > Available overloads:
>> >  - float64x2_t vabdq_v(float64x2_t, float64x2_t)
>> >  - float64x1_t vabd_v(float64x1_t, float64x1_t)
>> >  - float64_t vabdd_f64(float64_t, float64_t)
>> >  - float32_t vabds_f32(float32_t, float32_t)
>> > ... snip ...
>> >
>> > This makes it seriously easy to work out what you've done wrong in
>> fairly
>> > nasty
>> > intrinsics.
>> >
>> > As part of this I've massively beefed up the documentation in
>> arm_neon.td
>> > too.
>> >
>> > Things still to do / on the radar:
>> >  - Testcase generation. This was implemented in the previous version and
>> > not in
>> >    the new one, because
>> >    - Autogenerated tests are not being run. The testcase in test/
>> differs
>> > from
>> >      the autogenerated version.
>> >    - There were a whole slew of special cases in the testcase generation
>> > that just
>> >      felt (and looked) like hacks.
>> >    If someone really feels strongly about this, I can try and
>> reimplement
>> > it too.
>> >  - Big endian. That's coming soon and should be a very small diff on
>> top of
>> > this one.
>> >
>> > Cheers,
>> >
>> > James
>> >
>> > ---
>> > include/clang/Basic/arm_neon.td   |  653 +++--
>> > test/CodeGen/arm64_vcvtfp.c       |    2 +-
>> > test/Sema/arm-neon-types.c        |    2 +-
>> > test/Sema/arm64-neon-args.c       |    2 +-
>> > utils/TableGen/NeonEmitter.cpp    | 4833
>> > ++++++++++++++-----------------------
>> > utils/TableGen/TableGenBackends.h |    3 +
>> > 6 files changed, 2303 insertions(+), 3192 deletions(-)
>> > <0001-Move-SetTheory-from-utils-TableGen-into-lib-TableGen.patch>
>> > <0001-Rewrite-ARM-NEON-intrinsic-emission-completely.patch>
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140605/b18a6558/attachment.html>