[PATCH] D75470: [SVE] Auto-generate builtins and header for svld1.

Fri Mar 6 10:29:03 PST 2020

sdesmalen marked an inline comment as done.
sdesmalen added a comment.
Herald added a subscriber: danielkiss.

In D75470#1907562 <https://reviews.llvm.org/D75470#1907562>, @SjoerdMeijer wrote:

> Adding @simon_tatham in case he feels wants to have a look too.

Thanks Sjoerd! @simon_tatham and I had a chat about this offline today.

The SVE implementation now does more or less the same thing the MVE implementation; `arm_sve.h` also uses `__attribute__((overloadable))` and `__attribute__((arm_sve_alias("__builtin_...")))`, the latter only to declare the overloaded intrinsics. That means we get the same benefits as Simon described.

There are a few details that are different:

- The MVE implementation *does not* use the type string in the actual Builtins.def. For example, the type string below is not `"V16ScV16ScV16Sci"`, but rather `""`: 

  TARGET_HEADER_BUILTIN(__builtin_arm_mve_vcaddq_rot270_f16, "", "n", "arm_mve.h", ALL_LANGUAGES, "")

- This means that the implementation relies solely on the declaration in arm_mve.h to define the intrinsic prototype.
- If I understood this correctly, this is currently needed to represent the tuple types (which cannot yet be expressed in the bulitin type string format) returned from structured loads like ld2/ld3/ld4.

The SVE implementation *does* use the type string in Builtins.def.

- For SVE, the structured loads will just return a wider vector with 2, 3 or 4 times the number of elements, so the lack of support in the builtin type-string format is not an issue.
- An advantage of that is that we can use a `#define svadd_u8(...) __builtin_svadd_u8(...)` for the non-overloaded builtins which helps compile-time performance.

If all the intrinsics can be described *with* a builtin type string, we can actually work to define all builtins internally in Clang (as @efriedma suggested in D75298 <https://reviews.llvm.org/D75298>) rather than having to rely on the header file to define the prototypes, hopefully get rid of the expensive header file.
When we do this for SVE, MVE can probably follow the same approach.

================
Comment at: clang/include/clang/Basic/arm_sve.td:121
+// Load one vector (scalar base)
+def SVLD1   : MInst<"svld1[_{2}]", "dPc", "csilUcUsUiUlhfd", [IsLoad]>;
----------------
simon_tatham wrote:
> SjoerdMeijer wrote:
> > sdesmalen wrote:
> > > SjoerdMeijer wrote:
> > > > This encoding, e.g, this is  "csilUcUsUiUlhfd", is such a monstrosity. It's a very efficient encoding, but of course completely unreadable.  I know there is prior art, and know that this is how it's been done, but just curious if you have given it thoughts how to do this in a normal way, a bit more c++y.  I don't want to de-rail this work, but if we are adding a new emitter, perhaps now is the time to give it a thought, so was just curious.  
> > > Haha, its a bit of a monstrosity indeed. The only thing I can think of here would be having something like:
> > > ```class TypeSpecs<list<string> val> {
> > >   list<string> v = val;
> > > }
> > > def All_Int_Float_Ty : TypeSpecs<["c", "s", "i", "l", "Uc", "Us", "Ul", "h", "f", "d">;
> > > 
> > > def SVLD1 : Minst<"svld1[_{2}]", "dPc", All_Int_Float_Ty, [IsLoad]>;```
> > > 
> > > But I suspect this gets a bit awkward because of the many permutations, I count more than 40. Not sure if that would really improve the readability.
> > I would personally welcome any improvement here, even the smallest. But if you think it's tricky, then fair enough!
> > 
> > I've managed to completely ignore the MVE intrinsics work so far, but understood there were some innovations here and there (e.g. in tablegen). Probably because it is dealing with similar problems: a lot of intrinsics, some of them overloaded with different types. I'm going to have a little look now to see if there's anything we can borrow from that, or if that is unrelated....
> In the MVE intrinsics implementation I completely avoided that entire system of string-based type specifications. There's another completely different way you can set up the types of builtins, and I used that instead.
> 
> You can declare the function for `Builtins.def` purposes with no type specification at all, and then you fill in its type signature using a declaration in the //header file//, with the unusual combination of `__inline__` and no function body:
> ```static __inline__ int32x4_t __builtin_arm_foo_bar(int16x8_t, float23x7t); // or whatever```
> 
> In fact I went one step further: the user-facing names for the MVE intrinsics are declared in `arm_mve.h` with a special attribute indicating that they're aliases for clang builtins. And the MVE polymorphic intrinsics are done by adding `__attribute__((overloadable))` to the declaration, which allows C++-style overloading based on parameter types even when compiling in C. So when the user invokes an MVE intrinsic by its polymorphic name, the compiler first does overload resolution to decide which declaration in the header file to select; then it looks at the builtin-alias attribute and discovers which internal builtin id it corresponds to; and then it can do codegen for that builtin directly, without a wrapper function in the header.
> 
> Pros of doing it this way:
>  - if the builtin requires some of its arguments to be compile-time constants, then you don't run into the problem that a wrapper function in the header fails to pass through the constantness. (In NEON this is worked around by making some wrapper functions be wrapper macros instead – but NEON doesn't have to deal with polymorphism.)
>  - declaring a builtin's type signature in the header file means that it can include definitions that the header file has created beforehand. For example, one of the arguments to the MVE `vld2q` family involves a small `struct` containing 2 or 4 vectors, and it would be tricky to get that struct type into the `Builtins.def` type specification before the header file can tell clang it exists.
>  - doing polymorphism like this, rather than making the polymorphic function be a macro expanding to something involving C11 `_Generic`, means the error messages are orders of magnitude more friendly when the user messes up a call. (Also it's remarkably fiddly to use `_Generic` in complicated cases, because of the requirement that even its untaken branches not provoke any type-checking or semantic errors.)
>  - I don't know of any way that the preprocessor + `_Generic` approach can avoid expanding its macro arguments at least twice. It can avoid //evaluating// twice, so that's safe in the side-effect sense, but you still have the problem that you get exponential inflation of the size of preprocessed output if calls to these macros are lexically nested too deeply.
> 
> Cons:
>  - you have to do all of your codegen inside the clang binary, starting from the function operands you're given, and ending up with LLVM IR. You don't get to do the tedious parts (like unpacking structs, or dereferencing pointer arguments for passed-by-reference parameters) in the wrapper function in the header, because there isn't one. I had to invent a whole system in MveEmitter to allow the IR generation to be specified in a not-too-verbose way.
>  - if the builtins don't have type declarations until the header is included, then users can't call them //without// the header file. Probably this is fine for SVE intrinsics the same way it is for MVE, where the builtins are a detail of that particular compiler's implementation and users are intended to use the compiler-independent public API. But in cases where the builtin itself was intended to be called directly by the end user (in the way that `__builtin_clz` is, for example), you'd probably want it to work everywhere.
>  - if you do polymorphism using `__attribute__((overloadable))` then all the things you're overloading between have to be real functions. You can't make some of them be macros, with the extra flexibility a macro gives you. (But then, making them //builtins// rather than genuine functions restores some of that flexibility.)
> 
> Off the top of my head I don't know whether all these ideas can be separated from each other. It feels to me as if all the choices I made are leaning on each other and making a mutually supporting whole, and it's quite possible that if you tried to cherry-pick just one of these design decisions into an otherwise more conventional approach, it might all come crashing down. But I haven't tried it :-)
Thanks for sharing some background here @simon_tatham!

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75470/new/

https://reviews.llvm.org/D75470