[cfe-dev] [RFC][RISCV] Add intrinsic and/or builtin functions by #pragma

Kito Cheng via cfe-dev cfe-dev at lists.llvm.org
Tue Jun 22 23:21:05 PDT 2021


Hi Anastasia:

Thanks for your explanation! My first impression is the implementation
is kind of OpenCL specific since there are lots of OpenCL term are
used in the implementation including the option name used in tablegen,
but the mechnich seems could be generalized.

> We have removed the need for the pragmas in the last commits but it is mainly
> because it wasn't useful in OpenCL in a way it was defined in the spec as it
> was not similar to a header include. The TableGen based header include is very
> fast compared to parsing the large header files so I can certainly recommend
> this route.

Could you explain what the TableGen based header is ? Does it mean
OpenCLBuiltins.inc? or some other headers?

I guess we still needed for RISC-V since we don't want to import those
symbols until include riscv_vector.h, but that should not conflict
with the OpenCL built-in approach :)

Thanks!

On Wed, Jun 23, 2021 at 1:50 AM Anastasia Stulova via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
>
> FYI, in case it helps we have started documentation about the internals of the
> approach https://clang.llvm.org/docs/OpenCLSupport.html#opencl-builtins.
> Although it is still a bit concise. There is not much OpenCL specific in the
> approach we have implemented so it should be easily generalizable with some
> renaming and minor refactoring (CC to Sven who might be able to provide more
> info if needed). You might need to add a few special types if you use any that
> we don't have in OpenCL yet. Although we have covered a good variety from C99
> already.
>
> We have removed the need for the pragmas in the last commits but it is mainly
> because it wasn't useful in OpenCL in a way it was defined in the spec as it
> was not similar to a header include. The TableGen based header include is very
> fast compared to parsing the large header files so I can certainly recommend
> this route.
>
> Cheers,
> Anastasia
> ________________________________
> From: cfe-dev <cfe-dev-bounces at lists.llvm.org> on behalf of Kito Cheng via cfe-dev <cfe-dev at lists.llvm.org>
> Sent: 22 June 2021 03:41
> To: David Rector <davrecthreads at gmail.com>
> Cc: Clang Dev <cfe-dev at lists.llvm.org>
> Subject: Re: [cfe-dev] [RFC][RISCV] Add intrinsic and/or builtin functions by #pragma
>
> Hi David:
>
> Thanks for your info, I investigate OpenCL intrinsic last few days,
> and I saw OpenCL already use some #pragama to control the extenison
> on/off.
> So I think the mechnish is pretty simiular, the difference is OpenCL
> apporache need to write a new td file to generate those helper
> functions.
>
> And our apparoch is extending existing builtin declare mechnish: add
> one filed to record the enable contdition.
>
> We consider pre-compiled header before, but seems like pre-compiled
> header are not fit RISC-V scenario - having different -march
> combination which will affect the content of the header, so it seems
> not work for RISC-V intrinsic headers.
>
>
> Thanks :)
>
> On Tue, Jun 15, 2021 at 11:11 PM David Rector via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
> >
> > IIUC OpenCL faced the same issue, and their solution was pretty clever and generalizable; a similar approach could conceivably improve compile speeds still further, while also minimizing memory usage and making pragmas unnecessary.  https://lists.llvm.org/pipermail/cfe-dev/2021-February/067610.html
> >
> > The basic idea if I recall (Anastasia cc’d might correct me), is to create the necessarily declarations whenever lookup fails.  I.e., if lookup of `vint32m1_t` fails, before giving up clang checks if that is the name of one of your intrinsics; if so it adds the necessarily declaration/overloaded declarations (the particulars handled via Tablegen) and returns that.
> >
> > The effect is to "instantiate" these declarations as needed, as if from a template.
> >
> > What also seems nice about this approach is that heavy-duty users can alternatively choose to just #include the large header, or use a pre-compiled header, and thereby automatically avoid any costs associated with this last-ditch-lookup solution.
> >
> > On Jun 15, 2021, at 2:59 AM, Kito Cheng via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> >
> > Hi :
> >
> >
> > # TL;DR:
> >
> > It's the intrinsic and/or builtin functions related issue again, in
> > this RFC we are trying to use pragma to import intrinsics and declare
> > intrinsic wrappers function to reduce the compilation time.
> >
> > And here is the PoC for this RFC:
> > https://reviews.llvm.org/D103228
> >
> > # Background:
> >
> > RISC-V vector extension has defined 25,386 intrinsic and 2,102
> > overloaded intrinsic functions in riscv_vector.h which increase a lot
> > of compilation time; the header file contains ~60k lines for those
> > overload functions and intrinsic wrapper functions.
> >
> > An empty file with include riscv_vector.h takes 0.395s on release
> > build and 8.067s second on debug build, and this also increases the
> > clang test time.
> >
> > # Proposal:
> >
> > Using Tablegen to generate the table of the intrinsic wrapper
> > functions and then using pragma to declare intrinsic wrapper
> > functions.
> >
> > Syntax:
> > ```c
> > #pragma riscv intrinsic vector
> > ```
> >
> > Then import all builtin functions and intrinsic wrappers into the
> > symbol table, this could save lots of time parsing the prototypes of
> > the intrinsic wrapper function.
> >
> > And this idea of trick is borrowing from AArch64/SVE's implementation on GCC:
> > https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/arm_sve.h#L40
> >
> >
> > # Experimental Results:
> > ## Size of riscv_vector.h:
> >      |      size |     LoC |
> > ------------------------------
> > Before | 4,434,725 |  69,749 |
> > After  |     5,463 |     159 |
> >
> > ## Compilation Speed for Simple File
> >
> > testcase:
> > ```c
> > #include <riscv_vector.h>
> >
> > vint32m1_t test_vadd_vv_vfloat32m1_t(vint32m1_t op1, vint32m1_t op2,
> > size_t vl) {
> >  return vadd(op1, op2, vl);
> > }
> > ```
> >
> > Release build:
> >  Before: 0m0.417s
> >  After:  0m0.090s
> >
> > Debug build:
> >  Before: 0m8.016s
> >  After:  0m2.295s
> >
> >
> > ## Regression Time
> > LLVM regression on our 48 core server:
> > Release build:
> >  Before : Testing Time: 203.81s
> >  After : Testing Time: 181.13s
> >
> > Debug build:
> >  Before : Testing Time: 675.18s
> >  After : Testing Time: 647.20s
> >
> >
> >
> > Any comments or feedback are appreciated!
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


More information about the cfe-dev mailing list