[cfe-dev] [RFC][OpenCL] Implement OpenCL builtin functions in Clang

Wed Oct 10 04:08:54 PDT 2018

Hello all,

I posted my prototype here for review/discussion: https://reviews.llvm.org/D53023

I included some explanation / TODOs there.

Thanks,

Joey

________________________________
From: Anastasia Stulova
Sent: 01 October 2018 14:21
To: Andrew Savonichev
Cc: nd; Sven Van Haastregt; Joey Gouly; clang-dev developer list
Subject: Re: [cfe-dev] [RFC][OpenCL] Implement OpenCL builtin functions in Clang

Hi Andrew,

Thanks for your early feedback. Yes, your concerns are valid and we would have to look into both alternatives in more details, especially in terms of performance measurements with and without OpenCL mode.

We will be working on the prototype now and try to put something upstream ASAP for evaluation. If we then prefer this alternative we can continue improving it and go for the complete solution. If not, we can go ahead with your solution then. But the main message is that we should implement a solution to speed up parsing/loading of OpenCL builtin function declarations.

Cheers,

Anastasia

________________________________
From: Andrew Savonichev <andrew.savonichev at intel.com>
Sent: 28 September 2018 15:14
To: Anastasia Stulova
Cc: nd; Sven Van Haastregt; Joey Gouly; clang-dev developer list
Subject: Re: [cfe-dev] [RFC][OpenCL] Implement OpenCL builtin functions in Clang

Hi Anastasia,

Thank you for taking time to review.

> Potential problems with this approach are:
>
> - The maintenance costs as it's harder to modify Clang than just
> change the header in case we want to extend the list of builtins or
> modify them. Meaning that you have to add logic to clang to handle
> what is essentially just a list of declarations. Clang builtins
> weren't really intended to solve this problem.

It is definitely harder to modify Clang vs a change in a header file,
although it is expected that all difficult code is shared between all
builtins, making it easy to add a new builtin.

> - It is more risky in terms of bugs because it might not be easy to
> cover all use cases.

This might be true, considering a custom typecheck code that we have to
write to handle implicit conversions (which comes for free if we use
headers).

On the other hand, you'll need to write this typecheck code once, and
all other OpenCL builtins can re-use it. Surely some bugs may happen,
but the problem is localized and it is easy to test.

> - Affects parsing time to check all the extra switch cases.

Switch is generally faster than any other lookup, so I assume that there
will be no performance implications after adding more switch-cases.

It also makes sense to put these switch-cases under `if (Lang.OpenCL)',
so it will have a minimal impact on other languages.

> - We will need to replace the builtins from the header with macros
> aliasing Clang builtins. I find macros really bad in terms of error
> reporting.

Agree. Any diagnostic related to builtins will be accompanied by "note:
expanded from macro 'convert_char'". This makes error messages a bit
harder to read.

I wonder if there is a way to have a user-facing alias for a builtin
function without using a preprocessor.

> But also they [macros] will still need to be parsed.

Right, but the number of macros should not be high: we need only one
macro to handle all overloadings. I expect ~600 macros for different
`convert_xxx' builtins, ~200 macros for math builtins, ~50 macros for
vload/vstore, etc.

In essence, there should be ~1000 macro declarations vs existing ~15000
function declarations. This has to be measured of course, but I expect a
significant improvement in parsing in comparison with the existing
(header) design.

> Also something important to consider is the impact on parsing for
> other languages, i.e. C/C++. I think we should evaluate this
> carefully. Do you have any numbers for this already?

>From my understanding, there shouldn't be any impact on parsing other
languages: __builtin_opencl_* are LANGBUILTIN, and they should only be
parsed for OpenCL.

> My team has a slightly different solution to this problem. It is based
> on generation of tries. Something similar to attribute parsing, see
> generated AttrParsedAttrKinds.inc. Unfortunately it's not in a shape
> that we can currently share. But it should not take longer than 2-3
> weeks to upload a prototype if there is interest.

Having an alternative to consider is always better. This can help us to
get a solution that fits for everyone.

> To give a rough idea, it uses a TableGen mechanism to describe the
> OpenCL builtin functions (similar to intrinsics in LLVM). During the
> Clang build phase, a trie file is generated. There is a hook in Clang
> that uses the generated trie to lookup the builtin function name
> during call resolution in Sema. After a successful lookup it generates
> the AST declaration of a function with overloads and the rest just
> follows the unmodified Clang flow (i.e. checking overloading,
> mangling, etc). This will cover all builtins, not just conversions and
> maths. It is easy to extend. It doesn't need to change much of Clang
> code, as it allows to follow normal compilation flow.

So when Sema reaches a call to, say, `acos', and cannot resolve it to
any existing declaration, the hook creates all `acos' function
declarations for all combination of types:

  float acos(float);
  float2 acos(float2);
  ...
  // same for other vector types, double and half

Then Clang's standard machinery chooses which overloading to use and
performs all necessary conversions.

Do I understand this correctly?

> Do you think it would make sense to evaluate this solution as well?

Absolutely. Although, there are several drawbacks/things to consider:

  1) This approach introduces another type of 'builtin function' to
     Clang: it is built-in into Clang, but it looks and behaves exactly
     as a user-defined function (except that it doesn't have a
     SourceLocation).

     This is something new for Clang (unless I miss something), so we
     probably need more people from Clang community to review this.

  2) If OpenCL builtin looks and behaves as a user-defined function, it
     will always be mangled the same way as it is mangled now (with
     headers).

     We cannot emit LLVM intrinsics instead of mangled functions, and,
     as I mentioned, LLVM intrinsics for builtins can make SPIR-V
     translation easier. Again, see [1].

  3) If my assumption is correct, you'll be generating dozens of
     function declarations (overloadings for all types) for every
     builtin that is used in a program. This might not be optimal in
     terms of performance, but it'll probably be good enough for general
     use-case.

Anyway, this approach looks pretty interesting.

  [1]: https://lists.llvm.org/pipermail/llvm-dev/2018-September/125977.html

--
Andrew

--------------------------------------------------------------------
Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181010/7330c96d/attachment.html>