[llvm-dev] RFC: Full 'restrict' support in LLVM

Jeroen Dobbelaere via llvm-dev llvm-dev at lists.llvm.org
Tue Mar 19 08:33:35 PDT 2019


Hi all,



during the last US LLVM conference (October last year), I had two 'BoF restrict' sessions.

During these sessions, I discussed with a number of persons our view and mechanism on fixing the issues

with llvm's  restrict implementation and Hal Finkel's local restrict patches. One outcome of these discussions

was a request to put all of this together in an RFC and present it to the larger llvm community so that a

broader discussion can be started.



With thanks to Wouter, Bruno, Dirk, Gert, Troy, David, Johannes for helping with the initial review of this rfc.



Greetings,



Jeroen Dobbelaere

---



                                      2019-03-19  jeroen.dobbelaere at synopsys.com



RFC: Full 'restrict' support in LLVM



==  Introduction ==



LLVM only has basic support for restrict on function arguments. All other

restrict usages are ignored.  This document proposes a mechanism to get full

support for restrict in LLVM.



== 'restrict' ==



C99 introduces the 'restrict' keyword for pointer operands [1, 2]. It helps the

programmer to specify that memory accesses will not interfere with each

other. Based on such information, the compiler can decide to do better

optimizations and transformations, as it can assume that certain memory accesses

will never alias.  This can result in dramatic speed improvements.

Restrict pointers can occur as function arguments, local variables (local restrict),

struct members (restrict member pointers), global variables, ...



== Current status (LLVM-8) ==



LLVM mostly ignores the 'restrict' keyword. It only has support for restrict

pointer arguments of a function. Even there, the C restrict keyword is mapped

onto the 'noalias' argument attribute, which is in some way stronger than

'restrict' [5,9].



While inlining, LLVM tries to track the 'noalias' argument information through

the '!noalias' and '!alias.scope' metadata.  This easily results in

transformations that are not correct according to the C restrict rules, as this

metadata representation is not sufficient to retain all information that is

needed to represent all cases [6].



Another example that demonstrates this problem is shown in [7]:



  #define SIZE 1024



  // Testcase showing that llvm-ir is not able to

  // differentiate between following two cases.

  // (restrict outside ('0') or inside ('1') the loop)



  // Toggle '0' to '1' and see that the metadata

  // (!noalias, !alias.scope) remains similar.

  #if 0

  static __attribute__((always_inline))

  void add_element(int* restrict pDest, int* restrict pA, int* restrict pB) {

      *pDest = *pA+*pB;

  }



  void test_add_array01(int* pDest, int* pA, int* pB) {

    for (int i=0; i<SIZE; ++i) {

       // restrict only holds per iteration, not across iterations

      add_element(pDest++, pA++, pB++);

    }

  }

  #else

  static __attribute__((always_inline))

  void add_array(int* restrict pDest, int* restrict pA, int* restrict pB) {

    for (int i=0; i<SIZE; ++i) {

      *pDest++=*pA++ + *pB++; // restrict holds across iterations

    }

  }



  void test_add_array02(int* pDest, int* pA, int* pB) {

    // restrict info is similar to 'test_add_array01', no differentiation possible

    add_array(pDest, pA, pB);

  }

  #endif





== 'local restrict' patches ==



Around 2014/2015, Hal Finkel proposed a number of patches, fixing some of the

observed issues with restrict, while also adding support for local restrict [4].



Hal had the following goals:

* Fix the issue with inlined functions, where restrictness inside vs outside

  the loop body was not decidable

* Enable support for local restrict



Some of the initial patches have been added to LLVM in the meantime. But, a

number of main changes are still pending, due to slow review progress, but also

due to a number of issues that popped up in the meantime.



How it works:

* Whenever a value is assigned to a restrict pointer, a 'llvm.noalias %p,

  metadata !x' intrinsic is put between the assignment and the value.

  '!x' is a list of a single scope id that represents the specific restrict pointer.

* Whenever a memory instruction is produced, the '!noalias !y' metadata scope is

  added.

  '!y' is a list of all scoped id's that are active at the moment of the memory

  instruction.

* The combination of the intrinsic and the !noalias scope allows to check if two

  memory accesses can/cannot alias according to the restrict rules.



Observed issues:

* Performance:

** The intrinsic blocks certain optimizations.

** In some combinations of local restrict and inlining, the restrictness is

   lost, resulting in slow code.

* Correctness:

** Some optimizations (like LSR) eliminate the intrinsics, possibly resulting

   in wrong code.

** Loop unrolling can result in wrong code. [6]



== A way out ==



I had the following goals when trying to fix the observed issues with the local

restrict implementation:

* Fix the known issues wrt to the local restrict implementation. Both

  correctness (must) and performance (as much as possible)

* Make it safe(r) against the unexpected effect of (new) optimization passes:

  prefer slow code over incorrect code

* Make it easy to add support for restrict member pointers.

* Make it feasible to implement the 'alias_set' proposal (n4150) [3]



High level strategy:

* Keep most of the restrictness annotations out the way of the normal program

  flow, except when it is needed for correctness.

* Follow the restrict specification [2] to the letter. More precisely:

** Track the declaration of a 'object P'.

** Each distinct restrict pointer 'object P' points to its own set of objects

   (within the valid scope).

   As such, restrictness is introduced when using a restrict pointer

   (not when assigning it).



   Example:

     void foo(int* restrict rP, int* restrict * pRP) {

        int* p2=rP;   // 'using rP' introduces restrictness

        *p2=42;       // dereferencing a pointer based on a restrict pointer

     rP=&someInt;  // assignment does not introduce (additional) restrictness

     int* p3=*pRP; // 'using *pRP' introduces restrictness

     *p3=43;       // dereferencing a pointer based on a restrict pointer

     }

** Track the 'based on' expressions in an accurate way.



NOTE: Part of those ideas have been discussed in small group during 2 BOF

      sessions during the 2018 US LLVM Conference.



The following sections explain how to come to a full set of intrinsics that should

support restrict:



(A) The Basics



In principle, two intrinsics should be sufficient:

* 'llvm.noalias.decl %p.alloc, metadata !Scope' intrinsic:

** indicates at what place in the program a restrict pointer has been declared.

** %p.alloc: represents the alloca associated with the declaration.

** !Scope: metadata that represents a unique scope, associated with this specific

   declaration.

** NOTE: the llvm.noalias.decl intrinsic can normally not be moved outside

   loops. Its purpose is to identify the freedom that a restrict pointer has

   wrt to the loop bodies and other blocks.  As such it helps to resolve

   problems as seen in [6] and [7].





* 'llvm.noalias %p, %p.decl, %p.addr' intrinsic:

** created when a restrict pointer object P is _read_.

   It is used to track the 'based-on' dependency.

** %p: is the pointer value that was read. This is also the value that the

   intrinsic returns.

** %p.decl: refers to the 'llvm.noalias.decl' that is associated with the

   restrict pointer.

** %p.addr represents the address of 'object P'

** NOTE: sometimes the declaration is not known upfront, in that case '%p.decl'

   is 'null'. After inlining and/or optimizations, it can be possible to infer

   the llvm.noalias.decl (based on %p.addr).



Example A:

  int foo(int* pA, int* pB) {

    int * restrict rpA=pA;

    *rpA=42;

    *pB=99;

    return *rpA;

  }



  (pseudo LLVM-IR)

  define @foo(i32* pA, i32* pB) {

    %rpA.address = alloca i32*

    %rpA.decl = llvm.noalias.decl %rpA.address, !metadata !10 ; declaration of a restrict pointer

    store i32* %pA, %rpA.address, !noalias !10

    %rpA = load i32* %rpA.address, !noalias !10

    %rpA.1 = i32* llvm.noalias %rpA, %rpA.decl, %rpA.address ; reading of a restrict pointer

    store i32 42, %rpA.1, !noalias !10

    store i32 99, %pB, !noalias !10

    %1 = load i32 %rpA.1, !noalias !10

    ret i32 %1

  }



NOTES:

* this is slightly different from Hal Finkel's approach, where the

  'llvm.noalias' intrinsics is used when storing a value in the pointer.

* this representation suffers from possible blocking of optimizations. But, when

  'llvm.noalias' is made too transparent, it can result in wrong code.



Summary:

* llvm.noalias.decl %p.alloc, metadata !Scope

* llvm.noalias %p, %p.decl, %p.addr



(B) The side channel



We introduce a 'side channel' for load/store instructions. The idea is that we

use the 'pointer operand' for normal pointer computations, and the 'side channel

operand' for restrict related dependencies. Optimizations (like LSR) can modify the

'pointer operand' as they see fit. As long as the 'side channel operand' is not

touched, we are still able to deduce the restrict related information. When an

optimization introduces a load/store without a side channel operand, we fall

back to the fail-safe 'worst case'.



NOTE: the name 'side channel' comes from the need to track the restrict information

  for load/store instructions 'on the side'. In a way that is hidden for normal

  optimizations.



Although the actual pointer computations can be removed in the side channel, it

can still contain PHI nodes and 'select' instructions.



As it is hard to track the usage of a pointer in clang, we will work on llvm-IR

level. Because of that the annotations exist in two states:



* Before 'noalias propagation'

  This state is produced by clang. It is similar to (A). The main difference is

  that the 'llvm.noalias' intrinsic becomes completely opaque for optimizations.



* After 'noalias propagation'

  A 'noalias propagation' pass is introduced that converts the llvm-IR code to

  the second state:

** llvm.noalias intrinsics are converted into 'llvm.side_noalias' intrinsics

** their usage is removed from the main pointer computations, and moved to the

   side channel of load/store instructions

** When a pointer depending on a llvm.noalias intrinsic is passed as an

   argument, returned from a function or stored into memory, we introduce a

   'llvm.noalias.arg.guard'.  This combines the original pointer computation

   with the side channel information. After inlining, it is also used to

   propagate the noalias information to the load/store instructions.



So, we now have two extra intrinsics:

* 'llvm.side.noalias %side.p, %p.decl, %p.addr' intrinsic:

** provides restrict information to a side channel.

** %side.p: tracks the side channel associated with the pointer value that was read.

** %p.decl: refers to the 'llvm.noalias.decl' that is associated with the

   restrict pointer.

** %p.addr: represents the address of 'object P'.



* 'llvm.noalias.arg.guard %p, %side.p' intrinsic:

** combines pointer and restrict information when a restrict pointer value

   escapes. It is normally used for function arguments, returns, or stores to

   memory.

** %p tracks the pointer computation

** %side.p tracks the side channel associated with the pointer value



Example B: After noalias propagation, example A becomes:

  (pseudo LLVM-IR)

  define @foo(i32* pA, i32* pB) {

    %rpA.address = alloca i32*

    %rpA.decl = llvm.noalias.decl %rpA.address, !metadata !10 ; declaration of a restrict pointer

    store i32* %pA, %rpA.address, noalias_sidechannel undef, !noalias !10

    %rpA = load i32* %rpA.address, noalias_sidechannel undef, !noalias !10

    %side.rpA.1 = i32* llvm.side.noalias %rpA, %rpA.decl, %rpA.address ; reading of a restrict pointer

    store i32 42, %rpA, noalias_sidechannel %side.rpA.1, !noalias !10

    store i32 99, %pB, noalias_sidechannel undef, !noalias !10

    %1 = load i32 %rpA, noalias_sidechannel %side.rpA.1, !noalias !10

    ret i32 %1

  }



Summary:

* llvm.noalias.decl %p.alloc, metadata !Scope

* llvm.noalias %p, %p.decl, %p.addr

* llvm.side.noalias %side.p, %p.decl, %p.addr

* llvm.noalias.arg.guard %p, %side.p



(C) Optimizations that eliminate the location of 'object P' from the stack



When SROA eliminates a local variable, we do not have an address for 'object P'

anymore (the alloca is removed and %p.addr becomes 'null'). At that moment we

must depend on the '!Scope' metadata to differentiate restrict objects.

For convenience, we also add this information to the llvm.noalias and

llvm.side.noalias intrinsics.



It is also possible that a single variable declaration contains multiple

restrict pointers (think of a struct containing multiple restrict pointers, or

an array of restrict pointers).  Instead of introducing new scopes when this

happens, we introduce an extra object ID (objId) parameter for

llvm.noalias.decl, llvm.noalias and llvm.side.noalias. This can be thought of as

the 'offset in the variable'.



Summary:

* llvm.noalias.decl %p.alloc, i32 objId, metadata !Scope

* llvm.noalias %p, %p.decl, %p.addr, i32 objId, metadata !Scope

* llvm.side.noalias %side.p, %p.decl, %p.addr, i32 objId, metadata !Scope

* llvm.noalias.arg.guard %p, %side.p



For alias analysis, this means that two llvm.side.noalias intrinsics represent a

different 'object P0', 'object P1', if:

* %p0.addr and $p1.addr are different

* or, objId0 and objId1 are different

* or, !Scope0 and !Scope1 are different



(D) Optimizing a restrict pointer pointing to a restrict pointer



Example:

  int * restrict * restrict ppA = ...;

  int * restrict * restrict ppB = ...;



  **ppA=42;

  **ppB=99;

  return **ppA; // according to C99, 6.7.3.1 paragraph 4, **ppA and **ppB are not aliasing



In order to allow this optimization, we also need to track the !noalias scope

where the 'llvm.noalias' intrinsic was introduced.  The %p.addr parameter in the

'llvm.side.noalias' will also get a noalias side channel, through the

%side.p.addr argument.



In short, we treat the 'llvm.noalias' and 'llvm.side.noalias' intrinsics as if

they are a memory operation.



Summary:

* llvm.noalias.decl %p.alloc, i32 objId, metadata !Scope

* llvm.noalias %p, %p.decl, %p.addr, i32 objId, metadata !Scope, !noalias !visible_scopes

* llvm.side.noalias %side.p, %p.decl, %p.addr, %side.p.addr, i32 objId, metadata !Scope, !noalias !visible_scopes

* llvm.noalias.arg.guard %p, %side.p



For alias analysis, this means that two llvm.side.noalias intrinsics represent a

different 'object P0', 'object P1', if:

* %p0.addr and $p1.addr are different

* or, objId0 and objId1 are different

* or, !Scope0 and !Scope1 are different

* or we can prove that (%p0.addr, %side.p0.addr, !visible_scopes0) and

  (%p1.addr, %side.p1.addr, !visible_scopes1) do not alias for both intrinsics.

  (as if we treat each of the two 'llvm.side.noalias' as a 'store to %p.addr'

  and we must prove that the two stores do not alias; also see [8], question 2)



(E) 'unknown function' scope



When the declaration of a restrict pointer is not visible, C99, 6.7.3.1

paragraph 2, says that the pointer is assumed to start living from 'main'.



This case can be handled by the 'unknown function' scope, which is annotated to

the function itself. This can be treated as saying: the scope of this restrict

pointer starts somewhere outside this function. In such case, the 'llvm.noalias'

and 'llvm.side.noalias' will not be associated with a 'llvm.noalias.decl'. It is

possible that after inlining, the scopes can be refined to a declaration which

became visible.



For convenience, each function can have its own unknown scope specified by a

'noalias !UnknownScope' metadata attribute on the function itself.



(F) Aggregate copies



Restrictness is introduced by 'reading a restrict pointer'. It is not always

possible to add the necessary 'llvm.noalias' annotation when this is done. An

aggregate containing one or more restrict pointers can be copied with a single

load/store pair or a llvm.memcpy. This makes it hard to track when a restrict

pointer is copied over.



For this, a final intrinsic is introduced: 'llvm.noalias.copy.guard':



* 'llvm.noalias.copy.guard %p.addr, %p.decl, metadata !indices, metadata !Scope' intrinsic

** Guards a %p.addr object that is copied as a single aggregate or llvm.memcpy

** %p.addr: the object to guard

** %p.decl: (when available), the llvm.noalias.decl associated with the object

** !indices: The !indices refers to a metadata list. Each element of the list

   refers to a set of indices where a restrict pointer is located.

** !Scope: the declaration scope of %p.decl



This information allows SROA to introduce the needed 'llvm.noalias' intrinsics

when a struct is split up.



Summary:

* potential '!noalias !UnknownScope' annotation at function level

* llvm.noalias.decl %p.alloc, i32 objId, metadata !Scope

* llvm.noalias %p, %p.decl, %p.addr, i32 objId, metadata !Scope, !noalias !visible_scopes

* llvm.side.noalias %side.p, %p.decl, %p.addr, %side.p.addr, i32 objId, metadata !Scope, !noalias !visible_scopes

* llvm.noalias.arg.guard %p, %side.p

* llvm.noalias.copy.guard %p.addr, %p.decl, metadata !indices, metadata !Scope



(G) Optimization passes



For correctness, some optimization passes must be aware of the noalias intrinsics:

inlining [7], unrolling [6], loop rotation, ...  Whenever a body is duplicated that

contains a llvm.noalias.decl, it must be decided how that duplication must be done.

Sometimes new unique scopes must be introduced, sometimes not.



Other optimization passes can perform better by knowing about the side channels: when

new load/store instructions are introduced, adding side channel information can result

in better alias analysis for those load/stores.



== c++ alias_set ==



With this framework in place, it should be easy to extend it to support the

'alias_set' proposal [3]. This can be done by tracking a separate 'universe

object', instead of 'object P'.





== References ==

* [1] https://en.wikipedia.org/wiki/Restrict

* [2] WG14 N1256: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf (Chapter 6.7.3.1 Formal definition of restrict)

* [3] WG21 N4150: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4150.pdf

* [4] https://reviews.llvm.org/D9375   Hal Finkel's local restrict patches

* [5] https://bugs.llvm.org/show_bug.cgi?id=39240 "clang/llvm looses restrictness, resulting in wrong code"

* [6] https://bugs.llvm.org/show_bug.cgi?id=39282 "Loop unrolling incorrectly duplicates noalias metadata"

* [7] https://www.godbolt.org/z/cUk6To "testcase showing that llvm-ir is not able to differentiate if restrict is done inside or outside the loop"

* [8] DR294: http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_294.htm

* [9] WG14 N2250: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2260.pdf  Clarifying the restrict Keyword v2




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190319/097e97ce/attachment-0001.html>


More information about the llvm-dev mailing list