<div><div dir="auto">I’ve written a number of benchmarks related to multidimensional arrays (d=2 to d=6) in multiple languages (C99 and Fortran 77 in one case and 5+ languages in others). Would any of that be useful here, or is the value obvious enough without benchmarking?</div></div><div dir="auto"><br></div><div dir="auto">Jeff</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jul 24, 2019 at 7:34 PM David Greene via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">It's also very common in Fortran.<br>
<br>
-David<br>
<br>
Michael Ferguson via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> writes:<br>
<br>
>> It seems that the main advantage of your proposal is that it would<br>
>> allow for non-constant strides (i.e. variable length arrays) in<br>
>> dimensions other than the first one. Do these appear frequently<br>
>> enough in the programs that you're interested in to be worth<br>
>> optimizing for?<br>
><br>
> Yes - at least in Chapel (which is one of the motivating languages)<br>
> these are very common.<br>
> In other words, typical Chapel programs use arrays with a size that is<br>
> not fixed at compile time.<br>
><br>
> Thanks,<br>
><br>
> -michael<br>
><br>
> On Mon, Jul 22, 2019 at 1:02 PM Peter Collingbourne via llvm-dev<br>
> <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>
>><br>
>> The restrictions of this new intrinsic seem very similar to those of<br>
>> the inrange attribute on GEPs. It seems that the main advantage of<br>
>> your proposal is that it would allow for non-constant strides<br>
>> (i.e. variable length arrays) in dimensions other than the first<br>
>> one. Do these appear frequently enough in the programs that you're<br>
>> interested in to be worth optimizing for?<br>
>><br>
>> Peter<br>
>><br>
>> On Sun, Jul 21, 2019 at 3:54 PM Siddharth Bhat via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>
>>><br>
>>> Hello,<br>
>>><br>
>>> We would like to begin discussions around a new set of intrinsics, to better express<br>
>>> multi-dimensional array indexing within LLVM. The motivations and a possible design<br>
>>> are sketched out below.<br>
>>><br>
>>> Rendered RFC link here<br>
>>><br>
>>> Raw markdown:<br>
>>><br>
>>> # Introducing a new multidimensional array indexing intrinsic<br>
>>><br>
>>> ## The request for change: A new intrinsic, `llvm.multidim.array.index.*`.<br>
>>><br>
>>> We propose the addition of a new family of intrinsics, `llvm.multidim.array.index.*`.<br>
>>> This will allow us to represent array indexing into an array `A[d1][d2][..][dk]`<br>
>>> as `llvm.multidim.array.index.k A d1, d2, d3, ... dk` instead of flattening<br>
>>> the information into `gep A, d1 * n1 + d2 * n2 + ... + dk * nk`. The former<br>
>>> unflattened representation is advantageous for analysis and optimisation. It also<br>
>>> allows us to represent array indexing semantics of languages such as Fortran<br>
>>> and Chapel, which differs from that of C based array indexing.<br>
>>><br>
>>> ## Motivating the need for a multi-dimensional array indexing intrinsic<br>
>>><br>
>>> There are primarily one of two views of array indexing involving<br>
>>> multiple dimensions most languages take, which we discuss to motivate<br>
>>> the need for a multi-dimensional array index. This consideration impacts<br>
>>> the kinds of analysis we can perform on the program. In Polly, we care about<br>
>>> dependence analysis, so the examples here will focus on that particular problem.<br>
>>><br>
>>> Let us consider an array indexing operation of the form:<br>
>>> ```cpp<br>
>>> int ex1(int n, int m, B[n][m], int x1, int x2, int y1, int y2) {<br>
>>> __builtin_assume(x1 != x2);<br>
>>> __builtin_assume(y1 != y2);<br>
>>> B[x1][y1] = 1;<br>
>>> printf("%d", B[x2][y2]);<br>
>>> exit(0);<br>
>>> }<br>
>>> ```<br>
>>><br>
>>> One would like to infer that the array indices _interpreted as tuples_<br>
>>> `(x1, y1)` and `(x2, y2)` do not have the same value, due to the guarding asserts<br>
>>> that `x1 != x2` and `y1 != y2`. As a result, the write `B[x1][y1] = 1` can<br>
>>> in no way interfere with the value of `B[x2][y2]`. Consquently,<br>
>>> we can optimise the program into:<br>
>>><br>
>>><br>
>>> ```cpp<br>
>>> int ex1_opt(int n, int m, B[n][m], int x1, int x2, int y1, int y2) {<br>
>>> // B[x1][y1] = 1 is omitted because the result<br>
>>> // cannot be used:<br>
>>> // It is not used in the print and then the program exits<br>
>>> printf("%d", B[x2][y2]);<br>
>>> exit(0);<br>
>>> }<br>
>>> ```<br>
>>><br>
>>> However, alas, this is illegal, for the C language does not provide<br>
>>> semantics that allow the final inference above. It is conceivable that<br>
>>> `x1 != x2, y1 != y2`, but the indices do actually alias, since<br>
>>> according to C semantics, the two indices alias if the _flattened<br>
>>> representation of the indices alias_. Consider the parameter<br>
>>> values:<br>
>>><br>
>>> ```<br>
>>> n = m = 3<br>
>>> x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3<br>
>>> x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3<br>
>>> ```<br>
>>><br>
>>> Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_, and<br>
>>> so the transformation proposed in `ex1_opt` is unsound in general.<br>
>>><br>
>>><br>
>>> In contrast, many langagues other than C require that index<br>
>>> expressions for multidimensional arrays have each component within the<br>
>>> array dimension for that component. As a result, in the example above,<br>
>>> the index pair `(0,3)` would be out-of-bounds. In languages with these<br>
>>> semantics, one can infer that the indexing:<br>
>>><br>
>>> `[x1][y1] != [x2][y2]` iff `x1 != x2 || y1 != y2`.<br>
>>><br>
>>> While this particular example is not very interesting, it shows the<br>
>>> spirit of the lack of expressiveness in LLVM we are trying to<br>
>>> improve.<br>
>>><br>
>>> Julia, Fortran, and Chapel are examples of such languages which target<br>
>>> LLVM.<br>
>>><br>
>>> Currently, the LLVM semantics of `getelementptr` talk purely of<br>
>>> the C style flattened views of arrays. This inhibits the ability of the optimiser<br>
>>> to understand the multidimensional examples as given above, and we<br>
>>> are forced to make conservative assumptions, inhibiting optimisation.<br>
>>> This information must ideally be expressible in LLVM, such that LLVM's optimisers and<br>
>>> alias analysis can use this information to model multidimensional-array<br>
>>> semantics.<br>
>>><br>
>>> There is a more realistic (and involved) example in [Appendix A](#Appendix-A)<br>
>>> in the same spirit as the above simple example, but one a compiler might<br>
>>> realistically wish to perform.<br>
>>><br>
>>> ## Evaluation of the impact of the intrinsic on accuracy of dependence analysis<br>
>>><br>
>>> - This has been implemented in an exprimental branch of Polly, and was used<br>
>>> on the COSMO climate weather model. This greatly helped increase the accuracy<br>
>>> of Polly's analysis, since we eliminated the guessing game from the array analysis.<br>
>>><br>
>>> - This has also been implemented as part of a GSoC effort to unify<br>
>>> Chapel and Polly.<br>
>>><br>
>>> - Molly, the distributed version of Polly written by Michael Kruse for his PhD<br>
>>> also implemented a similar scheme. In his use-case, optimistic run-time checks<br>
>>> with delinearization was not possible, so this kind of intrinsic was _necessary_<br>
>>> for the application, not just _good to have_. More details are available<br>
>>> in his PhD thesis: [Lattice QCD Optimization and Polytopic Representations of Distributed Memory](<a href="https://tel.archives-ouvertes.fr/tel-01078440)" rel="noreferrer" target="_blank">https://tel.archives-ouvertes.fr/tel-01078440)</a>.>> In particular, Chapter 9 contains a detailed discussion.<br>
>>><br>
>>> # Representations<br>
>>><br>
>>> ## Intrinsic<br>
>>><br>
>>> ### Syntax<br>
>>> ```<br>
>>> <result> = llvm.multidim.array.index.* <ty> <ty>* <ptrval> {<stride>, <idx>}*<br>
>>> ```<br>
>>><br>
>>> ### Overview:<br>
>>><br>
>>> The `llvm.multidim.array.index.*` intrinsic is used to get the address of<br>
>>> an element from an array. It performs address calcuation only and<br>
>>> does not access memory. It is similar to `getelementptr`. However, it imposes<br>
>>> additional semantics which allows the optimiser to provide better optimisations<br>
>>> than `getlementptr`.<br>
>>><br>
>>><br>
>>><br>
>>> ### Arguments:<br>
>>><br>
>>> The first argument is always a type used as the basis for the calculations. The<br>
>>> second argument is always a pointer, and is the base address to start the<br>
>>> calculation from. The remaining arguments are a list of pairs. Each pair<br>
>>> contains a dimension stride, and an offset with respect to that stride.<br>
>>><br>
>>><br>
>>> ### Semantics:<br>
>>><br>
>>> `llvm.multidim.array.index.*` represents a multi-dimensional array index, In particular, this will<br>
>>> mean that we will assume that all indices `<idx_i>` are non-negative.<br>
>>><br>
>>> Additionally, we assume that, for each `<str_i>`, `<idx_i>` pair, that<br>
>>> `0 <= idx_i < str_i`.<br>
>>><br>
>>> Optimizations can assume that, given two llvm.multidim.array.index.* instructions with matching types:<br>
>>><br>
>>> ```<br>
>>> llvm.multidim.array.index.* <ty> <ty>* <ptrvalA> <strA_1>, <idxA_1>, ..., <strA_N>, <idxA_N><br>
>>> llvm.multidim.array.index.* <ty> <ty>* <ptrvalB> <strB_1>, <idxB_1>, ..., <strB_N>, <idxb_N><br>
>>> ```<br>
>>><br>
>>> If `ptrvalA == ptrvalB` and the strides are equal `(strA_1 == strB_1 && ... && strA_N == strB_N)` then:<br>
>>><br>
>>> - If all the indices are equal (that is, `idxA_1 == idxB_1 && ... && idxA_N == idxB_N`),<br>
>>> then the resulting pointer values are equal.<br>
>>><br>
>>> - If any index value is not equal (that is, there exists an `i` such that `idxA_i != idxB_i`),<br>
>>> then the resulting pointer values are not equal.<br>
>>><br>
>>><br>
>>> ##### Address computation:<br>
>>> Consider an invocation of `llvm.multidim.array.index.*`:<br>
>>><br>
>>> ```<br>
>>> <result> = call @llvm.multidim.array.index.* <ty> <ty>* <ptrval> <str_0>, <idx_0>, <str_1> <idx_1>, ..., <str_n> <idx_n><br>
>>> ```<br>
>>><br>
>>> If the pairs are denoted by `(str_i, idx_i)`, where `str_i` denotes the stride<br>
>>> and `idx_i` denotes the index of the ith pair, then the final address (in bytes)<br>
>>> is computed as:<br>
>>><br>
>>> ```<br>
>>> ptrval + len(ty) * [(str_0 * idx_0) + (str_1 * idx_1) + ... (str_n * idx_n)]<br>
>>> ```<br>
>>><br>
>>> ## Transitioning to `llvm.multidim.array.index.*`: Allow `multidim_array_index` to refer to a GEP instruction:<br>
>>><br>
>>> This is a sketch of how we might gradually introduce the `llvm.multidim.array.index.*`<br>
>>> intrinsic into LLVM without immediately losing the analyses<br>
>>> that are performed on `getelememtptr` instructions. This section<br>
>>> lists out some possible choices that we have, since the authors<br>
>>> do not have a "best" solution.<br>
>>><br>
>>> ##### Choice 1: Write a `llvm.multidim.array.index.*` to `GEP` pass, with the `GEP` annotated with metadata<br>
>>><br>
>>> This pass will flatten all `llvm.multidim.array.index.*` expressions into a `GEP` annotated with metadata. This metadata will indicate that the index expression computed by the lowered GEP is guaranteed to be in a canonical form which allows the analysis<br>
>>> to infer stride and index sizes.<br>
>>><br>
>>> A multidim index of the form:<br>
>>> ```<br>
>>> %arrayidx = llvm.multidim.array.index.* i64 i64* %A, %str_1, %idx_1, %str_2, %idx_2<br>
>>> ```<br>
>>><br>
>>> is lowered to:<br>
>>><br>
>>> ```<br>
>>> %mul1 = mul nsw i64 %str_1, %idx_1<br>
>>> %mul2 = mul1 nsw i64 %str_2, %idx_2<br>
>>> %total = add nsw i64 %mul2, %mul1<br>
>>> %arrayidx = getelementptr inbounds i64, i64* %A, i64 %total, !multidim !1<br>
>>> ```<br>
>>> with guarantees that the first term in each multiplication is the stride<br>
>>> and the second term in each multiplication is the index. (What happens<br>
>>> if intermediate transformation passes decide to change the order? This seems<br>
>>> complicated).<br>
>>><br>
>>> **TODO:** Lookup how to attach metadata such that the metadata can communicate<br>
>>> which of the values are strides and which are indeces<br>
>>><br>
>>><br>
>>><br>
>>> # Caveats<br>
>>><br>
>>> Currently, we assume that the array shape is immutable. However, we will need to deal with<br>
>>> being able to express `reshape` like primitives where the array shape can be mutated. However,<br>
>>> this appears to make expressing this information quite difficult: We now need to attach the shape<br>
>>> information to an array per "shape-live-range".<br>
>>><br>
>>><br>
>>> ## Appendix-A<br>
>>><br>
>>> ##### A realistic, more involved example of dependence analysis going wrong<br>
>>><br>
>>> ```cpp<br>
>>> // In an array A of size (n0 x n1),<br>
>>> // fill a subarray of size (s0 x s1)<br>
>>> // which starts at an offset (o0 x o1) in the larger array A<br>
>>> void set_subarray(unsigned n0, unsigned n1,<br>
>>> unsigned o0, unsigned o1,<br>
>>> unsigned s0, unsigned s1,<br>
>>> float A[n0][n1]) {<br>
>>> for (unsigned i = 0; i < s0; i++)<br>
>>> for (unsigned j = 0; j < s1; j++)<br>
>>> S: A[i + o0][j + o1] = 1;<br>
>>> }<br>
>>> ```<br>
>>> We first reduce this index expression to a sum of products:<br>
>>><br>
>>> ```<br>
>>> (i + o0) * n1 + (j + o1) = n1i + n1o0 + j + o1<br>
>>> ix(i, j, n0, n1, o0, o1) = n1i + n1o0 + j + o1<br>
>>> ```<br>
>>><br>
>>> `ix` is the index expression which `LLVM` will see, since it is fully<br>
>>> flattened, in comparison with the multi-dimensional index expression<br>
>>> `index:[i + o0][j + o1] | sizes:[][n1]`.<br>
>>><br>
>>> We will now show _why_ this multi-dimensional index is not always correct,<br>
>>> and why guessing for one language will not work for another:<br>
>>><br>
>>> Consider a call `set_subarray(n0=8, n1=9, o0=4, o1=6, s0=3, s1=6)`. At face<br>
>>> value, this is incorrect if we view the array as 2D grid, since the size<br>
>>> of the array is `(n0, n1) = (8, 9)`, but we are writing an array of size<br>
>>> `(s0, s1) = (3, 6)` starting from `(o0, o1) = (4, 6)`. Clearly, we will<br>
>>> exceed the width of the array, since `(s1 + o1 = 6 + 6 = 12) > (n1 = 9)`.<br>
>>> However, now think of the array as a flattened 1D representation. In this<br>
>>> case, the total size of the array is `n1xn2 = 8x9 = 72`, while the largest<br>
>>> element we will access is at the largest value of `(i, j)`. That is,<br>
>>> `i = s0 - 1 = 2`, and `j = s1 - 1 = 5`.<br>
>>><br>
>>> The largest index will be `ix(i=2, j=5, n0=8, n1=9, o0=4, o1=6) = 8*2+8*4+5+6=59`.<br>
>>> Since `59 < 72`, we are clearly at _legal_ array indices, by C semantics!<br>
>>><br>
>>> The definition of the semantics of the language **changed the illegal<br>
>>> multidimensional access** (which is illegal since it exceeds the `n1`<br>
>>> dimension), into a **legal flattened 1D access** (which is legal since the<br>
>>> flattened array indices are inbounds).<br>
>>><br>
>>> LLVM has no way of expressing these two different semantics. Hence, we are<br>
>>> forced to:<br>
>>> 1. Consider flattened 1D accesses, which makes analysis of index expressions<br>
>>> equivalent to analysis of polynomials over the integers, which is hard.<br>
>>> 2. Guess multidimensional representations, and use them at the expense of<br>
>>> soundness bugs as shown above.<br>
>>> 3. Guess multidimensional representations, use them, and check their validity<br>
>>> at runtime, causing a runtime performance hit. This implementation follows<br>
>>> the description from the paper [Optimistic Delinearization of Parametrically Sized Arrays](optim-delin-parametrically-sized-arrays).<br>
>>><br>
>>><br>
>>> Currently, Polly opts for option (3), which is to emit runtime checks. If<br>
>>> the run-time checks fail, then Polly will not run its optimised code. Instead,<br>
>>> It keeps a copy of the unoptimised code around, which is run in this case.<br>
>>> Note that this effectively doubles the amount of performance-sensitive code<br>
>>> which is finally emitted after running Polly.<br>
>>><br>
>>> Ideally, we would like a mechanism to directly express the multidimensional<br>
>>> semantics, which would eliminate this kind of guesswork from Polly/LLVM,<br>
>>> which would both make code faster, and easier to analyze.<br>
>>><br>
>>> ## References<br>
>>> - [The chapel language specification](<a href="https://chapel-lang.org/docs/1.13/_downloads/chapelLanguageSpec.pdf" rel="noreferrer" target="_blank">https://chapel-lang.org/docs/1.13/_downloads/chapelLanguageSpec.pdf</a>)>> - [Fortran 2003 standard](<a href="http://www.j3-fortran.org/doc/year/04/04-007.pdf%7D%7BFortran" rel="noreferrer" target="_blank">http://www.j3-fortran.org/doc/year/04/04-007.pdf}{Fortran</a> 2003 standard)<br>
>>> - [C++ subscripting](<a href="http://eel.is/c++draft/expr.sub" rel="noreferrer" target="_blank">http://eel.is/c++draft/expr.sub</a>)>> - [Michael Kruse's PhD thesis: Lattice QCD Optimization and Polytopic Representations of Distributed Memory](<a href="https://tel.archives-ouvertes.fr/tel-01078440)" rel="noreferrer" target="_blank">https://tel.archives-ouvertes.fr/tel-01078440)</a>.>> - [Molly source code link for the intrinsic](<a href="https://github.com/Meinersbur/llvm/blob/molly/include/llvm/IR/IntrinsicsMolly.td#L3" rel="noreferrer" target="_blank">https://github.com/Meinersbur/llvm/blob/molly/include/llvm/IR/IntrinsicsMolly.td#L3</a>)>> -[Optimistic Delinearization of Parametrically Sized Arrays](optim-delin-parametrically-sized-arrays)<br>
>>><br>
>>> [optim-delin-parametrically-sized-arrays]: <a href="http://delivery.acm.org/10.1145/2760000/2751248/p351-grosser.pdf?ip=93.3.109.183&id=2751248&acc=CHORUS&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1557948443_2f56675c6d04796f27b84593535c9f70" rel="noreferrer" target="_blank">http://delivery.acm.org/10.1145/2760000/2751248/p351-grosser.pdf?ip=93.3.109.183&id=2751248&acc=CHORUS&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1557948443_2f56675c6d04796f27b84593535c9f70</a>>><br>
>>><br>
>>> _______________________________________________<br>
>>> LLVM Developers mailing list<br>
>>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
>>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>><br>
>><br>
>><br>
>> --<br>
>> --<br>
>> Peter<br>
>> _______________________________________________<br>
>> LLVM Developers mailing list<br>
>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev_______________________________________________" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev_______________________________________________</a><br>
> LLVM Developers mailing list<br>
> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>