[llvm-dev] [RFC] A new multidimensional array indexing intrinsic

Michael Kruse via llvm-dev llvm-dev at lists.llvm.org
Mon Jul 22 08:59:13 PDT 2019


Am Mo., 22. Juli 2019 um 10:50 Uhr schrieb Doerfert, Johannes
<jdoerfert at anl.gov>:
> Why introduce a new intrinsic (family)? It seems that would require us
> to support GEPs and GEP + "multi-dim" semantics in various places. What is
> the benefit over a GEP extension?

Adding an intrinsic is easier than adding or extending an existing
instruction, as suggested by
https://llvm.org/docs/ExtendingLLVM.html#introduction-and-warning

Extending GEP would require all passes to understand the added
semantics on day 1, while a new intrinsic allows us to gradually add
support.
We can still make the intrinsic into an instruction when existing
passes understand the new functionality.


> > However, alas, this is illegal, for the C language does not provide
> > semantics that allow the final inference above. It is conceivable that
> > `x1 != x2, y1 != y2`, but the indices do actually alias, since
> > according to C semantics, the two indices alias if the _flattened
> > representation of the indices alias_. Consider the parameter
> > values:
> >
> > ```
> > n = m = 3
> > x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3
> > x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3
> > ```
> >
> > Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_, and
> > so the transformation proposed in `ex1_opt` is unsound in general.
>
> I'm unsure your example actually showcases the problem:
>
> C standard, N1570 draft, Page 560, Appendix J.2 Undefined Behavior:
>
> An array subscript is out of range, even if an object is apparently
> accessible with the given subscript (as in the lvalue expression a[1][7]
> given the declaration inta[4][5]) (6.5.6).


GEP requires an array type, which in LLVM has static size. For
dynamically-sized array (In C/C++ language family, this is only
supported by C99 VLAs), clang emits the the linearized index
expression followed by a single-dimensional GEP.

The multi-dimensional GEP instruction has this semantics with the
`inrange` modifier, which clang currently does not generate for
subscript expressions. Given that there might be code around relying
on this behavior, and the C++ standard not containing this paragraph,
I am not sure we can change it to undefined behavior, even if the C
standard would permit it. I think this is a separate discussion since
the main motivation of the RFC are languages that have broader use of
runtime-length arrays.

But for the showcase, you are right in that according to the C
standard, this aliasing might be undefined behavior.

Michael


More information about the llvm-dev mailing list