[llvm-dev] [RFC] A new multidimensional array indexing intrinsic

Philip Reames via llvm-dev llvm-dev at lists.llvm.org
Tue Jul 23 09:31:50 PDT 2019


On 7/22/19 8:59 AM, Michael Kruse via llvm-dev wrote:
> Am Mo., 22. Juli 2019 um 10:50 Uhr schrieb Doerfert, Johannes
> <jdoerfert at anl.gov>:
>> Why introduce a new intrinsic (family)? It seems that would require us
>> to support GEPs and GEP + "multi-dim" semantics in various places. What is
>> the benefit over a GEP extension?
> Adding an intrinsic is easier than adding or extending an existing
> instruction, as suggested by
> https://llvm.org/docs/ExtendingLLVM.html#introduction-and-warning
In this case, that's probably bad advice.
>
> Extending GEP would require all passes to understand the added
> semantics on day 1, while a new intrinsic allows us to gradually add
> support.
> We can still make the intrinsic into an instruction when existing
> passes understand the new functionality.
>
>
>>> However, alas, this is illegal, for the C language does not provide
>>> semantics that allow the final inference above. It is conceivable that
>>> `x1 != x2, y1 != y2`, but the indices do actually alias, since
>>> according to C semantics, the two indices alias if the _flattened
>>> representation of the indices alias_. Consider the parameter
>>> values:
>>>
>>> ```
>>> n = m = 3
>>> x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3
>>> x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3
>>> ```
>>>
>>> Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_, and
>>> so the transformation proposed in `ex1_opt` is unsound in general.
>> I'm unsure your example actually showcases the problem:
>>
>> C standard, N1570 draft, Page 560, Appendix J.2 Undefined Behavior:
>>
>> An array subscript is out of range, even if an object is apparently
>> accessible with the given subscript (as in the lvalue expression a[1][7]
>> given the declaration inta[4][5]) (6.5.6).
>
> GEP requires an array type, which in LLVM has static size.
This is false.  See the last paragraph in 
http://llvm.org/docs/LangRef.html#array-type
> For
> dynamically-sized array (In C/C++ language family, this is only
> supported by C99 VLAs), clang emits the the linearized index
> expression followed by a single-dimensional GEP.
This could be changed if desired.
>
> The multi-dimensional GEP instruction has this semantics with the
> `inrange` modifier, which clang currently does not generate for
> subscript expressions. Given that there might be code around relying
> on this behavior, and the C++ standard not containing this paragraph,
> I am not sure we can change it to undefined behavior, even if the C
> standard would permit it. I think this is a separate discussion since
> the main motivation of the RFC are languages that have broader use of
> runtime-length arrays.
>
> But for the showcase, you are right in that according to the C
> standard, this aliasing might be undefined behavior.
>
> Michael
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


More information about the llvm-dev mailing list