[llvm-dev] RFC: Generate plain !tbaa tags in place of !tbaa.struct ones
Ivan Kosarev via llvm-dev
llvm-dev at lists.llvm.org
Thu Dec 7 11:27:31 PST 2017
Thanks Hal.
Here's the LangRef patch for the new format:
https://reviews.llvm.org/D40975
On 07/12/17 04:36, Hal Finkel wrote:
>
> On 11/02/2017 01:17 PM, Ivan Kosarev wrote:
>> On 02/11/17 05:54, Hal Finkel wrote:
>>>
>>> On 10/31/2017 05:02 AM, Ivan Kosarev wrote:
>>>> To clarify further, what this paper proposes is to use !tbaa for
>>>> all kinds of accesses, including aggregate ones, so we don't need
>>>> to bother trying to convert them when an aggregate access becomes a
>>>> series of scalar accesses or vice versa. As I said, in most cases
>>>> such conversions are not possible anyway, because !tbaa.struct tags
>>>> do not refer to the type of the aggregate itself and only enumerate
>>>> its members.
>>>
>>> That makes sense, but I don't believe you actually said *that*. I
>>> agree, the fact that the tbaa.struct metadata refers to the scalar
>>> types, and not the type itself, will prohibit the conversion in
>>> general.
>>>
>>> I agree that we should fix this. Given that we're planning to make
>>> changes, perhaps major ones, to the TBAA representation, we should
>>> account for this requirement as part of that larger discussion. I
>>> see no need for a separate thread on this particular topic in the
>>> context of those larger changes.
>>
>> Sure, we can discuss all the related things here. I don't mind if we
>> should change the subject line to something more appropriate.
>>
>>>
>>> That having been said, if we desire to fix this before those larger
>>> changes to the TBAA representation (to handle aggregates, unions,
>>> etc.), I think we should do that. In this case, however, we
>>> can/should make a small enhancement that addresses this
>>> representational deficiency. We can, for example, generate
>>> tbaa.struct metadata where the tbaa struct type is the first in the
>>> list. This is easy to distinguish from the existing format because
>>> in the existing format the first operand is an integer. This will
>>> provide the necessary information to allow SROA to create struct
>>> access tags for the scalarized accesses.
>>
>> Generating a !tbaa tag in addition to the !tbaa.struct would probably
>> be even simpler. Unfortunately, that alone wouldn't be enough as
>> !tbaa tags in place of/as part of/next to !tbaa.struct tags only make
>> sense if we can represent accesses to aggregates, so we need this to
>> be done first. Adding support for aggregate accesses, in turn,
>> requires us to be able to determine common types for aggregates plus
>> some other things that imply changes to the metadata format.
>>
>> If we are ready to discuss specific changes to the format, then I
>> would propose the following:
>>
>> * All types, including aggregate ones, to refer to their parent
>> types. Otherwise, we couldn't determine the common type for a couple
>> of aggregate final access types. Since all type nodes shall have this
>> field, it would be natural to make it first in the list while also
>> making it possible to detect the new format.
>>
>> * Encode access sizes. This is essential for a better support for
>> accesses to union members. This patch makes the first step in this
>> direction:
>>
>> https://reviews.llvm.org/D39455
>>
>> but suffers from being imprecise WRT what specific part of the union
>> object is accessed. Given we know the offset range (the base offset
>> together with the access size), we can determine if two accesses to
>> members of the same union type actually overlap.
>>
>> * Encode type sizes. This, provided in addition to access sizes, is
>> what we could use to support member arrays. As of now, we describe
>> all accesses to array elements, including elements of member arrays,
>> as if they were standalone objects, because there is no way to
>> represent an access to a range of array elements or a specific array
>> element. This means all TBAA information about the base lvalue gets
>> lost.
>>
>> Once we can encode the information about the size of the element type
>> and the size of the access, we can determine the range of potentially
>> accessed elements. This is how it is supposed work in terms of TBAA
>> metadata:
>>
>> struct S {
>> int i[7];
>> };
>>
>> ; The good old int. 4 bytes in size. Mutable.
>> !3 = !{!1, i64 4, i64 0, !"int"}
>>
>> ; A structure containing an array of seven ints.
>> ; We know the element type is int because we explicitly refer to the
>> node of that type.
>> ; We know there are 7 of them because the size of the field is 28 and
>> the size of the type is 4.
>> !5 = !{!1, i64 28, i64 0, !"_ZTS1S", !3, i64 0, i64 28}
>>
>> ; An access to all or any of the array elements in S.
>> !7 = !{!5, !3, i64 0, i64 28}
>>
>> ; An access to the element with index 2.
>> !9 = !{!5, !3, i64 8, i64 4}
>>
>> The same principle can be applied to describing accesses to slices of
>> scalarized aggregates.
>>
>> * We would need to generalize the concept of type identifiers.
>> Strings containing mangled names are not enough because tagged types
>> in C do not have mangled names and there are local types in C++ with
>> no linkage and there are tagged types that have no names. As a
>> result, in some cases like this one:
>>
>> https://bugs.llvm.org/show_bug.cgi?id=8572
>>
>> we count different types to be the same type.
>>
>> If we can allow type identifiers to be metadata nodes of unspecified
>> content, then we could generate as detailed identifiers as we need to
>> support the rules of the input language. For example, C's structures
>> and unions are subject to the type compatibility rules that respect
>> things like widths of bit fields and the order of members. And for
>> local types in C++ we could produce 'distinct' metadata nodes.
>
> Ivan, I think that all of this sounds reasonably good. I know that
> you've posted some patches:
>
> https://reviews.llvm.org/D39956
> https://reviews.llvm.org/D40438
> https://reviews.llvm.org/D39557
>
> Please, however, first post a patch with updated to the LangRef
> spelling out all of the details of the format and associated
> semantics. Once we converge on that, we can take care of the
> implementation.
>
> Thanks again,
> Hal
>
>>
>> Thanks,
>>
>>
>>>
>>> Thanks again,
>>> Hal
>>>
>>>>
>>>> As of today, there is no adequate representation that could be used
>>>> to describe accesses to individual slices of scalarized aggregates.
>>>> But at very least we could decorate accesses to slices with tags of
>>>> their aggregates. That requires aggregate accesses to be described
>>>> the same way we describe scalar accesses.
>>>>
>>>> To provide some background on importance of this matter, SROA and
>>>> passing flattened parameters/returning values in clang together
>>>> produce 75% of all undecorated loads and 90% if all undecorated
>>>> stores on the LLVM code base under -O1.
>>>>
>>>>
>>>> On 31/10/17 10:44, Ivan Kosarev wrote:
>>>>> In short, the problem with !tbaa.struct is that in most cases it
>>>>> cannot be converted to !tbaa. For transformations like SROA this
>>>>> means they cannot propagate !tbaa.struct tags for
>>>>> aggregate-accessing instructions like memcpy() calls to the
>>>>> resulting loads and stores.
>>>>>
>>>>>
>>>>> On 31/10/17 05:11, Hal Finkel wrote:
>>>>>>
>>>>>> On 10/18/2017 05:49 AM, Ivan Kosarev via llvm-dev wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> The motivation behind this proposal is to make it possible to
>>>>>>> propagate TBAA information through scalarizing transformations,
>>>>>>> such as SROA, that rewrite accesses to aggregates, e.g., memcpy()
>>>>>>> calls, into accesses to scalars, that is, load and store
>>>>>>> instructions.
>>>>>>>
>>>>>>> Currently, we decorate instructions that initialize and copy
>>>>>>> aggregates with !tbaa.struct tags that generally cannot be
>>>>>>> transformed to !tbaa tags. For this reason every time we
>>>>>>> scalarize an aggregate, we leave resulting loads and stores
>>>>>>> undecorated, meaning optimization of such instructions cannot
>>>>>>> benefit from TBAA.
>>>>>>
>>>>>> I understand that we'd like to improve TBAA in a number of
>>>>>> different ways, and that probably involves replacing !tbaa.struct
>>>>>> with an enhanced !tbaa representation, but I don't understand the
>>>>>> point of this proposal. As you state, there are lots of places
>>>>>> that could use !tbaa.struct information, such as SROA, but don't.
>>>>>> Is the current !tbaa.struct missing some information that is
>>>>>> essential for that purpose?
>>>>>>
>>>>>> -Hal
>>>>>>
>>>>>>>
>>>>>>> Furthermore, our analysis indicates that the only place where
>>>>>>> !tbaa.struct tags may potentially impact code generation is
>>>>>>> simplification of memcpy() and memmove() calls, see
>>>>>>> SimplifyMemTransfer() in InstCombineCalls.cpp. Ironically, what
>>>>>>> the code that makes that sole use of such tags is trying to do is
>>>>>>> to construct a !tbaa tag from the information encoded in the
>>>>>>> given !tbaa.struct tag. Note that it can only do that if the
>>>>>>> !tbaa.struct tag describes a structure with a single member of a
>>>>>>> scalar type.
>>>>>>>
>>>>>>> Here's how we propose to resolve the issue in terms of specific
>>>>>>> steps:
>>>>>>>
>>>>>>> 1. Extend the TBAA facilities in clang to support aggregate types
>>>>>>> as final access types. This patch:
>>>>>>>
>>>>>>> [CodeGen] Propagate may-alias'ness of lvalues with TBAA info
>>>>>>> https://reviews.llvm.org/D39008
>>>>>>>
>>>>>>> implements this for the needs of fixing issues with
>>>>>>> propagation of TBAA information, which in turn is necessary to
>>>>>>> support TBAA for unions. So once this patch is committed, this
>>>>>>> step is done.
>>>>>>>
>>>>>>> 2. Generate !tbaa tags in addition to !tbaa.struct tags for
>>>>>>> aggregate accesses.
>>>>>>>
>>>>>>> 3. Fix the TBAA analysis to support aggregate accesses as
>>>>>>> explained in this proposal:
>>>>>>>
>>>>>>> RFC: Resolving TBAA issues
>>>>>>> http://lists.llvm.org/pipermail/llvm-dev/2017-August/116652.html
>>>>>>>
>>>>>>> 4. Switch the code that simplifies memcpy() and memmove() calls
>>>>>>> to !tbaa tags.
>>>>>>>
>>>>>>> 5. Remove the support for !tbaa.struct tags.
>>>>>>>
>>>>>>> I guess we would want to remove all code which only purpose is
>>>>>>> to generate such tags, but I'm not sure what we should do with
>>>>>>> the MD_tbaa_struct enumerator. Some possible options:
>>>>>>> a) Leave it as is, just don't use it.
>>>>>>> b) Rename to something like MD_unused.
>>>>>>> c) Remove, but do not change the values of other MD_*
>>>>>>> enumerators.
>>>>>>> d) Remove and adjust values of other MD_* enumerators
>>>>>>> respectively.
>>>>>>> Or, maybe we want some multi-stage plan here?
>>>>>>>
>>>>>>> Further steps are supposed to include things like fixing SROA to
>>>>>>> propagate TBAA information.
>>>>>>>
>>>>>>> Any feedback is highly appreciated.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the llvm-dev
mailing list