[llvm-dev] [RFC] Aggreate load/store, proposed plan

Thu Aug 20 14:57:50 PDT 2015

We used to have something that looked like this.  I remember aggregates 
becoming integers with some crazy numbers of bits.  I think SROA was 
doing it, what happened to that?

-Krzysztof

On 8/20/2015 4:09 PM, deadal nix via llvm-dev wrote:
> Problem :
>
> Many languages define aggregates and some way to manipulate them. LLVM
> define aggregates types (arrays and structs) to handle them. However,
> when aggregate are loaded or stored, LLVM will simply ignore these up to
> the legalization in the backend. This lead to many misoptimizations.
> Most frontend are using a set of trick to work around this limtation,
> but that an undesirable situation as it increase the work required to
> write a front end. Ideally that work should be done once by LLVM instead
> of every time by each frontend.
>
> In previous discussion on the subject, many LLVM user have expressed
> interest in being able to use aggregate memory access. In addition, it
> is likely that it would have reduced the workload of some existing
> frontends.
>
> The proposed solution is to transform aggregate loads and stores to
> something that the LLVM toolcahin already understand and is able to work
> with. The proposed solution will use InstCombine to do the
> transformation as it is done early and will allow subsequent passes to
> work with something familiar (basically, canonicalization).
>
> Proposed solution :
>
> Aggregate load and store are turned into aggregate load and store of a
> scalar of the same size and alignement. Binary manipulation, like mask
> and shift, are used to build the aggregate from the scalar after loading
> and the aggregate to a scalar when storing.
>
> For instance, the following IR (extracted from a D frontend) :
>
> %B__vtbl = type { i8*, i32 (%B*)* }
> @B__vtblZ = constant %B__vtbl { i8* null, i32 (%B*)* @B.foo }
>
> %0 = tail call i8* @allocmemory(i64 32)
> %1 = bitcast i8* %0 to %B*
> store %B { %B__vtbl* @B__vtblZ, i32 42 }, %B* %1, align 8
>
> Would be canonized into :
> %0 = tail call i8* @allocmemory(i64 32)
> %1 = bitcast i8* %0 to i128*
> store i128 or (i128 zext (i64 ptrtoint (%B__vtbl* @B__vtblZ to i64) to
> i128), i128 774763251095801167872), i128* %1, align 8
>
> Which the rest of the LLVM pipeline can work with.
>
> Limitations :
>
> 1/ This solution will not handle properly large (tens of kilobytes)
> aggregates. It is an accepted limitation, both for this proposal and
> other part of the pipeline that handle aggregates. Optionally, checks
> can be added both for this canonicalization  and SROA to disable them on
> very large aggregates as to avoid wasting work that won't yield good
> codegen at the end anyway. This limitation should not be a blocker as
> most aggregate are fairly small. For instance, some language make heavy
> use of fat pointers, and would greatly benefit from this canonicalization.
>
> 2/ This solution will generate loads and stores of value that may not be
> natively supported by the hardware. The hardware do not natively support
> aggregate to begin with, so both original IR and canonized IR will
> require optimization. This is not ideal, but the canonicalization is a
> plus for 2 reasons:
>   - A subset of these memory access won't need canonicalization anymore.
>   - Other passes in LLVM will be able to work with these load and
> perform adequate transformations.
>
> Possible alternatives :
>
> In order to mitigate 1/ it is possible to gate the canonicalization to
> aggregate under a certain size. This essentially avoiding to do work
> that will lead to bad codegen no matter what.
> In order to mitigate 2/, it is possible to slice aggregates loads and
> stores according to the target's data layout. This CANNOT be implemented
> for atomic/volatile as it would change semantic, but can be done for
> regulars ones, which are the most commons.
>
> Do that looks better as an RFC ?
>
> 2015-08-19 22:11 GMT-07:00 Hal Finkel <hfinkel at anl.gov
> <mailto:hfinkel at anl.gov>>:
>
>     ----- Original Message -----
>     > From: "Mehdi Amini via llvm-dev" <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
>     > To: "deadal nix" <deadalnix at gmail.com <mailto:deadalnix at gmail.com>>
>     > Cc: "llvm-dev" <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
>     > Sent: Wednesday, August 19, 2015 7:24:28 PM
>     > Subject: Re: [llvm-dev] [RFC] Aggreate load/store, proposed plan
>     >
>     > Hi,
>     >
>     > To be sure, because the RFC below is not detailed and assume everyone
>     > knows about all the emails from 10 months ago,
>
>     I agree. The RFC needs to summarize the problems and the potential
>     solutions.
>
>     > is there more to do
>     > than what is proposed inhttp://reviews.llvm.org/D9766 ?
>     >
>     > So basically the proposal is that *InstCombine*
>
>     I think that fixing this early in the optimizer makes sense
>     (InstCombine, etc.). This seems little different from any other
>     canonicalization problem. These direct aggregate IR values are valid
>     IR, but not our preferred canonical form, so we should transform the
>     IR, when possible, into our preferred canonical form.
>
>       -Hal
>
>      > turns aggregate
>      > load/store into a load/store using an integer of equivalent size and
>      > insert the correct bitcast before/after, right?
>      >
>      > Example is:
>      >
>      >   %0 = tail call i8* @allocmemory(i64 32)
>      >   %1 = bitcast i8* %0 to %B*
>      >   store %B { %B__vtbl* @B__vtblZ, i32 42 }, %B* %1, align 8
>      >
>      > into:
>      >
>      > store i128 or (i128 zext (i64 ptrtoint (%B__vtbl* @B__vtblZ to i64)
>      > to i128), i128 774763251095801167872), i128* %1, align 8
>      >
>      > Where the aggregate is:
>      >
>      > %B__vtbl = type { i8*, i32 (%B*)* }
>      > @B__vtblZ = constant %B__vtbl { i8* null, i32 (%B*)* @B.foo }
>      >
>      >
>      > Thanks,
>      >
>      > —
>      > Mehdi
>      >
>      >
>      > > On Aug 19, 2015, at 5:02 PM, deadal nix via llvm-dev
>      > > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>      > >
>      > > It is pretty clear people need this. Let's get this moving.
>      > >
>      > > I'll try to sum up the point that have been made and I'll try to
>      > > address them carefully.
>      > >
>      > > 1/ There is no good solution for large aggregates.
>      > > That is true. However, I don't think this is a reason to not
>      > > address smaller aggregates, as they appear to be needed.
>      > > Realistically, the proportion of aggregates that are very large is
>      > > small, and there is no expectation that such a thing would map
>      > > nicely to the hardware anyway (the hardware won't have enough
>      > > registers to load it all anyway). I do think this is reasonable to
>      > > expect a reasonable handling of relatively small aggregates like
>      > > fat pointers while accepting that larges ones will be inefficient.
>      > >
>      > > This limitation is not unique to the current discussion, as SROA
>      > > suffer from the same limitation.
>      > > It is possible to disable to transformation for aggregates that are
>      > > too large if this is too big of a concern. It should maybe also be
>      > > done for SROA.
>      > >
>      > > 2/ Slicing the aggregate break the semantic of atomic/volatile.
>      > > That is true. It means slicing the aggregate should not be done for
>      > > atomic/volatile. It doesn't mean this should not be done for
>      > > regular ones as it is reasonable to handle atomic/volatile
>      > > differently. After all, they have different semantic.
>      > >
>      > > 3/ Not slicing can create scalar that aren't supported by the
>      > > target. This is undesirable.
>      > > Indeed. But as always, the important question is compared to what ?
>      > >
>      > > The hardware has no notion of aggregate, so an aggregate or a large
>      > > scalar ends up both requiring legalization. Doing the
>      > > transformation is still beneficial :
>      > >  - Some aggregates will generate valid scalars. For such aggregate,
>      > >  this is 100% win.
>      > >  - For aggregate that won't, the situation is still better as
>      > >  various optimization passes will be able to handle the load in a
>      > >  sensible manner.
>      > >  - The transformation never make the situation worse than it is to
>      > >  begin with.
>      > >
>      > > On previous discussion, Hal Finkel seemed to think that the scalar
>      > > solution is preferable to the slicing one.
>      > >
>      > > Is that a fair assessment of the situation ? Considering all of
>      > > this, I think the right path forward is :
>      > >  - Go for the scalar solution in the general case.
>      > >  - If that is a problem, the slicing approach can be used for non
>      > >  atomic/volatile.
>      > >  - If necessary, disable the transformation for very large
>      > >  aggregates (and consider doing so for SROA as well).
>      > >
>      > > Do we have a plan ?
>      > >
>      > > _______________________________________________
>      > > LLVM Developers mailing list
>      > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>      > >
>     https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQIGaQ&c=eEvniauFctOgLOKGJOplqw&r=v-ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=KkqzAZMcLUlWa3Uwmbr4DQqJdYQAzN_pFY3M8dzVdZ8&s=SFb1jraizjgechN0Pq3738tzBZyK8dZRqIU8Zfi_Qns&e=
>      >
>      > _______________________________________________
>      > LLVM Developers mailing list
>      > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>      > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>      >
>
>     --
>     Hal Finkel
>     Assistant Computational Scientist
>     Leadership Computing Facility
>     Argonne National Laboratory
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation