[llvm-dev] [RFC] Aggreate load/store, proposed plan
Krzysztof Parzyszek via llvm-dev
llvm-dev at lists.llvm.org
Thu Aug 20 14:57:50 PDT 2015
We used to have something that looked like this. I remember aggregates
becoming integers with some crazy numbers of bits. I think SROA was
doing it, what happened to that?
-Krzysztof
On 8/20/2015 4:09 PM, deadal nix via llvm-dev wrote:
> Problem :
>
> Many languages define aggregates and some way to manipulate them. LLVM
> define aggregates types (arrays and structs) to handle them. However,
> when aggregate are loaded or stored, LLVM will simply ignore these up to
> the legalization in the backend. This lead to many misoptimizations.
> Most frontend are using a set of trick to work around this limtation,
> but that an undesirable situation as it increase the work required to
> write a front end. Ideally that work should be done once by LLVM instead
> of every time by each frontend.
>
> In previous discussion on the subject, many LLVM user have expressed
> interest in being able to use aggregate memory access. In addition, it
> is likely that it would have reduced the workload of some existing
> frontends.
>
> The proposed solution is to transform aggregate loads and stores to
> something that the LLVM toolcahin already understand and is able to work
> with. The proposed solution will use InstCombine to do the
> transformation as it is done early and will allow subsequent passes to
> work with something familiar (basically, canonicalization).
>
> Proposed solution :
>
> Aggregate load and store are turned into aggregate load and store of a
> scalar of the same size and alignement. Binary manipulation, like mask
> and shift, are used to build the aggregate from the scalar after loading
> and the aggregate to a scalar when storing.
>
> For instance, the following IR (extracted from a D frontend) :
>
> %B__vtbl = type { i8*, i32 (%B*)* }
> @B__vtblZ = constant %B__vtbl { i8* null, i32 (%B*)* @B.foo }
>
> %0 = tail call i8* @allocmemory(i64 32)
> %1 = bitcast i8* %0 to %B*
> store %B { %B__vtbl* @B__vtblZ, i32 42 }, %B* %1, align 8
>
> Would be canonized into :
> %0 = tail call i8* @allocmemory(i64 32)
> %1 = bitcast i8* %0 to i128*
> store i128 or (i128 zext (i64 ptrtoint (%B__vtbl* @B__vtblZ to i64) to
> i128), i128 774763251095801167872), i128* %1, align 8
>
> Which the rest of the LLVM pipeline can work with.
>
> Limitations :
>
> 1/ This solution will not handle properly large (tens of kilobytes)
> aggregates. It is an accepted limitation, both for this proposal and
> other part of the pipeline that handle aggregates. Optionally, checks
> can be added both for this canonicalization and SROA to disable them on
> very large aggregates as to avoid wasting work that won't yield good
> codegen at the end anyway. This limitation should not be a blocker as
> most aggregate are fairly small. For instance, some language make heavy
> use of fat pointers, and would greatly benefit from this canonicalization.
>
> 2/ This solution will generate loads and stores of value that may not be
> natively supported by the hardware. The hardware do not natively support
> aggregate to begin with, so both original IR and canonized IR will
> require optimization. This is not ideal, but the canonicalization is a
> plus for 2 reasons:
> - A subset of these memory access won't need canonicalization anymore.
> - Other passes in LLVM will be able to work with these load and
> perform adequate transformations.
>
> Possible alternatives :
>
> In order to mitigate 1/ it is possible to gate the canonicalization to
> aggregate under a certain size. This essentially avoiding to do work
> that will lead to bad codegen no matter what.
> In order to mitigate 2/, it is possible to slice aggregates loads and
> stores according to the target's data layout. This CANNOT be implemented
> for atomic/volatile as it would change semantic, but can be done for
> regulars ones, which are the most commons.
>
> Do that looks better as an RFC ?
>
> 2015-08-19 22:11 GMT-07:00 Hal Finkel <hfinkel at anl.gov
> <mailto:hfinkel at anl.gov>>:
>
> ----- Original Message -----
> > From: "Mehdi Amini via llvm-dev" <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
> > To: "deadal nix" <deadalnix at gmail.com <mailto:deadalnix at gmail.com>>
> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
> > Sent: Wednesday, August 19, 2015 7:24:28 PM
> > Subject: Re: [llvm-dev] [RFC] Aggreate load/store, proposed plan
> >
> > Hi,
> >
> > To be sure, because the RFC below is not detailed and assume everyone
> > knows about all the emails from 10 months ago,
>
> I agree. The RFC needs to summarize the problems and the potential
> solutions.
>
> > is there more to do
> > than what is proposed inhttp://reviews.llvm.org/D9766 ?
> >
> > So basically the proposal is that *InstCombine*
>
> I think that fixing this early in the optimizer makes sense
> (InstCombine, etc.). This seems little different from any other
> canonicalization problem. These direct aggregate IR values are valid
> IR, but not our preferred canonical form, so we should transform the
> IR, when possible, into our preferred canonical form.
>
> -Hal
>
> > turns aggregate
> > load/store into a load/store using an integer of equivalent size and
> > insert the correct bitcast before/after, right?
> >
> > Example is:
> >
> > %0 = tail call i8* @allocmemory(i64 32)
> > %1 = bitcast i8* %0 to %B*
> > store %B { %B__vtbl* @B__vtblZ, i32 42 }, %B* %1, align 8
> >
> > into:
> >
> > store i128 or (i128 zext (i64 ptrtoint (%B__vtbl* @B__vtblZ to i64)
> > to i128), i128 774763251095801167872), i128* %1, align 8
> >
> > Where the aggregate is:
> >
> > %B__vtbl = type { i8*, i32 (%B*)* }
> > @B__vtblZ = constant %B__vtbl { i8* null, i32 (%B*)* @B.foo }
> >
> >
> > Thanks,
> >
> > —
> > Mehdi
> >
> >
> > > On Aug 19, 2015, at 5:02 PM, deadal nix via llvm-dev
> > > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> > >
> > > It is pretty clear people need this. Let's get this moving.
> > >
> > > I'll try to sum up the point that have been made and I'll try to
> > > address them carefully.
> > >
> > > 1/ There is no good solution for large aggregates.
> > > That is true. However, I don't think this is a reason to not
> > > address smaller aggregates, as they appear to be needed.
> > > Realistically, the proportion of aggregates that are very large is
> > > small, and there is no expectation that such a thing would map
> > > nicely to the hardware anyway (the hardware won't have enough
> > > registers to load it all anyway). I do think this is reasonable to
> > > expect a reasonable handling of relatively small aggregates like
> > > fat pointers while accepting that larges ones will be inefficient.
> > >
> > > This limitation is not unique to the current discussion, as SROA
> > > suffer from the same limitation.
> > > It is possible to disable to transformation for aggregates that are
> > > too large if this is too big of a concern. It should maybe also be
> > > done for SROA.
> > >
> > > 2/ Slicing the aggregate break the semantic of atomic/volatile.
> > > That is true. It means slicing the aggregate should not be done for
> > > atomic/volatile. It doesn't mean this should not be done for
> > > regular ones as it is reasonable to handle atomic/volatile
> > > differently. After all, they have different semantic.
> > >
> > > 3/ Not slicing can create scalar that aren't supported by the
> > > target. This is undesirable.
> > > Indeed. But as always, the important question is compared to what ?
> > >
> > > The hardware has no notion of aggregate, so an aggregate or a large
> > > scalar ends up both requiring legalization. Doing the
> > > transformation is still beneficial :
> > > - Some aggregates will generate valid scalars. For such aggregate,
> > > this is 100% win.
> > > - For aggregate that won't, the situation is still better as
> > > various optimization passes will be able to handle the load in a
> > > sensible manner.
> > > - The transformation never make the situation worse than it is to
> > > begin with.
> > >
> > > On previous discussion, Hal Finkel seemed to think that the scalar
> > > solution is preferable to the slicing one.
> > >
> > > Is that a fair assessment of the situation ? Considering all of
> > > this, I think the right path forward is :
> > > - Go for the scalar solution in the general case.
> > > - If that is a problem, the slicing approach can be used for non
> > > atomic/volatile.
> > > - If necessary, disable the transformation for very large
> > > aggregates (and consider doing so for SROA as well).
> > >
> > > Do we have a plan ?
> > >
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> > >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQIGaQ&c=eEvniauFctOgLOKGJOplqw&r=v-ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=KkqzAZMcLUlWa3Uwmbr4DQqJdYQAzN_pFY3M8dzVdZ8&s=SFb1jraizjgechN0Pq3738tzBZyK8dZRqIU8Zfi_Qns&e=
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
More information about the llvm-dev
mailing list