[llvm-dev] New intrinsic property IntrOnlyWrite
Nicolai Hähnle via llvm-dev
llvm-dev at lists.llvm.org
Fri Apr 1 14:25:14 PDT 2016
following up on this discussion, I cleaned things up a bit and explored
the direction of an IR attribute (in addition to the LLVM intrinsic
property which is for TableGen only).
There are two proposed changes:
1) http://reviews.llvm.org/D18291 -- this only affects TableGen and the
AMDGPU backend. It's a small change which I'd like to commit as soon as
2) http://reviews.llvm.org/D18714 -- builds on top of the previous one,
adding the LLVM IR attribute ``writeonly`` to go with the LLVM intrinsic
property. This change touches more things, but should still be mostly
straightforward. The change starts adding the ``writeonly`` attribute --
there are a few obvious next things one could do with it (basically, the
TODOs in BasicAliasAnalysis), but I think it makes sense to already
commit this as is.
On 21.03.2016 21:29, Philip Reames wrote:
> On 03/21/2016 08:54 AM, Nicolai Hähnle wrote:
>> On 19.03.2016 14:47, Philip Reames wrote:
>>> I'm generally in support of this change. I haven't looked at the patch
>>> yet, but the direction seems worthwhile.
>>> Note that we already have a writeonly predicate in a few places in the
>>> code (BasicAA being one). If we do introduce the new intrinsic
>>> property, we should refactor all of these places to use the new
>>> mechanism. This could be separate changes, but should be considered
>>> required as part of adding the new property.
>> That makes a lot of sense.
>>> Another way you might consider slicing the problem is to introduce a
>>> WriteOnlyArg property. When combined with ArgMemOnly, this would more
>>> precisely model the pseudo store (right?) and is a better fit for the
>>> memset/memcpy use case already handled in BasicAA.
>> This is unsufficient for our use case. Let me try to explain with an
>> example (also for your later comments). We have an intrinsic
>> declare void @llvm.amdgcn.buffer.store.format.f32(float, <4 x i32>,
>> i32, i32, i1, i1) #1
>> that stores a 32-bit value in a memory location that is described by a
>> "buffer descriptor" (the <4 x i32>), an i32 index and an i32 offset.
>> You can think of the buffer descriptor as a "heavy pointer" that
>> contains a base address and can also contain additional information
>> like a buffer size, a stride, and a data format. The given index is
>> multiplied by the stride before being added to the base address; the
>> offset is an additional byte offset. The resulting address is checked
>> against the buffer size (and an out of bounds store is silently
>> discarded, which fits the requirements of APIs like OpenGL).
>> Furthermore, the data being stored may be subject to a format
>> conversion (float to int, etc.). The additional i1 arguments enable
>> provide some cache control.
> So, I think you have a fundamental problem here. By not using pointers,
> you are breaking one of the deepest assumptions in LLVM: that we can
> reason about memory aliasing without having to consider the numeric
> values of non-pointer values. If a buffer can never *ever* be addressed
> via a normal pointer, then *maybe* we can paper over this, but if you're
> mixing pointer and non-pointer accesses to the same address, you have a
> potentially serious problem.
> As a concrete example, let's take the following:
> %mem = malloc() <-- returns a noalias pointer
> %buf = construct_buf_ref(%mem) <-- assume this is transparent and
> doesn't capture
> load %mem
> load %mem
> In this (psuedo code) example, basicaa will use capture tracking to
> *prove* that buffer_store can't have written to %mem. buffer_store is
> allowed to write to any memory it can address, but because %mem has not
> been captured, buffer_store can not obtain a pointer to it. This could
> lead to use value forwarding the load and producing an incorrect result.
> Note: If you treat "construct_buf_ref" as a capturing operation, you can
> paper over this example (and many others). I'm using it mostly as an
> example of how deep the assumptions about memory, pointers, and
> capturing go.
>> If you have some advice on how to fit such a memory addressing model
>> into the ArgMemOnly logic that would be appreciated, but from my
>> reading of the code it appears that trying to make ArgMemOnly work
>> with this would be a rather large project which we don't really have
>> the resources to tackle right now.
>> Why am I pushing for IntrWriteOnly?
>> I suspect the merely IntrWriteOnly without a functional ArgMemOnly
>> does not provide a lot of opportunity for optimization at the IR level.
> I think this is not entirely true. In particular, it gives us a way to
> model things like memset and memset_16 by combining InstrWriteOnly and
> ArgMemOnly. It's not quite as powerful as having a writeonly argument
> attribute, but it's better than where we are today.
>> However, the codegen backend does understand the resulting hardware
>> instructions, which are marked mayLoad = 0, mayStore = 1,
>> hasSideEffects = 0. Having mayLoad = 0 and hasSideEffects = 0 makes a
>> difference for instruction scheduling.
>> If you try to map the intrinsic as-is, without IntrWriteOnly, onto
>> such hardware instructions, TableGen will (correctly) complain about a
>> mismatch of mayLoad and hasSideEffects.
>> So I'd like to do a minimal change that will make TableGen happy, but
>> I do not want that change to be an ugly kludge.
> I'm not opposed to introducing an InstrWriteOnly property for
> intrinsics. My concern is that you're expecting it to be something it
> isn't (per capture discussion above.)
>> I understand your concern about having this interact nicely with
>> BasicAA. Perhaps both an IntrWriteOnlyMem and an IntrWriteOnlyArgMem
>> should be added for orthogonality?
> A InstrWriteOnly + ArgMemOnly should equal an InstWriteOnlyArgMem. Thus,
> we shouldn't need separate attributes. My original comment was about
> introducing a writeonly attribute on the argument of a function. (i.e.
> a *particular* attribute) This would allow us to precisely describe
> memcpy for instance. (src argument is readonly, dest argument is
>>> p.s. Inline comments below specific to your use case.
>>> On 03/18/2016 08:16 PM, Nicolai Hähnle via llvm-dev wrote:
>>>> I'd like to draw your attention to http://reviews.llvm.org/D18291, in
>>>> which I propose a new intrinsic property for intrinsics that are
>>>> lowered to instructions that mayStore, but are neither mayLoad nor
>>>> This is relevant for AMDGPU, where we have store instructions that
>>>> don't operate on pointers. The codegen backend understands these
>>>> perfectly well as stores, and so we can enable better scheduling
>>>> decisions than if we forced these instruction to hasSideEffects.
>>> Can you give a bit more detail here? Example, etc..?
>>>> In a perfect world, we'd be able to model the behavior of these load
>>>> and store intrinsics via ReadWriteArgMem, but that would require
>>>> massive changes in how LLVM thinks about memory locations and how to
>>>> describe them.
>>> This comments makes me think you might have a much deeper problem. Let's
>>> see an actual example first though; I might just be misreading your
>>>> This comparatively minor addition allows us to move forward with
>>>> decent scheduling in codegen for the time being.
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
More information about the llvm-dev