[llvm-dev] New intrinsic property IntrOnlyWrite

Fri Apr 1 14:25:14 PDT 2016

Hi,

following up on this discussion, I cleaned things up a bit and explored 
the direction of an IR attribute (in addition to the LLVM intrinsic 
property which is for TableGen only).

There are two proposed changes:

1) http://reviews.llvm.org/D18291 -- this only affects TableGen and the 
AMDGPU backend. It's a small change which I'd like to commit as soon as 
possible.

2) http://reviews.llvm.org/D18714 -- builds on top of the previous one, 
adding the LLVM IR attribute ``writeonly`` to go with the LLVM intrinsic 
property. This change touches more things, but should still be mostly 
straightforward. The change starts adding the ``writeonly`` attribute -- 
there are a few obvious next things one could do with it (basically, the 
TODOs in BasicAliasAnalysis), but I think it makes sense to already 
commit this as is.

Please review.

Thanks,
Nicolai

On 21.03.2016 21:29, Philip Reames wrote:
>
>
> On 03/21/2016 08:54 AM, Nicolai Hähnle wrote:
>> On 19.03.2016 14:47, Philip Reames wrote:
>>> I'm generally in support of this change. I haven't looked at the patch
>>> yet, but the direction seems worthwhile.
>>>
>>> Note that we already have a writeonly predicate in a few places in the
>>> code (BasicAA being one).  If we do introduce the new intrinsic
>>> property, we should refactor all of these places to use the new
>>> mechanism.  This could be separate changes, but should be considered
>>> required as part of adding the new property.
>>
>> That makes a lot of sense.
>>
>>> Another way you might consider slicing the problem is to introduce a
>>> WriteOnlyArg property.  When combined with ArgMemOnly, this would more
>>> precisely model the pseudo store (right?) and is a better fit for the
>>> memset/memcpy use case already handled in BasicAA.
>>
>> This is unsufficient for our use case. Let me try to explain with an
>> example (also for your later comments). We have an intrinsic
>>
>> declare void @llvm.amdgcn.buffer.store.format.f32(float, <4 x i32>,
>> i32, i32, i1, i1) #1
>>
>> that stores a 32-bit value in a memory location that is described by a
>> "buffer descriptor" (the <4 x i32>), an i32 index and an i32 offset.
>> You can think of the buffer descriptor as a "heavy pointer" that
>> contains a base address and can also contain additional information
>> like a buffer size, a stride, and a data format. The given index is
>> multiplied by the stride before being added to the base address; the
>> offset is an additional byte offset. The resulting address is checked
>> against the buffer size (and an out of bounds store is silently
>> discarded, which fits the requirements of APIs like OpenGL).
>> Furthermore, the data being stored may be subject to a format
>> conversion (float to int, etc.). The additional i1 arguments enable
>> provide some cache control.
> So, I think you have a fundamental problem here.  By not using pointers,
> you are breaking one of the deepest assumptions in LLVM: that we can
> reason about memory aliasing without having to consider the numeric
> values of non-pointer values.  If a buffer can never *ever* be addressed
> via a normal pointer, then *maybe* we can paper over this, but if you're
> mixing pointer and non-pointer accesses to the same address, you have a
> potentially serious problem.
>
> As a concrete example, let's take the following:
> %mem = malloc() <-- returns a noalias pointer
> %buf = construct_buf_ref(%mem) <-- assume this is transparent and
> doesn't capture
> load %mem
> buffer_store(%buf)
> load %mem
>
> In this (psuedo code) example, basicaa will use capture tracking to
> *prove* that buffer_store can't have written to %mem.  buffer_store is
> allowed to write to any memory it can address, but because %mem has not
> been captured, buffer_store can not obtain a pointer to it. This could
> lead to use value forwarding the load and producing an incorrect result.
>
> Note: If you treat "construct_buf_ref" as a capturing operation, you can
> paper over this example (and many others).  I'm using it mostly as an
> example of how deep the assumptions about memory, pointers, and
> capturing go.
>
>>
>> If you have some advice on how to fit such a memory addressing model
>> into the ArgMemOnly logic that would be appreciated, but from my
>> reading of the code it appears that trying to make ArgMemOnly work
>> with this would be a rather large project which we don't really have
>> the resources to tackle right now.
>>
>> Why am I pushing for IntrWriteOnly?
>>
>> I suspect the merely IntrWriteOnly without a functional ArgMemOnly
>> does not provide a lot of opportunity for optimization at the IR level.
> I think this is not entirely true.  In particular, it gives us a way to
> model things like memset and memset_16 by combining InstrWriteOnly and
> ArgMemOnly.  It's not quite as powerful as having a writeonly argument
> attribute, but it's better than where we are today.
>>
>> However, the codegen backend does understand the resulting hardware
>> instructions, which are marked mayLoad = 0, mayStore = 1,
>> hasSideEffects = 0. Having mayLoad = 0 and hasSideEffects = 0 makes a
>> difference for instruction scheduling.
>>
>> If you try to map the intrinsic as-is, without IntrWriteOnly, onto
>> such hardware instructions, TableGen will (correctly) complain about a
>> mismatch of mayLoad and hasSideEffects.
>>
>> So I'd like to do a minimal change that will make TableGen happy, but
>> I do not want that change to be an ugly kludge.
> I'm not opposed to introducing an InstrWriteOnly property for
> intrinsics.  My concern is that you're expecting it to be something it
> isn't (per capture discussion above.)
>>
>> I understand your concern about having this interact nicely with
>> BasicAA. Perhaps both an IntrWriteOnlyMem and an IntrWriteOnlyArgMem
>> should be added for orthogonality?
> A InstrWriteOnly + ArgMemOnly should equal an InstWriteOnlyArgMem. Thus,
> we shouldn't need separate attributes.  My original comment was about
> introducing a writeonly attribute on the argument of a function.  (i.e.
> a *particular* attribute)  This would allow us to precisely describe
> memcpy for instance.  (src argument is readonly, dest argument is
> writeonly)
>>
>> Cheers,
>> Nicolai
>>
>>>
>>> Philip
>>>
>>> p.s. Inline comments below specific to your use case.
>>>
>>> On 03/18/2016 08:16 PM, Nicolai Hähnle via llvm-dev wrote:
>>>> Hi,
>>>>
>>>> I'd like to draw your attention to http://reviews.llvm.org/D18291, in
>>>> which I propose a new intrinsic property for intrinsics that are
>>>> lowered to instructions that mayStore, but are neither mayLoad nor
>>>> hasSideEffects.
>>>>
>>>> This is relevant for AMDGPU, where we have store instructions that
>>>> don't operate on pointers. The codegen backend understands these
>>>> perfectly well as stores, and so we can enable better scheduling
>>>> decisions than if we forced these instruction to hasSideEffects.
>>> Can you give a bit more detail here?  Example, etc..?
>>>>
>>>> In a perfect world, we'd be able to model the behavior of these load
>>>> and store intrinsics via ReadWriteArgMem, but that would require
>>>> massive changes in how LLVM thinks about memory locations and how to
>>>> describe them.
>>> This comments makes me think you might have a much deeper problem. Let's
>>> see an actual example first though; I might just be misreading your
>>> intent.
>>>>
>>>> This comparatively minor addition allows us to move forward with
>>>> decent scheduling in codegen for the time being.
>>>>
>>>> Cheers,
>>>> Nicolai
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>