[LLVMdev] Unaligned vector memory access for ARM/NEON.
Bob Wilson
bob.wilson at apple.com
Fri Sep 7 10:56:57 PDT 2012
On Sep 6, 2012, at 4:40 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
> -----Original Message-----
> From: Bob Wilson [mailto:bob.wilson at apple.com]
> Sent: Thursday, September 06, 2012 3:39 PM
> To: David Peixotto
> Cc: 'Peter Couperus'; 'Jim Grosbach'; 'Jakob Olesen'; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
>
>
> On Sep 6, 2012, at 2:48 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
>
>> Hi Pete,
>>
>> We ran into the same issue with generating vector loads/stores for
>> vectors with less than word alignment. It seems we took a similar
>> approach to solving the problem by modifying the logic in
> allowsUnalignedMemoryAccesses.
>>
>> As you and Jim mentioned, it looks like the vld1/vst1 instructions
>> should support element aligned access for any armv7 implementation
>> (I'm looking at Table A3-1 ARM Architecture Reference Manual - ARM DDI
> 0406C).
>>
>> Right now I do not think we have the correct code setup in
>> ARMSubtarget to accurately represent this table. I would propose that
>> we keep the existing field for unaligned access and add a new field for
> element-aligned access.
>>
>> The AllowsUnAlignedMem field remains as is and it could be used to
>> represent the SCTLR.A column in Table A3-1. The
>> AllowsElementAlignedNEON field would be used allow targets to generate
>> vld1/vst1 instructions for element-aligned accesses. By default it would
> be set to true for armv7 targets with NEON.
>
> That doesn't make sense to me. Element-aligned Neon load/stores are always
> valid. The AllowsUnalignedMem setting is supposed to model the SCTLR.A
> setting, which also applies to Neon load/stores with less-than-element
> alignment. Why would we need a new field?
>
> I was proposing it as a way to keep the current behavior of the
> AllowsUnAlignedMem field. Originally I just guarded the use of vld1/vst1 by
> testing whether the target has NEON, but then I was getting a failure in
> test/CodeGen/ARM/unaligned_load_store.ll because it did not want to see a
> vld1 when using the -arm-strict-align even if the target has NEON.
I think we could just change the test case. The point of -arm-strict-align is just to avoid relying on hardware or OS support for unaligned accesses. Using vld1.8/vst1.8 instructions would not require either of those.
>
> The difference in the flags is between allowing unaligned access for
> half-word and word load/stores (e.g. ldr/str) and getting around an
> unaligned access for D registers by using vld1.8/vst1.8 to load/store.
>
> It looks to me that a target that supports NEON could support unaligned
> access for D registers by using vld1.8/vst1.8 regardless of whether it
> supports unaligned access for other types. Currently these two are tied
> together into a single flag. The new flag was proposed as a way to make a
> distinction between these.
If it works on all targets with Neon, regardless of the SCTLR.A setting, then I still don't see the need for a separate flag.
Note that this won't work on big-endian targets where changing the vld1/vst1 element size actually changes the behavior of the instructions.
>
> I guess one way to keep the current behavior would be to look back through
> the store instruction to find the actual vector type (everything gets
> legalized to f64 or v2f64) and allow it for real vector types but disallow
> it for a real f64 type. It seems that it would be good to support the
> unaligned f64 access if we have NEON though.
>
>>
>> The -arm-strict-align would set both of the fields to false. This
>> would retain the behavior that seems to be desired from the
>> test/CodeGen/ARM/unaligned_load_store.ll test case.
>>
>> A bit of a grey area is if we have an unaligned f64 store and
>> AllowsElementAlignedNEON is true. We can actually generate a vstr1.8
>> to support this store directly instead of using the target-independent
>> method and I think it would be good to do so.
>
> Don't we already do this as of svn r161962?
>
> No, the vstr1 will only be generated if the target supports unaligned
> access. Setting AllowsUnAlignedMem to true will then also allow unaligned
> access using smaller types (e.g. word) which may not be desirable.
>
> -Dave
If AllowsUnAlignedMem is set, we should take advantage of it; otherwise, for little-endian targets, we can fall back to your suggestion of using vld1.8/vst1.8.
>
>>
>> I have some code to do this that I will likely be able to upstream.
>>
>> -Dave
>>
>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> hosted by The Linux Foundation
>>
>>
>> -----Original Message-----
>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
>> On Behalf Of Peter Couperus
>> Sent: Thursday, September 06, 2012 8:14 AM
>> To: Jim Grosbach
>> Cc: Jakob Olesen; llvmdev at cs.uiuc.edu (LLVMdev at cs.uiuc.edu)
>> Subject: Re: [LLVMdev] Unaligned vector memory access for ARM/NEON.
>>
>> Hello,
>>
>> Thanks again. We did try overestimating the alignment, and saw the
>> vldr you reference here.
>> It looks like a recent change (r161962?) did enable vld1 generation
>> for this case (great!) on darwin, but not linux.
>> I'm not sure if the effect of lowering load <4 x i16>* align 2 to
>> vld1.16 this was intentional in this change or not.
>> If so, my question is what is the preferable way to inform the
>> Subtarget that it is allowed to use unaligned vector loads/stores when
>> NEON is available, but can't use unaligned accesses generally speaking?
>> A new field in ARMSubtarget?
>> Should the -arm-strict-align flag force expansion even on unaligned
>> vector loads/stores?
>> We got this working by adding a field to ARMSubtarget and changing
>> logic in ARMTargetLowering::allowsUnalignedMemoryAccesses, but I am
>> admittedly not entirely sure of the downstream consequences of this,
>> as we don't allow unaligned access generally.
>>
>> Pete
>>
>>
>> On 09/05/2012 04:58 PM, Jim Grosbach wrote:
>>> Hmmm. Well, it's entirely possible that it's LLVM that's confused
>>> about the alignment requirements here. :)
>>>
>>> I think I see, in general, where. I twiddled the IR to give it higher
>> alignment (16 bytes) and get:
>>> extend: @ @extend
>>> @ BB#0:
>>> vldr d16, [r0]
>>> vmovl.s16 q8, d16
>>> vstmia r1, {d16, d17}
>>> vldr d16, [r0, #8]
>>> add r0, r1, #16
>>> vmovl.s16 q8, d16
>>> vstmia r0, {d16, d17}
>>> bx lr
>>>
>>> Note that we're using a plain vldr instruction here to load the d
>> register, not a vld1 instruction. Similarly for the stores. According
>> to the ARM ARM (DDI 0406C), you're correct about the element size
>> alignment requirement for VLD1, but our isel isn't attempting to use
>> that instruction, but rather VLDR, which has word alignment required, so
> it falls over.
>>>
>>> Given that, it seems that the answer to your original question is
>>> that to
>> improve codegen for this case, the proper place to look is in
>> instruction selection for loads and stores to the VFP/NEON registers.
>> That code can be made smarter to better use the NEON instructions. I
>> know Jakob has done some work related to better utilization of those for
> other constructs.
>>>
>>> -Jim
>>>
>>> On Sep 5, 2012, at 4:25 PM, Peter Couperus<peter.couperus at st.com> wrote:
>>>
>>>> Hello Jim,
>>>>
>>>> Thank you for the response. I may be confused about the alignment
>>>> rules
>> here.
>>>> I had been looking at the ARM RVCT Assembler Guide, which seems to
>>>> indicate vld1.16 operates on 16-bit aligned data, unless I am
>> misinterpreting their table (Table 5-11 in ARM DUI 0204H, pg 5-70,5-71).
>>>> Prior to the table, It does mention the accesses need to be "element"
>> aligned, where I took element in this case to mean i16.
>>>>
>>>> Anyhow, to make this a little more concrete:
>>>>
>>>> void extend(short* a, int* b) {
>>>> for(int i = 0; i< 8; i++)
>>>> b[i] = (int)a[i];
>>>> }
>>>>
>>>> When I compile this program with clang -O3 -ccc-host-triple
>>>> armv7-none-linux-gnueabi -mfpu=neon -mllvm -vectorize, the
>>>> intermediate
>> LLVM assembly looks OK (and it has an align 2 vector load), but the
>> generated ARM assembly has the scalar loads.
>>>> When I compile with (4.6) gcc -std=c99 -ftree-vectorize -marm
>>>> -mfpu=neon
>> -O3, it uses vld1.16 and vst1.32 regardless of the parameter alignment.
>> This is on armv7a.
>>>>
>>>> The gcc version (and the clang version with our modified backend)
>>>> runs
>> fine, even on 2-byte aligned data. Is this not a guarantee across
>> armv7/armv7a generally?
>>>>
>>>> Pete
>>>>
>>>>
>>>>
>>>>
>>>> On 09/05/2012 03:15 PM, Jim Grosbach wrote:
>>>>> VLD1 expects a 64-bit aligned address unless the target explicitly
>>>>> days
>> that unaligned loads are OK.
>>>>>
>>>>> For your situation, either the subtarget should set
>>>>> AllowsUnalignedMem
>> to true (if that's accurate), or the load address should be made
>> 64-bit aligned.
>>>>>
>>>>> -Jim
>>>>>
>>>>> On Sep 5, 2012, at 2:42 PM, Peter Couperus<peter.couperus at st.com>
>> wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> I am a first time writer here, but am a happy LLVM tinkerer. It
>>>>>> is a
>> pleasure to use :).
>>>>>> We have come across some sub-optimal behavior when LLVM lowers
>>>>>> loads for vectors with small integers, i.e. load<4 x i16>* %a,
>>>>>> align 2,
>> using a sequence of scalar loads rather than a single vld1 on armv7
>> linux with NEON.
>>>>>> Looking at the code in svn, it appears the ARM backend is capable
>>>>>> of
>> lowering these loads as desired, and will if we use an appropriate
>> darwin triple.
>>>>>> It appears this was actually enabled relatively recently.
>>>>>> Seemingly, the case where the Subtarget has NEON available should
>>>>>> be
>> handled the same on Darwin and Linux.
>>>>>> Is this true, or am I missing something?
>>>>>> Do the regulars have an opinion on the best way to handle this?
>>>>>> Thanks!
>>>>>>
>>>>>> Pete
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> <extend.c>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
More information about the llvm-dev
mailing list