[LLVMdev] Limit loop vectorizer to SSE
Arnold Schwaighofer
aschwaighofer at apple.com
Fri Nov 15 17:34:06 PST 2013
The vectorizer will now emit
= load <8 x i32>, align #TargetAlignmentOfScalari32
where before it would emit
= load <8 x i32>
(which has the semantics of “= load <8 xi32>, align 0” which means the address is aligned with target abi alignment, see http://llvm.org/docs/LangRef.html#load-instruction).
When the backend generates code for the former it will emit an unaligned move:
= vmovups ...
wheres for the later it will use an aligned move:
= vmovaps …
vmovups can load from unaligned addresses while vmovaps can not.
No, we currently don’t peel loops for alignment.
Best,
Arnold
On Nov 15, 2013, at 7:23 PM, Frank Winter <fwinter at jlab.org> wrote:
> I confirm that r194876 fixes the issue, i.e. segfault not caused.
>
> My program still passed 16 byte aligned pointers to the function
> which the loop vectorizer processes successfully:
>
> LV: Vector loop of width 8 costs: 1.
> LV: Selecting VF = : 8.
> LV: Found a vectorizable loop (8) in func_orig.ll
> LV: Unroll Factor is 1
>
> Since the program runs fine, it seems to be allowed for the CPU
> to issue a vector load (8 floats) to a 16 byte aligned address (as
> opposed to 32 byte aligned). Or does in fact the loop vectorizer
> handle this case in the preamble and the vector.body issues only
> 32 byte aligned accesses. In which case I would align the payload
> to 32 byte in order to save a little in the preamble.
>
> Frank
>
>
>
> On 15/11/13 18:15, Arnold Schwaighofer wrote:
>> A fix for this is in r194876.
>>
>> Thanks for reporting this!
>>
>>
>> On Nov 15, 2013, at 3:49 PM, Joshua Klontz <josh.klontz at gmail.com> wrote:
>>
>>> Nadav,
>>>
>>> I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to vectorization. I can't tell if it's the loop vectorizer or the codegen at fault, but the alignment assumption seems to sneak in somewhere.
>>>
>>> v/r,
>>> Josh
>>>
>>> [1] http://pastebin.com/kc95WtGG
>>> [2] http://pastebin.com/VY3ZLVJK
>>>
>>>
>>> On Fri, Nov 15, 2013 at 3:58 PM, Nadav Rotem <nrotem at apple.com> wrote:
>>>
>>> On Nov 15, 2013, at 12:36 PM, Renato Golin <renato.golin at linaro.org> wrote:
>>>
>>>> On 15 November 2013 20:24, Joshua Klontz <josh.klontz at gmail.com> wrote:
>>>> Agreed, is there a pass that will insert a runtime alignment check? Also, what's the easiest way to get at TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32? Thanks!
>>>>
>>>> I think that's a fair question, and it's about safety. If you're getting this on the JIT, means we may be generating unsafe transformations on the vectorizer.
>>>>
>>>> Arnold, Nadav, I don't remember seeing code to generate any run-time alignment checks on the incoming pointer, is there such a thing? If not, shouldn't we add one?
>>>
>>> If the the vectorizer generates aligned memory accesses to unaligned addresses then this is a serious bug. But I don’t think that Josh said that the vectorizer generated aligned accesses to unaligned pointers.
>>>
>>> There is no point in LLVM checking for alignment because if the memory is unaligned then the program will crash. Users who want to crash with a readable error message can simply write code that checks the pointer (by masking the high bits and comparing to zero).
>>>
>>>
>
>
More information about the llvm-dev
mailing list