[LLVMdev] Limit loop vectorizer to SSE

Fri Nov 15 17:34:06 PST 2013

The vectorizer will now emit

= load <8 x i32>, align #TargetAlignmentOfScalari32

where before it would emit

= load <8 x i32>

(which has the semantics of “= load <8 xi32>, align 0” which means the address is aligned with target abi alignment, see http://llvm.org/docs/LangRef.html#load-instruction).

When the backend generates code for the former it will emit an unaligned move:
 = vmovups ...
wheres for the later it will use an aligned move:
 = vmovaps …

vmovups can load from unaligned addresses while vmovaps can not.

No, we currently don’t peel loops for alignment.

Best,
Arnold

On Nov 15, 2013, at 7:23 PM, Frank Winter <fwinter at jlab.org> wrote:

> I confirm that r194876 fixes the issue, i.e. segfault not caused.
> 
> My program still passed 16 byte aligned pointers to the function
> which the loop vectorizer processes successfully:
> 
> LV: Vector loop of width 8 costs: 1.
> LV: Selecting VF = : 8.
> LV: Found a vectorizable loop (8) in func_orig.ll
> LV: Unroll Factor is 1
> 
> Since the program runs fine, it seems to be allowed for the CPU
> to issue a vector load (8 floats) to a 16 byte aligned address (as
> opposed to 32 byte aligned). Or does in fact the loop vectorizer
> handle this case in the preamble and the vector.body issues only
> 32 byte aligned accesses. In which case I would align the payload
> to 32 byte in order to save a little in the preamble.
> 
> Frank
> 
> 
> 
> On 15/11/13 18:15, Arnold Schwaighofer wrote:
>> A fix for this is in r194876.
>> 
>> Thanks for reporting this!
>> 
>> 
>> On Nov 15, 2013, at 3:49 PM, Joshua Klontz <josh.klontz at gmail.com> wrote:
>> 
>>> Nadav,
>>> 
>>> I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to vectorization. I can't tell if it's the loop vectorizer or the codegen at fault, but the alignment assumption seems to sneak in somewhere.
>>> 
>>> v/r,
>>> Josh
>>> 
>>> [1] http://pastebin.com/kc95WtGG
>>> [2] http://pastebin.com/VY3ZLVJK
>>> 
>>> 
>>> On Fri, Nov 15, 2013 at 3:58 PM, Nadav Rotem <nrotem at apple.com> wrote:
>>> 
>>> On Nov 15, 2013, at 12:36 PM, Renato Golin <renato.golin at linaro.org> wrote:
>>> 
>>>> On 15 November 2013 20:24, Joshua Klontz <josh.klontz at gmail.com> wrote:
>>>> Agreed, is there a pass that will insert a runtime alignment check? Also, what's the easiest way to get at TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32? Thanks!
>>>> 
>>>> I think that's a fair question, and it's about safety. If you're getting this on the JIT, means we may be generating unsafe transformations on the vectorizer.
>>>> 
>>>> Arnold, Nadav, I don't remember seeing code to generate any run-time alignment checks on the incoming pointer, is there such a thing? If not, shouldn't we add one?
>>> 
>>> If the the vectorizer generates aligned memory accesses to unaligned addresses then this is a serious bug.  But I don’t think that Josh said that the vectorizer generated aligned accesses to unaligned pointers.
>>> 
>>> There is no point in LLVM checking for alignment because if the memory is unaligned then the program will crash.  Users who want to crash with a readable error message can simply write code that checks the pointer (by masking the high bits and comparing to zero).
>>> 
>>> 
> 
>