[PATCH] Implement AVX1 vbroadcast intrinsics with vector initializers

Thu May 29 13:57:41 PDT 2014

Thanks, Nadav!  It’s r209846.

On May 29, 2014, at 12:40 PM, Nadav Rotem <nrotem at apple.com> wrote:

> This change make sense. Thank you Adam!
> 
> On May 29, 2014, at 10:25 AM, Adam Nemet <anemet at apple.com> wrote:
> 
>> These intrinsics are special because they directly take a memory operand (AVX2
>> adds the register counterparts).  Typically, other non-memop intrinsics take
>> registers and then it's left to isel to fold memory operands.
>> 
>> In order to LICM intrinsics directly reading memory, we require that no stores
>> are in the loop (LICM) or that the folded load accesses constant memory
>> (MachineLICM).  When neither is the case we fail to hoist a loop-invariant
>> broadcast.
>> 
>> We can work around this limitation if we expose the load as a regular load and
>> then just implement the broadcast using the vector initializer syntax.  This
>> exposes the load to LICM and other optimizations.
>> 
>> At the IR level this is translated into a series of insertelements.  The
>> sequence is already recognized as a broadcast so there is no impact on the
>> quality of codegen.
>> 
>> _mm256_broadcast_pd and _mm256_broadcast_ps are not updated by this patch
>> because right now we lack the DAG-combiner smartness to recover the broadcast
>> instructions.  This will be tackled in a follow-on.
>> 
>> There will be completing changes on the LLVM side to remove the LLVM
>> intrinsics and to auto-upgrade bitcode files.
>> 
>> Fixes <rdar://problem/16494520>
>> 
>> Adam
>> 
>> <Implement-AVX1-vbroadcast-intrinsics-with-vector-ini.patch>
>