[PATCH] Implement AVX1 vbroadcast intrinsics with vector initializers

Thu May 29 12:40:28 PDT 2014

This change make sense. Thank you Adam!

On May 29, 2014, at 10:25 AM, Adam Nemet <anemet at apple.com> wrote:

> These intrinsics are special because they directly take a memory operand (AVX2
> adds the register counterparts).  Typically, other non-memop intrinsics take
> registers and then it's left to isel to fold memory operands.
> 
> In order to LICM intrinsics directly reading memory, we require that no stores
> are in the loop (LICM) or that the folded load accesses constant memory
> (MachineLICM).  When neither is the case we fail to hoist a loop-invariant
> broadcast.
> 
> We can work around this limitation if we expose the load as a regular load and
> then just implement the broadcast using the vector initializer syntax.  This
> exposes the load to LICM and other optimizations.
> 
> At the IR level this is translated into a series of insertelements.  The
> sequence is already recognized as a broadcast so there is no impact on the
> quality of codegen.
> 
> _mm256_broadcast_pd and _mm256_broadcast_ps are not updated by this patch
> because right now we lack the DAG-combiner smartness to recover the broadcast
> instructions.  This will be tackled in a follow-on.
> 
> There will be completing changes on the LLVM side to remove the LLVM
> intrinsics and to auto-upgrade bitcode files.
> 
> Fixes <rdar://problem/16494520>
> 
> Adam
> 
> <Implement-AVX1-vbroadcast-intrinsics-with-vector-ini.patch>