[PATCH] Implement AVX1 vbroadcast intrinsics with vector initializers
Nadav Rotem
nrotem at apple.com
Thu May 29 12:40:28 PDT 2014
This change make sense. Thank you Adam!
On May 29, 2014, at 10:25 AM, Adam Nemet <anemet at apple.com> wrote:
> These intrinsics are special because they directly take a memory operand (AVX2
> adds the register counterparts). Typically, other non-memop intrinsics take
> registers and then it's left to isel to fold memory operands.
>
> In order to LICM intrinsics directly reading memory, we require that no stores
> are in the loop (LICM) or that the folded load accesses constant memory
> (MachineLICM). When neither is the case we fail to hoist a loop-invariant
> broadcast.
>
> We can work around this limitation if we expose the load as a regular load and
> then just implement the broadcast using the vector initializer syntax. This
> exposes the load to LICM and other optimizations.
>
> At the IR level this is translated into a series of insertelements. The
> sequence is already recognized as a broadcast so there is no impact on the
> quality of codegen.
>
> _mm256_broadcast_pd and _mm256_broadcast_ps are not updated by this patch
> because right now we lack the DAG-combiner smartness to recover the broadcast
> instructions. This will be tackled in a follow-on.
>
> There will be completing changes on the LLVM side to remove the LLVM
> intrinsics and to auto-upgrade bitcode files.
>
> Fixes <rdar://problem/16494520>
>
> Adam
>
> <Implement-AVX1-vbroadcast-intrinsics-with-vector-ini.patch>
More information about the cfe-commits
mailing list