[LLVMdev] AVX broadcast Vs. vector constant pool load
Nadav Rotem
nrotem at apple.com
Tue Nov 6 21:05:54 PST 2012
I don't remember exactly why I did this. I vaguely remember looking at this with one of the Sandybridge architects and following his suggestion.
When I look at it now, it looks like broadcasting the scalar would be faster because the 256 bit load on sandy bridge is double pumped.
I am CC-ing Elena, who should be able to tell.
On Nov 6, 2012, at 8:38 PM, Cameron McInally <cameron.mcinally at nyu.edu> wrote:
> Hey guys,
>
> I'm currently investigating broadcasts from the constant pool on Sandy Bridge. I see this comment in llvm/lib/Target/X86/X86ISelLowering.cpp:
>
> // Handle the broadcasting a single constant scalar from the constant pool
> // into a vector. On Sandybridge it is still better to load a constant vector
> // from the constant pool and not to broadcast it from a scalar.
>
> Would anyone be able to explain why it is better to load a vector from the constant pool rather than broadcast a scalar?
>
> I checked out Agner Fog's tables, but it wasn't so obvious to me...
>
> vmovaps y, m256:
> Uops: 1
> Lat: 4
> Throughput: 1
>
> vbroadcastsd y, m64:
> Uops: 2
> Lat: [Not or cannot be measured]
> Throughput: 1
>
> Thanks in advance,
> Cameron
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121106/d22100a2/attachment.html>
More information about the llvm-dev
mailing list