[LLVMdev] AVX broadcast Vs. vector constant pool load

Tue Nov 6 21:05:54 PST 2012

I don't remember exactly why I did this.  I vaguely remember looking at this with one of the Sandybridge architects and following his suggestion.

When I look at it now, it looks like broadcasting the scalar would be faster because the 256 bit load on sandy bridge is double pumped.

I am CC-ing Elena, who should be able to tell.

On Nov 6, 2012, at 8:38 PM, Cameron McInally <cameron.mcinally at nyu.edu> wrote:

> Hey guys,
> 
> I'm currently investigating broadcasts from the constant pool on Sandy Bridge. I see this comment in llvm/lib/Target/X86/X86ISelLowering.cpp:
> 
>    // Handle the broadcasting a single constant scalar from the constant pool
>    // into a vector. On Sandybridge it is still better to load a constant vector
>    // from the constant pool and not to broadcast it from a scalar.
> 
> Would anyone be able to explain why it is better to load a vector from the constant pool rather than broadcast a scalar? 
> 
> I checked out Agner Fog's tables, but it wasn't so obvious to me...
> 
> vmovaps y, m256:
>   Uops: 1
>   Lat: 4
>   Throughput: 1
> 
> vbroadcastsd y, m64:
>   Uops: 2
>   Lat: [Not or cannot be measured]
>   Throughput: 1
> 
> Thanks in advance,
> Cameron
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121106/d22100a2/attachment.html>