[llvm-dev] Vectorizing structure reads, writes, etc on X86-64 AVX

Tue Nov 3 09:23:52 PST 2015

Hi Jay -

I'm surprised by the codegen for your examples too, but LLVM has an
expectation that a front-end and IR optimizer will use llvm.memcpy
liberally:
http://llvm.org/docs/doxygen/html/SelectionDAGBuilder_8cpp_source.html#l00094
http://llvm.org/docs/doxygen/html/SelectionDAGBuilder_8cpp_source.html#l03156

"Any ld-ld-st-st sequence over this should have been converted to
llvm.memcpy by the frontend."
"The optimizer should really avoid this case by converting large
object/array copies to llvm.memcpy"

So for example with clang:

$ cat copy.c
struct bagobytes {
    int i0;
    int i1;
};

void foo(struct bagobytes* a, struct bagobytes* b) {
    *b = *a;
}

$ clang -O2 copy.c -S -emit-llvm -Xclang -disable-llvm-optzns -o -
define void @foo(%struct.bagobytes* %a, %struct.bagobytes* %b) #0 {
...
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* %3, i64 8, i32 4, i1
false), !tbaa.struct !6
  ret void
}

It may still be worth filing a bug (or seeing if one is already open) for
one of your simple examples.

On Thu, Oct 29, 2015 at 6:08 PM, Jay McCarthy via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> I am a first time poster, so I apologize if this is an obvious
> question or out of scope for LLVM. I am an LLVM user. I don't really
> know anything about hacking on LLVM, but I do know a bit about
> compilation generally.
>
> I am on x86-64 and I am interested in structure reads, writes, and
> constants being optimized to use vector registers when the alignment
> and sizes are right. I have created a gist of a small example:
>
> https://gist.github.com/jeapostrophe/d54d3a6a871e5127a6ed
>
> The assembly is produced with
>
> llc -O3 -march=x86-64 -mcpu=corei7-avx
>
> The key idea is that we have a structure like this:
>
> %athing = type { float, float, float, float, float, float, i16, i16,
> i8, i8, i8, i8 }
>
> That works out to be 32 bytes, so it can fit in YMM registers.
>
> If I have two pointers to arrays of these things:
>
> @one = external global %athing
> @two = external global %athing
>
> and then I do a copy from one to the other
>
>   %a = load %athing* @two
>   store %athing %a, %athing* @one
>
> Then the code that is generated uses the XMM registers for the floats,
> but does 12 loads and then 12 stores.
>
> In contrast, if I manually cast to a properly sized float vector I get
> the desired single load and single store:
>
>   %two_vector = bitcast %athing* @two to <8 x float>*
>   %b = load <8 x float>* %two_vector
>   %one_vector = bitcast %athing* @one to <8 x float>*
>   store <8 x float> %b, <8 x float>* %one_vector
>
> The rest of the file demonstrates that the code for modifying these
> vectors is pretty good, but has examples of bad ways to initialize the
> structure and a good way to initialize it. If I try to store a
> constant struct, I get 13 stores. If I try to assemble a vector by
> casting <2 x i16> to float then <4 x i8> to float and installing them
> into a single <8 x float>, I do get the desired single store, but I
> get very complicated constants that are loaded from memory. In
> contrast, if I bitcast the <8 x float> to <16 x i16> and <32 x i8> as
> I go, then I get the desired initialization with no loads and just
> modifications of the single YMM register. (Even this last one,
> however, doesn't have the best assembly because the words and bytes
> are not inserted into the vector simultaneously, but instead
> individually.)
>
> I am kind of surprised that the obvious code didn't get optimized the
> way I expected and even the tedious version of the initialization
> isn't optimal either. I would like to know if a transformation of one
> to the other is feasible in LLVM (I know anything is possible, but
> what is feasible in this situation?) or if I should implement a
> transformation like this in my front-end and settle for the
> initialization that comes out.
>
> Thank you for your time,
>
> Jay
>
> --
> Jay McCarthy
> Associate Professor
> PLT @ CS @ UMass Lowell
> http://jeapostrophe.github.io
>
>            "Wherefore, be not weary in well-doing,
>       for ye are laying the foundation of a great work.
> And out of small things proceedeth that which is great."
>                           - D&C 64:33
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151103/80c7bb4b/attachment.html>