[PATCH] D18566: [x86] use SSE/AVX ops for non-zero memsets (PR27100)

Sanjay Patel via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 29 16:11:52 PDT 2016


spatel added a comment.

In http://reviews.llvm.org/D18566#386084, @zansari wrote:

> .. in addition to all the gackiness, also notice that we're only doing 8B stores after all of that.
>
> I like the change, but any chance we could fix this issue before committing this change? We should really only be generating a couple of shifts/ors and a shuffle, followed by full 16B stores.


Clearly, the one-line patch was too ambitious. :)

What we're seeing in some of these changes is that we're hitting what I hope is a weird corner case: a slow unaligned SSE store implementation (ie, before SSE4.2) with a 32-bit OS. On 2nd thought, maybe that's not so weird.

In any case, I will fix the patch to preserve that existing behavior. By just loosening the restriction on the non-zero memset for fast CPUs, we'll avoid the strange codegen and still get the benefits shown in PR27100.


http://reviews.llvm.org/D18566





More information about the llvm-commits mailing list