[PATCH] D18566: [x86] use SSE/AVX ops for non-zero memsets (PR27100)

Tue Mar 29 11:50:42 PDT 2016

spatel created this revision.
spatel added reviewers: zansari, kbsmith1, hjl.tools, DavidKreitzer, RKSimon, qcolombet, MatzeB.
spatel added a subscriber: llvm-commits.
Herald added a subscriber: mcrosier.

This is a one-line-of-code patch for:
https://llvm.org/bugs/show_bug.cgi?id=27100

The reasoning that I'm hoping will hold is that we shouldn't discriminate a memset operation from a memcpy at this level because they have exactly the same load/store instruction type requirements.

But as the test cases show, there's some ugliness here:
1. The i386 (Windows) test expands to use 32 stores instead of a 'rep stosl'. Is that better or worse? (I'm not sure why this change even happens yet.)
2. The memset-2.ll tests look quite awkward in the way they splat the byte value into an XMM reg; imul isn't generally cheap.
3. Why do the memset-nonzero.ll tests for an AVX1 target not use vbroadcast like the AVX2 target?
4. Why does the machine scheduler reorder the DAG nodes? In all cases, we create those store nodes in low-to-high mem address order, but that's not how the machine instructions come out.

I don't think any of the above are big enough problems to prevent this patch from going in first, but we should improve those.

http://reviews.llvm.org/D18566

Files:
  lib/Target/X86/X86ISelLowering.cpp
  test/CodeGen/X86/mem-intrin-base-reg.ll
  test/CodeGen/X86/memset-2.ll
  test/CodeGen/X86/memset-nonzero.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D18566.51958.patch
Type: text/x-patch
Size: 14116 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160329/a5298f08/attachment.bin>