[llvm-testresults] buildbot failure in lab.llvm.org on phase1 - sanity

Sat Aug 2 04:04:32 PDT 2014

The Buildbot has detected a new failure on builder phase1 - sanity while building llvm.
Full details are available at:
 http://lab.llvm.org:8013/builders/phase1%20-%20sanity/builds/3116

Buildbot URL: http://lab.llvm.org:8013/

Buildslave for this Build: macpro1

Build Reason: scheduler
Build Source Stamp: 214628
Blamelist: chandlerc

BUILD FAILED: failed

sincerely,
 -The Buildbot

================================================================================

CHANGES:
Files:
 lib/Target/X86/X86ISelLowering.cpp
 test/CodeGen/X86/avx-basic.ll
 test/CodeGen/X86/avx-splat.ll
 test/CodeGen/X86/exedepsfix-broadcast.ll
 test/CodeGen/X86/vec_splat-3.ll
On: http://10.1.1.2/svn/llvm-project
For: llvm
At: Sat 02 Aug 2014 03:41:39
Changed By: chandlerc
Comments: [x86] Teach the target shuffle mask extraction to recognize unary forms
of normally binary shuffle instructions like PUNPCKL and MOVLHPS.

This detects cases where a single register is used for both operands
making the shuffle behave in a unary way. We detect this and adjust the
mask to use the unary form which allows the existing DAG combine for
shuffle instructions to actually work at all.

As a consequence, this uncovered a number of obvious bugs in the
existing DAG combine which are fixed. It also now canonicalizes several
shuffles even with the existing lowering. These typically are trying to
match the shuffle to the domain of the input where before we only really
modeled them with the floating point variants. All of the cases which
change to an integer shuffle here have something in the integer domain, so
there are no more or fewer domain crosses here AFAICT. Technically, it
might be better to go from a GPR directly to the floating point domain,
but detecting floating point *outputs* despite integer inputs is a lot
more code and seems unlikely to be worthwhile in practice. If folks are
seeing domain-crossing regressions here though, let me know and I can
hack something up to fix it.

Also as a consequence, a bunch of missed opportunities to form pshufb
now can be formed. Notably, splats of i8s now form pshufb.
Interestingly, this improves the existing splat lowering too. We go from
3 instructions to 1. Yes, we may tie up a register, but it seems very
likely to be worth it, especially if splatting the 0th byte (the
common case) as then we can use a zeroed register as the mask.Properties: 

File: lib/Target/X86/X86ISelLowering.cpp
On: http://10.1.1.2/svn/llvm-project
For: llvm
At: Sat 02 Aug 2014 03:41:39
Changed By: chandlerc
Comments: [x86] Fix a few typos in my comments spotted in passing.Properties: 

File: lib/Target/X86/X86MCInstLower.cpp
On: http://10.1.1.2/svn/llvm-project
For: llvm
At: Sat 02 Aug 2014 03:41:39
Changed By: chandlerc
Comments: [x86] Switch to using the variable we extracted this operand into.

Spotted this missed refactoring by inspection when reading code, and it
doesn't changethe functionality at all.Properties: 

Files:
 lib/Target/X86/Utils/X86ShuffleDecode.cpp
 lib/Target/X86/Utils/X86ShuffleDecode.h
 lib/Target/X86/X86ISelLowering.cpp
 test/CodeGen/X86/vector-shuffle-128-v16.ll
 test/CodeGen/X86/vector-shuffle-128-v8.ll
On: http://10.1.1.2/svn/llvm-project
For: llvm
At: Sat 02 Aug 2014 03:51:38
Changed By: chandlerc
Comments: [x86] Largely complete the use of PSHUFB in the new vector shuffle
lowering with a small addition to it and adding PSHUFB combining.

There is one obvious place in the new vector shuffle lowering where we
should form PSHUFBs directly: when without them we will unpack a vector
of i8s across two different registers and do a potentially 4-way blend
as i16s only to re-pack them into i8s afterward. This is the crazy
expensive fallback path for i8 shuffles and we can just directly use
pshufb here as it will always be cheaper (the unpack and pack are
two instructions so even a single shuffle between them hits our
three instruction limit for forming PSHUFB).

However, this doesn't generate very good code in many cases, and it
leaves a bunch of common patterns not using PSHUFB. So this patch also
adds support for extracting a shuffle mask from PSHUFB in the X86
lowering code, and uses it to handle PSHUFBs in the recursive shuffle
combining. This allows us to combine through them, combine multiple ones
together, and generally produce sufficiently high quality code.

Extracting the PSHUFB mask is annoyingly complex because it could be
either pre-legalization or post-legalization. At least this doesn't have
to deal with re-materialized constants. =] I've added decode routines to
handle the different patterns that show up at this level and we dispatch
through them as appropriate.

The two primary test cases are updated. For the v16 test case there is
still a lot of room for improvement. Since I was going through it
systematically I left behind a bunch of FIXME lines that I'm hoping to
turn into ALL lines by the end of this.Properties: 

LOGS: