[llvm] r259796 - [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads

Sat Feb 6 08:51:15 PST 2016

Michel, thanks for the report, this should be fixed by rL259991.

> On 5 Feb 2016, at 10:10, Michel Dänzer <michel at daenzer.net> wrote:
> 
> 
> Hi Simon,
> 
> 
> On 05.02.2016 01:12, Simon Pilgrim via llvm-commits wrote:
>> Author: rksimon
>> Date: Thu Feb  4 10:12:56 2016
>> New Revision: 259796
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=259796&view=rev
>> Log:
>> [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads
>> 
>> This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load.
>> 
>> Differential Revision: http://reviews.llvm.org/D16729
> 
> This change introduced an assertion failure with the Mesa llvmpipe
> driver unit test lp_test_format. See below for information about the
> CPU, the IR, the assertion failure and the backtrace.
> 
> 
> processor	: 0
> vendor_id	: AuthenticAMD
> cpu family	: 21
> model		: 48
> model name	: AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
> stepping	: 1
> microcode	: 0x6003106
> cpu MHz		: 4100.000
> cache size	: 2048 KB
> physical id	: 0
> siblings	: 4
> core id		: 0
> cpu cores	: 2
> apicid		: 16
> initial apicid	: 0
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 13
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold vmmcall fsgsbase bmi1 xsaveopt
> bugs		: fxsave_leak sysret_ss_attrs
> bogomips	: 8200.55
> TLB size	: 1536 4K pages
> clflush size	: 64
> cache_alignment	: 64
> address sizes	: 48 bits physical, 48 bits virtual
> power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]
> 
> 
> define void @fetch_r32_unorm_float(<4 x float>*, i8*, i32, i32, { [2048 x i32], [128 x i64] }*) {
> entry:
>  %5 = getelementptr i8, i8* %1, i32 0
>  %6 = bitcast i8* %5 to i32*
>  %7 = load i32, i32* %6
>  %8 = insertelement <4 x i32> undef, i32 %7, i32 0
>  %9 = shufflevector <4 x i32> %8, <4 x i32> undef, <4 x i32> zeroinitializer
>  %10 = lshr <4 x i32> %9, <i32 0, i32 undef, i32 undef, i32 undef>
>  %11 = and <4 x i32> %10, <i32 -1, i32 0, i32 0, i32 0>
>  %12 = uitofp <4 x i32> %11 to <4 x float>
>  %13 = fmul <4 x float> %12, <float 0x3DF0000000000000, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>
>  %14 = shufflevector <4 x float> %13, <4 x float> <float 0.000000e+00, float 1.000000e+00, float undef, float undef>, <4 x i32> <i32 0, i32 4, i32 4, i32 5>
>  store <4 x float> %14, <4 x float>* %0
>  ret void
> }
> 
> 
> lp_test_format: ../lib/CodeGen/SelectionDAG/SelectionDAG.cpp:5776: llvm::SDNode* llvm::SelectionDAG::UpdateNodeOperands(llvm::SDNode*, llvm::SDValue, llvm::SDValue): Assertion `N->getNumOperands() == 2 && "Update with wrong number of operands"' failed.
> 
> Program received signal SIGABRT, Aborted.
> 0x00007ffff45a7507 in __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
> 55	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  0x00007ffff45a7507 in __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
> #1  0x00007ffff45a88da in __GI_abort () at abort.c:89
> #2  0x00007ffff45a059d in __assert_fail_base (fmt=0x7ffff46dd6b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion at entry=0x7ffff6c99f70 "N->getNumOperands() == 2 && \"Update with wrong number of operands\"", 
>    file=file at entry=0x7ffff6c97f40 "../lib/CodeGen/SelectionDAG/SelectionDAG.cpp", line=line at entry=5776, 
>    function=function at entry=0x7ffff6ca3bc0 <llvm::SelectionDAG::UpdateNodeOperands(llvm::SDNode*, llvm::SDValue, llvm::SDValue)::__PRETTY_FUNCTION__> "llvm::SDNode* llvm::SelectionDAG::UpdateNodeOperands(llvm::SDNode*, llvm::SDValue, llvm::SDValue)")
>    at assert.c:92
> #3  0x00007ffff45a0652 in __GI___assert_fail (assertion=assertion at entry=0x7ffff6c99f70 "N->getNumOperands() == 2 && \"Update with wrong number of operands\"", file=file at entry=0x7ffff6c97f40 "../lib/CodeGen/SelectionDAG/SelectionDAG.cpp", line=line at entry=5776, 
>    function=function at entry=0x7ffff6ca3bc0 <llvm::SelectionDAG::UpdateNodeOperands(llvm::SDNode*, llvm::SDValue, llvm::SDValue)::__PRETTY_FUNCTION__> "llvm::SDNode* llvm::SelectionDAG::UpdateNodeOperands(llvm::SDNode*, llvm::SDValue, llvm::SDValue)")
>    at assert.c:101
> #4  0x00007ffff5f87451 in llvm::SelectionDAG::UpdateNodeOperands (this=<optimized out>, N=N at entry=0x8af230, Op1=..., Op2=...) at ../lib/CodeGen/SelectionDAG/SelectionDAG.cpp:5776
> #5  0x00007ffff679f276 in <lambda(llvm::EVT, llvm::LoadSDNode*)>::operator()(llvm::EVT, llvm::LoadSDNode *) const (__closure=__closure at entry=0x7fffffffc020, VT=..., LDBase=LDBase at entry=0x8af230) at ../lib/Target/X86/X86ISelLowering.cpp:5616
> #6  0x00007ffff67be902 in EltsFromConsecutiveLoads (VT=..., Elts=..., DL=..., DAG=..., isAfterLegalize=true) at ../lib/Target/X86/X86ISelLowering.cpp:5679
> #7  0x00007ffff67fc454 in PerformShuffleCombine (N=N at entry=0x861370, DAG=..., DCI=..., Subtarget=...) at ../lib/Target/X86/X86ISelLowering.cpp:24321
> #8  0x00007ffff681292d in llvm::X86TargetLowering::PerformDAGCombine (this=<optimized out>, N=0x861370, DCI=...) at ../lib/Target/X86/X86ISelLowering.cpp:28481
> #9  0x00007ffff5e70a48 in (anonymous namespace)::DAGCombiner::combine (this=this at entry=0x7fffffffc950, N=N at entry=0x861370) at ../lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1461
> #10 0x00007ffff5e72a7e in (anonymous namespace)::DAGCombiner::Run (AtLevel=llvm::AfterLegalizeVectorOps, this=0x7fffffffc950) at ../lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1301
> #11 llvm::SelectionDAG::Combine (this=<optimized out>, Level=Level at entry=llvm::AfterLegalizeVectorOps, AA=..., OptLevel=<optimized out>) at ../lib/CodeGen/SelectionDAG/DAGCombiner.cpp:14801
> #12 0x00007ffff5fa9555 in llvm::SelectionDAGISel::CodeGenAndEmitDAG (this=this at entry=0x8b4f90) at ../lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:801
> #13 0x00007ffff5fa9a85 in llvm::SelectionDAGISel::SelectBasicBlock (this=this at entry=0x8b4f90, Begin=..., Begin at entry=..., End=..., End at entry=..., HadTailCall=@0x7fffffffcec8: false) at ../lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:669
> #14 0x00007ffff5fb246b in llvm::SelectionDAGISel::SelectAllBasicBlocks (this=this at entry=0x8b4f90, Fn=...) at ../lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1361
> #15 0x00007ffff5fb3def in llvm::SelectionDAGISel::runOnMachineFunction (this=0x8b4f90, mf=...) at ../lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:503
> #16 0x00007ffff679a674 in (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction (this=<optimized out>, MF=...) at ../lib/Target/X86/X86ISelDAGToDAG.cpp:171
> #17 0x00007ffff5b43869 in llvm::FPPassManager::runOnFunction (this=0x88b9a0, F=...) at ../lib/IR/LegacyPassManager.cpp:1550
> #18 0x00007ffff5b43c1b in llvm::FPPassManager::runOnModule (this=0x88b9a0, M=...) at ../lib/IR/LegacyPassManager.cpp:1571
> #19 0x00007ffff5b434a4 in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=<optimized out>) at ../lib/IR/LegacyPassManager.cpp:1627
> #20 llvm::legacy::PassManagerImpl::run (this=0x81d7e0, M=...) at ../lib/IR/LegacyPassManager.cpp:1730
> #21 0x00007ffff5b4367e in llvm::legacy::PassManager::run (this=this at entry=0x7fffffffd230, M=...) at ../lib/IR/LegacyPassManager.cpp:1761
> #22 0x00007ffff66f8958 in llvm::MCJIT::emitObject (this=this at entry=0x86e550, M=M at entry=0x8532d0) at ../lib/ExecutionEngine/MCJIT/MCJIT.cpp:160
> #23 0x00007ffff66f90a7 in llvm::MCJIT::generateCodeForModule (this=0x86e550, M=0x8532d0) at ../lib/ExecutionEngine/MCJIT/MCJIT.cpp:203
> #24 0x00007ffff66f56c0 in llvm::MCJIT::finalizeObject (this=0x86e550) at ../lib/ExecutionEngine/MCJIT/MCJIT.cpp:251
> #25 0x00007ffff66d6a16 in LLVMGetPointerToGlobal (EE=0x86e550, Global=0x837798) at ../lib/ExecutionEngine/ExecutionEngineBindings.cpp:295
> #26 0x00000000004e0089 in gallivm_jit_function (gallivm=gallivm at entry=0x80d730, func=func at entry=0x837798) at ../../../../src/gallium/auxiliary/gallivm/lp_bld_init.c:640
> #27 0x0000000000405ebe in test_format_float (verbose=0, desc=0x79b480 <util_format_r32_unorm_description>, fp=0x0) at ../../../../../src/gallium/drivers/llvmpipe/lp_test_format.c:157
> #28 test_one (verbose=0, format_desc=0x79b480 <util_format_r32_unorm_description>, fp=0x0) at ../../../../../src/gallium/drivers/llvmpipe/lp_test_format.c:336
> #29 test_all (verbose=verbose at entry=0, fp=fp at entry=0x0) at ../../../../../src/gallium/drivers/llvmpipe/lp_test_format.c:397
> #30 0x0000000000406c72 in test_some (verbose=verbose at entry=0, fp=fp at entry=0x0, n=n at entry=1000) at ../../../../../src/gallium/drivers/llvmpipe/lp_test_format.c:413
> #31 0x00000000004059f3 in main (argc=1, argv=0x7fffffffe708) at ../../../../../src/gallium/drivers/llvmpipe/lp_test_main.c:410
> 
> 
> -- 
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer