[llvm] r262397 - DAGCombiner: Turn truncate of a bitcasted vector to an extract

Mon Apr 25 07:49:59 PDT 2016

Hi Matt,

If I run llc

llc bug.ll -o bug.s

on this

target datalayout = 
"E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v128:128:128-n32:64"
target triple = "powerpc64-unknown-linux-gnu"

%rec9 = type { [4 x i8] }

@g_cm_s = global %rec9 { [4 x i8] [i8 111, i8 112, i8 113, i8 114] }

declare void @__fail()

define i8 @foo() {
   %1 = load <4 x i8>, <4 x i8> * bitcast (%rec9 * @g_cm_s to <4 x i8> *)
   %2 = bitcast <4 x i8> %1 to i32
   %3 = trunc i32 %2 to i8
   %4 = icmp eq i8 %3, 114
   br i1 %4, label %bb2, label %bb1

bb1:                                              ; preds = %0
   call void @__fail()
   br label %bb2

bb2:                                              ; preds = %0, %bb1
   ret i8 0

bb3:                                              ; No predecessors!
   ret i8 0
}

with an llc without your change I get

# BB#0:
	addis 3, 2, g_cm_s at toc@ha
	addi 3, 3, g_cm_s at toc@l
	lbz 3, 3(3)
	cmplwi	 3, 114
	beq	 0, .LBB0_2

Notice the

	lbz 3, 3(3)

Then if I compile it with your change I get

# BB#0:
	addis 3, 2, g_cm_s at toc@ha
	addi 3, 3, g_cm_s at toc@l
	lbz 3, 0(3)
	cmplwi	 3, 114
	beq	 0, .LBB0_2

Notice the

	lbz 3, 0(3)

So you get different result on PPC with and without your change. This 
indicates a bug to me.

Then finally if I compile with your change plus my own change I get

# BB#0:
	addis 3, 2, g_cm_s at toc@ha
	addi 3, 3, g_cm_s at toc@l
	lbz 3, 3(3)
	cmplwi	 3, 114
	beq	 0, .LBB0_2

which is the same as the original code.

Regards,
Mikael

On 04/25/2016 05:06 AM, Matt Arsenault wrote:
>
>> On Mar 4, 2016, at 05:38, Mikael Holmén <mikael.holmen at ericsson.com> wrote:
>>
>> Hi,
>>
>> On 03/04/2016 02:33 AM, Matt Arsenault wrote:
>>>
>>>> On Mar 3, 2016, at 00:27, Mikael Holmén via llvm-commits <llvm-commits at lists.llvm.org> wrote:
>>>>
>>>> Hi Matt,
>>>>
>>>> What about Big Endian targets? Shouldn't we extract the highest vector element instead of element 0 then?
>>>>
>>>> Regards,
>>>> Mikael
>>>
>>> I don’t know how vectors types work on big endian targets
>>
>> Me neither. :D
>>
>> But one case I've seen for my big-endian out-of-tree target is that we have:
>>
>> @g_cm_s = addrspace(21) global %rec802 { [4 x i16] [i16 111, i16 112, i16 113, i16 114] }
>>
>> and then:
>>
>>   %1 = load <4 x i16>, <4 x i16> addrspace(21)* bitcast (%rec802 addrspace(21)* @g_cm_s to <4 x i16> addrspace(21)*)
>>   %2 = bitcast <4 x i16> %1 to i64
>>   %3 = trunc i64 %2 to i16
>>   %_tmp7 = icmp eq i16 %3, 114
>>
>> Without the new optimization in visitTRUNCATE this code works well for me. The result of the trunc is 114 as expected, but with your change we get 111.
>>
>> So I've changed
>>
>> +
>> +      // We need to consider endianness when deciding which vector
>> +      // element to extract.
>> +      unsigned ElmtIdx =
>> +        DAG.getDataLayout().isBigEndian()
>> +        ? SrcVT.getVectorNumElements() - 1
>> +        : 0;
>>        return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, VT,
>> -                         VecSrc, DAG.getConstant(0, SL, IdxVT));
>> +                         VecSrc, DAG.getConstant(ElmtIdx, SL, IdxVT));
>>
>> locally to get my test to pass.
>>
>> I've no idea if there are any big-endian in-tree targets that has vectors where this can be an issue.
>>
>> /Mikael
>>
>>>
>>> -Matt
>>>
>>
>
>
> This might be the case on PPC? Can you try to write a test for that?
>
> -Matt
>