[llvm-commits] PATCH: CONCAT_VECTORS expansion through the stack generated incorrect stack offsets, causing broken code to be generated

Sat May 5 01:28:48 PDT 2012

On 3 May 2012, at 16:41, Duncan Sands wrote:

> Hi,
> 
> On 03/05/12 15:24, Heikki Kultala wrote:
>> The expansion of CONCAT_VECTORS goes into
>> SelectionDAGLegalize::ExpandVectorBuildThroughStack(SDNode* Node).
>> 
>> The routine calculated the offsets of the stores from the size of the element
>> size of the result vector. This works when the routine is used to expand
>> BUILD_VECTOR of scalars, but it works incorrectly for CONCAT_VECTORS where one
>> operand has bigger size than single element of the result vector.
>> 
>> This patch fixes the store offsets to be calculated by the size of the input
>> operands instead of the size of the elements in the result vectors.
> 
> can you please add a testcase. Also, are you saying that the result element
> type of CONCAT_VECTORS doesn't have to agree with the input element type, eg
> that <4 x i32> concat_vectors <2 x i64>, <2 x i64> is possible?  If so, this
> isn't documented in ISDOpcodes.h so can you please document it there.

No, I'm not saying that.

The problem with the original code was that the size of the store is the size of the (whole) input vector, 
but the size used in the store offset calculation was the size of one _element_ of a vector.

when concatenating two v2i32's into v4i32.

What should happen is that 

1) stack slot with size 16(sizeof result vector) is allocated, addres is SP+n
2) store of first v2i32 into SP+n
3) store of second v2i32 into SP+n+8 where 8 comes from sizeof(v2i32)
4) load of v4i32 from SP+n

BUT the current llvm code does:

1) stack slot with size 16(sizeof result vector) is allocated, addres is SP+n
2) store of first v2i32 into SP+n
3) store of second v2i32 into SP+n+4 where 4 comes from sizeof a single element in a vector(incorrect)
4) load of v4i32 from SP+n

This means that the second and third element of the vector go into same location (sp+n+4),
and nothing goes to address sp+n+12, meaning that the last element of the loaded vector is total garbage,
and in place of third element is last element and in place of second element is third element.