[LLVMdev] Overzealous PromoteCastOfAllocation

Mon Sep 8 08:57:38 PDT 2008

Hi all,

I'm currently running into some problems with instcombine changing the type of
alloca instructions. In particular, the PromoteCastOfAllocation looks at any
allocation instruction that is used by a bitast.

It does a few checks, but basically tries to change the type of the alloca
instruction to the type pointed to by the bitcasted type. The current
heuristic for determining if this is a good idea, is "do it if the aligment of
the new type is larger than before" (Note that this is not just a safety
check, since when the aligments are equal, the alloca is not changed). This
heuristic seems to have been introduced to prevent infinite loops when there
are multiple uses of the alloca.

At first glance, changing the alloca type to remove a bitcast seems a sensible
idea. However, the transformation seems quite ignorant of other uses of the
alloca. It does properly update them by inserting extra bitcasts, but in a lot
of cases, this means you can remove a single (or perhaps a few) bitcasts,
while introducing a bunch of other bitcasts. IMHO, a better policy is needed.

Another problem that occurs here, is that this transformation can replace a
struct alloca with a single integer alloca. This, in turn, can confuse other
passes like scalarrepl, and produce but ugly code. I've assembled a small
example of this.

	declare void @bar(i16, i16)

	define internal void @empty(i32* %X) {
		ret void
	}

	define void @foo(i16 %A, i16 %B) {
	entry:
		;; Alloc a struct
		%V = alloca {i16, i16}
		;; Save our arguments into it
		%V1 = getelementptr {i16, i16}* %V, i32 0, i32 0
		%V2 = getelementptr {i16, i16}* %V, i32 0, i32 1
		store i16 %A, i16* %V1
		store i16 %B, i16* %V2

		;; Load the values again
		%R1 = load i16* %V1
		%R2 = load i16* %V2
		;; Pass them to an external function to make them used
		call void @bar(i16 %R1, i16 %R2)

		;; Bitcast the alloca to i32&
		%V32 = bitcast {i16, i16}* %V to i32*
		;; And make it appear used (at first glance, deadargelim will
		;; can remove this use later on
		call void @empty(i32* %V32)
		ret void
	}

The above program does almost nothing, but contains an alloca of {i16, i16}
which is bitcasted to i32* somewhere. This i32* appears used, but can later be
removed (you'll see why).

Now, when I run instcombine over this code, I get the following (other
functions left out, they are unchanged).
	define void @foo(i16 %A, i16 %B) {
	entry:
					%V = alloca i32         ; <i32*> [#uses=3]
					%tmpcast = bitcast i32* %V to { i16, i16 }*             ; <{ i16, i16 }*> [#uses=1]
					%V1 = bitcast i32* %V to i16*           ; <i16*> [#uses=2]
					%V2 = getelementptr { i16, i16 }* %tmpcast, i32 0, i32 1                ; <i16*> [#uses=2]
					store i16 %A, i16* %V1, align 4
					store i16 %B, i16* %V2
					%R1 = load i16* %V1, align 4            ; <i16> [#uses=1]
					%R2 = load i16* %V2             ; <i16> [#uses=1]
					call void @bar( i16 %R1, i16 %R2 )
					call void @empty( i32* %V )
					ret void
	}

Here, the alloca is replaced by an i32 alloca. This doesn't seem like an
improvement to me, since now the proper uses of the struct (geps) need a
bitcast first. Though this looks like a minor problem at best, it confuses
scalarrepl.

When I run -deadargelim -die -scalarrepl over the instcombined code (the first
two to remove the i32* use and bitcast), I get the following ugly code:
	define void @foo(i16 %A, i16 %B) {
	entry:
					%A1 = zext i16 %A to i32                ; <i32> [#uses=1]
					%A12 = shl i32 %A1, 16          ; <i32> [#uses=1]
					%V.in.mask = and i32 undef, 65535               ; <i32> [#uses=1]
					%A12.ins = or i32 %V.in.mask, %A12              ; <i32> [#uses=1]
					%B9 = zext i16 %B to i32                ; <i32> [#uses=1]
					%V.in8.mask = and i32 %A12.ins, -65536          ; <i32> [#uses=1]
					%B9.ins = or i32 %V.in8.mask, %B9               ; <i32> [#uses=2]
					%R14 = lshr i32 %B9.ins, 16             ; <i32> [#uses=1]
					%R15 = trunc i32 %R14 to i16            ; <i16> [#uses=1]
					%R27 = trunc i32 %B9.ins to i16         ; <i16> [#uses=1]
					call void @bar( i16 %R15, i16 %R27 )
					call void @empty( )
					ret void
	}

However, when I run -deadargelim -die -scalarrepl directly on the original code, I get the result I expect:
	define void @foo(i16 %A, i16 %B) {
	entry:
					call void @bar( i16 %A, i16 %B )
					call void @empty( )
					ret void
	}

Now, I know that the shift/zext/trunc mess above might be supposed to be
cleaned up by instcombine again (it isn't currently), I think that the actual
problem lies with PromoteCastOfAllocation in instcombine. IMHO, the struct
alloca shouldn't have been replaced with an i32 alloca in the first place.

Also, in the code where I originally noticed this problem, the resulting code
is more complex and less likely to be simplified again. In particular, instead
of the @empty function I have a call to memmove, whose destination is passed
as an argument to another function. ArgumentPromotion is able to split up this
struct argument and remove the memmove, but by then the alloca is already
screwed up by instcombine and scalarrepl is no longer able to properly expand
it.

Any comments on this problem? Do you agree that the current replacement policy
of PromoteCastOfAllocation is a bit too wide?

I think that some other policy should be implemented in
PromoteCastOfAllocation, probably one that takes into account the number of
bitcasts that can be removed and the number that would be introduced, or
perhaps just refuse to change a struct alloca to an integer alloca, or...

Gr.

Matthijs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080908/617834b4/attachment.sig>