[LLVMdev] Overzealous PromoteCastOfAllocation

Mon Sep 8 11:53:03 PDT 2008

Hi Matthijs,

Changing PromoteCastOfAllocation to not replace aggregate allocas with
non-aggregate allocas if they have GEP users sounds reasonable to me.

Finding the maximum alignment is sometimes still useful though, so
it would be nice to update the alignment field of the alloca even if
its type is left unchanged.

Dan

On Sep 8, 2008, at 8:57 AM, Matthijs Kooijman wrote:

> Hi all,
>
> I'm currently running into some problems with instcombine changing  
> the type of
> alloca instructions. In particular, the PromoteCastOfAllocation  
> looks at any
> allocation instruction that is used by a bitast.
>
> It does a few checks, but basically tries to change the type of the  
> alloca
> instruction to the type pointed to by the bitcasted type. The current
> heuristic for determining if this is a good idea, is "do it if the  
> aligment of
> the new type is larger than before" (Note that this is not just a  
> safety
> check, since when the aligments are equal, the alloca is not  
> changed). This
> heuristic seems to have been introduced to prevent infinite loops  
> when there
> are multiple uses of the alloca.
>
> At first glance, changing the alloca type to remove a bitcast seems  
> a sensible
> idea. However, the transformation seems quite ignorant of other uses  
> of the
> alloca. It does properly update them by inserting extra bitcasts,  
> but in a lot
> of cases, this means you can remove a single (or perhaps a few)  
> bitcasts,
> while introducing a bunch of other bitcasts. IMHO, a better policy  
> is needed.
>
> Another problem that occurs here, is that this transformation can  
> replace a
> struct alloca with a single integer alloca. This, in turn, can  
> confuse other
> passes like scalarrepl, and produce but ugly code. I've assembled a  
> small
> example of this.
>
> 	declare void @bar(i16, i16)
>
> 	define internal void @empty(i32* %X) {
> 		ret void
> 	}
>
> 	define void @foo(i16 %A, i16 %B) {
> 	entry:
> 		;; Alloc a struct
> 		%V = alloca {i16, i16}
> 		;; Save our arguments into it
> 		%V1 = getelementptr {i16, i16}* %V, i32 0, i32 0
> 		%V2 = getelementptr {i16, i16}* %V, i32 0, i32 1
> 		store i16 %A, i16* %V1
> 		store i16 %B, i16* %V2
>
> 		;; Load the values again
> 		%R1 = load i16* %V1
> 		%R2 = load i16* %V2
> 		;; Pass them to an external function to make them used
> 		call void @bar(i16 %R1, i16 %R2)
>
> 		;; Bitcast the alloca to i32&
> 		%V32 = bitcast {i16, i16}* %V to i32*
> 		;; And make it appear used (at first glance, deadargelim will
> 		;; can remove this use later on
> 		call void @empty(i32* %V32)
> 		ret void
> 	}
>
> The above program does almost nothing, but contains an alloca of  
> {i16, i16}
> which is bitcasted to i32* somewhere. This i32* appears used, but  
> can later be
> removed (you'll see why).
>
> Now, when I run instcombine over this code, I get the following (other
> functions left out, they are unchanged).
> 	define void @foo(i16 %A, i16 %B) {
> 	entry:
> 					%V = alloca i32         ; <i32*> [#uses=3]
> 					%tmpcast = bitcast i32* %V to { i16, i16 }*             ;  
> <{ i16, i16 }*> [#uses=1]
> 					%V1 = bitcast i32* %V to i16*           ; <i16*> [#uses=2]
> 					%V2 = getelementptr { i16, i16 }* %tmpcast, i32 0, i32  
> 1                ; <i16*> [#uses=2]
> 					store i16 %A, i16* %V1, align 4
> 					store i16 %B, i16* %V2
> 					%R1 = load i16* %V1, align 4            ; <i16> [#uses=1]
> 					%R2 = load i16* %V2             ; <i16> [#uses=1]
> 					call void @bar( i16 %R1, i16 %R2 )
> 					call void @empty( i32* %V )
> 					ret void
> 	}
>
> Here, the alloca is replaced by an i32 alloca. This doesn't seem  
> like an
> improvement to me, since now the proper uses of the struct (geps)  
> need a
> bitcast first. Though this looks like a minor problem at best, it  
> confuses
> scalarrepl.
>
> When I run -deadargelim -die -scalarrepl over the instcombined code  
> (the first
> two to remove the i32* use and bitcast), I get the following ugly  
> code:
> 	define void @foo(i16 %A, i16 %B) {
> 	entry:
> 					%A1 = zext i16 %A to i32                ; <i32> [#uses=1]
> 					%A12 = shl i32 %A1, 16          ; <i32> [#uses=1]
> 					%V.in.mask = and i32 undef, 65535               ; <i32> [#uses=1]
> 					%A12.ins = or i32 %V.in.mask, %A12              ; <i32> [#uses=1]
> 					%B9 = zext i16 %B to i32                ; <i32> [#uses=1]
> 					%V.in8.mask = and i32 %A12.ins, -65536          ; <i32> [#uses=1]
> 					%B9.ins = or i32 %V.in8.mask, %B9               ; <i32> [#uses=2]
> 					%R14 = lshr i32 %B9.ins, 16             ; <i32> [#uses=1]
> 					%R15 = trunc i32 %R14 to i16            ; <i16> [#uses=1]
> 					%R27 = trunc i32 %B9.ins to i16         ; <i16> [#uses=1]
> 					call void @bar( i16 %R15, i16 %R27 )
> 					call void @empty( )
> 					ret void
> 	}
>
> However, when I run -deadargelim -die -scalarrepl directly on the  
> original code, I get the result I expect:
> 	define void @foo(i16 %A, i16 %B) {
> 	entry:
> 					call void @bar( i16 %A, i16 %B )
> 					call void @empty( )
> 					ret void
> 	}
>
> Now, I know that the shift/zext/trunc mess above might be supposed  
> to be
> cleaned up by instcombine again (it isn't currently), I think that  
> the actual
> problem lies with PromoteCastOfAllocation in instcombine. IMHO, the  
> struct
> alloca shouldn't have been replaced with an i32 alloca in the first  
> place.
>
> Also, in the code where I originally noticed this problem, the  
> resulting code
> is more complex and less likely to be simplified again. In  
> particular, instead
> of the @empty function I have a call to memmove, whose destination  
> is passed
> as an argument to another function. ArgumentPromotion is able to  
> split up this
> struct argument and remove the memmove, but by then the alloca is  
> already
> screwed up by instcombine and scalarrepl is no longer able to  
> properly expand
> it.
>
>
> Any comments on this problem? Do you agree that the current  
> replacement policy
> of PromoteCastOfAllocation is a bit too wide?
>
> I think that some other policy should be implemented in
> PromoteCastOfAllocation, probably one that takes into account the  
> number of
> bitcasts that can be removed and the number that would be  
> introduced, or
> perhaps just refuse to change a struct alloca to an integer alloca,  
> or...
>
>
> Gr.
>
> Matthijs
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev