[PATCH] [PATCH][SROA]Also slice the STORE when slicing a LOAD in AllocaSliceRewriter

Thu Sep 18 23:43:10 PDT 2014

Hi all,

Thanks for the code review. I’ve learnt a lot.

Now I think I agree with Chandler, sometimes wide store is better than two narrow stores. I’ve attached the test cases.

Currently, opt will do nothing for the narrow store case:

define i64 @narrow(i32 %a) {

  store i32 %a, i32* bitcast (i64* @ep to i32*), align 4

  store i32 %a, i32* bitcast (i8* getelementptr (i8* bitcast (i64* @ep to i8*), i64 4) to i32*), align 4

  %m = load i64* @ep, align 8

  %res = add i64 %m, 32

  ret i64 %res

}

But opt will remove the load IR in the wide store case and use the result of zext/shl/or directly (In the EarlyCSE pass):

define i64 @wide(i32 %a) {

  %ext = zext i32 %a to i64

  %shift = shl nuw i64 %ext, 32

  %insert = or i64 %shift, %ext

  store i64 %insert, i64* @ep, align 8

  %res = add i64 %insert, 32   // use %insert instead of load from @ep

  ret i64 %res

}

Currently, the first case will finally generate 2 stores and 1 load, while the second case will generate math instructions and only 1 store. So I think the second version is better than the first version.

So it seems we should fix this issue in other places (SLP, backend …).

Thanks,

-Hao 

From: Chandler Carruth [mailto:chandlerc at google.com] 
Sent: Monday, September 15, 2014 4:23 PM
To: James Molloy
Cc: Hao Liu; Commit Messages and Patches for LLVM; reviews+D4954+public+39a520765005fb8e at reviews.llvm.org
Subject: Re: [PATCH] [PATCH][SROA]Also slice the STORE when slicing a LOAD in AllocaSliceRewriter

On Fri, Sep 12, 2014 at 5:32 AM, James Molloy <james at jamesmolloy.co.uk> wrote:

Hi Chandler,

I had a discussion with Jiangning about this based upon our chat on IRC, and he asked a question I couldn't answer, so possibly you could?

On IRC you asserted that we need the wide store so that instcombines can see that the store was made up of two constitutent values. Why is this the case? Why can't memdep get you the same information with two narrow stores?

Because analyzing through memory is hard?

Not really sure what you're asking. instcombine doesn't have memdep, and doesn't do the kind of analyses that GVN does. (Nor could it for compile time reasons...)

Surely a query such as "what does this i64 load depend on?" would return "these two i32 stores"? If this is not the case, why not?

It is the case in things which do heavy-weight memory load/store analyses, but those aren't instcombine.

and if it is the case, why can't instcombine use this information to dive across load/store boundaries?

Because instcombine is fundamentally about optimizing based on the SSA graph, not based on value numbers or other load/store information about the contents of memory.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140919/ffe1331f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stores.ll
Type: application/octet-stream
Size: 520 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140919/ffe1331f/attachment.obj>