[LLVMdev] RFC: Missing canonicalization in LLVM

Wed Jan 21 15:02:57 PST 2015

----- Original Message -----
> From: "Pete Cooper" <peter_cooper at apple.com>
> To: "Chandler Carruth" <chandlerc at gmail.com>
> Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Wednesday, January 21, 2015 4:43:47 PM
> Subject: Re: [LLVMdev] RFC: Missing canonicalization in LLVM
> 
> 
> On Jan 21, 2015, at 2:18 PM, Chandler Carruth < chandlerc at gmail.com >
> wrote:
> 
> 
> 
> 
> On Wed, Jan 21, 2015 at 2:16 PM, Chandler Carruth <
> chandlerc at gmail.com > wrote:
> 
> 
> 
> So, we've run into some test cases which are pretty alarming.
> 
> 
> When inlining code in various different paths we can end up with this
> IR:
> 
> 
> 
> define void @f(float* %value, i8* %b) {
> 
> entry:
> %0 = load float* %value, align 4
> %1 = bitcast i8* %b to float*
> store float %0, float* %1, align 1
> ret void
> }
> 
> 
> define void @g(float* %value, i8* %b) {
> 
> entry:
> %0 = bitcast float* %value to i32*
> %1 = load i32* %0, align 4
> %2 = bitcast i8* %b to i32*
> store i32 %1, i32* %2, align 1
> ret void
> }
> 
> 
> Now, I don't really care one way or the other about these two IR
> inputs, but it's pretty concerning that we get these two equivalent
> bits of code and nothing canonicalizes to one or the other.
> 
> 
> So, the naive first blush approach here would be to canonicalize on
> the first -- it has fewer instructions after all -- but I don't
> think that's the right approach for two reasons:
> 
> 
> 1) It will be a *very* narrow canonicalization that only works with
> overly specific sets of casted pointers.
> 2) It doesn't effectively move us toward the optimizer treating IR
> with different pointee types for pointer types indistinguishably.
> Some day, I continue to think we should get rid of the pointee types
> entirely.
> 
> 
> To see why #1 and #2 are problematic, assume another round of
> inlining took place and we suddenly had the following IR:
> 
> 
> 
> 
> And the missing IR example:
> 
> 
> define void @f(i8* %value, i8* %b) {
> 
> entry:
> %0 = bitcast i8* %value to float*
> %1 = load float* %0, align 4
> %2 = bitcast i8* %b to float*
> store float %0, float* %1, align 1
> ret void
> }
> 
> 
> define void @g(i8* %value, i8* %b) {
> 
> entry:
> %0 = bitcast i8* %value to i32*
> %1 = load i32* %0, align 4
> %2 = bitcast i8* %b to i32*
> store i32 %1, i32* %2, align 1
> ret void
> }
> 
> 
> 
> 
> 
> 
> 
> AFAICT, this is the same and we still don't have a good
> canonicalization story.
> 
> 
> What seems like the obvious important and missing canonicalization is
> that when we have a loaded value that is *only* used by storing it
> back into memory, we don't canonicalize the type of that *value*
> (ignoring the pointer types) to a single value type.
> 
> 
> So, the only really suitable type for this kind of stuff is 'iN'
> where N matches the number of bits loaded or stored.
> 
> 
> I have this change implemented. It is trivial and unsurprising.
> However, the effects of this are impossible to predict so I wanted
> to make sure it made sense to others. Essentially, I expect random
> and hard to track down performance fluctuations across the board.
> Some things may get better, others may get worse, and they will
> probably all be bugs elsewhere in the stack.
> 
> 
> So, thoughts? The first thing that springs to mind is that I don’t
> trust the backend to get this right. I don’t think it will
> understand when an i32 load/store would have been preferable to a
> float one or vice versa. I have no evidence of this, but given how
> strongly typed tablegen is, I don’t think it can make a good choice
> here.
> 
> 
> So I think we probably need to teach the backend how to undo whatever
> canonical form we choose if it has a reason to. And the best long
> term solution is for tablegen to have sized load/stores, not typed
> ones.

You still need to make sure, in many cases, that you load into the correct register file to avoid expensive cross-domain moves. So *something* need to look at the eventual users, either the IR-level optimizer or the backend. I'd prefer the IR-level optimizer if possible.

 -Hal

> 
> 
> One (potentially expensive) way to choose the canonical form here is
> to look at the users of the load and see what type works best. If we
> load an i32, but bit cast and do an fp operation on it, then a float
> load was best. If we just load it then store, then in theory either
> type works.
> 
> 
> Pete
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -Chandler
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory