[LLVMdev] RFC: Missing canonicalization in LLVM
Pete Cooper
peter_cooper at apple.com
Tue Apr 21 11:00:24 PDT 2015
Hi Daniel
Thanks for the excellent breakdown of whats going on here.
Earlier in the thread on this I made this comment:
"The first thing that springs to mind is that I don’t trust the backend to get this right. I don’t think it will understand when an i32 load/store would have been preferable to a float one or vice versa. I have no evidence of this, but given how strongly typed tablegen is, I don’t think it can make a good choice here.
So I think we probably need to teach the backend how to undo whatever canonical form we choose if it has a reason to”
Without seeing the machine instructions, its hard to be 100% certain, but the case you’ve found may be simple enough that the backend can actually fix this. However, such a fixup would be quite target specific (such a target would need different register classes for integers and doubles in this case), and we’d need such a pass for all targets which isn’t ideal.
So i wouldn’t rule out a backend solution, but i have a preference for your suggestion to improve SROA.
In this particular case, it makes sense for SROA to do effectively the same analysis InstCombine did here and work out when a load is just raw data vs when its data is used as a specific type. The relevant piece of InstCombine is this:
// Try to canonicalize loads which are only ever stored to operate over
// integers instead of any other type. We only do this when the loaded type
// is sized and has a size exactly the same as its store size and the store
// size is a legal integer type.
if (!Ty->isIntegerTy() && Ty->isSized() &&
DL.isLegalInteger(DL.getTypeStoreSizeInBits(Ty)) &&
DL.getTypeStoreSizeInBits(Ty) == DL.getTypeSizeInBits(Ty)) {
if (std::all_of(LI.user_begin(), LI.user_end(), [&LI](User *U) {
auto *SI = dyn_cast<StoreInst>(U);
return SI && SI->getPointerOperand() != &LI;
})) {
...
After ignoring load/stores which satisfy something like the above code, you can always fallback to the current code of choosing an integer type, so in the common case there won’t be any behavior difference.
Cheers
Pete
> On Apr 21, 2015, at 9:18 AM, Daniel Stewart <stewartd at codeaurora.org> wrote:
>
> So this change did indeed have an effect! J
>
> I’m seeing regressions in a number of benchmarks mainly due to a host of extra bitcasts that get introduced. Here’s the problem I’m seeing in a nutshell:
>
> 1) There is a Phi with input type double
> 2) Polly demotes the phi into a load/store of type double
> 3) InstCombine canonicalizes the load/store to use i64 instead of double
> 4) SROA removes the load/store & inserts a phi back in, using i64 as the type. Inserts bitcast to get to double.
> 5) The bitcast sticks around and eventually get translated into FMOVs (for AArch64 at least).
>
> The function findCommonType() in SROA.cpp is used to obtain the type that should be used for the new alloca that SROA wants to create. It’s decision process is essentially – if all loads/stores of alloca are the same, use that type; else use the corresponding integer type. This causes bitcasts to be inserted in a number of places, most all of which stick around.
>
> I’ve copied a reduced version of an instance of the problem below. I’m looking for comments on what others think is the right solution here. Make SROA more intelligent about picking the type?
>
> The code is below with all unnecessary code removed for easy consumption.
>
> Daniel
> Before Polly – Prepare code for polly we have code that looks like:
>
> while.cond473: ; preds = %while.cond473.outer78, %while.body475
> %p_j_x452.0 = phi double [ %105, %while.body475 ], [ %p_j_x452.0.ph82, %while.cond473.outer78 ]
>
> while.body475: ; preds = %while.cond473
> %sub480 = fsub fast double %64, %p_j_x452.0
> %105 = load double* %x485, align 8, !tbaa !25
>
> After Polly – Prepare code for polly we have:
>
> while.cond473: ; preds = %while.cond473.outer78, %while.body475
> %p_j_x452.0.reload = load double* %p_j_x452.0.reg2mem
>
> while.body475: ; preds = %while.cond473
> %sub480 = fsub fast double %64, %p_j_x452.0.reload
> %110 = load double* %x485, align 8, !tbaa !25
> store double %110, double* %p_j_x452.0.reg2mem
>
> After Combine redundant instructions :
>
> while.cond473: ; preds = %while.cond473.outer78, %while.body475
> %p_j_x452.0.reload = load double* %p_j_x452.0.reg2mem, align 8
>
> while.body475: ; preds = %while.cond473
> %sub480 = fsub fast double %74, %p_j_x452.0.reload
> %x485 = getelementptr inbounds %struct.CompAtom* %15, i64 %idxprom482, i32 0, i32 0
> %194 = bitcast double* %x485 to i64*
> %195 = load i64* %194, align 8, !tbaa !25
> %200 = bitcast double* %p_j_x452.0.reg2mem to i64*
> store i64 %195, i64* %200, align 8
>
> After SROA :
>
> while.cond473: ; preds = %while.cond473.outer78, %while.body475
> %p_j_x452.0.reg2mem.sroa.0.0.p_j_x452.0.reload362 = phi i64 [ %p_j_x452.0.ph73.reg2mem.sroa.0.0.load368, %while.cond473.outer78 ], [ %178, %while.body475 ]
> %173 = bitcast i64 %p_j_x452.0.reg2mem.sroa.0.0.p_j_x452.0.reload362 to double
>
> while.body475: ; preds = %while.cond473
> %sub480 = fsub fast double %78, %173
> %x485 = getelementptr inbounds %struct.CompAtom* %15, i64 %idxprom482, i32 0, i32 0
> %177 = bitcast double* %x485 to i64*
> %178 = load i64* %177, align 8, !tbaa !25
>
>
> From: llvmdev-bounces at cs.uiuc.edu <mailto:llvmdev-bounces at cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu <mailto:llvmdev-bounces at cs.uiuc.edu>] On Behalf Of Chandler Carruth
> Sent: Wednesday, January 21, 2015 8:32 PM
> To: Pete Cooper
> Cc: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] RFC: Missing canonicalization in LLVM
>
>
> On Wed, Jan 21, 2015 at 3:06 PM, Pete Cooper <peter_cooper at apple.com <mailto:peter_cooper at apple.com>> wrote:
>> Sounds good to me. Integers it is then.
>
>
> FYI, thanks, I'm just going to commit this then. It seems we're all in essential agreement. We can revert it and take a more cautious approach if something terrible happens. =]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150421/327165e4/attachment.html>
More information about the llvm-dev
mailing list