[LLVMdev] RFC: Missing canonicalization in LLVM

Thu Apr 23 10:41:53 PDT 2015

FYI, on vacation and then at a conference but will actually look at this on
Monday next week.

On Thu, Apr 23, 2015, 18:31 Daniel Stewart <stewartd at codeaurora.org> wrote:

> I think I can solve this by both adding BitCast checks to the loads, in
> addition to the Stores, and also checking the Stores to ensure they are fed
> by a Load and that the Load only feeds it. I’ll test this solution some
> more and try to make a patch.
>
>
>
> Daniel
>
>
>
> *From:* llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] *On
> Behalf Of *Daniel Stewart
> *Sent:* Thursday, April 23, 2015 9:17 AM
>
>
> *To:* 'Pete Cooper'
> *Cc:* 'LLVM Developers Mailing List'
> *Subject:* Re: [LLVMdev] RFC: Missing canonicalization in LLVM
>
>
>
> Thanks for the reply Pete. Unfortunately, I don’t think it is going to be
> as simple as ignoring those loads which only store. In findCommonType(),
> only one alloca is passed in at a time. So, while you could find those
> cases where that alloca was loaded from and stored elsewhere, you can’t
> find those places that store to that alloca from somewhere else (at least
> not easily that I can see).
>
>
>
> So in my particular example, I could catch the case of only load -> store.
> However, there are other stores that use the alloca address, but it is not
> readily apparent if they come directly & only from a load.
>
>
>
> Still trying to figure out the best way forward.
>
>
>
> Daniel
>
>
>
> *From:* Pete Cooper [mailto:peter_cooper at apple.com
> <peter_cooper at apple.com>]
> *Sent:* Tuesday, April 21, 2015 2:00 PM
> *To:* Daniel Stewart
> *Cc:* LLVM Developers Mailing List; Chandler Carruth
> *Subject:* Re: [LLVMdev] RFC: Missing canonicalization in LLVM
>
>
>
> Hi Daniel
>
>
>
> Thanks for the excellent breakdown of whats going on here.
>
>
>
> Earlier in the thread on this I made this comment:
>
>
>
> "The first thing that springs to mind is that I don’t trust the backend to
> get this right.  I don’t think it will understand when an i32 load/store
> would have been preferable to a float one or vice versa.  I have no
> evidence of this, but given how strongly typed tablegen is, I don’t think
> it can make a good choice here.
>
>
>
> So I think we probably need to teach the backend how to undo whatever
> canonical form we choose if it has a reason to”
>
>
>
> Without seeing the machine instructions, its hard to be 100% certain, but
> the case you’ve found may be simple enough that the backend can actually
> fix this. However, such a fixup would be quite target specific (such a
> target would need different register classes for integers and doubles in
> this case), and we’d need such a pass for all targets which isn’t ideal.
>
>
>
> So i wouldn’t rule out a backend solution, but i have a preference for
> your suggestion to improve SROA.
>
>
>
> In this particular case, it makes sense for SROA to do effectively the
> same analysis InstCombine did here and work out when a load is just raw
> data vs when its data is used as a specific type.  The relevant piece of
> InstCombine is this:
>
>
>
> // Try to canonicalize loads which are only ever stored to operate over
>
>   // integers instead of any other type. We only do this when the loaded
> type
>
>   // is sized and has a size exactly the same as its store size and the
> store
>
>   // size is a legal integer type.
>
>   if (!Ty->isIntegerTy() && Ty->isSized() &&
>
>       DL.isLegalInteger(DL.getTypeStoreSizeInBits(Ty)) &&
>
>       DL.getTypeStoreSizeInBits(Ty) == DL.getTypeSizeInBits(Ty)) {
>
>     if (std::all_of(LI.user_begin(), LI.user_end(), [&LI](User *U) {
>
>           auto *SI = dyn_cast<StoreInst>(U);
>
>           return SI && SI->getPointerOperand() != &LI;
>
>         })) {
>
> ...
>
>
>
> After ignoring load/stores which satisfy something like the above code,
> you can always fallback to the current code of choosing an integer type, so
> in the common case there won’t be any behavior difference.
>
>
>
> Cheers
>
> Pete
>
> On Apr 21, 2015, at 9:18 AM, Daniel Stewart <stewartd at codeaurora.org>
> wrote:
>
>
>
> So this change did indeed have an effect! J
>
>
>
> I’m seeing regressions in a number of benchmarks mainly due to a host of
> extra bitcasts that get introduced. Here’s the problem I’m seeing in a
> nutshell:
>
>
>
> 1)      There is a Phi with input type double
>
> 2)      Polly demotes the phi into a load/store of type double
>
> 3)      InstCombine canonicalizes the load/store to use i64 instead of
> double
>
> 4)      SROA removes the load/store & inserts a phi back in, using i64 as
> the type. Inserts bitcast to get to double.
>
> 5)      The bitcast sticks around and eventually get translated into FMOVs
> (for AArch64 at least).
>
>
>
> The function findCommonType() in SROA.cpp is used to obtain the type that
> should be used for the new alloca that SROA wants to create. It’s decision
> process is essentially – if all loads/stores of alloca are the same, use
> that type; else use the corresponding integer type. This causes bitcasts to
> be inserted in a number of places, most all of which stick around.
>
>
>
> I’ve copied a reduced version of an instance of the problem below. I’m
> looking for comments on what others think is the right solution here. Make
> SROA more intelligent about picking the type?
>
>
>
> The code is below with all unnecessary code removed for easy consumption.
>
>
>
> Daniel
>
> *Before **Polly – Prepare code for polly* *we have code that looks like:*
>
>
>
> while.cond473:                                    ; preds =
> %while.cond473.outer78, %while.body475
>
>   *%p_j_x452.0* = phi double [ *%105*, %while.body475 ], [
> %p_j_x452.0.ph82, %while.cond473.outer78 ]
>
>
>
> while.body475:                                    ; preds = %while.cond473
>
>   %sub480 = fsub fast double %64, *%p_j_x452.0*
>
>   *%105* = load double* %x485, align 8, !tbaa !25
>
>
>
> *After **Polly – Prepare code for polly* *we have:*
>
>
>
> while.cond473:                                    ; preds =
> %while.cond473.outer78, %while.body475
>
>   *%p_j_x452.0.reload* = load double* %p_j_x452.0.reg2mem
>
>
>
> while.body475:                                    ; preds = %while.cond473
>
>   %sub480 = fsub fast double %64, *%p_j_x452.0.reload*
>
>   *%110* = load double* %x485, align 8, !tbaa !25
>
>   store double *%110*, double* %p_j_x452.0.reg2mem
>
>
>
> *After **Combine redundant instructions* *:*
>
>
>
> while.cond473:                                    ; preds =
> %while.cond473.outer78, %while.body475
>
>   *%p_j_x452.0.reload* = load double* %p_j_x452.0.reg2mem, align 8
>
>
>
> while.body475:                                    ; preds = %while.cond473
>
>   %sub480 = fsub fast double %74, *%p_j_x452.0.reload*
>
>   %x485 = getelementptr inbounds %struct.CompAtom* %15, i64 %idxprom482,
> i32 0, i32 0
>
>   %194 = bitcast double* %x485 to i64*
>
>   *%195* = load i64* %194, align 8, !tbaa !25
>
>   %200 = bitcast double* %p_j_x452.0.reg2mem to i64*
>
>   store i64 *%195*, i64* %200, align 8
>
>
>
> *After **SROA* *:*
>
>
>
> while.cond473:                                    ; preds =
> %while.cond473.outer78, %while.body475
>
>   %p_j_x452.0.reg2mem.sroa.0.0.p_j_x452.0.reload362 = phi i64 [
> %p_j_x452.0.ph73.reg2mem.sroa.0.0.load368, %while.cond473.outer78 ], [
> *%178*, %while.body475 ]
>
>   *%173* = bitcast i64 %p_j_x452.0.reg2mem.sroa.0.0.p_j_x452.0.reload362
> to double
>
>
>
> while.body475:                                    ; preds = %while.cond473
>
>   %sub480 = fsub fast double %78, *%173*
>
>   %x485 = getelementptr inbounds %struct.CompAtom* %15, i64 %idxprom482,
> i32 0, i32 0
>
>   %177 = bitcast double* %x485 to i64*
>
>   *%178* = load i64* %177, align 8, !tbaa !25
>
>
>
>
>
> *From:* llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu
> <llvmdev-bounces at cs.uiuc.edu>] *On Behalf Of *Chandler Carruth
> *Sent:* Wednesday, January 21, 2015 8:32 PM
> *To:* Pete Cooper
> *Cc:* LLVM Developers Mailing List
> *Subject:* Re: [LLVMdev] RFC: Missing canonicalization in LLVM
>
>
>
>
>
> On Wed, Jan 21, 2015 at 3:06 PM, Pete Cooper <peter_cooper at apple.com>
> wrote:
>
> Sounds good to me.  Integers it is then.
>
>
> FYI, thanks, I'm just going to commit this then. It seems we're all in
> essential agreement. We can revert it and take a more cautious approach if
> something terrible happens. =]
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150423/e2985069/attachment.html>