[PATCH] Add Load Combine Pass
hfinkel at anl.gov
hfinkel at anl.gov
Wed May 7 09:21:53 PDT 2014
----- Original Message -----
> From: "Andrew Trick" <atrick at apple.com>
> To: bigcheesegs at gmail.com, atrick at apple.com, chandlerc at gmail.com
> Cc: hfinkel at anl.gov, nrotem at apple.com, aschwaighofer at apple.com, rsilvera at google.com, llvm-commits at cs.uiuc.edu
> Sent: Wednesday, May 7, 2014 11:07:24 AM
> Subject: Re: [PATCH] Add Load Combine Pass
> It looks like you first sort the loads, then use load as the
> insertion point. How does this work if the loads are striding
> backward through memory with uses in between?
> Since you are effectively hoisting loads, I think you should check
> mayThrow(), not just mayWriteMemory(). In the future, we will want
> to support read-only calls that mayThrow. (This would allow
> redundant load elimination across the calls--important for runtime
> safety checks.)
> Test cases tend to be more effective with CHECK-LABEL on the name of
> each subtest.
> Otherwise LGTM after addressing Hal's comments. I'm not sure what Hal
> meant by checking alignment.
1. In the regression test, the CHECK line should check the alignment of the generated load.
2. I had proposed that we only perform the transformation at all when TLI->allowsUnalignedMemoryAccesses returns true and sets *fast to true. When this function returns false, then SDAG will split the load (so Andy's experiment is possible). Also, I agree with Andy that if benchmark data shows the code quality from combining+SDAG-splitting to be better than that from doing nothing, then by all means do the former. Without benchmark data (from multiple platforms, preferably), showing that to be the case, I'd prefer the conservative approach I detailed.
> The new load inherits the alignment of
> the first aggregated load. We'll end up with an unaligned load as
> far as the compiler can tell, which is often not optimal. I'm not
> sure whether combining, then splitting in SelectionDAG will produce
> worse code or not without trying it. If the mechanism exists to
> split unaligned loads, can we force that on for x86 and see if we
> generate worse code in our load combining test cases?
More information about the llvm-commits