[PATCH] [InstCombine] Combine adjacent i8 loads.

Andrew Trick atrick at apple.com
Thu May 1 22:08:11 PDT 2014


I agree with Chandler that this should only be done once, late in the pipeline (post GVN). I am also concerned that if this runs before SLP vectorizer it will interfere with it. I'd like to get Arnold's comments on this.

Load combining should probably be done in a single pass over the block. First collect all the offsets, then sort, then look for pairs. See the LoadClustering stuff in MachineScheduler. Your RangeLimit skirts around this problem, but I don't think the arbitrary threshold is necessary.

Doing this per basic block is ok. Although there's no reason you can't do it almost as easily on an extended basic block (side exits ok, no merges). Chandler said do it on the domtree, but handling CFG merges would be too complicated and expensive.

Did you forget to check for Invokes?

Conceptually this is doing SLP vectorization, but it doesn't fit with our SLP algortihm which first finds a vectorizable use-def tree. Sticking it in GVN is another option, but again I'm concerned about it running before SLP. Maybe it can run in the SLP pass after the main algorithm.

http://reviews.llvm.org/D3580






More information about the llvm-commits mailing list