[PATCH] Reassociate GEP operands for loop invariant code motion
jingyue at google.com
Mon Apr 20 21:11:53 PDT 2015
One reason we couldn't simply leverage LSR is that LSR forgets nsw and would miss cases such as `gep input, sext(a +nsw i)` in `simple_licm`. Cases where indices are 32-bit and pointers are 64-bit are pretty common for GPU programs, because
1. most NVIDIA and AMD GPUs natively support only i32, and
2. the host side that drives GPU programs usually runs on 64-bit CPUs, and communicates (e.g. via unified memory) with GPU via 64-bit pointers.
Indvar widening alleviates this nsw issue for lots of architectures. However, it's not a good option for GPU programs again because most GPUs support only i32 natively. If LSR fails to simplify the loop, then indvar widening can negatively affect performance (https://llvm.org/bugs/show_bug.cgi?id=21148) because 64-bit arithmetic is much more expensive than 32-bit. P.S. maybe we can narrow an induction variable back to its original size on LSR failure?
Mark and Wei seem to observe significant speedup by applying this pass to X86 where I believe indvar widening is on. What's the story there? Why didn't indvar widening + LSR hoist loop invariants?
More information about the llvm-commits