joerg added a comment. Yeah, but even the generic expansion results in ~19 instructions on ARMv4. Compare that to one instruction in the loop and it can hardly be said to be a general win. Repository: rL LLVM https://reviews.llvm.org/D32605