jlebar added a comment. OK, this is now working, please have a look. I'm not sure if it's possible to write a test as-is, but I have a test for my mistake in my WIP CUDA patch. (Also this mistake is much harder to make after http://reviews.llvm.org/D16013.) http://reviews.llvm.org/D15960