[PATCH] D12269: Add a pass to lift aggregate into allocas so SROA can get rid of them.
Amaury SECHET via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 1 00:13:51 PDT 2015
deadalnix added a comment.
What makes you think it is limited to globals ? It is for any load and/or store from memory in general. SROA doesn't touch theses. Personally, the case I'm trying to solve are aggregate access to memory that is freshly allocated. Other people have voiced interest in having these kind of load/store optimized, I can't speak for their reasons, but overall, this seems to be of interest. i used globals in the tests cases as it was easy, but this should not be limited to global, or that would be fairly useless.
Right now, this kind of operation is plain ignored by the optimizer. What I'm trying to do is transform these into something the rest of the pipeline understands and will process nicely. Lifting theses values into allocas create something that is not identical, but similar enough to what clang generate that the rest of the pipeline pick it up and optimize it nicely.
This basically transforms :
%1 = load { i8*, i64 }, { i8*, i64 }* %ptr
%2 = extractvalue { i8*, i64 } %1, 0
%3 = extractvalue { i8*, i64 } %1, 1
Into something like
%1 = alloca { i8*, i64 }
call void @llvm.memcpy(%1, %ptr, sizeof({ i8*, i64 })) ; pseudo code, you get the idea.
%2.lifted = gep { i8*, i64 }, { i8*, i64 }* %1, 0
%2 = load i8, i8* %2.lifted
%3.lifted = gep { i8*, i64 }, { i8*, i64 }* %1, 1
%3 = load i8, i8* %3.lifted
As result, instead of having aggregate loads and stores, you get a memcpy and a set of non aggregate loads and stores. The rest of the pipeline pick up on this nicely and is able to optimize this away.
This has been made in its own pass as to have a POC up and see if it works well (it does, that is the model I got the best results with so far), but I indeed wondering if making SROA do it is not a better idea.
http://reviews.llvm.org/D12269
More information about the llvm-commits
mailing list