[PATCH] D29088: Do not create ctlz/cttz(X, false) when the target do not support zero defined ctlz/cttz.
Andrea Di Biagio via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 25 03:10:37 PST 2017
andreadb added a comment.
In https://reviews.llvm.org/D29088#655338, @deadalnix wrote:
> @andreadb it hides the branch many of the subsequent optimization passes, resulting in bad codegen. I stumbled on bad codegen from cttz/ctlz several time recently. Thing that I noticed are : doing the 0 case check several time, failure to constant fold the 0 case when there is one, etc...
To me, bad codegen has a different meaning, which is why I was concerned by your initial post and the lack of specific tests for the problematic cases. I now understand that you meant what I would call poor codegen (as a synonim of suboptimal).
> It doesn't make sense to re-implement all of this in CodeGenprepare nor does it to special case it in the whole pipeline. Canonicalisation should help subsequent passes, not hide information to them, which is what this is doing.
As far as I remember, this combine rule was originally added to specifically target `[cttz|ctlz]+select` pairs introduced by SimplifyCFG as the result of aggressively flattening simple if-then CFGs. Most (if not all) of those simple if-then constructs originated from conditional statements in x86 bmi/lzcnt intrinsic definitions. SimplifyCFG was originally modified to enable a more aggressive flattening of simple if-then branches. However, there was a concern with the performance of cttz/ctlz+select for targets witn no tzcnt/bmi. So Sanjay implemented a "despeculation" logic in CodeGenPrepare (http://llvm.org/viewvc/llvm-project?view=revision&revision=253573) to revert the CFG flattening done by SimplifyCFG (only for targets which don't provide a cheap cttz/ctlz - see also: llvm.org/viewvc/llvm-project?rev=255660&view=rev).
If there is a problem, then it is very likely to be caused by a bad interaction between these three entities:
- SimplifyCFG (which aggressively flattens simple if-then blocks even in the presence of one expensive cttz/ctlz).
- InstCombine (which canonicalizes a cttz/ctlz +select into a single cttz/ctlz [is_zero_undef=0]).
- CodeGenPrepare (which is able to despeculate a cttz/ctlz [is_zero_undef=0] by undoing the flattening of the CFG done by SimplifyCFG).
I am not suggesting that the current design is optimal. However, as Hal pointed out, there is a trade off here. I am not convinced that InstCombine is "hiding information". FWIW, the "branch" is already hidden by the speculation/cfg flattening performed by SimplifyCFG.
I think that we really need to see some code samples to have a better understand of what is going wrong. There may be better places where to fix your issue.
Cheers,
Andrea
https://reviews.llvm.org/D29088
More information about the llvm-commits
mailing list