[PATCH] D29088: Do not create ctlz/cttz(X, false) when the target do not support zero defined ctlz/cttz.
Mehdi AMINI via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 26 08:35:43 PST 2017
mehdi_amini added a comment.
> Before CodeGenPrepare, we would end up with code like this:
>
> define i32 @_Z3fooi(i32 %n) local_unnamed_addr #0 {
> entry:
> %cmp = icmp eq i32 %n, -1
> br i1 %cmp, label %cleanup, label %if.end
>
> if.end: ; preds = %entry
> %neg = xor i32 %n, -1
> %0 = tail call i32 @llvm.ctlz.i32(i32 %neg, i1 false) #2 ;; <--- suboptimal flag!.
> br label %cleanup
>
> cleanup: ; preds = %entry, %if.end
> %retval.0 = phi i32 [ %0, %if.end ], [ 32, %entry ]
> ret i32 %retval.0
> }
Can you clarify why the flag is suboptimal by itself? The intrinsic carries the same semantic as the unfolded sequence, isn't it?
This seems to me like just a missing optimization here to recover that at this point: can't we just figure that %neg can't be zero and turn the flag to true?
================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:198
+ /// \brief Return if the target has a way to compute cttz/ctlz that is
+ /// defnied when the argument is zero.
+ bool hasZeroDefinedCtlz() const;
----------------
`s/defnied/defined`.
================
Comment at: test/Transforms/InstCombine/select-cmp-cttz-ctlz.ll:3
+; RUN: opt -S -instcombine -mattr=+bmi < %s | FileCheck %s --check-prefix=TZDEF
+; RUN: opt -S -instcombine -mattr=+lzcnt < %s | FileCheck %s --check-prefix=LZDEF
+
----------------
Some comment before the two new lines can be nice to have.
Also I suspect this won't work without the X86 backend configured in.
https://reviews.llvm.org/D29088
More information about the llvm-commits
mailing list