[all-commits] [llvm/llvm-project] 06f136: [instcombine][x86] Converted pdep/pext with shifte...
Philip Reames via All-commits
all-commits at lists.llvm.org
Fri Sep 18 14:55:28 PDT 2020
Branch: refs/heads/master
Home: https://github.com/llvm/llvm-project
Commit: 06f136f61e6d23fde5c91f7fa0813d0291c17c97
https://github.com/llvm/llvm-project/commit/06f136f61e6d23fde5c91f7fa0813d0291c17c97
Author: Philip Reames <listmail at philipreames.com>
Date: 2020-09-18 (Fri, 18 Sep 2020)
Changed paths:
M llvm/lib/Target/X86/X86InstCombineIntrinsic.cpp
M llvm/test/Transforms/InstCombine/X86/x86-bmi-tbm.ll
Log Message:
-----------
[instcombine][x86] Converted pdep/pext with shifted mask to simple arithmetic
If the mask of a pdep or pext instruction is a shift masked (i.e. one contiguous block of ones) we need at most one and and one shift to represent the operation without the intrinsic. One all platforms I know of, this is faster than the pdep/pext.
The cost modelling for multiple contiguous blocks might be worth exploring in a follow up, but it's not relevant for my current use case. It would almost certainly be a win on AMDs where these are really really slow though.
Differential Revision: https://reviews.llvm.org/D87861
More information about the All-commits
mailing list