[llvm] [X86] Add APX imulzu support. (PR #116806)
Daniel Zabawa via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 20 08:18:08 PST 2024
================
@@ -2184,17 +2184,43 @@ multiclass EFLAGSDefiningPats<string suffix, Predicate p> {
defm : EFLAGSDefiningPats<"", NoNDD>;
defm : EFLAGSDefiningPats<"_ND", HasNDD>;
+let Predicates = [HasZU] in {
+ // zext (mul reg/mem, imm) -> imulzu
+ def : Pat<(i32 (zext (i16 (mul GR16:$src1, imm:$src2)))),
+ (SUBREG_TO_REG (i32 0), (IMULZU16rri GR16:$src1, imm:$src2), sub_16bit)>;
+ def : Pat<(i32 (zext (i16 (mul (loadi16 addr:$src1), imm:$src2)))),
+ (SUBREG_TO_REG (i32 0), (IMULZU16rmi addr:$src1, imm:$src2), sub_16bit)>;
+ def : Pat<(i64 (zext (i16 (mul GR16:$src1, imm:$src2)))),
+ (SUBREG_TO_REG (i64 0), (IMULZU16rri GR16:$src1, imm:$src2), sub_16bit)>;
+ def : Pat<(i64 (zext (i16 (mul (loadi16 addr:$src1), imm:$src2)))),
+ (SUBREG_TO_REG (i64 0), (IMULZU16rmi addr:$src1, imm:$src2), sub_16bit)>;
+
+ // (mul (reg/mem), imm) -> imulzu
+ // Note this pattern doesn't explicitly require the zero-upper behaviour of imulzu,
+ // but instead avoids the zero-extend of the reg/mem operand that would be
+ // required if the multiply were promoted to 32b to avoid partial-write stalls.
+ // The imulzu here simply doesn't incur any partial-write stalls.
+ def : Pat<(mul GR16:$src1, imm:$src2),
+ (IMULZU16rri GR16:$src1, imm:$src2)>;
+ def : Pat<(mul (loadi16 addr:$src1), imm:$src2),
+ (IMULZU16rmi addr:$src1, imm:$src2)>;
+}
+
// mul reg, imm
-def : Pat<(mul GR16:$src1, imm:$src2),
- (IMUL16rri GR16:$src1, imm:$src2)>;
+let Predicates = [NoZU] in {
----------------
daniel-zabawa wrote:
Removing this gives a tablegen error. It may be possible to move things around to avoid it, but would require the IMULZU patterns to be broken up so that the zero-extends precede the generic patterns.
https://github.com/llvm/llvm-project/pull/116806
More information about the llvm-commits
mailing list