[llvm-dev] tablegen exponential behavior
Sebastian Pop via llvm-dev
llvm-dev at lists.llvm.org
Mon Sep 9 21:07:52 PDT 2019
Hi,
I implemented a pattern matching of the dot product for arm64
and it seemed to work well for the basic case, i.e.,
class mulB<SDPatternOperator ldop> :
PatFrag<(ops node:$Rn, node:$Rm, node:$offset),
(mul (ldop (add node:$Rn, node:$offset)),
(ldop (add node:$Rm, node:$offset)))>;
class mulBz<SDPatternOperator ldop> :
PatFrag<(ops node:$Rn, node:$Rm),
(mul (ldop node:$Rn), (ldop node:$Rm))>;
class DotProductI32<Instruction DOT, SDPatternOperator ldop> :
Pat<(i32 (add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 3)),
(add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 2)),
(add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 1)),
(mulBz<ldop> GPR64sp:$Rn, GPR64sp:$Rm))))),
(EXTRACT_SUBREG
(i64 (DOT (DUPv2i32gpr WZR),
(v8i8 (LD1Onev8b GPR64sp:$Rn)),
(v8i8 (LD1Onev8b GPR64sp:$Rm)))),
sub_32)>, Requires<[HasDotProd]>;
def : DotProductI32<SDOTv8i8, sextloadi8>;
def : DotProductI32<UDOTv8i8, zextloadi8>;
Then when I extended it to 8 element vectors, the time spent by tblgen exploded:
from under 7 seconds (on A-72) on the AArch64 td files and the above patch
to more than half an hour when I decided to terminate the processes.
Here are the additional def'pats that produce the exponential behavior:
def VADDV_32 : OutPatFrag<(ops node:$R), (ADDPv2i32 node:$R, node:$R)>;
class DotProduct2I32<Instruction DOT, SDPatternOperator ldop> :
Pat<(i32 (add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 7)),
(add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 6)),
(add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 5)),
(add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 4)),
(add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 3)),
(add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 2)),
(add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 1)),
(mulBz<ldop> GPR64sp:$Rn, GPR64sp:$Rm))))))))),
(EXTRACT_SUBREG
(VADDV_32
(i64 (DOT (DUPv2i32gpr WZR),
(v8i8 (LD1Onev8b GPR64sp:$Rn)),
(v8i8 (LD1Onev8b GPR64sp:$Rm))))),
sub_32)>, Requires<[HasDotProd]>;
def : DotProduct2I32<SDOTv8i8, sextloadi8>;
def : DotProduct2I32<UDOTv8i8, zextloadi8>;
linux-perf profile for the first minute executing llvm-tblgen shows
that most of the time is spent in isIsomorphicTo:
28.25% llvm-tblgen llvm-tblgen [.]
llvm::TreePatternNode::isIsomorphicTo
21.62% llvm-tblgen llvm-tblgen [.]
llvm::TypeSetByHwMode::operator==
15.25% llvm-tblgen libc-2.27.so [.] memcmp
14.61% llvm-tblgen llvm-tblgen [.]
std::__shared_ptr<llvm::TreePatternNode,
(__gnu_cxx::_Lock_policy)2>::__shared_ptr
In call-graph mode `perf -g` points to GenerateVariants that generates
most of the calls to isIsomorphicTo:
+ 100.00% 0.00% llvm-tblgen llvm-tblgen [.] main
+ 100.00% 0.00% llvm-tblgen llvm-tblgen [.] llvm::TableGenMain
+ 99.85% 0.00% llvm-tblgen llvm-tblgen [.] (anonymous
namespace)::LLVMTableGenMain
+ 99.85% 0.00% llvm-tblgen llvm-tblgen [.] llvm::EmitDAGISel
+ 99.85% 0.00% llvm-tblgen llvm-tblgen [.]
llvm::CodeGenDAGPatterns::CodeGenDAGPatterns
+ 99.46% 98.01% llvm-tblgen llvm-tblgen [.]
llvm::CodeGenDAGPatterns::GenerateVariants
0.38% 0.00% llvm-tblgen llvm-tblgen [.] GenerateVariantsOf
Sebastian
More information about the llvm-dev
mailing list