[llvm-dev] tablegen exponential behavior

Sebastian Pop via llvm-dev llvm-dev at lists.llvm.org
Mon Sep 9 21:07:52 PDT 2019


Hi,
I implemented a pattern matching of the dot product for arm64
and it seemed to work well for the basic case, i.e.,

class mulB<SDPatternOperator ldop> :
  PatFrag<(ops node:$Rn, node:$Rm, node:$offset),
          (mul (ldop (add node:$Rn, node:$offset)),
               (ldop (add node:$Rm, node:$offset)))>;
class mulBz<SDPatternOperator ldop> :
  PatFrag<(ops node:$Rn, node:$Rm),
          (mul (ldop node:$Rn), (ldop node:$Rm))>;

class DotProductI32<Instruction DOT, SDPatternOperator ldop> :
  Pat<(i32 (add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 3)),
           (add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 2)),
           (add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 1)),
                (mulBz<ldop> GPR64sp:$Rn, GPR64sp:$Rm))))),
      (EXTRACT_SUBREG
       (i64 (DOT (DUPv2i32gpr WZR),
                 (v8i8 (LD1Onev8b GPR64sp:$Rn)),
                 (v8i8 (LD1Onev8b GPR64sp:$Rm)))),
       sub_32)>, Requires<[HasDotProd]>;

  def : DotProductI32<SDOTv8i8, sextloadi8>;
  def : DotProductI32<UDOTv8i8, zextloadi8>;

Then when I extended it to 8 element vectors, the time spent by tblgen exploded:
from under 7 seconds (on A-72) on the AArch64 td files and the above patch
to more than half an hour when I decided to terminate the processes.

Here are the additional def'pats that produce the exponential behavior:

def VADDV_32 : OutPatFrag<(ops node:$R), (ADDPv2i32 node:$R, node:$R)>;

class DotProduct2I32<Instruction DOT, SDPatternOperator ldop> :
  Pat<(i32 (add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 7)),
           (add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 6)),
           (add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 5)),
           (add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 4)),
           (add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 3)),
           (add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 2)),
           (add (mulB<ldop> GPR64sp:$Rn, GPR64sp:$Rm, (i64 1)),
                (mulBz<ldop> GPR64sp:$Rn, GPR64sp:$Rm))))))))),
      (EXTRACT_SUBREG
       (VADDV_32
        (i64 (DOT (DUPv2i32gpr WZR),
                  (v8i8 (LD1Onev8b GPR64sp:$Rn)),
                  (v8i8 (LD1Onev8b GPR64sp:$Rm))))),
       sub_32)>, Requires<[HasDotProd]>;

  def : DotProduct2I32<SDOTv8i8, sextloadi8>;
  def : DotProduct2I32<UDOTv8i8, zextloadi8>;

linux-perf profile for the first minute executing llvm-tblgen shows
that most of the time is spent in isIsomorphicTo:

  28.25%  llvm-tblgen  llvm-tblgen          [.]
llvm::TreePatternNode::isIsomorphicTo
  21.62%  llvm-tblgen  llvm-tblgen          [.]
llvm::TypeSetByHwMode::operator==
  15.25%  llvm-tblgen  libc-2.27.so         [.] memcmp
  14.61%  llvm-tblgen  llvm-tblgen          [.]
std::__shared_ptr<llvm::TreePatternNode,
(__gnu_cxx::_Lock_policy)2>::__shared_ptr

In call-graph mode `perf -g` points to GenerateVariants that generates
most of the calls to isIsomorphicTo:

+  100.00%     0.00%  llvm-tblgen  llvm-tblgen          [.] main
+  100.00%     0.00%  llvm-tblgen  llvm-tblgen          [.] llvm::TableGenMain
+   99.85%     0.00%  llvm-tblgen  llvm-tblgen          [.] (anonymous
namespace)::LLVMTableGenMain
+   99.85%     0.00%  llvm-tblgen  llvm-tblgen          [.] llvm::EmitDAGISel
+   99.85%     0.00%  llvm-tblgen  llvm-tblgen          [.]
llvm::CodeGenDAGPatterns::CodeGenDAGPatterns
+   99.46%    98.01%  llvm-tblgen  llvm-tblgen          [.]
llvm::CodeGenDAGPatterns::GenerateVariants
     0.38%     0.00%  llvm-tblgen  llvm-tblgen          [.] GenerateVariantsOf

Sebastian


More information about the llvm-dev mailing list