[llvm-dev] Musings on the TableGen -emit-dag-isel backend
Craig Topper via llvm-dev
llvm-dev at lists.llvm.org
Fri Nov 13 11:08:52 PST 2020
Can we skip the relaxation step if comments aren't being emitted? Comment
emission is off by default in CMake configuration.
~Craig
On Fri, Nov 13, 2020 at 10:59 AM Owen Anderson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> This is the size of the table, not the size of the overall binary, right?
> I would imagine that a 4% growth in the size of the table is a
> substantially smaller growth in the total executable size of, say, clang.
> If the overall growth is minuscule (say, under 1%), then this seems like
> the clear path forward.
>
> I’m also optimistic that we might be able to find other ways to shrink the
> tables to make up the difference if we need to.
>
> —Owen
>
> On Nov 13, 2020, at 2:45 AM, Jay Foad via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> I wouldn't want to be too hasty about simply removing the relaxation
> algorithm. The size and speed of the compiler affects all users, but the
> time to compile the compiler "only" affects us compiler developers. And I
> speak as a developer who is heavily affected by the time to compile the
> AMDGPU backend.
>
> One off-the-cuff idea (I haven't even looked at the code yet): could we
> pass in a null output stream while doing the relaxation phase, and use that
> to skip most of the actual textual formatting, followed by a final pass
> with a real output stream once the offset has been determined?
>
> Jay.
>
> On Thu, 12 Nov 2020 at 16:43, Krzysztof Parzyszek via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> This is great! Thanks Paul!
>>
>> I think that the 9x reduction in compile-time is well worth the 4% size
>> increase. TableGen's run-time has been a sore point and a source of
>> complaints for quite some time.
>>
>> --
>> Krzysztof Parzyszek kparzysz at quicinc.com AI tools development
>>
>> -----Original Message-----
>> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Paul C.
>> Anagnostopoulos via llvm-dev
>> Sent: Thursday, November 12, 2020 10:23 AM
>> To: llvm-dev at lists.llvm.org
>> Subject: [EXT] [llvm-dev] Musings on the TableGen -emit-dag-isel backend
>>
>> A rather notorious aspect of TableGen is the time required to run the
>> -emit-dag-isel backend on some targets, including AMDGPU and X86. I added a
>> timing feature to TableGen and timed the AMDGPU run.
>>
>>
>> ===-------------------------------------------------------------------------===
>> TableGen Phase Timing
>> ===-------------------------------------------------------------------------===
>> Total Execution Time: 733.6103 seconds (733.8740 wall clock)
>>
>> ---User Time--- --System Time-- --User+System-- ---Wall Time---
>> --- Name ---
>> 645.0017 ( 87.9%) 0.2340 (100.0%) 645.2357 ( 88.0%) 645.2709 (
>> 87.9%) Emit matcher table
>> 70.4501 ( 9.6%) 0.0000 ( 0.0%) 70.4501 ( 9.6%) 70.5510 ( 9.6%)
>> Convert to matchers
>> 14.6329 ( 2.0%) 0.0000 ( 0.0%) 14.6329 ( 2.0%) 14.7638 ( 2.0%)
>> Parse, build records
>> 2.1996 ( 0.3%) 0.0000 ( 0.0%) 2.1996 ( 0.3%) 2.1871 ( 0.3%)
>> Sort patterns
>> 1.0920 ( 0.1%) 0.0000 ( 0.0%) 1.0920 ( 0.1%) 1.0961 ( 0.1%)
>> Optimize matchers
>> 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0050 ( 0.0%)
>> Write output
>> 733.3763 (100.0%) 0.2340 (100.0%) 733.6103 (100.0%) 733.8740
>> (100.0%) Total
>>
>> As you can see, most of the time is spent emitting the C++ code. A simple
>> step toward reducing the time is to use the --omit-comments option.
>> However, I am informed that trying to debug the pattern matching table
>> without comments is a hopeless task.
>>
>>
>> ===-------------------------------------------------------------------------===
>> TableGen Phase Timing
>> ===-------------------------------------------------------------------------===
>> Total Execution Time: 162.9274 seconds (162.9173 wall clock)
>>
>> ---User Time--- --System Time-- --User+System-- ---Wall Time---
>> --- Name ---
>> 75.0833 ( 46.1%) 0.0468 ( 42.9%) 75.1301 ( 46.1%) 75.1313 ( 46.1%)
>> Emit matcher table
>> 69.7948 ( 42.9%) 0.0000 ( 0.0%) 69.7948 ( 42.8%) 69.8050 ( 42.8%)
>> Convert to matchers
>> 14.6173 ( 9.0%) 0.0468 ( 42.9%) 14.6641 ( 9.0%) 14.6668 ( 9.0%)
>> Parse, build records
>> 2.2308 ( 1.4%) 0.0000 ( 0.0%) 2.2308 ( 1.4%) 2.2191 ( 1.4%)
>> Sort patterns
>> 1.0920 ( 0.7%) 0.0000 ( 0.0%) 1.0920 ( 0.7%) 1.0921 ( 0.7%)
>> Optimize matchers
>> 0.0000 ( 0.0%) 0.0156 ( 14.3%) 0.0156 ( 0.0%) 0.0030 ( 0.0%)
>> Write output
>> 162.8182 (100.0%) 0.1092 (100.0%) 162.9274 (100.0%) 162.9173
>> (100.0%) Total
>>
>> Emitting the C++ code for most pattern operators is straightforward.
>> However, three operators are more time-consuming: Matcher::Scope,
>> SwitchOpcode, and SwitchType. These operators take a list of child
>> patterns, each of which is emitted as a 1- to 3-byte size followed by the
>> child's pattern bytes. The size is coded as a variable-length sequence of
>> bytes, as is every integer in the matcher table. (Just for interest, the
>> average number of children per Scope operator is about 2.7.)
>>
>> In order to minimize the length of the size, the backend performs a sort
>> of relaxation algorithm, where it first tries a 1-byte size. If that fails,
>> it tries the actual required number of bytes. Trying involves calculating
>> the offset in the table for the child and then recursively generating the
>> entire child, passing it a string buffer as its output stream. When the
>> size is finally determined, that string buffer is appended to the actual
>> buffer passed to the generation function.
>>
>> Why does the number of bytes in the size matter to the child? Because the
>> offset of the child is determined by that size, and the offset is passed to
>> the child and then included in many comments emitted by the child. If the
>> size is wrong, then the offset is wrong, and then the comments are wrong.
>>
>> So it's clear that repetitive code generating is done because of the
>> relaxation algorithm. How bad is it? It turns out that the algorithm is
>> about O(1.7^(n-1)), where n is the depth of the pattern matching tree. The
>> depth of the AMDGPU tree is 13. Here are some interesting statistics:
>>
>> Number of pattern operators at depth 11: 35,000
>> Number of times those operators are regenerated: 20 million
>>
>> I think we have found the problem. So what can be done? I tried a quick
>> and dirty experiment. I forced all the child sizes to occupy a 3-byte
>> length, so that the relaxation algorithm was no longer necesseary. The
>> results are shown here.
>>
>>
>> ===-------------------------------------------------------------------------===
>> TableGen Phase Timing
>> ===-------------------------------------------------------------------------===
>> Total Execution Time: 90.7302 seconds (90.8242 wall clock)
>>
>> ---User Time--- --System Time-- --User+System-- ---Wall Time---
>> --- Name ---
>> 69.7324 ( 76.9%) 0.0000 ( 0.0%) 69.7324 ( 76.9%) 69.7320 ( 76.8%)
>> Convert to matchers
>> 14.5705 ( 16.1%) 0.0312 ( 33.3%) 14.6017 ( 16.1%) 14.6958 ( 16.2%)
>> Parse, build records
>> 3.0576 ( 3.4%) 0.0624 ( 66.7%) 3.1200 ( 3.4%) 3.1192 ( 3.4%)
>> Emit matcher table
>> 2.1840 ( 2.4%) 0.0000 ( 0.0%) 2.1840 ( 2.4%) 2.1891 ( 2.4%)
>> Sort patterns
>> 1.0920 ( 1.2%) 0.0000 ( 0.0%) 1.0920 ( 1.2%) 1.0831 ( 1.2%)
>> Optimize matchers
>> 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0050 ( 0.0%)
>> Write output
>> 90.6366 (100.0%) 0.0936 (100.0%) 90.7302 (100.0%) 90.8242 (100.0%)
>> Total
>>
>> Now the matcher emitter phase is insignificant! Unfortunately, the size
>> of the matchter table increases from 451,430 bytes to 469,612 bytes, an
>> increase of 18,182 bytes or 4%.
>>
>> So one solution is not to care about the 4% increase and always use
>> 3-byte child sizes. A second solution is to add an option that specifies
>> the starting number of child size bytes and retains the relaxation
>> algorithm.
>> When building the system, we would specify --child-size-bytes=1. When
>> building for debugging, you could specify --child-size-bytes=3.
>>
>> Comments and suggestion gratefully accepted.
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201113/f6fd17b5/attachment.html>
More information about the llvm-dev
mailing list