[PATCH] D68570: Unify the two CRC implementations

Tue Oct 8 10:17:19 PDT 2019

rupprecht added a comment.

In D68570#1699204 <https://reviews.llvm.org/D68570#1699204>, @hans wrote:

> In D68570#1697872 <https://reviews.llvm.org/D68570#1697872>, @rupprecht wrote:
>
> > Do you have any benchmarks?
>
>
> Of what? That generating the table is slower than just using it directly? I'd say no benchmark is needed to conclude that :-)

There is also the cost of code complexity to consider. Sure, the table generation goes away, but does that even show up on any benchmark/real world usage of this? Nobody's exercising the table generation code. People are either check-summing once (in which case, it doesn't matter as much if it's slow, because it happens once) or check-summing 1000 times (in which case the one-time table generation is probably not the CRC bottleneck). If there's no performance win, is it worth the potential code complexity of an opaque hex array over constructing it with an algorithm that can be inspected?

FWIW I'm still in favor of this patch, and the objcopy part that Herald added me for LGTM. Addressing my comments would help me understand this code but feel free to ignore if I'm the only one that feels this way.

================
Comment at: llvm/lib/Support/CRC.cpp:26

-uint32_t llvm::crc32(uint32_t CRC, StringRef S) {
-  static llvm::once_flag InitFlag;
-  static CRC32Table Tbl;
-  llvm::call_once(InitFlag, initCRC32Table, &Tbl);
+static const uint32_t CRCTable[256] = {
+    0x00000000, 0x77073096, 0xee0e612c, 0x990951ba,
----------------
hans wrote:
> hiraditya wrote:
> > rupprecht wrote:
> > > Can you leave a comment how this table was generated/how it could be regenerated if needed in the future? And/or a unit test to assert the values are correct?
> > +1
> I'm adding a comment with references for how the algorithm and the table works. There is already a unit test in llvm/unittests/Support/CRCTest.cpp
>From what I can tell:
- This algorithm makes use of one byte from this table per byte in the input array
- The table has 256 entries
- The test only includes "The quick brown fox jumps over the lazy dog" and "123456789" (43 total values if I'm counting correctly)
There's no way that test sufficiently tests that all 256 values here are correct. Maybe check summing a very large (>100k) buffer would give confidence that -- probably -- all values are being used.

I think the unit test is sufficient for testing a generated table because it's hard to mess up the generation in a way that only affects some values. However with a hard-coded table, it's much more likely that a single value here could be wrong due to mechanical issues (e.g. accidentally changing a character in the process of formatting it).

================
Comment at: llvm/lib/Support/CRC.cpp:29-37
-  for (size_t I = 0; I < Tbl->size(); ++I) {
-    uint32_t V = Shuffle(I);
-    V = Shuffle(V);
-    V = Shuffle(V);
-    V = Shuffle(V);
-    V = Shuffle(V);
-    V = Shuffle(V);
----------------
I'd personally find keeping this as a comment to be more useful in understanding how the table is generated than digging up papers

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68570/new/

https://reviews.llvm.org/D68570