[llvm-dev] [tablegen] table readability / performance

Tue Jan 14 04:59:33 PST 2020

Hello

I've been looking at the tables generated by
`SequenceToOffsetTable::emit`, and notice that when the generated data
are strings, the data is basically un-grep-able, and very tricky to
read, as they are emitted as an array of comma-separated char-literal:

    extern const char HexagonInstrNameData[] = {
      /* 0 */ 'G', '_', 'F', 'L', 'O', 'G', '1', '0', 0,
      /* 9 */ 'E', 'N', 'D', 'L', 'O', 'O', 'P', '0', 0,
      /* 18 */ 'V', '6', '_', 'v', 'd', 'd', '0', 0,
      /* 26 */ 'P', 'S', '_', 'v', 'd', 'd', '0', 0,
      /* 34 */ 'V', '6', '_', 'l', 'd', '0', 0,
      /* 41 */ 'V', '6', '_', 'z', 'l', 'd', '0', 0,
      [...]
    };

As far as I can see, this makes it more difficult than necessary to read
for at least the following cases:

    Target AsmStrs
    Target InstrNameData
    Target RegStrings
    Target RegClassStrings

I hacked together a fix for the above cases locally, and found that for
at least for clang and gcc, the compile-time for generated tables is
significantly reduced when emitting string literals, and the user can
grep the name tables without huge effort. The above table is now:

    extern const char HexagonInstrNameData[] = {
      /* 0 */ "G_FLOG10\0"
      /* 9 */ "ENDLOOP0\0"
      /* 18 */ "V6_vdd0\0"
      /* 26 */ "PS_vdd0\0"
      /* 34 */ "V6_ld0\0"
      /* 41 */ "V6_zld0\0"
      [...]
    };

My question then is: Is there a specific technical reason that we should
avoid emitting concatenated string literals rather array of
comma-separated char literals for "string-like" data?

If not, I can probably post a patch, which I feel will make it much
easier to understand the output from tablegen, and helps compilation
speed of generated tables.

Any thoughts appreciated.

All the Best

Luke

-- 
Codeplay Software Ltd.
Company registered in England and Wales, number: 04567874
Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF