[llvm-dev] [MC] Tablegen code emitter catch invalid immediate value in assembly instruction

Tue Nov 30 09:46:49 PST 2021

Hello,

I'm working on an issue where the Hexagon Assembler would assemble an
incorrect instruction without emitting an error where GNU Assembler on
the other hand would emit an error for the same case.

llvm-mc:
echo 'memh(r4+#3) = r2;memh(r4+#-5) = r3' | bin/llvm-mc
--mcpu=hexagonv71 -filetype=obj | llvm-objdump -d -

Disassembly of section .text:
00000000 <.text>:
       0:       01 c2 44 a1     a144c201 {      memh(r4+#2) = r2 }
       4:       fd e3 44 a7     a744e3fd {      memh(r4+#-6) = r3 }

GNU as:
a.s:17: Error: low 1 bits of immediate -5 must be zero
a.s:17: Error: invalid instruction `memh(r4+#-5) =r2'
a.s:18: Error: low 1 bits of immediate -1 must be zero
a.s:18: Error: invalid instruction `memh(r4+#-1) =r3'

The above example shows the immediate value #3 and #-5 were changed to
#2 and #-6 due to the fact that the encoding class specifies that the
first bit of the immediate value to be skipped because this instruction
always accesses 2 bytes aligned memory addresses.

The instruction is defined at
lib/Target/Hexagon/HexagonDepInstrInfo.td:9634 where it has a encoding
class of Enc_de0214 which is defined at
lib/Target/Hexagon/HexagonDepInstrFormats.td:3103. Enc_dec0214
specifies the first bit of the immediate value be skipped during
encoding.

This issue could be fixed in Hexagon target dependent portion, but I
wonder if some sort of "fact checking" for this kind of behavior could
be added to tablegen e.g. TableGen/AsmMatcherEmitter.cpp which would
be beneficial to other architectures as well.

Thanks,
Alvin