[LLVMdev] [RFC] LegalizeDAG support for targets without subword load/store instructions
Matt Johnson
johnso87 at crhc.illinois.edu
Fri Jul 15 19:34:56 PDT 2011
Hi All,
Some targets don't provide subword (e.g., i8 and i16 for a 32-bit
machine) load and store instructions, so currently we have to
custom-lower Load- and StoreSDNodes in our backends. For examples, see
LowerLOAD() and LowerSTORE() in {XCore,CellSPU}ISelLowering.cpp. I
believe it's possible to support this lowering in a target-agnostic
fashion in LegalizeDAG.cpp, similar to what is done for
non-naturally-aligned loads and stores using the
allowsUnalignedMemoryAccesses() target hook.
I wanted to see if there was any interest in something like this
for mainline before writing up something more detailed. Here are a few
supporting details for now:
* Several existing machines don't provide loads and stores for every
power-of-2-sized datatype down to i8. For example, Cell's SPUs only
support 16-byte, 16-byte-aligned memory ops (this restriction is found
for other SIMD processors as well), and some GPUs don't support i8 or
i16 loads/stores.
* Even when short memory operations are possible, sometimes they are
implemented in a very conservative way (e.g., trapping to a software
routine) such that, if the compiler can expand the operation statically
and make use of whatever alignment information it does have, it should
do so.
* The current expansion of unaligned loads and stores in LegalizeDAG.cpp
doesn't work for machines that don't support these datatypes. The
reason is that ExpandUnaligned*() splits the X-bit load/store into 2
*independent (parallel in the SelectionDAG, with a TokenFactor "beneath"
them) X/2-bit load/stores. In machines without subword stores, you have
to be careful about the ordering of the constituent operations, such
that two stores don't clobber one another. Here's an example:
Say I have a 32-bit target that only supports i32 loads and stores, and
I have a word of memory at address 0x1000, initialized to 0x0000, with
two adjacent i16's, s1 and s2. I want to write 0x1234 to s1, and 0xABCD
to s2. I thus need to do (pseudocode):
r1 = mem[(0x1000 & ~0x3)] #Load word containing s1
r2 = 0x1234 << 16 #Shift s1 value into place
r3 = r1 & 0x0000FFFF #Mask out s1 bits
r4 = r3 | r2 #OR in s1 value
mem[(0x1000 & ~0x3)] = r4 #Store back word containing new s1 value *****
r5 = mem[(0x1002 & ~0x3)] #Load word containing s2 *****
r6 = 0xABCD #s2 value doesn't need to be shifted
r7 = r5 & 0xFFFF0000 #Mask out s2 bits
r8 = r7 | r6 #OR in s2 value
mem[(0x1002 & ~0x3)] = r8 #Store back word containing new s2 value
If all goes well, the word at mem[0x1000] should read 0x1234ABCD after
we're done.
NOTE: The two starred instructions (the store for s1 and the load for
s2) *must* be executed in that order. Otherwise, the s2
read-modify-write will see the old value of s1, and will clobber it when
it writes back (yielding an incorrect mem[0x1000] value of 0x0000ABCD).
I'm not experienced enough with LLVM to figure out the most precise way
to express this dependence in my lowering function. My current solution
is to mark all loads and stores in these cases as volatile. This is too
heavy-handed for my taste, and disallows reordering loads and stores
that are to completely separate parts of memory, but it works for now.
I think we can do a more precise job here, but I'm not exactly sure how.
Comments, questions, or requests for clarifications are welcome.
Basically, I think we could obviate the logic in CellSPU, XCore, and
future backends, as well as do a better job of optimizing based on
available alignment information, by moving subword load/store lowering
into LegalizeDAG, and adding another target hook along the lines of
allowsUnalignedMemoryAccesses().
I'm interested in working on this and integrating it into mainline if
people think it's worthwhile and not contrary to project goals.
Otherwise, I can hack what I need into my own backend.
Best,
Matt
More information about the llvm-dev
mailing list