[PATCH] D61289: [globalisel] Add G_SEXT_INREG

Thu May 2 14:39:00 PDT 2019

dsanders marked 3 inline comments as done.
dsanders added inline comments.

================
Comment at: llvm/include/llvm/Target/GenericOpcodes.td:40
+// returns true) to allow targets to have some bitwidths legal and others
+// lowered.
+def G_SEXT_INREG : GenericInstruction {
----------------
rovka wrote:
> dsanders wrote:
> > rovka wrote:
> > > This comment really needs to do a better job explaining the difference between G_SEXT and G_SEXT_INREG. It only covers the mechanical differences (i.e. that you have an immediate operand), but it says nothing about why this different opcode exists or where it would come from. Is the fact that the IRTranslator never creates such instructions relevant? Should we mention that it is only a legalization artifact? Targets can already say that G_SEXT for certain bitwidths is legal, why don't we just allow them to say which bitwidths should be lowered (instead of adding a new opcode)?
> > > This comment really needs to do a better job explaining the difference between G_SEXT and G_SEXT_INREG. It only covers the mechanical
> > > differences (i.e. that you have an immediate operand), but it says nothing about why this different opcode exists or where it would come
> > > from.
> > 
> > Ok I can add to that
> > 
> > > Is the fact that the IRTranslator never creates such instructions relevant?
> > 
> > No, who creates it is irrelevant to the operation of the instruction. There's no guarantee that the legalizer won't receive them as input.
> > 
> > In the case of the IRTranslator, the IRTranslator could create it if it wanted but it's a simple 1:1 converter (for the most part) and chooses not to at the moment as there's no LLVM-IR equivalent. Target-specific passes are also free to create them.
> > 
> > > Should we mention that it is only a legalization artifact?
> > 
> > It's (currently only) created by code that deals with legalization artifacts but it's not a legalization artifact itself.
> > 
> > > Targets can already say that G_SEXT for certain bitwidths is legal, why don't we just allow them to say which bitwidths should be lowered (instead of adding a new opcode)?
> > 
> > It becomes important when you start optimizing with GlobalISel. Suppose that ARM's SXTB instruction has a latency of 1 and and LSL/ASR have a latency of 2 and that this includes forwarding paths in the hardware (if any). Having the signextend as a single atom in the MIR becomes useful for emitting the most efficient code since given code like:
> >   int foo(char a) {
> >     return (int)a << 2;
> >   }
> > it's cheaper to emit:
> >   sxtb r0, r1
> >   lsl r0, r0, #2
> >   // 3 cycles
> > than:
> >   lsl r0, r1, #16
> >   asr r0, r0, #16
> >   lsl r0, r0, #2
> >   // 6 cycles
> > even if you can exploit known-bits to emit:
> >   lsl r0, r1, #16
> >   asr r0, r0, #14
> >   // 4 cycles
> > it would still be better to use the sxtb. The latter example also illustrates that optimization can make it hard to recognise sign-extension. It gets harder if you also reduce the strength of instructions (maybe lsl r0, r0, #1 is faster as add r0, r0, r0) and there's plenty of ways to make things even more difficult. Essentially, the more mangling the optimizer does while ignorant of the desirable code, the harder it is to select the optimal code at the end.
> Sorry, but I still don't get it. I understand why you're trying to avoid the shifts, what I don't understand is why adding this new node is the best solution.
> 
> For one thing, the name is not very descriptive. I guess you just copied it from SelectionDAG, where you can actually constrain the source to match the destination. We don't do that here, so it's just confusing (I mean it sounds as if a legal G_SEXT would be going through memory or something).
> 
> Secondly, it looks like what we need is just a way to tell the artifact combiner "don't turn sext into shifts on this target, for these sizes". Why don't we just use G_SEXT's legality for that? I.e. actually use the regular legality actions on G_SEXT directly instead of G_SEXT_INREG, and tell the combiner to not mess with G_SEXT with legal sizes. With G_SEXT_INREG as proposed in this patch, it looks like you're just moving the type legality problem into a value-of-immediate legality problem for which we need new infrastructure.
> 
> I'm probably missing something, so please bear with me :)
> Sorry, but I still don't get it. I understand why you're trying to avoid the shifts, what I don't understand is why adding this new node is the best solution.
> For one thing, the name is not very descriptive. I guess you just copied it from SelectionDAG, where you can actually constrain the source to
> match the destination. We don't do that here, so it's just confusing (I mean it sounds as if a legal G_SEXT would be going through memory or
> something).

We actually do constrain the source and destination types. The constraint is specified here via type0:$dst and type0:$src where the use of the same type-index specifies a type matching constraint. It's tested on line 36 of llvm/test/MachineVerifier/test_g_sext_inreg.mir which is emitted when the types are not equal. We don't really need the message on line 38 as it's triggered by the subset of mismatches where they aren't even the same kind of type but it's somewhat useful to report how the types are different as well as that they're different.

For the naming part of this, I couldn't think of a better name and sticking to SelectionDAG's name had some slight benefits in the sense that someone who knows the distinction in SelectionDAG would also know the distinction here as it's the same. The difference is that G_SEXT makes the container bigger and the newly-created bits are copies of the previous sign bit. With G_SEXT_INREG, the container remains the same size and a specified bit (Size-1) is replicated to all the bits to its left.

As for where you'd use each one, G_SEXT_INREG is useful for cases where you don't want to handle the smaller type. For example, most upstream targets have legal operations for s32 and s64 and widen s1-s31 to s32 as well as s33-s63 to s64. However, they still have to support sign extension from say, s7 to s32 if the input IR had that. One way to achieve that is to use s32 -> G_TRUNC -> s7 -> G_SEXT -> s32. This costs more memory than G_SEXT_INREG (which can add up if you do it a lot, e.g. for code heavily using short or char) but aside from that, it also means that all the register allocation code has to support s7. Similarly, spill/reload/move has to support s7, frame lowering has to support it. Instruction selection has to support it too which is a problem for imported SelectionDAG patterns as they can't describe s7 unless there's a register class with an i7 type which isn't possible as it isn't one of the MVT types. There's probably more but the point is that being able to eliminate some types simplifies the backend. You might think that this sounds like type legalization (and I'd be inclined to agree w.r.t the effect at least but I'd still call it operation legalization as the possible removal of types is a side-effect) but the key difference from SelectionDAG is that GlobalISel itself doesn't mandate it or treat it separately from operation legalization. If a target works in s8 a lot but doesn't really have s8 operations or registers, it can choose to make s8 operations legal anyway and trade more complexity in the backend for (hopefully) better code quality.

> Secondly, it looks like what we need is just a way to tell the artifact combiner "don't turn sext into shifts on this target, for these sizes".
> Why don't we just use G_SEXT's legality for that? I.e. actually use the regular legality actions on G_SEXT directly instead of G_SEXT_INREG,
> and tell the combiner to not mess with G_SEXT with legal sizes. With G_SEXT_INREG as proposed in this patch, it looks like you're just moving
> the type legality problem into a value-of-immediate legality problem for which we need new infrastructure.

We want to eliminate the smaller types _and_ have a sign-extension operation which are mutually exclusive demands at the moment. It's not just about legalization though, it's more about the handling of optimization and instruction selection in all passes from the legalizer onwards (including target specific passes). In the previous example, I showed that hanging on to the knowledge that we had a sign-extension led to the optimal code. How do we hang on to that knowledge for as long as it's useful and only let go of that knowledge when it's beneficial to do so?

Suppose we lowered our sign-extend to a lsl, ashr pair. There is (or rather, will be) lots of combines that know how to transform various shifts into other forms (not all of them shifts). Some use known-bits analysis to prove they're valid, some are much simpler. There's also lots of lowerings that do likewise and lots of other optimizations with various effects on shifts. Each and every one can potentially permanently remove our knowledge that we have a sign-extend operation and force us to use the slower code because we can't reconstruct the desired operation later. So how do we get our sign-extend past the combiners and other optimizers that only want to do their job?

One answer to this is we teach every single one how to recognize a sign-extending-shift-pair and ask the target if it wants us to leave it alone. This gets impractical really quickly. Even assuming we can teach hundreds of optimizations to recognize dozens of conventional sign-extension patterns and all the target specific patterns in a reasonable way, we'd still be burning large amounts of compile-time checking for all the possible ways a sign-extend can be accomplished just to prevent undesirable optimizations from happening.

A better answer is to form a higher-level 'composite' operation to smuggle it past all the combiners and optimizers we don't want to happen. This is what G_SEXT_INREG does. In this approach, it's cheap to determine that the undesirable combines/optimizations shouldn't happen because the opcode isn't the one they want. The downside is that any optimization you do want to happen needs to be taught to recognize the new opcode as well. This is much more managable than the alternative of teaching everything to reject everything they shouldn't change as a list of things they should do grows much slower than the list of things not to do.

To put this in another context that doesn't have the legalizer baggage, consider byte swapping and let's pretend there's no intrinsic so we can only emit a byte swap instruction if we actually recognize a byteswap in the code. It's usually a pretty big win to emit a byte-swap instruction so we want to find as many as possible. Unfortunately, there are lots of ways to write byte swapping code and it's difficult to recognize even without optimizations getting in the way. The chances of still being able to recognize a byteswap after the optimizers have picked at the code are fairly low. Some of the masks, shifts, and ors may have been mangled or disappeared entirely. So when we do find one, we want to make sure it gets to the instruction selector in tact. Much like I described above, we form a composite operation from the masks, shifts, and ors the moment we see a byte-swap pattern (ideally before legalization) and smuggle the byte swap operation through to the instruction selector. If we didn't do that, the dozens of patterns to match would become hundreds by the time we reach isel.

Another context that has the same principles behind it is bit-rotation.

> With G_SEXT_INREG as proposed in this patch, it looks like you're just moving the type legality problem into a value-of-immediate legality
> problem for which we need new infrastructure.

I disagree with this summary as it's too focused on the legalizer. I believe I'm solving an information-preservation problem for the compilation pipeline by preventing the decomposition of a higher-level operation into lower-level components in cases where that information is still useful to optimal codegen

I also disagree w.r.t to the legalizer but I might be being picky here. I would say I'm removing a requirement that all targets with sign-extension instructions which outperform shift-pairs make G_SEXT legal for all possible source types that benefit from that instruction (for AArch64, this means every type from s1 up to s63). In the context of the legalizer, this means being able to promote the source type without changing the operation and thereby making the operation specified by the opcode and immediate rather than the opcode and types. In terms of implementation, this does turn an operation legality problem from being about types to being about value-of-immediate which is pretty close to the way you stated it and is what makes me think I might be being picky. I do think there is a small distinction between the two though as the value-of-immediate bit falls out of excluding the types from the definition of the operation.

> I'm probably missing something, so please bear with me :)

No worries :-)

================
Comment at: llvm/lib/Target/ARM/ARMLegalizerInfo.cpp:87

+  getActionDefinitionsBuilder(G_SEXT_INREG).lower();
+
----------------
rovka wrote:
> Testcase?
This is just to maintain the status quo for ARM. It will be whatever test cases you already had for the lowering of G_SEXT into G_LSL/G_ASHR which appears to be just llvm/test/CodeGen/ARM/GlobalISel/arm-legalize-divmod.mir

================
Comment at: llvm/test/CodeGen/AArch64/GlobalISel/legalize-ext.mir:64
+    ; CHECK-DAG: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
+    ; CHECK-DAG: $w0 = COPY [[DEF]](s32)
     %0:_(s64) = COPY $x0
----------------
rovka wrote:
> dsanders wrote:
> > rovka wrote:
> > > Is the order really irrelevant for all of these? If so, maybe commit just the change from CHECK to CHECK-DAG separately. Personally, I wouldn't mind keeping the CHECK lines so we can see what actually changed with this patch. Ditto for the other tests.
> > The legalizer doesn't provide any guarantees on the order beyond that defs will precede uses. By changing to CHECK-DAG we make the test robust against future changes to the legalizer too. For this patch, the only thing that changed in many cases was the placement of the G_CONSTANT used in the sign-extending shifts
> I looked in more detail and I agree that the order isn't that important. I still think this is an independent change that you can commit before this patch. Keeping it here makes it a bit difficult to spot the tests that are actually relevant for G_SEXT_INREG.
Sure. I can commit it separately.

> Keeping it here makes it a bit difficult to spot the tests that are actually relevant for G_SEXT_INREG.

It's inclusion in this patch indicates that the test was affected by the addition of G_SEXT_INREG. It takes a different code path to the same end result and slightly peturbs the order in the process. FWIW, I think that makes it relevant to G_SEXT_INREG but I don't mind committing the status-quo tests separately.

The two tests that test something other than the maintenance of the status quo are:
  llvm/unittests/CodeGen/GlobalISel/LegalizerHelperTest.cpp 
  llvm/unittests/CodeGen/GlobalISel/PatternMatchTest.cpp
D61290 is the patch that makes G_SEXT_INREG legal for a target and changes the code for that target.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61289/new/

https://reviews.llvm.org/D61289