[PATCH] D52286: [Intrinsic] Signed Saturation Intirnsic

Fri Sep 21 02:06:35 PDT 2018

ebevhan added a comment.

> So if I understand this correctly, we can handle this intrinsic in 2 ways:
> 
> 1. The intrinsic gets converted to IR in a pass before touching the DAG.
> 2. The intrinsic gets passed to the DAG and gets legalized and lowered to machine code there.
> 
> It seems that from the thread, the consensus was for going with (2), but I also see now how things would be much easier just having this run through a pass in (1). If I recall correctly, the main reason for offsetting the intrinsic "expansion" to the DAG was to determine if the intrinsic was supported by the hardware during the legalization steps. If so, couldn't this also be done in a pass when converting the intrinsic to IR in (1)? That is, we determine early in the pass if this intrinsic call is supported by this target. Those that aren't legal get expanded into IR whereas those that are legal get passed to DAG instruction selection.
> 
> I think this might avoid the trouble of having to check types and operations in the DAG, and ISel can choose whatever appropriate instructions for whatever supported intrinsic (if that makes sense).

Yes, exactly. Typically legalization occurs on the isel level, but I think it's a lot easier to do some fixed-point operations on a higher level instead.

Though, I think this mostly applies to the operations that must be performed in a wider type, like multiplication and division. For operations like saturation, and saturated addition/subtraction, it's probably not as important since saturation can be done in the operation type just fine, and addition/subtraction hopefully have the overflow operations. It feels a bit odd to do some in DAG and others in IR, though... but if all of the DAG operations are also the ones that work on integer values as well, maybe that works fine?

================
Comment at: include/llvm/CodeGen/ISDOpcodes.h:264
+    /// signed value is returned instead.
+    SSAT,
+
----------------
leonardchan wrote:
> ebevhan wrote:
> > With the way the rest is written, it doesn't seem like this DAG opcode is ever used.
> > 
> > The way you would normally do this in DAG would be to:
> > * Lower the intrinsic to this node in SelectionDAGBuilder, regardless of type or operation legality
> > * Deal with illegally typed nodes in LegalizeTypes
> > * Deal with illegal opcodes in LegalizeDAG
> > 
> > I think that doing the legalization in DAG for some of the fixed-point operations could be hard. Opcode legalization occurs after type legalization, so the legalization for opcodes that have legal types but where the target doesn't actually support the operation must be able to legalize to valid types. I think this could get unnecessarily tricky. For example, an N-bit, M-scale fixsmul must actually be done as N+M-bits or greater, and if that type (or anything greater than it) isn't legal it gets hairy.
> > 
> > There's also the issue of how a target specifies legality of these operations. A target might support ISD::FIXSMUL on MVT::i16, but only with a scale of 15. However, there's currently no way of expressing legality of an operation based on anything than the type (in general). Something would probably have to be added to TargetLowering/TargetLoweringBase that lets targets specify the legality of these operations. (The same issue applies to ssaturate; it might be legal to do 'i32 ssaturate (X, 15)', but not 'i32 ssaturate (X, 19)')
> > 
> > 
> > I think that doing the legalization in DAG for some of the fixed-point operations could be hard. Opcode legalization occurs after type legalization, so the legalization for opcodes that have legal types but where the target doesn't actually support the operation must be able to legalize to valid types. I think this could get unnecessarily tricky. For example, an N-bit, M-scale fixsmul must actually be done as N+M-bits or greater, and if that type (or anything greater than it) isn't legal it gets hairy.
> 
> I'm not sure if I follow still. (Still learning how the instruction selection process works). So for the fixsmul example, is the problem that during the operation legalization step, the target may not be able to support an int that has N+M bits? Wouldn't this legalization occur during the first type legalization step before operation legalization, like we check if the target can support a N+M bit int before operation legalization? Then during the following optimization stage, have a DAG pass that will expand an illegal intrinsic operation into other DAG nodes.
> 
> > There's also the issue of how a target specifies legality of these operations. A target might support ISD::FIXSMUL on MVT::i16, but only with a scale of 15. However, there's currently no way of expressing legality of an operation based on anything than the type (in general). Something would probably have to be added to TargetLowering/TargetLoweringBase that lets targets specify the legality of these operations. (The same issue applies to ssaturate; it might be legal to do 'i32 ssaturate (X, 15)', but not 'i32 ssaturate (X, 19)')
> 
> Also having trouble understanding this one. Is the problem that you don't have access to operand types during the instruction legalization step?
> I'm not sure if I follow still. (Still learning how the instruction selection process works). So for the fixsmul example, is the problem that during the operation legalization step, the target may not be able to support an int that has N+M bits?

Correct. Operation legalization (as I understand it; I could be wrong!) can not introduce illegal types into the DAG since type legalization occurs first. Say that you have a 'i16 fixsmul(X, Y, 15)' on a target where i16 is legal and i32 is not, and the fixsmul is not legal. This passes type legalization just fine, because i16 is legal. Then we get to operation legalization. It determines that the fixsmul node is not legal, so it needs to either be expanded or promoted according to what the target says (generally from TLI.getOperationAction).

However, we know how a fixsmul should be lowered. It needs to be done as just a regular mul in a wider type; but we can't make this wider type, because it's not legal! If the target specified Expand (which it probably did in this case, since i32 is not legal), we need to do it as a bunch of i8 operations*. This just seems really messy to me; it would have been much cleaner if it had just been properly promoted from the start and we let isel legalize the higher level lowering instead.

*I'm unsure about this. It might be the case that if we need to expand an illegal node with a legal type, it is allowed to do the lowered operations in the legal type anyway, even though we said expand.

> Wouldn't this legalization occur during the first type legalization step before operation legalization, like we check if the target can support a N+M bit int before operation legalization? Then during the following optimization stage, have a DAG pass that will expand an illegal intrinsic operation into other DAG nodes.

If the type of the node is legal, type legalization won't do anything. And N+M necessarily being legal for an N-bit fixsmul is not a requirement, since the fixsmul might itself be a legal, actual target operation, in which case the legality of N+M doesn't matter. I suppose you could ask about N+M if the node isn't legal as N, but that mixes type and operation legalization (since type legalization would then depend on whether an operation was legal in a type that we haven't even seen) and I'm not sure if the existing implementation ever does this.

> Also having trouble understanding this one. Is the problem that you don't have access to operand types during the instruction legalization step?

Generally, DAG will call `TLI.getOperationAction` to determine the legality of operations. It passes the opcode and the type, but we need more than that to determine if one of these operations is legal. A new hook would have to be added to TargetLowering that takes both the opcode, the type and the scaling factor/saturating bitwidth of the node. I think I mentioned it in another comment.

================
Comment at: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:1119
+  case ISD::SSAT:
+    // Target legalization checked here?
+    Action = TargetLowering::Expand;
----------------
leonardchan wrote:
> @ebevhan Do you know if this is where checking if this intrinsic is supported by the target should be? 
Yes, this would be it. For nodes like ssat and fixed-point ops, you could ask a custom target hook, maybe `TLI.getScaledOperationAction(<opcode>, <type>, <scale/sat_bw>)` to determine the legality of a given node.

When type-legalizing you also need to handle things for specific nodes as well, since things behave differently when you legalize operands/results on different nodes.

(Of course, this all depends on whether we do the legalization here or in IR instead.)

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:5768
+  case Intrinsic::ssaturate: {
+    Value *Src = I.getArgOperand(0);
+    auto *SatBits = dyn_cast<ConstantInt>(I.getArgOperand(1));
----------------
leonardchan wrote:
> ebevhan wrote:
> > This should probably be building an SSAT node rather than actually converting the intrinsic to normal DAG operations.
> Ah, ok, I think I'm starting to understand the DAG process better. I tried this earlier, but wasn't sure if that was the correct way to approach this. I initially had `setValue(&I, DAG.getNode(ISD::SSAT, sdl, DAG.getVTList(...), Op1, Op2))` (which I believe creates an SSAT node), but always ran into this fatal error:
> 
> ```
> fatal error: error in backend: Cannot select: t4: i32 = ssaturate t2, Constant:i32<4>
>   t2: i32,ch = CopyFromReg t0, Register:i32 %2
>     t1: i32 = Register %2
>   t3: i32 = Constant<4>
> In function: main
> ```
> 
> But I think this is because I didn't make this node type illegal during either legalization step.
> 
> If I understand the purpose of this file correctly: this is where the initial build of the DAG is done, then legalization occurs, then for targets where specific intrinsic operations are illegal, this DAG should be "expanded" into other DAG nodes before instruction selection into machine instructions.
> 
> If this is the case, where should the actual expansion (which I think is what this code also does) occur? I imagine it would be during the optimization step between the operation legalization and actual machine instruction selection from the DAG.
Yes, that line looks correct. The reason you got that error was because the final isel pattern matching step failed (i.e. the node was completely legal all the way through) but as there are no patterns for x86 to select that node, it couldn't finish.

> If I understand the purpose of this file correctly: this is where the initial build of the DAG is done, then legalization occurs, then for targets where specific intrinsic operations are illegal, this DAG should be "expanded" into other DAG nodes before instruction selection into machine instructions.

Correct. The initial DAG is built here, then type-legalized (through expansion/promotion), then operation-legalized (through expansion/promotion) with some optimization steps in between both. Then it's passed to the pattern matcher to finish the selection.

Technically, it is possible to select intrinsics directly if they survive all the way down. They show up as 'int_XXX' nodes in the .td patterns. If we decide to circumvent the typical path for determining the legality of these intrinsics (such as adding a special target hook to TargetLowering or legalizing as IR instead), then we might not need a DAG node at all. Any such intrinsic that makes it to the isel stage would be 'legal'. I don't know what upstream thinks of that design, though.

It also seems that some nodes have special hooks to handle a bit extra even on the DAG level. That might be an option if we still decide to go with doing something in DAG.

> If this is the case, where should the actual expansion (which I think is what this code also does) occur? I imagine it would be during the optimization step between the operation legalization and actual machine instruction selection from the DAG.

The actual expansion occurs during legalization. Typically, nodes can either be expanded (the type is divided in half and the operation is done in the halves) or promoted (the type is rounded up to the next legal type and done in the larger type instead). All of this occurs in LegalizeDAG.cpp, LegalizeTypes.cpp, and a number of other files (LegalizeIntegerTypes, LegalizeFloatTypes, LegalizeTypesGeneric etc). If you run a compilation with `-debug-only=isel` you'll get dumps of the DAG for each step of DAG selection:
```
Initial selection DAG: %bb.0 
...
Optimized lowered selection DAG: %bb.0 
...
Type-legalized selection DAG: %bb.0
...
Legalized selection DAG: %bb.0
...
Optimized legalized selection DAG: %bb.0
...
===== Instruction selection begins:
...
```
'Optimized' is done by DAGCombiner.

================
Comment at: lib/Transforms/FixedPoint/SaturationPass.cpp:13
+namespace {
+struct SignedSaturationPass : public FunctionPass {
+  static char ID;
----------------
leonardchan wrote:
> ebevhan wrote:
> > I think this would be more futureproof if it didn't strictly deal with ssaturate, but also lowered other fixed-point operations as well.
> > 
> > However, signed saturation is not strictly a fixed-point operation, so I'm not really sure how to go about it without it seeming odd.
> > 
> > It should probably also ask the target somehow if the operations should be converted at all. If this were considered a 'fixed-point lowering pass' then it could be inserted into the pass pipeline towards the end to lower operations that are not legal according to the target.
> > I think this would be more futureproof if it didn't strictly deal with ssaturate, but also lowered other fixed-point operations as well.
> >
> > However, signed saturation is not strictly a fixed-point operation, so I'm not really sure how to go about it without it seeming odd.
> 
> I see, in a sense we have 2 different sets of intrinsics: the saturation and add/sub ones that work generically for integers, and the mul/div ones that care about the scale of the fixed point operands. Do you think it would be ok to propose this as 2 separate passes then where one is specifically applied to fixed point types that take a scale? This would just be creating another directory and renaming this one.
It's true that the intrinsics do operate on different domains, but I think I would prefer them to be one and the same, since saturation is something that some fixed-point operations will want to do anyway. The expansion of a fixsmul_sat would contain a ssaturate, so it makes more sense to keep them in the same pass since this means you can iterate the expansion until you hit a fixed point (fixsmul_sat might not be legal but ssaturate might be, for example).

I guess you could have two passes and just run the fixed-point pass before the saturating integer pass...

Repository:
  rL LLVM

https://reviews.llvm.org/D52286