[llvm-dev] Fixed Point Support in LLVM

Wed Aug 22 10:46:01 PDT 2018

> On Aug 22, 2018, at 10:48 AM, Bevin Hansson <bevin.hansson at ericsson.com> wrote:
> On 2018-08-22 11:32, John McCall wrote:
>> As I understand things, leaving it undefined would allow non-saturating addition/subtraction to just freely overflow into the bit.  I don't know if other arithmetic would equally benefit, and defining it to be zero will probably simplify most other operations even if it does sometimes require some amount of extra masking.
>> 
>> I don't know what the dynamic balance of fixed-point operations is in a real program.  It's certainly plausible that optimizing non-saturating arithmetic is worth penalizing other operations.
> Yes, if we don't clear the unsigned padding bit after every operation (C operation, not IR operation) we can get garbage in the bit that could affect later operations (multiplication, comparison). Locking it to always be zero would ensure that you always produce a valid, representable value.
> 
> I think we already discussed this in review and settled on keeping it undefined, but from my side it's only because I don't really have a good argument for making it zeroed; technically this is overflow and therefore undefined anyway. I think that ensuring that we don't produce unrepresentable values is good, though.

Oh, sorry, I didn't realize that the semantics were that overflow is UB when it isn't saturating.  Yes, in that case, assuming that the bit is zero, but not doing anything to ensure that it's zero in non-saturating operations, seems like the right policy.

Note that that strengthens the arguments for using intrinsics over frontend expansions, because there are stronger preconditions on these operations than the integer optimizer understands.

>> I just mean that I want there to be intrinsics called llvm.fixadd or llvm.fixmul or whatever with well-specified, target-independent semantics instead of a million target-specific intrinsics called something like llvm.supermips37.addfixvlq_16.
> Okay, then we're pretty much on the same page. In our implementation we have llvm.sat, llvm.fixsmul, llvm.add.sat, llvm.fixsmul.sat, etc. as described in Leonard's first mail in the thread. We have omitted many intrinsics that we don't need for our particular target/language (like unsigned multiplication, unsigned variations of saturating ops, and plain addition) but the basic design is not hard to extend with this.

Okay.

>>> Either of these goals could be pretty tricky if the semantics of the intrinsics must be well defined ("fixsmul is equivalent to (trunc (lshr (mul (sext a), (sext b))))") rather than "fixsmul does a signed fixed-point multiplication". If a target or language has different semantics for their fixed-point operations, the intrinsics are useless to them.
>>> I would rather see them well defined than not, though. I also agree that they should be portable and generic enough to support any language/target implementation, but unless you add lots of intrinsics and parameterization, this could result in a bit of 'mismatch' between what the intrinsics can do and what the frontend wants to do. At some point you might end up having to emit a bit of extra code in the frontend to cover for the deficiencies of the generic implementation.
>> "Lots of parameterization" sounds about right.  There should just be a pass in the backend that legalizes intrinsics that aren't directly supported by the target.  The analogy is to something like llvm.sadd_with_overflow: a frontend can use that intrinsic on i19 if it wants, and that obviously won't map directly to a single instruction, so LLVM legalizes it to operations that *are* directly supported.  If your target has direct support for saturating signed additions on a specific format, the legalization pass can let those through, but otherwise it should lower them into basic operations.
> Sure; if a target does not support an instance of llvm.fixsmul.i32 with a scale parameter of 31, a pass should convert this into straight-line integer IR that implements the defined semantics of the operation.
> 
> I'm just not really a fan of something like 'llvm.fixmul.iN(iN LHS, iN RHS, i32 Scale, i1 Signed, i1 Saturating, i2 RoundingDirection, i1 HasPadding)'. I don't think this kind of intrinsic is nice to look at or implement. Defining the semantics of an operation like this is quite hard since there are so many knobs, and pretty much any target would only be able to handle a handful of value combinations to this anyway.

Well, yes, the readability problem here is part of why I've been encouraging the use of a type + instructions.  Something like:
  %1 = fixumul saturate fix32_16p %0, 7.5
is much, much more approachable than:
  %1 = call @llvm.fixmul.i32(i32 %0, i32 491520, i32 15, i1 0, i1 1, i2 0, i1 1)

But sure, we can do finer-grained intrinsics like this:
  %1 = call @llvm.fixumul.sat.i32(i32 %0, i32 491520, i32 16, i2 0, i1 1)
That still seems like a lot of parameters, though, and it's not hard to imagine it getting worse over time (if, say, people want a non-UB but still non-saturating semantics).

> IMHO it should be a number of separate intrinsics with well-defined semantics for each one, preferably semantics that can be described in a single line. That makes it easy to implement and understand. Like so:
> 
> llvm.ssat.iN(iN A, i32 SatWidth) = saturate anything above SatWidth bits as a signed value
> llvm.usat.iN(iN B, i32 SatWidth) = saturate anything above SatWidth bits as an unsigned value
> llvm.fixsmul.iN(iN A, iN B, i32 Scale) = sext->mul->ashr->trunc
> llvm.fixumul.iN(iN A, iN B, i32 Scale) = zext->mul->lshr->trunc
> llvm.fixsmul.sat.iN(iN A, iN B, i32 Scale) = sext->mul->ashr->ssat->trunc
> llvm.fixumul.sat.iN(iN A, iN B, i32 Scale) = zext->mul->lshr->usat->trunc
> llvm.sadd.sat.iN(iN A, iN B) = saddo->check sign&ovf->selects
> llvm.uadd.sat.iN(iN A, iN B) = uaddo->check ??&ovf->selects
> etc.
> 
> The operations that the intrinsics are expressed in are strictly different anyway (sext vs zext, saddo vs uaddo etc), so there isn't any room for generalization on constant parameter values in the first place. You'll need to disambiguate manually regardless. The semantics for the saturation intrinsics aren't that straightforward... but it's probably possible to come up with a good description with a bit of thinking.

Well, the semantics are the high-level arithmetic semantics, which are stronger than any particular lowering sequence.

>> If the frontend generates a ton of requests for operations that have to be carried out completely in software, that's its problem.
> An example of what I mean by extra code is if you look at the intrinsics I described. If you have an unsigned padding bit, there is no way to perform an unsigned saturating addition with those intrinsics. You need to emit an llvm.sadd.sat, add an extra check for a set sign bit (because that means that we went below 0) and clamp to zero in that case.

I don't see why this wouldn't be part of the intrinsic.

>>>> As for other frontends, I can only speak for Swift.  Fixed-point types are not a high
>>>> priority for Swift, just like they haven't been a high priority for Clang — it's not like
>>>> Embedded C is a brand-new specification.  But if we had a reason to add them to
>>>> Swift, I would be pretty upset as a frontend author to discover that LLVM's support
>>>> was scattered and target-specific and that my best implementation option was to
>>>> copy a ton of code from Clang.
>>> It might not be a ton, but at some level you'd have to copy a bit of code. There's several fixed-point operations that probably don't deserve their own intrinsics, like nonsaturating fixed-fixed and fixed-int conversion.
>> Why not?  I mean, sure, they're easy to define with extends and shifts, but it doesn't really seem *bad* to have intrinsics for them.  I guess you'd lose some small amount of free integer optimization from the middle-end, but the important cases will probably all still get done for free when they get legalized.
> Perhaps. The odd thing is though, if a target just wants the shifts and extends, they would have to say that these operations/intrinsics are illegal in order to get legalization, which is a bit strange to me.

Agreed, it's a little strange, but I think it's not hard to understand.

> If a target has a more efficient way of doing shift+trunc or shift+ext, then it should just have patterns for that in the first place.

True.

>>> There's always the possibility of adding them to IRBuilder if we think they might need to be reused.
>> Please at least do this, yes.
>> 
>>>> The main downsides of not having a type are:
>>>> 
>>>>   - Every operation that would've been overloaded by operand type instead has
>>>>     to be parameterized.  That is, your intrinsics all have to take width and scale in
>>>>     addition to signed-ness and saturating-ness; that's a lot of parameters, which
>>>>     tends to make testing and debugging harder.
>>> The width would be implied by the width of the integer type, and signedness and saturation should simply have their own intrinsics than be a parameter. Scale would have to be a constant parameter for some of the intrinsics, though.
>> Well, width can differ from the width of the integer type for these padded types, right?  I mean, you can represent those differently if you want, but I would hope it's represented explicitly and not just as a target difference.
> Well, only by a bit. There's no reason to have more bits in your representation than you have value bits. The 'store size' of an unsigned fixed type is the same as the signed counterpart, so if the width of the signed one is i16, the unsigned one must also be i16. You could trunc to i15 before every operation and then extend afterwards, but that seems a bit clunky.

Well, there are similar restrictions on the scale, too, right?  In theory the scale could be an arbitrary positive or negative number, but I assume we would actually constrain it to (0,width].

> If you keep them the same width, you can reuse the signed multiplication intrinsic for unsigned, since the representation in all of the lower bits is the same.

For targets that want to use that representation, yes.

>>> It means more intrinsics, but I think it's a better design than having a single intrinsic with flags for different cases.
>> In my experience, using different intrinsics does make it awkward to do a lot of things that would otherwise have parallel structure.  It's particularly annoying when generating code, which also affects middle-end optimizers.  And in the end you're going to be pattern-matching the constant arguments anyway.
> I guess our opinions differ there, then. I think the code savings/generalization/simplification you get from expressing the parameterization as constant parameters compared to having separate intrinsics is not huge.
> 
> if (Ty->isSigned())
>   IID = fixsmul
> else
>   IID = fixumul
> 
> if (Ty->isSigned())
>   SignedParam = ConstantInt(1)
> else
>   SignedParam = ConstantInt(0)

Well, it's more like:
  IID = E->isSaturating()
             ? (Ty->isSigned() ? fixsmul_sat : fixumul_sat)
              : (Ty->isSigned() ? fixsmul : fixumul);
vs.
  ConstantInt(Ty->isSigned()), ConstantInt(E->isSaturating())
and that's assuming just the two dimensions of variation encoded in the intrinsic name, not all the stuff with scale and padding bits and rounding mode.

> The same goes for legalization/expansion. The semantics would be described in specific operations, not generalized ones ('ext') so there isn't much you can do there anyway.
>> 
>> But ultimately, I'm not trying to dictate the design to the people actually doing the work.  I just wanted to ward off the implementation that seemed to be evolving, which was a bunch of hand-lowering in Clang's IR-generation, modified by target hooks to emit target-specific intrinsic calls.
>> 
>>>>   - Constants have to be written as decimal integers, which tends to make testing
>>>>     and debugging harder.
>>> It would be possible to add a decimal fixed-point format to the possible integer constant representations in textual IR, but this doesn't help when you're printing.
>> Right.
>> 
>>>>   - Targets that want to pass fixed-point values differently from integers have to
>>>>     invent some extra way of specifying that a value is fixed-point.
>>> Another function attribute would probably be fine for that.
>> It depends.  It wouldn't work well with compound results, at least, but people often tend not to care about those because they're awkward to work with in C.
> Oh, I didn't think of those.
> 
> Personally I think most of the CC handling in Lowering is a huge pain to do anything with, even for today's types.

Definitely.

John.