<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div><blockquote type="cite" class=""><div class="">On Aug 24, 2018, at 9:02 AM, Bevin Hansson <<a href="mailto:bevin.hansson@ericsson.com" class="">bevin.hansson@ericsson.com</a>> wrote:</div><div class=""><div class="">On 2018-08-22 19:46, John McCall wrote:<blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class="">Either of these goals could be pretty tricky if the semantics of the intrinsics must be well defined ("fixsmul is equivalent to (trunc (lshr (mul (sext a), (sext b))))") rather than "fixsmul does a signed fixed-point multiplication". If a target or language has different semantics for their fixed-point operations, the intrinsics are useless to them.<br class="">I would rather see them well defined than not, though. I also agree that they should be portable and generic enough to support any language/target implementation, but unless you add lots of intrinsics and parameterization, this could result in a bit of 'mismatch' between what the intrinsics can do and what the frontend wants to do. At some point you might end up having to emit a bit of extra code in the frontend to cover for the deficiencies of the generic implementation.<br class=""></blockquote>"Lots of parameterization" sounds about right. There should just be a pass in the backend that legalizes intrinsics that aren't directly supported by the target. The analogy is to something like llvm.sadd_with_overflow: a frontend can use that intrinsic on i19 if it wants, and that obviously won't map directly to a single instruction, so LLVM legalizes it to operations that *are* directly supported. If your target has direct support for saturating signed additions on a specific format, the legalization pass can let those through, but otherwise it should lower them into basic operations.<br class=""></blockquote>Sure; if a target does not support an instance of llvm.fixsmul.i32 with a scale parameter of 31, a pass should convert this into straight-line integer IR that implements the defined semantics of the operation.<br class=""><br class="">I'm just not really a fan of something like 'llvm.fixmul.iN(iN LHS, iN RHS, i32 Scale, i1 Signed, i1 Saturating, i2 RoundingDirection, i1 HasPadding)'. I don't think this kind of intrinsic is nice to look at or implement. Defining the semantics of an operation like this is quite hard since there are so many knobs, and pretty much any target would only be able to handle a handful of value combinations to this anyway.<br class=""></blockquote>Well, yes, the readability problem here is part of why I've been encouraging the use of a type + instructions. Something like:<br class=""> %1 = fixumul saturate fix32_16p %0, 7.5<br class="">is much, much more approachable than:<br class=""> %1 = call @llvm.fixmul.i32(i32 %0, i32 491520, i32 15, i1 0, i1 1, i2 0, i1 1)<br class=""></blockquote>True, but if we go the intrinsic route then we should strive to keep the intrinsic as simple as possible while still encapsulating the functionality we need for our purpose here and now.<br class=""></div></div></blockquote><div><br class=""></div>Sure.</div><div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><blockquote type="cite" class="">But sure, we can do finer-grained intrinsics like this:<br class=""> %1 = call @llvm.fixumul.sat.i32(i32 %0, i32 491520, i32 16, i2 0, i1 1)<br class="">That still seems like a lot of parameters, though, and it's not hard to imagine it getting worse over time (if, say, people want a non-UB but still non-saturating semantics).<br class=""></blockquote><br class="">The example I gave with the rounding and padding argument might have been more of a 'slippery slope' example. I think that if we settle on a specific implementation semantic for rounding and padding on the (Embedded-)C source language level, we should make intrinsics that reflect that and not too much more.<br class=""></div></div></blockquote><div><br class=""></div><div>I'm not trying to argue for pre-emptive generalization, but I'm not sure we're going to reach universal agreement on at least the padding question. Targets with ISA support probably want to use whatever format lets them best exploit the hardware, but targets without it probably want to use an un-padded format because it generally means they get to use ordinary overflow detection. (There's a nice coincidence where padding is an issue only for unsigned arithmetic, where saturation is generally easier because any given operation can only overflow in one direction. Although maybe you get the same advantages with signed overflow when the value just happens to be known non-negative — I haven't really thought this through.)</div><div><br class=""></div></div><div><blockquote type="cite" class=""><div class=""><div class="">We could add lots of knobs to the intrinsics to make them more 'portable' but I don't think the portability benefit outweighs the complexity cost. If a language decides they want to add fixed-point support but want their rounding to be towards zero instead of down, then I think they should just emit more to handle that on top of the existing intrinsics, or just go with the existing semantics. (The former might not even be reasonably possible depending on the design; I haven't considered the example much.)<br class=""></div></div></blockquote><div><br class=""></div>I just want the semantics to not include an implicit reference to the current target. We don't have to pre-emptively generalize beyond what's necessary to implement the spec.</div><div><br class=""></div><div>The spec says that rounding is basically unspecified:</div><div> <span style="font-size: 11pt;" class="">Whether rounding is up or down is implementation-defined and may differ for different values and
different situations; an implementation may specify that the rounding is indeterminable.</span></div>
<div class="page" title="Page 13">
<div class="layoutArea">
<div class="column"><p class="">I'm fine with using those semantics at a high level even if means that e.g. constant-folding might differ from what would actually be done by the processor. But I think that kind of difference has been a problem for optimizations in the past.</p></div></div></div><div><blockquote type="cite" class=""><div class=""><div class=""><blockquote type="cite" class=""><blockquote type="cite" class="">Well, only by a bit. There's no reason to have more bits in your representation than you have value bits. The 'store size' of an unsigned fixed type is the same as the signed counterpart, so if the width of the signed one is i16, the unsigned one must also be i16. You could trunc to i15 before every operation and then extend afterwards, but that seems a bit clunky.<br class=""></blockquote>Well, there are similar restrictions on the scale, too, right? In theory the scale could be an arbitrary positive or negative number, but I assume we would actually constrain it to (0,width].<br class=""></blockquote>Yes, that would be the restriction on the scale parameter. Technically it can be 0 as well, but then you just have an integer anyway?<br class=""></div></div></blockquote><div><br class=""></div>Right.</div><div><br class=""><blockquote type="cite" class=""><div class=""><div class="">My point is basically, if you have a padded type and you want to perform a fixed-point signed multiplication (for example) on fewer bits than the width, you have to emit code to do that specially, such as trunc first, then operate, then extend back. All other operations operate on all bits; there's no integer 'mul with padding', for example.<br class=""></div></div></blockquote><div><br class=""></div>Okay. So a target that wants to do padded unsigned operations on a 32-bit type should be truncating to i31 before invoking the intrinsic? I guess I'm a little worried about the code-generation consequences of that, but it's definitely cleaner, so if you're okay with it, it's fine with me.</div><div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><blockquote type="cite" class=""><blockquote type="cite" class="">If you keep them the same width, you can reuse the signed multiplication intrinsic for unsigned, since the representation in all of the lower bits is the same.<br class=""></blockquote>For targets that want to use that representation, yes.<br class=""></blockquote>Yes, true.<br class=""><br class="">I guess I mostly meant the general C level representation here with its restriction on a single padding bit. Any language with a different representation (such as more padding bits) would have to figure out some way of using the provided intrinsics to work on the padded types; likely by truncing before performing the operation, as I mentioned.<br class=""></div></div></blockquote><div><br class=""></div>Yeah, I don't think we need to generalize to arbitrary formats here.</div><div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class="">It means more intrinsics, but I think it's a better design than having a single intrinsic with flags for different cases.<br class=""></blockquote>In my experience, using different intrinsics does make it awkward to do a lot of things that would otherwise have parallel structure. It's particularly annoying when generating code, which also affects middle-end optimizers. And in the end you're going to be pattern-matching the constant arguments anyway.<br class=""></blockquote>I guess our opinions differ there, then. I think the code savings/generalization/simplification you get from expressing the parameterization as constant parameters compared to having separate intrinsics is not huge.<br class=""><br class="">if (Ty->isSigned())<br class=""> IID = fixsmul<br class="">else<br class=""> IID = fixumul<br class=""><br class="">if (Ty->isSigned())<br class=""> SignedParam = ConstantInt(1)<br class="">else<br class=""> SignedParam = ConstantInt(0)<br class=""></blockquote>Well, it's more like:<br class=""> IID = E->isSaturating()<br class=""> ? (Ty->isSigned() ? fixsmul_sat : fixumul_sat)<br class=""> : (Ty->isSigned() ? fixsmul : fixumul);<br class="">vs.<br class=""> ConstantInt(Ty->isSigned()), ConstantInt(E->isSaturating())<br class="">and that's assuming just the two dimensions of variation encoded in the intrinsic name, not all the stuff with scale and padding bits and rounding mode.<br class=""></blockquote>True, I omitted the saturating variants...<br class=""><br class="">I still prefer multiple intrinsics but I don't really have a good counterargument, except for the Clean Code boolean function argument recommendation :)<br class=""></div></div></blockquote><div><br class=""></div>I think that one went past me, sorry.</div><div><br class=""></div><div>John.</div><div><br class=""></div></body></html>