[PATCH] D108643: Introduce _BitInt, deprecate _ExtInt

Tue Sep 14 12:13:53 PDT 2021

erichkeane added subscribers: craig.topper, andrew.w.kaylor.
erichkeane added a comment.

In D108643#3000193 <https://reviews.llvm.org/D108643#3000193>, @rjmccall wrote:

> In D108643#2999776 <https://reviews.llvm.org/D108643#2999776>, @erichkeane wrote:
>
>>> ! In D108643#2965852 <https://reviews.llvm.org/D108643#2965852>, @rjmccall wrote:
>>>
>>> The choice that high bits are unspecified rather than extended is an interesting one.  Can you speak to that?  That's good for +, -, *, &, |, ^, <<, and narrowing conversions, but bad for ==, <, /, >>, and widening conversions.
>>
>> So we chose this for a few reasons:
>>
>> 1- Consistency with struct padding bits.  It seemed strange to specify the padding bits here, when the rest of the standards/ABI don't specify padding bits.
>
> I think it's a mistake to think of these as padding bits.  Implementations on general-purpose hardware will be doing operations in word-sized chunks; these are the high bits of the most significant word.

I guess thats fair?

>> 2- Flexibility of implementation: Requiring zeroing is a more constraining decision, which limits implementation to having to set these bits.  By leaving it unspecified, the implementation is free to zero them out if it feels it is worth-while.
>
> This is a trade-off.  Extending constrains the implementation of operations that produce noise in the high bits.  Not extending constrains the implementation of operations that are affected by noise in the high bits.  I'm willing to believe that the trade-off favors leaving the bits undefined, but that's why I'm asking, to see if you've actually evaluated this trade-off, because it kindof sounds like you've evaluated one side of it.

Perhaps I'm missing something... the intent in the ABI/standard was to leave it unconstrained so that an implementation could choose to zero it out if possible.  We DID consider making zero-ing the padding bits internally. On our hardware (see the reasoning below) there was not a perf-advantage.

>> I'll note that our backends choose NOT to zero them out when not necessary, since (so I'm told) 'masked' compares are trivial in most processors.
>
> They're trivial to implement in custom hardware, of course, but what existing ISAs actually provide masked compare instructions?  Is this a common feature I'm simply totally unaware of?  In practice I think this will be 1-2 extra instructions in every comparison.

Intel and AMD's X86-64 both do masked-compare, at least in microcode from what I was told. I'll have to see if it was @craig.topper or @andrew.w.kaylor (perhaps someone else?) who said we'd observed similar behavior on arm.

>> 3- Implement-ability on FPGAs: Since this was our motivating example, forcing an FPGA to zero out these bits when dealing with an interaction with a byte-aligned processor would have incredible performance overhead.
>
> How on earth does making the store unit zero/sign-extend have "incredible performance overhead"?  This is totally trivial technologically.  It's not like you're otherwise sending 17-bit stores out on the bus.

My understanding is, it IS in fact as if you're putting them out on a bus, extending these bits for pushing to our x86 core from our FPGA core ends up requiring more gates, longer reads during transition between processors, and longer 'interconnects'.  My understanding is the bus between the x86_64 cores and FPGA cores is a limited size, so being able to have the FPGA pack them is highly beneficial to the throughput. That said, my understanding of this is limited to the ELI5 explanation of our former FPGA architect.

> I'm not sure it's appropriate to think of this as primarily an FPGA feature when in fact it's being added to standard targets.

I don't believe I _AM_ thinking of this primarily as an FPGA feature, it is a motivating example for one of our architectures. Just like standards shouldn't unfairly punish 32 bit devices, it shouldn't punish FPGAs.

>> 4- Ease of implementation: Forcing LLVM to zero out these bits would either mean we had to do quite a bit of work in our CodeGen to zero them out, or modify most of the backends to not zero padding bits in these cases. Since there isn't a particular performance benefit (see #2) we didn't think it would be worth while.
>
> The obvious lowering would be for clang to use i17 as the scalar type lowering but i32 as the "in-memory" lowering, then make sure that the backends are reasonably intelligent about folding extend/trunc operations around operations that aren't sensitive / don't produce noise.

I guess we could?  That sounds like more work for opt for, what seems to me, to be little gain.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108643/new/

https://reviews.llvm.org/D108643