[PATCH] D59668: [ValueTracking] Take signedness into account for and/or ranges

Sat Mar 23 06:57:22 PDT 2019

nikic added a comment.

In D59668#1440521 <https://reviews.llvm.org/D59668#1440521>, @lebedev.ri wrote:

> Ok, i have to ask.
>  Is there some alternative way that does not require passing `ForSigned` all the way?
>
> - Operate on a single range, and reinterpret it somehow when we know the sign matters?
> - Always track two ranges, and use the one we need?

Good question! Let me start with what we can't do: It's not possible to have a single range and reinterpret it for signed and unsigned purposes. Taking the example of known bits `0b..00`, we can either interpret this is an unsigned range `[0b0000, 0b1100]` or a signed range `[0b1000, 0b0100]`. Here is a drawing of the unsigned (u) and signed (s) ranges, as well as the values (x) which are actually possible:

  0b0000  u s x
  0b0001  u s
  0b0010  u s
  0b0011  u s
  0b0100  u s x
  0b0101  u
  0b0110  u
  0b0111  u
  0b1000  u s x
  0b1001  u s
  0b1010  u s
  0b1011  u s
  0b1100  u s x
  0b1101    s
  0b1110    s
  0b1111    s

As you can see, both ranges cover all the x's, but in different ways: One without wrapping the unsigned domain, the other without wrapping the signed domain. Once we've arrived at this representation, we can no longer switch between them, because we have lost the information that the sign bit can be arbitrary.

There are a couple of different ways we can deal with this:

1. As you have already suggested, we can store both signed and unsigned ranges and just use whichever will be more convenient in the end. This would work, but it has rather high costs. We'd have to store two ranges everywhere and perform all calculations on two ranges. In many cases both ranges would end up being the same -- while you can interpret a known bits result without fixed sign bit in two ways, there's no two ways to interpret a urem result.

2. Pass a preferred signedness flag, which is what is done here. This is already common practice in some other parts of the codebase that deal with ranges, e.g. SCEV passes a `SignHint` through the range-based functionality. A flag allows us to choose an appropriate range when beneficial, but all calculations are still performed on only one range. (As a sidenote: While SCEV just passes this flag through, for recursive operations you don't necessarily want to do that. If you are computing a umax range for a signed context, you will still want to compute the ranges of the umax operands as unsigned. This is something that has to be decided per-operation, there is no general signedness context for the whole calculation.)

3. Some kind of hybrid representation that is not based on pure ranges. For example, we could store that the first N bits are undetermined and only store the range for the lower BW-N bits. If N>0 this would allow recovering both the signed and unsigned ranges for the case above. I'm not sure how general this would be, as it wouldn't be able to handle cases where the signed and unsigned ranges are not the same after truncation. Of course, this would also require rewriting the entire ConstantRange machinery from scratch :)

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59668/new/

https://reviews.llvm.org/D59668