[llvm] [InstSimplify] Consider vscale_range for get active lane mask (PR #160073)

Tue Sep 23 05:11:31 PDT 2025

================
@@ -6474,10 +6474,27 @@ Value *llvm::simplifyBinaryIntrinsic(Intrinsic::ID IID, Type *ReturnType,
                                      const CallBase *Call) {
   unsigned BitWidth = ReturnType->getScalarSizeInBits();
   switch (IID) {
-  case Intrinsic::get_active_lane_mask:
+  case Intrinsic::get_active_lane_mask: {
     if (match(Op1, m_Zero()))
       return ConstantInt::getFalse(ReturnType);
+
+    const Function *F = Call->getFunction();
+    auto ScalableTy = dyn_cast<ScalableVectorType>(ReturnType);
+    if (ScalableTy && F->hasFnAttribute(Attribute::VScaleRange)) {
+      Attribute Attr = F->getFnAttribute(Attribute::VScaleRange);
+      std::optional<unsigned> VScaleMax = Attr.getVScaleRangeMax();
+      if (!VScaleMax)
+        break;
+      unsigned MaxPossibleMaskElements =
+          ScalableTy->getMinNumElements() * (*VScaleMax);
----------------
david-arm wrote:

I'm not an expert in C semantics, but I don't think this is enough to guarantee it won't overflow. If I compile C code like this:

```
unsigned long foo(unsigned a, unsigned b) {
  return a * b;
}
```

to LLVM IR then it looks like this:

```
define dso_local range(i64 0, 4294967296) i64 @foo(i32 noundef %a, i32 noundef %b) {
entry:
  %mul = mul i32 %b, %a
  %conv = zext i32 %mul to i64
  ret i64 %conv
}
```

i.e. the multiplication is still done as a 32-bit value. I think you need to do something like:

```
  uint64_t MaxPossibleMaskElements = ScalableTy->getMinNumElements();
  MaxPossibleMaskElements *= *VScaleMax;
```


https://github.com/llvm/llvm-project/pull/160073