[llvm] 01e1f32 - [ValueTracking][SimplifyLibCalls] Fix bug in getConstantDataArrayInfo for wchar_t

Bjorn Pettersson via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 7 06:29:53 PDT 2022


Author: Bjorn Pettersson
Date: 2022-10-07T15:29:32+02:00
New Revision: 01e1f3297151231fbd73705a073f42f2c453855d

URL: https://github.com/llvm/llvm-project/commit/01e1f3297151231fbd73705a073f42f2c453855d
DIFF: https://github.com/llvm/llvm-project/commit/01e1f3297151231fbd73705a073f42f2c453855d.diff

LOG: [ValueTracking][SimplifyLibCalls] Fix bug in getConstantDataArrayInfo for wchar_t

When SimplifyLibCalls is dealing with wchar_t (e.g. optimizing wcslen)
it uses ValueTracking helpers with a CharSize/ElementSize that isn't
8, but rather 16 or 32 (to match with the size in bits of a wchar_t).

Problem I've seen is that llvm::getConstantDataArrayInfo is taking
both an "ElementSize" argument (basically indicating size of a
char/element in bits) and an "Offset" which afaict is an offset
in the unit "number of elements". Then it also use
stripAndAccumulateConstantOffsets to get a "StartIdx" which afaict
is calculated in bytes. The returned Slice.Length is based on
arithmetics that add/subtract variables that are having different
units (bytes vs elements). Most notably I think the "StartIdx" must
be scaled using the "ElementSize" to get correct results.

The symptom of the above problem was seen in the wcslen-1.ll test
case which miscompiled.

This patch is supposed to resolve the bug by converting between
bytes and elements when needed.

Differential Revision: https://reviews.llvm.org/D135263

Added: 
    

Modified: 
    llvm/lib/Analysis/ValueTracking.cpp
    llvm/test/Transforms/InstCombine/wcslen-1.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index e99f62c9660e..685910c94b5f 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -4232,10 +4232,14 @@ bool llvm::isGEPBasedOnPointerToString(const GEPOperator *GEP,
 // its initializer if the size of its elements equals ElementSize, or,
 // for ElementSize == 8, to its representation as an array of unsiged
 // char. Return true on success.
+// Offset is in the unit "nr of ElementSize sized elements".
 bool llvm::getConstantDataArrayInfo(const Value *V,
                                     ConstantDataArraySlice &Slice,
                                     unsigned ElementSize, uint64_t Offset) {
-  assert(V);
+  assert(V && "V should not be null.");
+  assert((ElementSize % 8) == 0 &&
+         "ElementSize expected to be a multiple of the size of a byte.");
+  unsigned ElementSizeInBytes = ElementSize / 8;
 
   // Drill down into the pointer expression V, ignoring any intervening
   // casts, and determine the identity of the object it references along
@@ -4259,15 +4263,19 @@ bool llvm::getConstantDataArrayInfo(const Value *V,
     // Fail if the constant offset is excessive.
     return false;
 
-  Offset += StartIdx;
+  // Off/StartIdx is in the unit of bytes. So we need to convert to number of
+  // elements. Simply bail out if that isn't possible.
+  if ((StartIdx % ElementSizeInBytes) != 0)
+    return false;
 
+  Offset += StartIdx / ElementSizeInBytes;
   ConstantDataArray *Array = nullptr;
   ArrayType *ArrayTy = nullptr;
 
   if (GV->getInitializer()->isNullValue()) {
     Type *GVTy = GV->getValueType();
     uint64_t SizeInBytes = DL.getTypeStoreSize(GVTy).getFixedSize();
-    uint64_t Length = SizeInBytes / (ElementSize / 8);
+    uint64_t Length = SizeInBytes / ElementSizeInBytes;
 
     Slice.Array = nullptr;
     Slice.Offset = 0;

diff  --git a/llvm/test/Transforms/InstCombine/wcslen-1.ll b/llvm/test/Transforms/InstCombine/wcslen-1.ll
index a18798970b1f..6f6d494db2d7 100644
--- a/llvm/test/Transforms/InstCombine/wcslen-1.ll
+++ b/llvm/test/Transforms/InstCombine/wcslen-1.ll
@@ -217,10 +217,9 @@ define i64 @test_simplify12() {
 @ws = constant [10 x i32] [i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0]
 
 ; Fold wcslen(ws + 2) => 7.
-; FIXME: This fold is faulty, result should be 7 not 1.
 define i64 @fold_wcslen_1() {
 ; CHECK-LABEL: @fold_wcslen_1(
-; CHECK-NEXT:    ret i64 1
+; CHECK-NEXT:    ret i64 7
 ;
   %p = getelementptr inbounds [10 x i32], ptr @ws, i64 0, i64 2
   %len = tail call i64 @wcslen(ptr %p)
@@ -229,11 +228,11 @@ define i64 @fold_wcslen_1() {
 
 ; Should not crash on this, and no optimization expected (idea is to get into
 ; llvm::getConstantDataArrayInfo looking for an array with 32-bit elements but
-; with an offset that isn't a multiple of the element size).  FIXME: Looks a
-; bit weird. Don't think we should return 6 here.
+; with an offset that isn't a multiple of the element size).
 define i64 @no_fold_wcslen_1() {
 ; CHECK-LABEL: @no_fold_wcslen_1(
-; CHECK-NEXT:    ret i64 6
+; CHECK-NEXT:    %len = tail call i64 @wcslen(ptr nonnull getelementptr inbounds ([15 x i8], ptr @ws, i64 0, i64 3))
+; CHECK-NEXT:    ret i64 %len
 ;
   %p = getelementptr [15 x i8], ptr @ws, i64 0, i64 3
   %len = tail call i64 @wcslen(ptr %p)


        


More information about the llvm-commits mailing list