[llvm] 01e1f32 - [ValueTracking][SimplifyLibCalls] Fix bug in getConstantDataArrayInfo for wchar_t
Bjorn Pettersson via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 7 06:29:53 PDT 2022
Author: Bjorn Pettersson
Date: 2022-10-07T15:29:32+02:00
New Revision: 01e1f3297151231fbd73705a073f42f2c453855d
URL: https://github.com/llvm/llvm-project/commit/01e1f3297151231fbd73705a073f42f2c453855d
DIFF: https://github.com/llvm/llvm-project/commit/01e1f3297151231fbd73705a073f42f2c453855d.diff
LOG: [ValueTracking][SimplifyLibCalls] Fix bug in getConstantDataArrayInfo for wchar_t
When SimplifyLibCalls is dealing with wchar_t (e.g. optimizing wcslen)
it uses ValueTracking helpers with a CharSize/ElementSize that isn't
8, but rather 16 or 32 (to match with the size in bits of a wchar_t).
Problem I've seen is that llvm::getConstantDataArrayInfo is taking
both an "ElementSize" argument (basically indicating size of a
char/element in bits) and an "Offset" which afaict is an offset
in the unit "number of elements". Then it also use
stripAndAccumulateConstantOffsets to get a "StartIdx" which afaict
is calculated in bytes. The returned Slice.Length is based on
arithmetics that add/subtract variables that are having different
units (bytes vs elements). Most notably I think the "StartIdx" must
be scaled using the "ElementSize" to get correct results.
The symptom of the above problem was seen in the wcslen-1.ll test
case which miscompiled.
This patch is supposed to resolve the bug by converting between
bytes and elements when needed.
Differential Revision: https://reviews.llvm.org/D135263
Added:
Modified:
llvm/lib/Analysis/ValueTracking.cpp
llvm/test/Transforms/InstCombine/wcslen-1.ll
Removed:
################################################################################
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index e99f62c9660e..685910c94b5f 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -4232,10 +4232,14 @@ bool llvm::isGEPBasedOnPointerToString(const GEPOperator *GEP,
// its initializer if the size of its elements equals ElementSize, or,
// for ElementSize == 8, to its representation as an array of unsiged
// char. Return true on success.
+// Offset is in the unit "nr of ElementSize sized elements".
bool llvm::getConstantDataArrayInfo(const Value *V,
ConstantDataArraySlice &Slice,
unsigned ElementSize, uint64_t Offset) {
- assert(V);
+ assert(V && "V should not be null.");
+ assert((ElementSize % 8) == 0 &&
+ "ElementSize expected to be a multiple of the size of a byte.");
+ unsigned ElementSizeInBytes = ElementSize / 8;
// Drill down into the pointer expression V, ignoring any intervening
// casts, and determine the identity of the object it references along
@@ -4259,15 +4263,19 @@ bool llvm::getConstantDataArrayInfo(const Value *V,
// Fail if the constant offset is excessive.
return false;
- Offset += StartIdx;
+ // Off/StartIdx is in the unit of bytes. So we need to convert to number of
+ // elements. Simply bail out if that isn't possible.
+ if ((StartIdx % ElementSizeInBytes) != 0)
+ return false;
+ Offset += StartIdx / ElementSizeInBytes;
ConstantDataArray *Array = nullptr;
ArrayType *ArrayTy = nullptr;
if (GV->getInitializer()->isNullValue()) {
Type *GVTy = GV->getValueType();
uint64_t SizeInBytes = DL.getTypeStoreSize(GVTy).getFixedSize();
- uint64_t Length = SizeInBytes / (ElementSize / 8);
+ uint64_t Length = SizeInBytes / ElementSizeInBytes;
Slice.Array = nullptr;
Slice.Offset = 0;
diff --git a/llvm/test/Transforms/InstCombine/wcslen-1.ll b/llvm/test/Transforms/InstCombine/wcslen-1.ll
index a18798970b1f..6f6d494db2d7 100644
--- a/llvm/test/Transforms/InstCombine/wcslen-1.ll
+++ b/llvm/test/Transforms/InstCombine/wcslen-1.ll
@@ -217,10 +217,9 @@ define i64 @test_simplify12() {
@ws = constant [10 x i32] [i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0]
; Fold wcslen(ws + 2) => 7.
-; FIXME: This fold is faulty, result should be 7 not 1.
define i64 @fold_wcslen_1() {
; CHECK-LABEL: @fold_wcslen_1(
-; CHECK-NEXT: ret i64 1
+; CHECK-NEXT: ret i64 7
;
%p = getelementptr inbounds [10 x i32], ptr @ws, i64 0, i64 2
%len = tail call i64 @wcslen(ptr %p)
@@ -229,11 +228,11 @@ define i64 @fold_wcslen_1() {
; Should not crash on this, and no optimization expected (idea is to get into
; llvm::getConstantDataArrayInfo looking for an array with 32-bit elements but
-; with an offset that isn't a multiple of the element size). FIXME: Looks a
-; bit weird. Don't think we should return 6 here.
+; with an offset that isn't a multiple of the element size).
define i64 @no_fold_wcslen_1() {
; CHECK-LABEL: @no_fold_wcslen_1(
-; CHECK-NEXT: ret i64 6
+; CHECK-NEXT: %len = tail call i64 @wcslen(ptr nonnull getelementptr inbounds ([15 x i8], ptr @ws, i64 0, i64 3))
+; CHECK-NEXT: ret i64 %len
;
%p = getelementptr [15 x i8], ptr @ws, i64 0, i64 3
%len = tail call i64 @wcslen(ptr %p)
More information about the llvm-commits
mailing list