[llvm] r194220 - IR: Do not canonicalize constant GEPs into an out-of-bounds array access

Fri Aug 21 10:38:55 PDT 2015

Dandy - 245712

Thanks!

On Fri, Aug 21, 2015 at 12:32 AM, David Majnemer <david.majnemer at gmail.com>
wrote:

> I believe that it is impossible to hit the case, we can just return true
> for pointer types.
>
> On Thu, Aug 20, 2015 at 10:39 PM, David Blaikie <dblaikie at gmail.com>
> wrote:
>
>> (adding the new mailing list...
>>
>> ---------- Forwarded message ----------
>> From: David Blaikie <dblaikie at gmail.com>
>> Date: Thu, Aug 20, 2015 at 10:35 PM
>> Subject: Re: [llvm] r194220 - IR: Do not canonicalize constant GEPs into
>> an out-of-bounds array access
>> To: David Majnemer <david.majnemer at gmail.com>
>> Cc: "llvm-commits at cs.uiuc.edu" <llvm-commits at cs.uiuc.edu>
>>
>>
>>
>>
>> On Thu, Nov 7, 2013 at 2:15 PM, David Majnemer <david.majnemer at gmail.com>
>> wrote:
>>
>>> Author: majnemer
>>> Date: Thu Nov  7 16:15:53 2013
>>> New Revision: 194220
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=194220&view=rev
>>> Log:
>>> IR: Do not canonicalize constant GEPs into an out-of-bounds array access
>>>
>>> Summary:
>>> Consider a GEP of:
>>> i8* getelementptr ({ [2 x i8], i32, i8, [3 x i8] }* @main.c, i32 0, i32
>>> 0, i64 0)
>>>
>>> If we proceeded to GEP the aforementioned object by 8, would form a GEP
>>> of:
>>> i8* getelementptr ({ [2 x i8], i32, i8, [3 x i8] }* @main.c, i32 0, i32
>>> 0, i64 8)
>>>
>>> Note that we would go through the first array member, causing an
>>> out-of-bounds accesses.  This is problematic because we might get fooled
>>> if we are trying to evaluate loads using this GEP, for example, based
>>> off of an object with a constant initializer where the array is zero.
>>>
>>> This fixes PR17732.
>>>
>>> Reviewers: nicholas, chandlerc, void
>>>
>>> Reviewed By: void
>>>
>>> CC: llvm-commits, echristo, void, aemerson
>>>
>>> Differential Revision: http://llvm-reviews.chandlerc.com/D2093
>>>
>>> Added:
>>>     llvm/trunk/test/Transforms/GVN/pr17732.ll
>>> Modified:
>>>     llvm/trunk/lib/IR/ConstantFold.cpp
>>>
>>> Modified: llvm/trunk/lib/IR/ConstantFold.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/IR/ConstantFold.cpp?rev=194220&r1=194219&r2=194220&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/IR/ConstantFold.cpp (original)
>>> +++ llvm/trunk/lib/IR/ConstantFold.cpp Thu Nov  7 16:15:53 2013
>>> @@ -1940,7 +1940,43 @@ static Constant *ConstantFoldGetElementP
>>>             I != E; ++I)
>>>          LastTy = *I;
>>>
>>> -      if ((LastTy && isa<SequentialType>(LastTy)) ||
>>> Idx0->isNullValue()) {
>>> +      // We cannot combine indices if doing so would take us outside of
>>> an
>>> +      // array or vector.  Doing otherwise could trick us if we
>>> evaluated such a
>>> +      // GEP as part of a load.
>>> +      //
>>> +      // e.g. Consider if the original GEP was:
>>> +      // i8* getelementptr ({ [2 x i8], i32, i8, [3 x i8] }* @main.c,
>>> +      //                    i32 0, i32 0, i64 0)
>>> +      //
>>> +      // If we then tried to offset it by '8' to get to the third
>>> element,
>>> +      // an i8, we should *not* get:
>>> +      // i8* getelementptr ({ [2 x i8], i32, i8, [3 x i8] }* @main.c,
>>> +      //                    i32 0, i32 0, i64 8)
>>> +      //
>>> +      // This GEP tries to index array element '8  which runs
>>> out-of-bounds.
>>> +      // Subsequent evaluation would get confused and produce erroneous
>>> results.
>>> +      //
>>> +      // The following prohibits such a GEP from being formed by
>>> checking to see
>>> +      // if the index is in-range with respect to an array or vector.
>>> +      bool IsSequentialAccessInRange = false;
>>> +      if (LastTy && isa<SequentialType>(LastTy)) {
>>> +        uint64_t NumElements = 0;
>>> +        if (ArrayType *ATy = dyn_cast<ArrayType>(LastTy))
>>> +          NumElements = ATy->getNumElements();
>>> +        else if (VectorType *VTy = dyn_cast<VectorType>(LastTy))
>>> +          NumElements = VTy->getNumElements();
>>> +
>>> +        if (ConstantInt *CI = dyn_cast<ConstantInt>(Idx0)) {
>>> +          int64_t Idx0Val = CI->getSExtValue();
>>> +          if (NumElements > 0 && Idx0Val >= 0 &&
>>> +              (uint64_t)Idx0Val < NumElements)
>>> +            IsSequentialAccessInRange = true;
>>> +        } else if (PointerType *PTy = dyn_cast<PointerType>(LastTy))
>>> +          // Only handle pointers to sized types, not pointers to
>>> functions.
>>> +          if (PTy->getElementType()->isSized())
>>>
>>
>> While going about my business removing access to pointee types, I came
>> across this code. Removing this conditional ^ doesn't result in any tests
>> failing. Any idea if it's necessary/useful and could/should be tested? (&
>> if it's necessary, how to implement it in a world involving opaque pointer
>> types)
>>
>>
>>> +            IsSequentialAccessInRange = true;
>>> +      }
>>> +      if (IsSequentialAccessInRange || Idx0->isNullValue()) {
>>>          SmallVector<Value*, 16> NewIndices;
>>>          NewIndices.reserve(Idxs.size() + CE->getNumOperands());
>>>          for (unsigned i = 1, e = CE->getNumOperands()-1; i != e; ++i)
>>>
>>> Added: llvm/trunk/test/Transforms/GVN/pr17732.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/GVN/pr17732.ll?rev=194220&view=auto
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/GVN/pr17732.ll (added)
>>> +++ llvm/trunk/test/Transforms/GVN/pr17732.ll Thu Nov  7 16:15:53 2013
>>> @@ -0,0 +1,30 @@
>>> +; RUN: opt -gvn -S -o - < %s | FileCheck %s
>>> +
>>> +target datalayout =
>>> "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
>>> +target triple = "x86_64-unknown-linux-gnu"
>>> +
>>> +%struct.with_array = type { [2 x i8], i32, i8 }
>>> +%struct.with_vector = type { <2 x i8>, i32, i8 }
>>> +
>>> + at main.obj_with_array = private unnamed_addr constant { [2 x i8], i32,
>>> i8, [3 x i8] } { [2 x i8] zeroinitializer, i32 0, i8 1, [3 x i8] undef },
>>> align 4
>>> + at array_with_zeroinit = common global %struct.with_array
>>> zeroinitializer, align 4
>>> +
>>> + at main.obj_with_vector = private unnamed_addr constant { <2 x i8>, i32,
>>> i8, [3 x i8] } { <2 x i8> zeroinitializer, i32 0, i8 1, [3 x i8] undef },
>>> align 4
>>> + at vector_with_zeroinit = common global %struct.with_vector
>>> zeroinitializer, align 4
>>> +
>>> +define i32 @main() {
>>> +entry:
>>> +  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* getelementptr inbounds
>>> (%struct.with_array* @array_with_zeroinit, i64 0, i32 0, i64 0), i8*
>>> getelementptr inbounds ({ [2 x i8], i32, i8, [3 x i8] }*
>>> @main.obj_with_array, i64 0, i32 0, i64 0), i64 12, i32 4, i1 false)
>>> +  %0 = load i8* getelementptr inbounds (%struct.with_array*
>>> @array_with_zeroinit, i64 0, i32 2), align 4
>>> +
>>> +  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* getelementptr inbounds
>>> (%struct.with_vector* @vector_with_zeroinit, i64 0, i32 0, i64 0), i8*
>>> getelementptr inbounds ({ <2 x i8>, i32, i8, [3 x i8] }*
>>> @main.obj_with_vector, i64 0, i32 0, i64 0), i64 12, i32 4, i1 false)
>>> +  %1 = load i8* getelementptr inbounds (%struct.with_vector*
>>> @vector_with_zeroinit, i64 0, i32 2), align 4
>>> +  %conv0 = sext i8 %0 to i32
>>> +  %conv1 = sext i8 %1 to i32
>>> +  %and = and i32 %conv0, %conv1
>>> +  ret i32 %and
>>> +; CHECK-LABEL: define i32 @main(
>>> +; CHECK: ret i32 1
>>> +}
>>> +
>>> +declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture
>>> readonly, i64, i32, i1)
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150821/61ea83bf/attachment.html>