[llvm-commits] [PATCH] Stack Coloring optimization

Tue Sep 4 23:40:38 PDT 2012

+samsonov at google.com

On Wed, Sep 5, 2012 at 1:12 AM, Chandler Carruth <chandlerc at google.com>wrote:

> On Tue, Sep 4, 2012 at 4:41 PM, Chandler Carruth <chandlerc at google.com>wrote:
>
>> Hey Nadav, super excited about this.
>>
>> Recently, we tried to roll out some significant improvements to GCC's
>> stack slot re-use, and discovered a really terrifying number of bugs in
>> user code stemming from this. I would like to make sure that when using
>> tools like ASan, this logic is disabled, and ASan can correctly detect code
>> using an alloca after its lifetime has ended. Without this, it will be very
>> hard to track down mis-compiles introduced here. Adding Kostya as I'm not
>> familiar with the exact current state of ASan and lifetime of alloca
>> instrumentation.
>>
>
> Having read the discussion more carefully when we hit lots of user bugs
> surrounding this, I have more information.
>
> The problem stems from re-use of locals after their lifetime ends. These
> won't show up often until we teach Clang to emit the lifetime intrinsics
> for those locals. Once we do, your pass will cause a very large amount of
> code to blow up. It's a good thing too, as that code is *broken*.
>
> We need to teach ASan to use the lifetime markers to catch
> use-after-lifetime-ends bugs first IMO, and *then* enable this pass and the
> lifetime markers coming from Clang. That way we will already have the tools
> to find these bugs quickly and accurately.
>

This is in our TODO, although, unfortunately, I am not sure when we will
get to it.
Perhaps in October.
http://code.google.com/p/address-sanitizer/issues/detail?id=83&

>
> We probably will also need to disable this particular optimization for
> functions that have the ASan attribute set.
>

Hm. This is happening in the CodGen, i.e. after asan, right?
asan merges all Alloca instructions into a single large Alloca, so under
asan this optimization will be automatically disabled anyway. No?

--kcc

>
>
>>
>>
>> Also, some code comments inline:
>>
>> Index: test/CodeGen/X86/StackColoring.ll
>> ===================================================================
>> --- test/CodeGen/X86/StackColoring.ll (revision 0)
>> +++ test/CodeGen/X86/StackColoring.ll (revision 0)
>> @@ -0,0 +1,132 @@
>> +; RUN: llc < %s                    | FileCheck %s --check-prefix=COLOR
>> +; RUN: llc < %s -no-stack-coloring | FileCheck %s --check-prefix=NOCOLOR
>> +
>> +target datalayout =
>> "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
>> +target triple = "x86_64-apple-macosx10.8.0"
>> +
>> +;COLOR: subq  $136, %rsp
>> +;COLOR: subq  $144, %rsp
>> +;COLOR: subq  $208, %rsp
>> +;COLOR: subq  $112, %rsp
>> +
>> +;NOCOLOR: subq  $264, %rsp
>> +;NOCOLOR: subq  $272, %rsp
>> +;NOCOLOR: subq  $400, %rsp
>> +;NOCOLOR: subq  $400, %rsp
>>
>> Can you put the filecheck assertions inside the functions in question,
>> and add anchoring assertions for the function name? That makes the test
>> easier for me to read, debug, and update.
>>
>> Index: include/llvm/CodeGen/ISDOpcodes.h
>> ===================================================================
>> --- include/llvm/CodeGen/ISDOpcodes.h (revision 162761)
>> +++ include/llvm/CodeGen/ISDOpcodes.h (working copy)
>> @@ -637,6 +637,10 @@
>>      ATOMIC_LOAD_UMIN,
>>      ATOMIC_LOAD_UMAX,
>>
>> +    /// This corresponds to the LIFETIME intrinsics. The first operand
>>
>> I would write this ".. to the @llvm.lifetime intrinsics. ..." or
>> otherwise differ from the style of the ISD node enumerators...
>>
>> +    /// is the chain and the second operand is the alloca pointer.
>> +    LIFETIME_START, LIFETIME_END,
>> +
>>
>>
>> Index: lib/CodeGen/StackColoring.cpp
>> ===================================================================
>> --- lib/CodeGen/StackColoring.cpp (revision 0)
>> +++ lib/CodeGen/StackColoring.cpp (revision 0)
>> @@ -0,0 +1,593 @@
>> +//===-- StackColoring.cpp --------------------------------------------
>> ----===//
>> +//
>> +//                     The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open Source
>> +// License. See LICENSE.TXT for details.
>> +//
>>
>> +//===----------------------------------------------------------------------===//
>> +//
>> +// This pass implements the stack-coloring optimization that looks for
>> +// lifetime markers machine instructions (LIFESTART_BEGIN and
>> LIFESTART_END),
>> +// which represent the possible lifetime of stack slots. It attempts to
>> +// merge disjoint stack slots and reduce the used stack space.
>> +// NOTE: This pass is not StackSlotColoring, which optimizes spill slots.
>>
>> Can you add some notes here about the future work? Specifically as it
>> relates to the other pass.
>>
>> Also, would it be possible to rename one or the other pass to make the
>> differences between them more obvious to the reader, and not require this
>> note? (clearly as a follow-up or preparatory commit)
>>
>> +namespace {
>> +class StackColoring : public MachineFunctionPass {
>>
>> Doxygen comments here and throughout please.
>>
>> +  MachineFrameInfo *MFI;
>> +  MachineFunction *MF;
>> +
>> +  typedef DenseMap<MachineBasicBlock*, BitVector> SlotsPerBlock;
>> +
>> +  // Holds a list of active slots per basic block.
>> +  SlotsPerBlock BEGIN;
>> +  SlotsPerBlock END;
>> +  SlotsPerBlock LIVE_IN;
>> +  SlotsPerBlock LIVE_OUT;
>>
>> These should definitely have very explanatory comments. Also, as this is
>> new code, please follow the coding convention for naming and make these
>> CamelCase.
>>
>> I'm surprised at having four separate dense maps here. What's the
>> rationale behind that? My (admittedly naive, so your explanation may be all
>> that I need) base line assumption would be that this would either call for
>> a single map from the block pointer to a struct of four BitVectors, or that
>> a better solution would be to put both a flag (indicating begin, end,
>> live_in, and live_out) and the block pointer in the key and have a single
>> map. The former seems most appropriate if the expectation is that
>> essentially every block will have all four of these, and that they are all
>> four accessed equally frequently. The latter would seem best if often not
>> all four of these are present for any particular block. What you have only
>> really makes sense to me if we always have all four, but one of these is
>> accessed *much* more often than the others, and thus we want greater
>> locality for that densemap? Anyways, a comment about this would help a lot
>> I suspect, regardless of the final representation.
>>
>> Also, how large are these bitvectors? I worry a bit about how happily
>> DenseMap will copy the values.
>>
>> +
>> +  // Mapping between basic blocks and a serial number.
>> +  DenseMap<MachineBasicBlock*, int> BasicBlocks;
>> +  SmallVector<MachineBasicBlock*, 8> BasicBlockNumbering;
>> +
>> +  // The intervals to describe the active ranges for each slot.
>> +  SmallVector<LiveInterval*, 16> Intervals;
>> +  VNInfo::Allocator VNInfoAllocator;
>> +  SlotIndexes* Indexes;
>> +
>> +  // The list of lifetime markers found.
>> +  SmallVector<MachineInstr*, 8> Markers;
>> +
>> +  struct SlotSizeSorter {
>>
>> It's not at all clear to me how this sorter is used... doxygen comments
>> will likely help here.
>>
>> +    MachineFrameInfo *MFI;
>> +    SlotSizeSorter(MachineFrameInfo *M) : MFI(M) { }
>>
>> Prevailing style uses the exact same name as the member for the
>> initializing parameter.
>>
>> +    bool operator()(int LHS, int RHS) {
>> +      if (LHS == -1) return 0;
>> +      if (RHS == -1) return 1;
>>
>> 0 and 1? How about false and true?
>>
>> Is this still a SWO? Should these just be asserts?
>>
>> +      size_t LSize = MFI->getObjectSize(LHS);
>> +      size_t RSize = MFI->getObjectSize(RHS);
>> +      return LSize > RSize;
>>
>> Why the variables?
>>
>> +  }
>> +};
>>
>> The indent for these two curlies is wrong.
>>
>> +private:
>> +  /// Debug.
>> +  void dump();
>> +
>> +  /// Removes all of the lifetime marker instructions from the function.
>>
>> How do you feel about putting all of the doxygen comments at the
>> implementation site since this isn't part of the interface, and just a
>> private helper function?
>>
>> +  bool RemoveAllMarkers();
>>
>> Please use the new convention (camelCase) for method names throughout.
>>
>> +
>> +  /// Scan the machine function and find all of the lifetime markers.
>> +  /// Record the findings in the BEGIN and END vectors.
>>
>> We don't have BEGIN and END vectors, we have dense maps of bitvectors?
>> Maybe this just needs more clarification in the datastructure represented
>> by those vectors.
>>
>> +  /// returns the number of markers found.
>>
>> Use doxygen \returns?
>>
>> +  unsigned CollectMarkers(unsigned NumSlot);
>> +
>> +  /// Perform the dataflow calculation and calculate the lifetime for
>> each of
>> +  /// the slots, based on the BEGIN/END vectors. Set the LIVE_IN and
>> LIVE_OUT
>> +  /// vectors that represent which stack slots are live coming in and out
>> +  // blocks.
>> +  void CalculateLocalLiveness();
>> +
>> +  /// Construct the LiveIntervals for the slots.
>> +  void CalculateLiveIntervals(unsigned NumSlots);
>> +
>> +  /// Go over the machine function and change instructions which use
>> stack
>> +  /// slots to use the joint slots.
>> +  void RemapInstructions(DenseMap<int, int> &SlotRemap);
>> +
>> +  /// Map entries which point to other entries to their destination.
>> +  ///   A->B->C becomes A->C.
>> +   void ExpungeSlotMap(DenseMap<int, int> &SlotRemap, unsigned NumSlots);
>> +};
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120905/9efdd82e/attachment.html>