[LLVMdev] asan coverage

Kostya Serebryany kcc at google.com
Fri Feb 21 21:13:19 PST 2014


Our users combine asan and coverage testing and they do it on thousands
machines.
(An older blog post about using asan:
http://blog.chromium.org/2012/04/fuzzing-for-security.html)
The binaries need to be shipped to virtual machines, where they will be run.
The VMs are *very* short of disk and the network bandwidth has a cost too.
We may be able to ship stripped binaries to those machine but this will
complicate the logic immensely.

Besides, zip-ed binaries are stored for several revisions every day and the
storage also costs money.
Just to give you the taste (
https://commondatastorage.googleapis.com/chromium-browser-asan/index.html):
asan-symbolized-linux-release-252010.zip 2014-02-19 14:34:24 406.35MB
asan-symbolized-linux-release-252017.zip 2014-02-19 18:22:54 406.41MB
asan-symbolized-linux-release-252025.zip 2014-02-19 21:35:49 406.35MB
asan-symbolized-linux-release-252031.zip 2014-02-20 00:44:25 406.35MB
asan-symbolized-linux-release-252160.zip 2014-02-20 06:30:16 406.34MB
asan-symbolized-linux-release-252185.zip 2014-02-20 09:21:47 408.52MB
asan-symbolized-linux-release-252188.zip 2014-02-20 12:20:05 408.52MB
asan-symbolized-linux-release-252194.zip 2014-02-20 15:01:05 408.52MB
asan-symbolized-linux-release-252218.zip 2014-02-20 18:00:42 408.54MB
asan-symbolized-linux-release-252265.zip 2014-02-20 21:00:03 408.65MB
asan-symbolized-linux-release-252272.zip 2014-02-21 00:00:40 408.66MB

--kcc


On Sat, Feb 22, 2014 at 8:58 AM, Bob Wilson <bob.wilson at apple.com> wrote:

> Why is the binary size a concern for coverage testing?
>
> On Feb 21, 2014, at 8:43 PM, Kostya Serebryany <kcc at google.com> wrote:
>
> I understand why you don't want to rely on debug info and instead produce
> your own section.
> We did this with our early version of llvm-based tsan and it was simpler
> to implement.
> But here is a data point to support my suggestion:
> chromium binary built with asan, coverage and -gline-tables-only is 1.6Gb.
> The same binary is 1.1Gb when stripped, so, the line tables require 500Mb.
> Separate line info for coverage will essentially double this amount.
> The size of binary is a serious concern for our users, please take it into
> consideration.
>
> Thanks!
> --kcc
>
>
>
> On Fri, Feb 21, 2014 at 8:28 PM, Bob Wilson <bob.wilson at apple.com> wrote:
>
>> We’re not going to use debug info at all. We’re emitting the counters in
>> the clang front-end. We just need to emit separate info to show how to map
>> those counters to source locations. Mapping to PCs and then using debug
>> info to get from the PCs to the source locations just makes things harder
>> and loses information in the process.
>>
>> On Feb 21, 2014, at 2:57 AM, Kostya Serebryany <kcc at google.com> wrote:
>>
>>
>>>
>>> We may need some additional info.
>>
>> What kind of additional info?
>>
>>
>>> I haven't put a ton of thought into
>>> this, but I'm hoping we can either (a) use debug info as is or add some
>>> extra (valid) debug info to support this, or (b) add an extra
>>> debug-info-like section to instrumented binaries with the information we
>>> need.
>>>
>>
>> I'd try this data format (binary equivalent):
>>
>> /path/to/binary/or/dso1 num_counters1
>> pc1 counter1
>> pc2 counter2
>> pc3 counter3
>> ...
>> /path/to/binary/or/dso2 num_counters2
>> pc1 counter1
>> pc2 counter2
>> pc3 counter3
>> ...
>>
>> I don't see a straightforward way to produce such data today because
>> individual Instructions do not work as labels.
>> But I think this can be supported in LLVM codegen.
>> Here is a *raw* patch with comments, just to get the idea.
>>
>>
>> Index: lib/CodeGen/CodeGenPGO.cpp
>> ===================================================================
>> --- lib/CodeGen/CodeGenPGO.cpp  (revision 201843)
>> +++ lib/CodeGen/CodeGenPGO.cpp  (working copy)
>> @@ -199,7 +199,8 @@
>>     llvm::Type *Args[] = {
>>      Int8PtrTy,                       // const char *MangledName
>>      Int32Ty,                         // uint32_t NumCounters
>> -    Int64PtrTy                       // uint64_t *Counters
>> +    Int64PtrTy,                       // uint64_t *Counters
>> +    Int64PtrTy                       // uint64_t *PCs
>>    };
>>    llvm::FunctionType *FTy =
>>      llvm::FunctionType::get(PGOBuilder.getVoidTy(), Args, false);
>> @@ -209,9 +210,10 @@
>>    llvm::Constant *MangledName =
>>      CGM.GetAddrOfConstantCString(CGM.getMangledName(GD),
>> "__llvm_pgo_name");
>>    MangledName = llvm::ConstantExpr::getBitCast(MangledName, Int8PtrTy);
>> -  PGOBuilder.CreateCall3(EmitFunc, MangledName,
>> +  PGOBuilder.CreateCall4(EmitFunc, MangledName,
>>                           PGOBuilder.getInt32(NumRegionCounters),
>> -                         PGOBuilder.CreateBitCast(RegionCounters,
>> Int64PtrTy));
>> +                         PGOBuilder.CreateBitCast(RegionCounters,
>> Int64PtrTy),
>> +                         PGOBuilder.CreateBitCast(RegionPCs,
>> Int64PtrTy));
>>  }
>>
>>  llvm::Function *CodeGenPGO::emitInitialization(CodeGenModule &CGM) {
>> @@ -769,6 +771,13 @@
>>                               llvm::GlobalVariable::PrivateLinkage,
>>                               llvm::Constant::getNullValue(CounterTy),
>>                               "__llvm_pgo_ctr");
>> +
>> +  RegionPCs =
>> +    new llvm::GlobalVariable(CGM.getModule(), CounterTy, false,
>> +                             llvm::GlobalVariable::PrivateLinkage,
>> +                             llvm::Constant::getNullValue(CounterTy),
>> +                             "__llvm_pgo_pcs");
>> +
>>  }
>>
>>  void CodeGenPGO::emitCounterIncrement(CGBuilderTy &Builder, unsigned
>> Counter) {
>> @@ -779,6 +788,21 @@
>>    llvm::Value *Count = Builder.CreateLoad(Addr, "pgocount");
>>    Count = Builder.CreateAdd(Count, Builder.getInt64(1));
>>    Builder.CreateStore(Count, Addr);
>> +  // We should put the PC of the instruction that increments
>> __llvm_pgo_ctr
>> +  // into __llvm_pgo_pcs, which will be passed to llvm_pgo_emit.
>> +  // This patch is wrong in many ways:
>> +  //   * We pass the PC of the Function instead of the PC of the
>> Instruction,
>> +  //   because the latter doesn't work like this. We'll need to support
>> +  //   Instructions as labels in LLVM codegen.
>> +  //   * We actually store the PC on each increment, while we should
>> initialize
>> +  //   this array at link time (need to refactor this code a bit).
>> +  //
>> +  Builder.CreateStore(
>> +      Builder.CreatePointerCast(
>> +          cast<llvm::Instruction>(Count)->getParent()->getParent(),
>> +          Builder.getInt64Ty()  // FIXME: use a better type
>> +          ),
>> +      Builder.CreateConstInBoundsGEP2_64(RegionPCs, 0, Counter));
>>  }
>>
>> Index: lib/CodeGen/CodeGenPGO.h
>> ===================================================================
>> --- lib/CodeGen/CodeGenPGO.h    (revision 201843)
>> +++ lib/CodeGen/CodeGenPGO.h    (working copy)
>> @@ -59,6 +59,7 @@
>>
>>    unsigned NumRegionCounters;
>>    llvm::GlobalVariable *RegionCounters;
>> +  llvm::GlobalVariable *RegionPCs;
>>    llvm::DenseMap<const Stmt*, unsigned> *RegionCounterMap;
>>    llvm::DenseMap<const Stmt*, uint64_t> *StmtCountMap;
>>    std::vector<uint64_t> *RegionCounts;
>> @@ -66,8 +67,9 @@
>>
>>  public:
>>    CodeGenPGO(CodeGenModule &CGM)
>> -    : CGM(CGM), NumRegionCounters(0), RegionCounters(0),
>> RegionCounterMap(0),
>> -      StmtCountMap(0), RegionCounts(0), CurrentRegionCount(0) {}
>> +      : CGM(CGM), NumRegionCounters(0), RegionCounters(0), RegionPCs(0),
>> +        RegionCounterMap(0), StmtCountMap(0), RegionCounts(0),
>> +        CurrentRegionCount(0) {}
>>    ~CodeGenPGO() {}
>>
>>    /// Whether or not we have PGO region data for the current function.
>> This is
>>
>>
>>
>>
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140222/68bb07f8/attachment.html>


More information about the llvm-dev mailing list