[LLVMdev] [RFC] Embedding command line options in bitcode (PR21471)

Fri Nov 14 15:24:30 PST 2014

> On Nov 14, 2014, at 2:57 PM, Chris Bieneman <cbieneman at apple.com> wrote:
> 
> There are parts of this proposal that I really like, and there are others that I think are actually at opposition to the work we’re trying to do with cl::opt.
> 
>> On Nov 14, 2014, at 2:44 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com <mailto:dexonsmith at apple.com>> wrote:
>> 
>> +chrisb
>> 
>>> On 2014-Nov-13, at 16:33, Akira Hatanaka <ahatanak at gmail.com <mailto:ahatanak at gmail.com>> wrote:
>>> 
>>> I'm working on fixing PR21471, which is about embedding codegen command line options into the bitcode as function or module-level attributes so that they don't get ignored when doing LTO.
>>> 
>>> http://llvm.org/bugs/show_bug.cgi?id=21471 <http://llvm.org/bugs/show_bug.cgi?id=21471>
>>> 
>>> I have an initial patch (attached to this email) which enables clang/llvm to recognize one command line option, write it to the IR, and read it out in a backend pass. I'm looking to get feedback from the community on whether I'm headed in the right direction or whether there are alternate ideas before I go all the way on fixing the PR. Specifically, I'd like to know the answers to the following questions:
>>> 
>>> 1. How do we make sure we continue to be able to use the command line options we've been using for llc and other tools?
> 
> In discussions about the new cl::opt API I believe the general idea was that most of the options expressed using cl::opt are actually only relevant as debug options, so I think one big part of this work is really going to be identifying a subset of the current options which are actually relevant to expose in the IR.
Ideally this will be a small set as it could get expensive to represent otherwise.  I’ll get to why later.

So with LTO we already have issues where modules have different metadata.  I’m not sure, but we might also have issues with weak functions and different attribute sets.

We need to work out whether the option a function was compiled with is interesting enough to be stored on that function, even if its the default.  For example, lets say I tag a function with ‘loop-unroll-threshold=100’, I would expect that to override the one given on the command line, but perhaps others would want the command line to always win.  

Then there’s the issue of whether a default is interesting or not.  For example, the default loop unroll threshold is 150.  We probably want to tag all functions with that threshold as how to we know that the default will stay the same in a later LLVM.  Or you could save the fact that something is a default.  So for example, store ‘unroll-threadhold=default(150)’ as then you can either:
- Always choose 150 for this function, because thats what it was tagged with
- Always choose the default, so if we change ToT to default 200, you choose 200.

Now if you have to store all ‘interesting’ options, the set of things you store could start to get quite large quite quickly.
> 
>>> 2. How to handle cases where two functions in a module have different sets of command line options?
I would store them in the attributes set.  I don’t think there’s anywhere else you can do this right now.  Attributes or metadata.  I would say attributes because then you can make them “option”=“value” (i.e., StringAttribute) and you don’t need to worry about anyone knowing about the names or not.
> 
> Today I don’t believe we have this ability.
> 
>>> 3. Where should the command line options or module/function attributes be stored once they are read out from the IR?
> 
> My suggestion would be the OptionStore that I proposed here: http://reviews.llvm.org/D6207 <http://reviews.llvm.org/D6207>I don’t think the OptionStore will work either, unless you put an OptionStore on the Function which I don’t think will be feasible.

Instead I think you need either the function pass manager (perhaps LPPassManager, BB PM, etc too) to parse the options from the function once before it runs all the passes.  Or, for each pass with an option you care about, move the storage of that option itself to the pass.  This is similar to what Chris is doing with static initializers.  So I can imagine pass initialization looking something like

class LoopUnrollPass {
  unsigned Threshold = ...
doIniit… {
  if (cl opt has value)
    Threshold = ‘cl opt value’
  if (function.getattribute(‘unroll-threshold’)
    Threadhold = function.getAttributeAsInteger(‘unroll-threshold’)
}

For performance reasons, I would actually add a new type of Attribute for a string key and integer value as then you don’t actually need to do any parsing in the new function.getAttributeAsInteger function I introduced here.

Thanks,
Pete
> 
>>> 
>>> The short description of the approach I took in my patch is that command line options that are important to codegen are collected by cl::ParseCommandLineOptions, written to the bitcode as function or module attributes, and read out directly by the optimization passes that need them. cl::opt options are replaced with CodeGenOpt options (subclass of cl::opt), which are needed only to parse the command line and provide the default value when the corresponding options are not in the bitcode.
>> 
>> I like this approach, since it means the frontend doesn't have to understand
>> options in order to pass them on to the backend.
>> 
>> The static variables should be straightforward to migrate to an LLVMContext
>> once ParseCommandLineOptions stores things there instead of in globals.
> 
> I also think that the OptionStore in conjunction with the OptionRegistry (rather than any of the cl APIs) should have all the parsing code. In fact, you shouldn’t have to call ParseCommandLineOptions, we could make encoding and decoding the stored options associated with a module part of loading and storing the module.
> 
>> 
>>> diff --git a/lib/CodeGen/CodeGenOption.cpp b/lib/CodeGen/CodeGenOption.cpp
>>> new file mode 100644
>>> index 0000000..2d74c2f
>>> --- /dev/null
>>> +++ b/lib/CodeGen/CodeGenOption.cpp
>>> @@ -0,0 +1,59 @@
>>> +//===- CodeGen/CodeGenOptions.cpp - Code-gen option.           --*- C++ -*-===//
>>> +//
>>> +//                     The LLVM Compiler Infrastructure
>>> +//
>>> +// This file is distributed under the University of Illinois Open Source
>>> +// License. See LICENSE.TXT for details.
>>> +//
>>> +//===----------------------------------------------------------------------===//
>>> +//
>>> +//===----------------------------------------------------------------------===//
>>> +
>>> +#include "llvm/CodeGen/CodeGenOption.h"
>>> +#include "llvm/IR/Attributes.h"
>>> +#include "llvm/IR/Module.h"
>>> +
>>> +using namespace llvm;
>>> +
>>> +static std::map<std::string, bool> FunctionBoolAttrs;
>>> +static std::map<std::string, bool> ModuleBoolAttrs;
>>> +
>> 
>> @Chris, should these be using ManagedStatic?
> 
> I’d much rather they just weren’t static at all. Using globals to store state that inherently isn’t global just feels wrong.
> 
> -Chris
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141114/6a31bb7f/attachment.html>