[Lldb-commits] [lldb] r281717 - [RenderScript] Support tracking and dumping reduction kernels

Wed Sep 21 03:50:29 PDT 2016

Hi Zachary

Thanks for the review. My comments are inline (I've cut down some of the 
diff context for brevity)

On 16/09/16 15:57, Zachary Turner wrote:
[snip]
>     -#define MAXLINESTR_(x) "%" STRINGIFY(x) "s"
>     -#define MAXLINESTR MAXLINESTR_(MAXLINE)
>     +bool RSModuleDescriptor::ParsePragmaCount(llvm::StringRef *lines,
>     +                                          size_t n_lines) {
>     +  // Skip the pragma prototype line
>     +  ++lines;
>     +  for (; n_lines--; ++lines) {
>     +    const auto kv_pair = lines->split(" - ");
>     +    m_pragmas[kv_pair.first.trim().str()] =
>     kv_pair.second.trim().str();
>     +  }
>     +  return true;
>     +}
>     +
>     +bool RSModuleDescriptor::ParseExportReduceCount(llvm::StringRef *lines,
>     +                                                size_t n_lines) {
>
> This should take an ArrayRef<StringRef>.  In general this is true any
> time you're passing an array to a function via (T* items, size_t length).
>
>
>     +  // The list of reduction kernels in the `.rs.info
>     <http://rs.info>` symbol is of the form
>     +  // "signature - accumulatordatasize - reduction_name -
>     initializer_name -
>     +  // accumulator_name - combiner_name -
>     +  // outconverter_name - halter_name"
>     +  // Where a function is not explicitly named by the user, or is
>     not generated
>     +  // by the compiler, it is named "." so the
>     +  // dash separated list should always be 8 items long
>     +  Log *log = GetLogIfAllCategoriesSet(LIBLLDB_LOG_LANGUAGE);
>     +  // Skip the exportReduceCount line
>     +  ++lines;
>     +  for (; n_lines--; ++lines) {
>
> With ArrayRef this would be written as:
>
> for (StringRef line : lines.drop_front()) {
> }
>
I'm not sure what the technical argument against pointer and length vs 
ArrayRef is, but I agree these range-based for loops are pretty. I'll 
add them in a future commit (I've got some more RenderScript cleanups 
pending, and I'll test this locally first - but this looks like good 
cleanup).

[snip]
>     __attribute__((kernel))
>     +       {"exportForEachCount", eExportForEach},
>     +       // The number of generalreductions: This marked in the
>     script by `#pragma
>     +       // reduce()`
>     +       {"exportReduceCount", eExportReduce},
>     +       // Total count of all RenderScript specific `#pragmas` used
>     in the script
>     +       {"pragmaCount", ePragma},
>     +       {"objectSlotCount", eObjectSlot}}};
>
>        // parse all text lines of .rs.info <http://rs.info>
>        for (auto line = info_lines.begin(); line != info_lines.end();
>     ++line) {
>
> How about (for auto line : info_lines) {

We actually need to manually increment the line pointer to skip over the 
already-parsed lines, so the range based for loop doesn't work in this case.

>
>     -    uint32_t numDefns = 0;
>     -    if (sscanf(line->c_str(), "exportVarCount: %" PRIu32 "",
>     &numDefns) == 1) {
>     -      while (numDefns--)
>     -        m_globals.push_back(RSGlobalDescriptor(this,
>     (++line)->c_str()));
>     -    } else if (sscanf(line->c_str(), "exportForEachCount: %" PRIu32 "",
>     -                      &numDefns) == 1) {
>     -      while (numDefns--) {
>     -        uint32_t slot = 0;
>     -        name[0] = '\0';
>     -        static const char *fmt_s = "%" PRIu32 " - " MAXLINESTR;
>     -        if (sscanf((++line)->c_str(), fmt_s, &slot, name.data()) ==
>     2) {
>     -          if (name[0] != '\0')
>     -            m_kernels.push_back(RSKernelDescriptor(this,
>     name.data(), slot));
>     -        }
>     -      }
>     -    } else if (sscanf(line->c_str(), "pragmaCount: %" PRIu32 "",
>     &numDefns) ==
>     -               1) {
>     -      while (numDefns--) {
>     -        name[0] = value[0] = '\0';
>     -        static const char *fmt_s = MAXLINESTR " - " MAXLINESTR;
>     -        if (sscanf((++line)->c_str(), fmt_s, name.data(),
>     value.data()) != 0) {
>     -          if (name[0] != '\0')
>     -            m_pragmas[std::string(name.data())] = value.data();
>     -        }
>     -      }
>     -    } else {
>     -      Log *log(GetLogIfAllCategoriesSet(LIBLLDB_LOG_LANGUAGE));
>     -      if (log) {
>     +    const auto kv_pair = line->split(": ");
>     +    const auto key = kv_pair.first;
>     +    const auto val = kv_pair.second.trim();
>     +
>     +    const auto handler = rs_info_handlers.find(key);
>     +    if (handler == rs_info_handlers.end())
>     +      continue;
>
> Instead of constructing this map, you could use llvm::StringSwitch, like
> this:
> int handler = llvm::StringSwitch<int>(key)
>                         .Case("exportVarCount", eExportVar)
>                         .Case("exportForEachCount", eExportForEach)
>                         .Case("exportReduceCount", eExportReduce)
>                         .Case("pragmaCount", ePragma)
>                         .Case("objectSlotCount", eObjectSlot)
>                         .Default(-1);
>

I wasn't aware at all of `llvm::StringSwitch`. This seems like it is 
more fit for purpose than what I had. Will add.

[snip]
>     @@ -110,6 +162,7 @@ public:
>        const lldb::ModuleSP m_module;
>        std::vector<RSKernelDescriptor> m_kernels;
>        std::vector<RSGlobalDescriptor> m_globals;
>     +  std::vector<RSReductionDescriptor> m_reductions;
>        std::map<std::string, std::string> m_pragmas;
>
> You should consider changing this to llvm::StringMap, which has better
> performance characteristics than std::map

For the case of a small number of keys (which is certainly true for the 
`m_pragmas` case), I've found `std::map` to have significantly better 
performance characteristics in artificial tests when compared to 
`llvm::StringMap` (x86_64 linux gcc-6.2 with libstdc++) so I'm, not too 
keen to replace all instances of `std::map<string, T>` at this time. In 
many cases though, I think the llvm containers (particularly StringRef) 
have a number of advantages over the standard library, and I'll 
definitely start moving over where it makes sense.

I threw together a very simple testcase for these performance numbers, 
which is attached.

Best

Luke

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mapvswitch.cpp
Type: text/x-c++src
Size: 3048 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20160921/5491d282/attachment-0001.cpp>
-------------- next part --------------
import os

env = Environment(ENV=os.environ)
env['CXX'] = os.environ.get('CXX', 'c++')
env.ParseConfig('llvm-config --cxxflags --ldflags --libs')
env.Append(LIBS=['dl', 'pthread', 'dl', 'z'])
env.Append(CXXFLAGS=['-Wall', '-Wextra', '-Ofast', '-mtune=native'])

env.Program('mapvswitch', 'mapvswitch.cpp')