[llvm] r202551 - New PBQP solver, and updates to the PBQP graph.

Fri Feb 28 16:25:37 PST 2014

I can't believe you managed to review that patch (and find as many
things as you did) that quickly. :)

Thanks for checking it out!

Addressing each of your points:

> Omnomnom. Do you have any (even rough) numbers on the memory footprint
> reduction factor?

Yes - they were in the commit message immediately proceeding the line
you referenced. ;)

A caveat, before I repeat the numbers: These reduction factors are for
the memory required to hold the cost vectors/matrices, not the whole
graph structure. On any machine with a half decent number of registers
though (and eight is more than enough) these costs will dominate the
memory overhead of the graph. With that, on to the numbers...

The reduction factor for the largest graphs in SPEC2006 was 400x on
average, and 1400x in the best case. Small graphs will see smaller
reductions, but it's the large graphs you care about. To put some
absolute figures to this (from memory, since I measured it a while
back and don't have these numbers handy): h264ref contained a function
that required ~150Mb to hold cost vectors/matrices before uniquing,
but only ~110k afterwards. The latter is small enough to fit in cache
these days, and could reasonably be expected to end up there, since
the reduction algorithm spends its whole time inspecting these values.

> <clippy>It looks like you're doing manual reference counting. Have you
> considered using a general-purpose reference counting smart
> pointer?</clippy>

Yes, this is desired future work, I just didn't the bandwidth to grok
C++11 smart pointers for this commit.

>> +    bool decRef() { --refCount; return (refCount == 0); }
>
> () around return expression aren't necessary/are a bit weird.
>
> (is this all run through clang-format? Having multiple statements on a
> single line definition seems a bit uncommon/not idiomatic in the LLVM
> codebase)

This hasn't been run through clang-format yet, but probably should be
(incidentally, RefCount should probably be capitalized :)
There are a few personal style quirks in here. I'm not wedded to them,
and I'm happy to see them brought inline with LLVM conventions. I will
try to find time to do so in the near future, and you're very welcome
to change anything that seems off to you.

>> +    NodeId addConstructedNode(const NodeEntry &N) {
>> +      NodeId NId = 0;
>> +      if (!FreeNodeIds.empty()) {
>> +        NId = FreeNodeIds.back();
>> +        FreeNodeIds.pop_back();
>> +        Nodes[NId] = std::move(N);
>
>
> Moving from a const ref? Did you intend to take the 'N' parameter by value
> instead?

Absolutely. Nice catch. The switch to move semantics was done late in
the piece and it looks like I've missed a few spots.

>> -    EdgeId addConstructedEdge(const EdgeEntry &e) {
<snip>
>> +        Edges[EId] = std::move(E);
>
> More move from const ref parameter.

Yep - missed that one too.

>> +      EdgeEntry &NE = getEdge(EId);
>
> ^ could this be a const ref?

Yes, and it should be. Node/Edge references used to have constness
issues that precluded this, but that has been fixed with the graph
improvements. Thanks for catching this.

>> -    Graph(const Graph &other) {}
>> -    void operator=(const Graph &other) {}
>> +    Graph(const Graph &Other) {}
>> +    void operator=(const Graph &Other) {}
>
> Do you need these explicitly? Or could you take the implicit default? (or
> provide an explicit "= default"?)

The intent was just to make them private. In C++11 I assume the best
thing would be to delete them?

>> +        while (NId < EndNId &&
>> +               std::find(FreeNodeIds.begin(), FreeNodeIds.end(), NId) !=
>> +                 FreeNodeIds.end()) {
>> +          ++NId;
>>          }
>
> We don't usually bother with {} on single line blocks - but I know the
> codebase varies on how stringently that's applied. (& local styles do
> happen)

I generally use parens whenever the loop/conditional will stretch over
more than two lines, even if there's only one statement. This is an
extreme application of that rule though (given how trivial the
statement is). As I mentioned - I have no problem with style changes
to bring the code in line with the LLVM standards.

>> +  /// \brief Copy-assignment operator.
>> +  Vector& operator=(const Vector &V) {
>> +    // llvm::dbgs() << "Assigning to PBQP::Vector " << this
>> +    //              << " from PBQP::Vector " << &V << "\n";
>> +    delete[] Data;
>
> Breaks on self assignment (see comment later - Data should probably be a
> std::unique_ptr<Data[]> which should simplify some of this)
>

Indeed. In practice I can guarantee that the PBQP solver will never
self-assign, but it would be nice to make this robust anyway.

> If you're storing teh length anyway... you could just use a
> std::vector<Data> - you'd pay an extra int for the "capacity" but it'd
> simplify a bunch of this even further.

Out of curiosity, how would it actually simplify anything relative to
a std::unique_ptr<PBQPNum[]> ? I think all the methods would remain
more-or-less the same.

>> +  PBQPNum *Data;
>
> Should this be a std::unique_ptr<PBQPNum[]>?

Yep.

>> +  /// \brief Construct a PBQP Matrix with the given dimensions.
>> +  Matrix(unsigned Rows, unsigned Cols) :
>> +    Rows(Rows), Cols(Cols), Data(new PBQPNum[Rows * Cols]) {
>
> More raw ownership, same solution as above (std::unique_ptr or std::array)
>
Yep.

>> +    const Matrix *YXECosts = FlipEdge1 ?
>> +      new Matrix(G.getEdgeCosts(YXEId).transpose()) :
>
>
> I'd consider having a unique_ptr<Matrix> here to handle the owning case,
> something like:
>
> std::unique_ptr<Matrix> owner;
> const Matrix *YXECosts = FlipEdge1 ? (owner = MakeUnique<Matrix>(...)) :
> &G.get... ;

Indeed. Actually, since transposition is so common in PBQP, I'm
thinking about having the pool lazily create and cache transposes.
That would simplify a lot of these conditional loops, and allow better
vectorization. Until that happens though, proper owning pointers are
an improvement.

>> +      MatrixMetadata(const PBQP::Matrix& m)
>> +        : worstRow(0), worstCol(0),
>> +          unsafeRows(new bool[m.getRows() - 1]()),
>
>
> *twitch* really? a bool array? Do you need them to be individually
> referenced bools, or is a bitfieldOK? (in any case, again -
> std::unique_ptr<bool[]>, std::bitfield, etc, etc... )

I haven't profiled it yet, but one of the most common operations in
the RegAllocSolver is adding/subtracting an array of bools from an
array of ints (the bools tell you which regs are potentially occupied
for a given neighbor). A bitset would save memory, but may not
vectorize as nicely. Making this decision will require profiling that
I haven't done yet. Switching to unique_ptr is a good idea though.

Thanks again for the review. Once this patch goes back in (having been
temporarily reverted to pacify the bots), please feel free to make any
of the changes discussed above, otherwise I'll try to find time when I
can.

Cheers,
Lang.