[PATCH] Add a fence elimination pass

Robin Morisset morisset at google.com
Thu Oct 16 11:20:27 PDT 2014


Thanks for the review !

Enabling it for ARM/x86 is really easy: it is just a cut-&-paste of the code added by this patch for the Power backend, adapting the arguments to the pass constructor to give it the right fences. I did not do it in this patch to avoid having to change it all if the interface to the pass changes during review (which is looking likely).

I will try to add a bigger example, it is just time consuming and I wanted to get feedback as soon as possible.

For the interaction with dot, it would probably be conceptually easy but require a fair bit of plumbing, I will look at how the -view-isel-dags and friends options work.

The main difficulty with profiling this patch is that the pass will basically do nothing on code without atomics (well, break critical edges followed by block frequency info, followed by going through every instruction and testing them for seeing if they are fences). So I would need a codebase that is at the same time large, full of atomics, and with an easy to hack build system so that I could cross-compile it for power.... So I can test a few small testcases, but I don't see how to do large-scale profiling. If the codebase has no atomic, the cost of this pass is completely dominated by its requirements.

Answers to the inline comments below, I will update this patch to apply your suggestions soon.

================
Comment at: include/llvm/Transforms/Scalar.h:44
@@ -30,1 +43,3 @@
+        std::function<bool(const Instruction&)> isWeakerFence
+            = [](const Instruction &I){return false;});
 
----------------
jfb wrote:
> I'm not a fan of default arguments, I think you shouldn't have them.
> 
> Can you add a description of what stronger/weaker mean, as well as FenceArgs?
> 
> `std::function` isn't in common use in the code base, can you see what similar functions use instead?
> 
> Overall you're trying to create a partial ordering for target-specific intrinsics for fences, and then manually adding this pass many times to a backend. Could you instead have the backend specify the ordering, and add the pass only once? The pass would handle each fence type that it's made aware of, in the proper strong->weak ordering.
OK for removing the default argument.

For the ordering, I will try to see if having the backend pass an ArrayRef<> of the different fences, sorted by strength works. The main weakness is that it would make it a total ordering which is fine for most architectures but may fail for some (only Alpha which is not supported by LLVM out of my head, but I don't know every architecture out there). How could the backend cleanly specify a partial ordering ?

================
Comment at: lib/Target/PowerPC/PPCTargetMachine.cpp:20
@@ -18,1 +19,3 @@
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/IntrinsicInst.h"
 #include "llvm/MC/MCStreamer.h"
----------------
jfb wrote:
> Keep these sorted.
OK

================
Comment at: lib/Transforms/Scalar/FencesPRE.cpp:52
@@ +51,3 @@
+
+STATISTIC(NumFencesDetected, "Number of fences pre-FencesPRE");
+STATISTIC(NumFencesInserted, "Number of fences added by FencesPRE");
----------------
jfb wrote:
> "before FencesPRE" is less odd. Pre-PRE sounds weird :)
OK

================
Comment at: lib/Transforms/Scalar/FencesPRE.cpp:58
@@ +57,3 @@
+  class FencesPRE : public FunctionPass {
+    public:
+    static char ID;
----------------
jfb wrote:
> `clang-format` the entire file.
OK

================
Comment at: lib/Transforms/Scalar/FencesPRE.cpp:94
@@ +93,3 @@
+
+    typedef struct FlowGraphNode {
+      Instruction * I;
----------------
jfb wrote:
> You don't need to `typedef struct A {} A;` in C++, just `struct A {};` is enough.
Thanks, I didn't know that.

================
Comment at: lib/Transforms/Scalar/FencesPRE.cpp:112
@@ +111,3 @@
+      // is not in the FlowGraph map (since the non-duplicate already is).
+      std::unique_ptr<struct FlowGraphNode> DupNode;
+      FlowGraphNode() : I(nullptr), FenceBeforeIt(nullptr), Excess(0),
----------------
jfb wrote:
> You don't need `struct` here (C versus C++).
Same

================
Comment at: lib/Transforms/Scalar/FencesPRE.cpp:115
@@ +114,3 @@
+                        Height(0), Mark(false), Succs(), Next(Succs.begin()),
+                        DupNode() {}
+      FlowGraphNode(Instruction * Inst) : I(Inst), FenceBeforeIt(nullptr),
----------------
jfb wrote:
> You can use a delegating constructor to ` FlowGraphNode(Instruction * Inst)`, though it doesn't look like this constructor is used, so you could mark it as `= delete` instead.
I will look into it, I think I had to add this constructor to silence a warning, it should indeed be unused.

================
Comment at: lib/Transforms/Scalar/FencesPRE.cpp:170
@@ +169,3 @@
+    if (!FenceIntrinsicID)
+      return false;
+    BFI = &getAnalysis<BlockFrequencyInfo>();
----------------
jfb wrote:
> Does this ever happen? It seems like it's an error and should be an `assert`.
OK.

================
Comment at: lib/Transforms/Scalar/FencesPRE.cpp:179
@@ +178,3 @@
+    SmallVector<Edge, 1> Cut;
+    SmallPtrSet<Instruction*, 1> Fences;
+
----------------
jfb wrote:
> Why 1 in all cases? That seems pretty low, especially since this is all stack allocated  it's pretty cheap to start with 4 or 8, though real-world numbers are better.
My reasoning is that most function will have no fences, so 0 element for these, or just be a tiny helper with only one atomic access. But I can easily change it to 4 or 8 as it is so cheap.

================
Comment at: lib/Transforms/Scalar/FencesPRE.cpp:209
@@ +208,3 @@
+
+    for (auto NPair : Cut) {
+      assert(NPair.first->I->getParent() == NPair.second->I->getParent());
----------------
jfb wrote:
> What is this loop doing? Putting it in its own function could make things easier to understand.
OK, I will try to put it into its own function, I hoped the comments inside would be enough.

================
Comment at: lib/Transforms/Scalar/FencesPRE.cpp:427
@@ +426,3 @@
+void FencesPRE::relabel(Flow &Flow, FlowGraphNode &U) const {
+  auto minHeight = UINT32_MAX;
+  for (auto &VWithCap : U.Succs) {
----------------
jfb wrote:
> I'd avoid the macros here and in other places. You have a `typedef` for the types, so it would be better to use `std::numeric_limits`.
OK (sorry I forgot this, I remember you suggested it already).

http://reviews.llvm.org/D5758






More information about the llvm-commits mailing list