[PATCH] D43093: [FastISel] Sink local value materializations to first use

Wed Feb 14 05:39:50 PST 2018

hans added a comment.

This looks pretty good to me, but I don't have a lot of experience with fastisel.

One potentially naive question though: materializing at first-use sounds reasonable, in fact for instruction selection that's trying to be fast and not optimize, I'd expect materialization to happen at every use basically, but here it seems something is inserting all the materializations early so they can be shared by all uses, which they now trivially dominate. Would it be crazy to just materialize "as needed" even if it creates some redundancies? Would that make fastisel both faster and simpler?

> I have not measured the impact on -O0 compile time yet, and would appreciate any ideas for how to do that.

Since it's a codegen change, I think it doesn't require any fancy input, but a large C program such as the amalgamated sqlite might do well. I tried it, and it shows your version a tiny bit slower, but not much:

  $ perf stat -r50 bin/clang -O0 -c /tmp/sqlite-amalgamation-3220000/sqlite3.c 

   Performance counter stats for 'bin/clang -O0 -c /tmp/sqlite-amalgamation-3220000/sqlite3.c' (50 runs):

         2169.455035      task-clock:u (msec)       #    0.998 CPUs utilized            ( +-  0.35% )
                   0      context-switches:u        #    0.000 K/sec                  
                   0      cpu-migrations:u          #    0.000 K/sec                  
              25,921      page-faults:u             #    0.012 M/sec                    ( +-  0.01% )
       6,340,741,096      cycles:u                  #    2.923 GHz                      ( +-  0.31% )
       8,151,193,597      instructions:u            #    1.29  insn per cycle           ( +-  0.02% )
       1,845,160,382      branches:u                #  850.518 M/sec                    ( +-  0.02% )
          42,297,415      branch-misses:u           #    2.29% of all branches          ( +-  0.23% )

         2.173604157 seconds time elapsed                                          ( +-  0.35% )

  $ perf stat -r50 bin/clang.rnk -O0 -c /tmp/sqlite-amalgamation-3220000/sqlite3.c 

   Performance counter stats for 'bin/clang.rnk -O0 -c /tmp/sqlite-amalgamation-3220000/sqlite3.c' (50 runs):

         2181.332026      task-clock:u (msec)       #    0.998 CPUs utilized            ( +-  0.41% )
                   0      context-switches:u        #    0.000 K/sec                  
                   0      cpu-migrations:u          #    0.000 K/sec                  
              25,912      page-faults:u             #    0.012 M/sec                    ( +-  0.01% )
       6,401,856,357      cycles:u                  #    2.935 GHz                      ( +-  0.39% )
       8,176,731,823      instructions:u            #    1.28  insn per cycle           ( +-  0.03% )
       1,851,112,997      branches:u                #  848.616 M/sec                    ( +-  0.02% )
          42,582,767      branch-misses:u           #    2.30% of all branches          ( +-  0.19% )

         2.185072378 seconds time elapsed                                          ( +-  0.41% )

================
Comment at: llvm/lib/CodeGen/SelectionDAG/FastISel.cpp:243
+
+  // We can DCE this instruction if there are no uses and it wasn't a
+  // materialized for a successor PHI node.
----------------
ultra nit: extra "a" at the end

https://reviews.llvm.org/D43093