[PATCH] D43093: [FastISel] Sink local value materializations to first use
Hans Wennborg via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Feb 14 05:39:50 PST 2018
hans added a comment.
This looks pretty good to me, but I don't have a lot of experience with fastisel.
One potentially naive question though: materializing at first-use sounds reasonable, in fact for instruction selection that's trying to be fast and not optimize, I'd expect materialization to happen at every use basically, but here it seems something is inserting all the materializations early so they can be shared by all uses, which they now trivially dominate. Would it be crazy to just materialize "as needed" even if it creates some redundancies? Would that make fastisel both faster and simpler?
> I have not measured the impact on -O0 compile time yet, and would appreciate any ideas for how to do that.
Since it's a codegen change, I think it doesn't require any fancy input, but a large C program such as the amalgamated sqlite might do well. I tried it, and it shows your version a tiny bit slower, but not much:
$ perf stat -r50 bin/clang -O0 -c /tmp/sqlite-amalgamation-3220000/sqlite3.c
Performance counter stats for 'bin/clang -O0 -c /tmp/sqlite-amalgamation-3220000/sqlite3.c' (50 runs):
2169.455035 task-clock:u (msec) # 0.998 CPUs utilized ( +- 0.35% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
25,921 page-faults:u # 0.012 M/sec ( +- 0.01% )
6,340,741,096 cycles:u # 2.923 GHz ( +- 0.31% )
8,151,193,597 instructions:u # 1.29 insn per cycle ( +- 0.02% )
1,845,160,382 branches:u # 850.518 M/sec ( +- 0.02% )
42,297,415 branch-misses:u # 2.29% of all branches ( +- 0.23% )
2.173604157 seconds time elapsed ( +- 0.35% )
$ perf stat -r50 bin/clang.rnk -O0 -c /tmp/sqlite-amalgamation-3220000/sqlite3.c
Performance counter stats for 'bin/clang.rnk -O0 -c /tmp/sqlite-amalgamation-3220000/sqlite3.c' (50 runs):
2181.332026 task-clock:u (msec) # 0.998 CPUs utilized ( +- 0.41% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
25,912 page-faults:u # 0.012 M/sec ( +- 0.01% )
6,401,856,357 cycles:u # 2.935 GHz ( +- 0.39% )
8,176,731,823 instructions:u # 1.28 insn per cycle ( +- 0.03% )
1,851,112,997 branches:u # 848.616 M/sec ( +- 0.02% )
42,582,767 branch-misses:u # 2.30% of all branches ( +- 0.19% )
2.185072378 seconds time elapsed ( +- 0.41% )
================
Comment at: llvm/lib/CodeGen/SelectionDAG/FastISel.cpp:243
+
+ // We can DCE this instruction if there are no uses and it wasn't a
+ // materialized for a successor PHI node.
----------------
ultra nit: extra "a" at the end
https://reviews.llvm.org/D43093
More information about the llvm-commits
mailing list