<div><br></div><div><br><div class="gmail_quote"><div dir="ltr">Le mar. 31 juil. 2018 à 08:18, Amara Emerson <<a href="mailto:aemerson@apple.com">aemerson@apple.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space">Hi Quentin,<br><div><br><blockquote type="cite"><div>On Jul 31, 2018, at 12:04 AM, Quentin Colombet <<a href="mailto:quentin.colombet@gmail.com" target="_blank">quentin.colombet@gmail.com</a>> wrote:</div><br class="m_7339017388512625949Apple-interchange-newline"><div><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">Hi Amara,</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">Thanks for sharing the plan going forward.</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">Inlined a couple of comments.</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">2018-07-30 7:01 GMT-07:00 Amara Emerson via llvm-dev <</span><a href="mailto:llvm-dev@lists.llvm.org" style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank">llvm-dev@lists.llvm.org</a><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">>:</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><blockquote type="cite" style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">Hi all,</blockquote></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><blockquote type="cite" style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br><br>Over the past few months we’ve been doing work on the foundations for the<br>next stages of GlobalISel development. In terms of changes from this time<br>last year, the IR translator, the legalizer, and instruction selector have<br>seen moderate to major changes. The most significant of these was the change<br>to the legalizer API, allowing targets to use predicates to express<br>legality, which gives more precise control over what forms of instructions<br>are legal, and how to legalize them. This was necessary to implement support<br>for the new extending loads and truncating stores, but also results in more<br>concise and elegant expressions of legality for each target. For example,<br>you can now apple a single definition to apply to multiples opcodes (G_ADD,<br>G_SUB, G_MUL etc).<br><br>The IR translator has been modified to split aggregates rather than handling<br>them as one single large scalar. This change fixed some bugs and was<br>necessary in order handle big endian code correctly in future.<br><br>The tablegen instruction selector also saw significant improvements in<br>performance, helping to keep overall compile time regression vs fastisel to<br>be <5% geomean on CTMark. There are still a few outliers like sqlite3 which<br>has a significant regression compared to FastISel, but most of the other<br>benchmarks show little difference or even improvement.<br><br>The tablegen importer has had improvements made to it, so that we can import<br>more SelectionDAG selection rules. For example, currently on AArch64 we have<br>about 40% of the rules being successfully imported.<br><br>New additions from last year include the beginnings of a new combiner,<br>although there’s still significant work to be done here in terms of the<br>final design. The combiner will become a critical part of the pipeline in<br>order to begin improving runtime performance.<br><br></blockquote></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><blockquote type="cite" style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">High levels goals</blockquote></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><blockquote type="cite" style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br><br>Going forward, we plan to improve GlobalISel in a number of key areas to<br>achieve the following targets:<br>* Keeping compile time under control, ideally within 5% of FastISel, and<br>when optimizations are enabled to maintain a compile time advantage of<br>SelectionDAG.<br>* Begin improving runtime performance by adding the most important<br>optimizations required to be competitive at -Os. We will be targeting and<br>measuring AArch64 for this goal but will endeavor to implement as many<br>optimizations as possible in generic code to benefit other targets.<br>* Improving overall stability and test coverage. Maintaining a high level<br>of code quality and minimizing regressions in correctness and performance<br>will be a significant challenge.<br>* Ensure that the overall design meets the needs of general targets, not<br>being overly tuned to a specific implementation.<br><br></blockquote></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><blockquote type="cite" style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">Design work planned</blockquote></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><blockquote type="cite" style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br><br>These are some design changes coming in the near to medium term future:<br><br>* The G_MERGE and G_UNMERGE opcodes will be split into separate opcodes to<br>handle different use cases. At the moment the opcode is too powerful,<br>resulting in overly complex handling in places like the legalizer. G_MERGE<br>will be split so that it only handles merging of scalars into one larger<br>scalar. For other cases like merging scalars into a vector we will create a<br>new G_BUILD_VECTOR opcode, with a new counterpart opcode for doing the<br>opposite. For the current vector + vector case a new G_CONCAT_VECTOR will be<br>introduced. With these changes it should simplify implementations for all<br>targets.<br><br>* Constant representation at the MI level needs some investigation. We<br>currently represent constants as generic instructions, with each instance of<br>a constant being largely independent of each other, being stored in the<br>entry block except for a few places in IR translation where we emit at the<br>point of use. As a result we run a localizer pass in an effort to reduce the<br>live ranges of the constants (and the consequent spilling), using some<br>heuristics to decide where to sink the constant definitions to.<br><br>Since we don’t do any real caching of MI constants, multiple G_CONSTANT<br>definitions can exist for the same constant. This can also result in a lot<br>of redundant constants being created, especially for things like address<br>computation. Reducing the number of constants can help reduce compile time<br>and memory usage. Given this situation, one possible approach is to encode<br>constants into the operands of the users, rather than have dedicated machine<br>instructions. At instruction selection time the constant can then be<br>materialized into a register or encoded as an immediate. Further<br>investigation is needed to find the right way forward here.<br></blockquote></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><blockquote type="cite" style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"></blockquote><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">The initial design was to not have constant in machine operands. The</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">main rational is materializing a constant may be expensive, so we</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">better not sprinkle it around.</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">Now, in practice, this indeed means that we need to keep a table of</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">all the constants created so that we don't start to duplicate them,</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">which we fail to do. That should be easy to fix that just a map</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">virtual register + type to constant that could be kept at the function</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">level (or even module level). Better yet, this could be handled by the</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">IRBuilder. E.g., when instantiating a IR builder, it could scan the</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">function to see which constants are already there and build this</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">mapping and then, create only the constants that are missing.</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">Moreover, another the advantage of this model is that optimizations</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">don't have to deal with two variants of the same instruction (ADDr and</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">ADDi), same for patterns. Alternatively, if we don't change the</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">opcode, but only the MachineOperands, then every optimization has to</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">deal with two different kind of opcodes. Which is bad IMHO.</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">Also, this design was meant to absorb what the constant hoisting pass</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">does on the IR, so that we can kill that pass while having a better</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">cost model.</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">Finally, regarding the localizer pass, this was meant as a workaround</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">for the fast regalloc problem and should be killed asap. In</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">particular, the main motivation was to avoid code size bloat, but</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">AFAIR during our last measurements with Ahmed, we only saved a few %</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">on a handful of benchmarks, so maybe we can just kill it.</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"></div></blockquote>All valid points, however we're now seeing more and more constants created especially as part of address computation, e.g. GEP offsets</div></div></blockquote><div dir="auto"><br></div><div dir="auto">I can see that, but if constants are not duplicated (which I believe is easy to do) most of the problem is fixed isn’t it?</div><div dir="auto">What I am saying is we shouldn’t compromise on weakening the representation by having mixed types of other options work.</div><div dir="auto"><br></div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><div>. Without a solution to the regalloc problem, code size and compile time is taking a hit. IMO a few % is significant enough to warrant considerable effort. For example, a quick constant deduplication experiment I did saved around 1.5% in overall compile time alone.</div><div><br></div><div>If the fast regalloc issue can be fixed without a significant compile time impact then I agree it sounds like the best approach combined with early deduplication. It’s definitely something we’ll look into.</div></div></blockquote><div dir="auto"><br></div><div dir="auto">One thing that we should consider is kill fast regalloc and use the basic one for fast compilation. I don’t know if the time budget would fit but if it does, we kill a redundant piece of code (why do we have so many allocator) while improving the generated code.</div><div dir="auto"><br></div><div dir="auto">Cheers,</div><div dir="auto">Q</div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><div><br><blockquote type="cite"><div></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><blockquote type="cite" style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br>* For optimizations to be supported, the combiner will become a crucial<br>part of the GISel pipeline. We have already done some preliminary work in a<br>generic combiner, which will be used to eventually support combines of<br>extloads/truncstores. We’ve had discussions on and off list about what we<br>need from the new combiner. The summary is that we want the combiner to be<br>flexible for each target to select from a library of combines, being as<br>efficient as possible. The expression of the combines are currently written<br>in C++, but one piece of investigation work we might do is to prototype<br>using the same tablegen driven instruction selector code to match<br>declarative combine patterns written in tablegen. Regardless, we will need<br>to support the custom C++ use case.<br><br>* CSE throughout the pipeline. From a theoretical perspective, having a<br>self contained CSE pass that operates as a single phase in the pipeline is<br>attractive for the simplicity and elegance. However, we know empirically<br>that this is expensive in compile time. Not only does the CSE pass itself<br>take a non-negligible time to run, but having it as a late pass can result<br>in the non-CSE’d code from the IRTranslator onwards surviving for a long<br>time, taking up time in analysis at each stage of compilation. We believe<br>running a light weight CSE early is a win. SelectionDAG currently does CSE<br>by default when building the DAG, and this is something we could explore as<br>part of a custom IRBuilder.<br></blockquote><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">I have been pushing for having the IRBuilder being smarter. Having</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">this doing CSE was something I wanted we try.</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"></div></blockquote>Yes IRBuilder is one of the candidates we’ll try for this.<br><blockquote type="cite"><div></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><blockquote type="cite" style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br>* Known bits computation. Some optimizations require the knowledge of which<br>bits in a value are known to be 1 or 0, and do this by using the<br>computeKnownBits() capability for SelectionDAG nodes. We will need some way<br>of getting the same information. In an ideal scenario the replacement<br>infrastructure for this will be more efficient, as this part of the codebase<br>seems to be disproportionately responsible for pathological compile time<br>regressions.<br><br>* Load/store ordering needs some thought, as we currently don’t have a way<br>to easily check at the MI level what the ordering requirements are on a set<br>of memory operations. SelectionDAG uses the chains to ensure that they’re<br>scheduled to respect the orderings. How to achieve the same thing remains an<br>open question for GlobalISel.<br></blockquote><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">I don't get this problem.</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">GISel has a sequential IR, the order is already modeled here.</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">Dominance should give us the relative ordering, then if we want more,</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">we should do exactly what we do at the IR level (alias analysis, etc.)</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"></div></blockquote>Sorry, I didn’t elaborate this properly. The information is there, you’re right about that. The problem is that finding if a load/store A precedes another in a block potentially requires a scan of every instruction since they’re stored as ilists. What chains give as a side effect of the implementation is a way to walk through the dependent memory operations without doing a backwards scan of every instruction. A simple block level cache might work here. As you say if we want more information then something akin to MemorySSA might be useful one day for whole CFG analysis.<br><blockquote type="cite"><div><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">Cheers,</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><span style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline!important">-Quentin</span><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><blockquote type="cite" style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"></blockquote></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><blockquote type="cite" style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br>* More extensive tests that exercise multiple stages of the pipeline. One<br>advantage of using MIR with GISel is that individual passes can be easily<br>tested by feeding the exact input expected for a particular pass, and<br>checking the immediate output of the pass. However this approach can leave<br>holes in the test coverage. To help mitigate this, we will be exploring<br>writing/generating whole pipeline tests, tracking some IR through each pass<br>and checking how the MIR is mutated. We currently also have a proposed<br>change to allow usage of FileCheck as a library, not just as a stand-alone<br>tool. This would allow us to use FileCheck style checks and Improve testing<br>of currently unused code paths.<br><br><br></blockquote></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><blockquote type="cite" style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">Roadmap for enabling optimizations</blockquote></div></blockquote></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div><blockquote type="cite" style="font-family:SFMono-Regular;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><br><br>I’ve filed a few PRs that people can follow or comment on to track the<br>progress towards enabling the -Os optimization level. The rough outline is:<br><br>PR 38365 - [AArch64][GlobalISel] Never fall back on CTMark or benchmarks<br>(Darwin)<br>PR 38366 - GlobalISel: Lightweight CSE<br>PR 32561 - GlobalISel: placement of constants in the entry-block and fast<br>regalloc result in lots of reloaded constant<br>PR 38367 - GlobalISel: Implement support for obtaining known bits<br>information<br>PR 38368 - GlobalISel: Investigate an efficient way to ensure load/store<br>orderings<br><br>These, along with general design and implementation work on the combiner,<br>will then lead onto a long road of performance analysis, inevitable bug<br>fixing, and implementing more optimizations.<br><br>If anyone is interested in discussing in more detail, feel free to reach out<br>on the list, or to any of the GlobalISel developers. We’d especially like to<br>hear about any issues or concerns about porting targets to GlobalISel.<br><br>Thanks,<br>Amara<br><br>_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></blockquote></div></blockquote></div></div></blockquote></div></div>