<div class="__aliyun_email_body_block"><div  style="clear:both;"><span  style="font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;">Thanks for your reply. It is meaning and inspiring! It looks like that I didn't understand ThinLTO enough. It looks like I need to redesign my proposal for how to import functions to specialize.</span></div><div  style="clear:both;"><span  style="font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;"><br ></span></div><div  style="clear:both;"><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;"><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;clear:both;"><span  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;font-family:calibri,arial,helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">>> To <span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;color:#000000;font-family:Calibri,sans-serif;font-size:15.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">Teresa Johnson:</span></span></span></div><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;clear:both;"><span  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;font-family:calibri,arial,helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">>>> <span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">One important thing to note is that we import functions with available_externally linkage, because they still live in their original translation unit. The intention is to make them available for inlining, and then they get dropped. But specialization presumably mainly benefits the out of line copy, I think? So I'm not sure that the approach of allowing the specialization candidates to get imported will ultimately do what you want.</span></span></span></div><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;clear:both;"><span  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;font-family:calibri,arial,helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><br ></span></span></div><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><span  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">>> For imported functions with </span><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">available_externally linkage, </span><span  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">the strategy now is that if we decide to specialize it, we would copy the importanted function and change the linkage for cloned to local so that it wouldn't affect other translation unit. It looks fine to me.  Correct me if there is any problems. Since linking is really magic.</span></span></div></div><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;"><br ></div><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;">> Ah ok, I missed that aspect of the approach. Yes, making them internal copies should work, in that they will get kept and linking will work fine. However, this could definitely exacerbate the code size increases from specialization. Here is a possible case:</div><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;"><br ></div><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;">> Assume a.cc:A() calls b.cc:B() (i.e. A and B are all in different TUs initially). Let's say B could be specialized when called via A. And further, assume A is small and hot, and is called from many many other TUs. Normally, we would be likely to import A aggressively since it is likely to be inlined (as it is small and hot). But if B is very large we may not want to normally import and inline it (and as such function specialization is very desirable). With this approach, B could be imported and specialized as a local copy many many times (wherever A is imported and inlined). It is possible that linker ICF could merge these copies, but that's not enabled by default and I'm not sure if it would always be able to do that. This could seem like an extreme example, but we have applications with 20K+ TUs internally.</div><span  style="font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;"></span></div><div  style="clear:both;"><span  style="font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;"><br ></span></div><div  style="clear:both;">For the exmaple, assume<span  style="caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"> there are c.cc, d.cc and e.cc and all of them called A very often but none of them would call B. In my original imagination, B would get imported into a.cc only. But from your description, it looks like B would possibly get imported into c.cc, d.cc and e.cc. So it looks like the ThinLTO would do importing in many iterations instead of one iteration. If it is the case, I agree that the current design is not scalable.</span></div><div  style="clear:both;"><span  style="caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><br ></span></div><div  style="clear:both;"><span  style="caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">>  </span>Actually, ThinLTO does utilize cross-module synchronization. The synchronization is implemented via the index provided to the ThinLTO backends in which whole program optimization decisions are recorded.<div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;">>  I could envision providing via the index a way to record the arg number and specialized value on the FunctionSummary for the function to be specialized, along with a globally unique suffix to append (decided during the thin link), for each specialization, and then on each caller FunctionSummary, which callsite should call a specialized version (indicated by the globally unique suffix to apply to the the called function name). I actually need to do something similar for context sensitive heap optimization (WIP), so presumably they could rely on a similar summary mechanism to coordinate the cross module renaming.<br ></div><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;">>  This approach will also require some additional info in the index for the callsite arguments that are constant (i.e. pass the constant instead of a marker that it is a constant) so we can make these decisions during the thin link.<br ></div><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;">>. It's definitely a bit more work, but probably something we'd want to do longer term, rather than creating many specialized local copies, so I'd just make sure that the work done for specialization via importing is extensible to this approach.</div></div><div  style="clear:both;"><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;"><br ></div></div><div  style="clear:both;"><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;">If it is the case, I agree with that we need to specialize functions in their own TUs. And it reveals that I need to take more time to look into the details for the ThinLTO implementations and it would take more time to redesign anddevelop.</div></div><div  style="clear:both;"><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;"><br ></div></div><div  style="clear:both;"><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;">> <span  style="caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">The omnetpp increase is a little large, and it is more likely representative of large C++ applications. Do you know how much of this is due to the ArgUsage info vs the FuncSpecCostInfo? Not sure if there is a good way to shrink these though (I've only skimmed your patch).</span></div></div><div  style="clear:both;"><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;"><span  style="caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><br ></span></div></div><div  style="clear:both;"><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;">I could say the enlarged size is caused by ArgUsage. Since <span  style="caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">FuncSpecCostInfo is simply an unsigned number and a array of pairs (A pair consists of two unsigned number whcih stands for argument index and the base bonus to specialize it). I think enlarged size is easy to reduce since we could reduce data width directly. For example, now I used i64 to represent the index of arguments. But it is clearly that we didn't need so many bits. I guess i4 or i8 is enough and we can save many spaces. It is the same situations for the number of IR instructions, now we use i64 to represent it. But i32 should be enough. So I am not so worried about the enlarged size.</span></div></div><div  style="clear:both;"><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;"><br ></div></div><div  style="clear:both;"><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;">Many Thanks,</div></div><div  style="clear:both;"><div  style="margin:.0px;padding:.0px;border:.0px;outline:.0px;caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;">Chuanqi</div></div><div  style="clear:both;"><span  style="caret-color:#000000;color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"></span></div><div  style="clear:both;"><span  style="font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;"><br ></span></div><blockquote  style="margin-right:0;margin-top:0;margin-bottom:0;"><div  style="clear:both;"><span  style="font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;">------------------------------------------------------------------</span></div><div  style="clear:both;"><span  style="font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;">From:Teresa Johnson <tejohnson@google.com></span></div><div  style="clear:both;"><span  style="font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;">Send Time:2021年8月20日(星期五) 01:05</span></div><div  style="clear:both;"><span  style="font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;">To:chuanqi.xcq <yedeng.yd@linux.alibaba.com></span></div><div  style="clear:both;"><span  style="font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;">Cc:Sjoerd Meijer <Sjoerd.Meijer@arm.com>; llvm-dev <llvm-dev@lists.llvm.org>; Florian Hahn <florian_hahn@apple.com></span></div><div  style="clear:both;"><span  style="font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;">Subject:Re: [llvm-dev] [ThinLTO] Import functions to enable function specialization in ThinLTO</span></div><div  style="clear:both;"><span  style="font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;"><br ></span></div><div ><div >Sorry for the slow reply, I've been out of office. Some replies inline below. Teresa</div><br ><div  class="gmail_quote"><div  class="gmail_attr">On Wed, Aug 11, 2021 at 9:00 PM chuanqi.xcq <<a  href="mailto:yedeng.yd@linux.alibaba.com" target="_blank">yedeng.yd@linux.alibaba.com</a>> wrote:<br ></div><div ><div  style="clear:both;"><span  class=" __aliyun_node_has_color" style="font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;">Sorry for the late reply. T</span><span  style="font-size:14.0px;">hanks everyone for looking into this!</span></div><div  style="clear:both;"><span  style="font-size:14.0px;"><br ></span></div><div  style="clear:both;">> To Sjoerd Meijer:</div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">>> This general direction is that we change the cost-model, import more functions as a result, so that we can do more transformations. The questions that need answering are: what is this going to cost in  compile-times, and what is the performance benefit?</span></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><br ></span></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">For the cost and beneift, I marked them in `<a  href="https://reviews.llvm.org/D105966" target="_blank">https://reviews.llvm.org/D105966`</a>. Simply, for compile-time, there is no significant change except 502.gcc_r increased 30%. For code-size, the average increasement is 6.7%. And the maximum is 19%. For the performance benefit, the ThinLTO for function specialization version gains all the benefit from FullLTO version. </span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">Obiviously, the number are not so satisfying and this is the reason I marked that patch as WIP. The things I want to say is:</span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">- The increasement for compile-time and code-size mainly comes the newly imported functions been specialized. Instead of calculating and storing the extra heuristics.</span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">- The main purpose of this mail is to make sure we are on the right direction. In other words, ThinLTO for function specialization is pursued.</span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;"><br ></span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">> To Sjoerd Meijer:</span></span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">>> <span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">To have a more efficient discussion on this, I think it is best if you prototype your proposal. This makes things more concrete and also allows to get some first costs and benefits of the approach to test the whole idea. Without this, it all just remains a bit abstract.</span></span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><br ></span></span></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">Yeah, I thought I made the proposal in D105524, D105966 and D107136.  Since the codes for function specialization changes relatively fast. It may not be applied directly. I would try to rebase it recently.</span></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">Another way to measure it is to `git reset --hard d4840175c95f6edcba21baae411589468`.</span></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><br ></span></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">> To Sjoerd Meijer:</span></span></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">>> <span  class=" __aliyun_node_has_color" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:black;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;background-color:#ffffff;display:inline;font-family:calibri,arial,helvetica,sans-serif;font-size:16.0px;color:#000000;">Regarding </span><span  class=" __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;background-color:white;display:inline;"><a  href="https://reviews.llvm.org/D105524" target="_blank" style="margin:.0px;">https://reviews.llvm.org/D105524</a></span><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;background-color:#ffffff;display:inline;font-family:calibri,arial,helvetica,sans-serif;font-size:16.0px;color:#000000;">, yes </span></span><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;font-family:calibri,arial,helvetica,sans-serif;font-size:16.0px;color:#000000;background-color:#ffffff;">it is NFC and harmless, but it is not important. I.e., it is trivial and I'm sure we can get it approved in no-time</span></span></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;font-family:calibri,arial,helvetica,sans-serif;font-size:16.0px;color:#000000;background-color:#ffffff;"><br ></span></span></div><div  style="clear:both;"><span  style="font-family:calibri,arial,helvetica,sans-serif;font-size:16.0px;">The problem is that <span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">function specialization gets changed fast. So that this patch couldn't be approved directly. It may be an extra cost. If we decide the ThinLTO for function specialization is a long-term direction, we should check it in first to avoid extra cost.</span></span></div><div  style="clear:both;"><span  style="font-family:calibri,arial,helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><br ></span></span></div><div  style="clear:both;"><span  style="font-family:calibri,arial,helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">> To <span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Calibri,sans-serif;font-size:15.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">Teresa Johnson:</span></span></span></div><div  style="clear:both;"><span  style="font-family:calibri,arial,helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">>> <span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">One important thing to note is that we import functions with available_externally linkage, because they still live in their original translation unit. The intention is to make them available for inlining, and then they get dropped. But specialization presumably mainly benefits the out of line copy, I think? So I'm not sure that the approach of allowing the specialization candidates to get imported will ultimately do what you want.</span></span></span></div><div  style="clear:both;"><span  style="font-family:calibri,arial,helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><br ></span></span></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">For imported functions with </span><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">available_externally linkage, </span><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">the strategy now is that if we decide to specialize it, we would copy the importanted function and change the linkage for cloned to local so that it wouldn't affect other translation unit. It looks fine to me.  Correct me if there is any problems. Since linking is really magic.</span></span></div></div><div ><br ></div><div >Ah ok, I missed that aspect of the approach. Yes, making them internal copies should work, in that they will get kept and linking will work fine. However, this could definitely exacerbate the code size increases from specialization. Here is a possible case:</div><div ><br ></div><div >Assume a.cc:A() calls b.cc:B() (i.e. A and B are all in different TUs initially). Let's say B could be specialized when called via A. And further, assume A is small and hot, and is called from many many other TUs. Normally, we would be likely to import A aggressively since it is likely to be inlined (as it is small and hot). But if B is very large we may not want to normally import and inline it (and as such function specialization is very desirable). With this approach, B could be imported and specialized as a local copy many many times (wherever A is imported and inlined). It is possible that linker ICF could merge these copies, but that's not enabled by default and I'm not sure if it would always be able to do that. This could seem like an extreme example, but we have applications with 20K+ TUs internally.</div><div > </div><div ><div  style="clear:both;"></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">> To </span><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;font-family:Calibri,sans-serif;font-size:15.0px;float:none;display:inline;">Teresa Johnson:</span></span></span></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">>> <span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">It probably would be better to allow the necessary info to be propagated via the index back to the original TU so that it can be specialized in its original location.</span></span></span></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><br ></span></span></span></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">It looks hard to do. I think the key point is that we make decision to specialize functions in the process of ThinLTO compiling, which is pararrel to other compilation unit. It looks like we need to do synchronization for ThinLTO, which sounds like to violate the design principles for ThinLTO.</span></span></span></div></div><div ><br ></div><div >Actually, ThinLTO does utilize cross-module synchronization. The synchronization is implemented via the index provided to the ThinLTO backends in which whole program optimization decisions are recorded.</div><div ><br ></div><div >I could envision providing via the index a way to record the arg number and specialized value on the FunctionSummary for the function to be specialized, along with a globally unique suffix to append (decided during the thin link), for each specialization, and then on each caller FunctionSummary, which callsite should call a specialized version (indicated by the globally unique suffix to apply to the the called function name). I actually need to do something similar for context sensitive heap optimization (WIP), so presumably they could rely on a similar summary mechanism to coordinate the cross module renaming.</div><div ><br ></div><div >This approach will also require some additional info in the index for the callsite arguments that are constant (i.e. pass the constant instead of a marker that it is a constant) so we can make these decisions during the thin link.</div><div ><br ></div><div >It's definitely a bit more work, but probably something we'd want to do longer term, rather than creating many specialized local copies, so I'd just make sure that the work done for specialization via importing is extensible to this approach.</div><div ><br ></div><div ><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><br ></span></span></span></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">> <span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;background-color:#ffffff;float:none;display:inline;">To </span><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;text-decoration:none;background-color:#ffffff;font-family:Calibri,sans-serif;font-size:15.0px;float:none;display:inline;">Teresa Johnson:</span></span></span></span></div><div  style="clear:both;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;"><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">> Agree with others that the solution should be prototyped so you can show some results in terms of not only the benefit, but also the cost. I.e. adding information about all of the function arguments will increase the size of the index - how significant is that increase? We want to ensure that the thin link portion of the ThinLTO build stays efficient as that is the only monolithic, serial portion of a ThinLTO build.</span></span></span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;"><br ></span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">It's a great point to measure the size increased for index. I forget to measure it before. Then here are the numbers for SPEC2017 int. The method I used to measure the size for index is to compile the sizefor the `.o` objects compiled by `-flto=thin`. Correct me if the measurement is not precise.</span></div><div  style="clear:both;"></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">The results shows that for SPEC2017 intrate, the overall increased size is about  2.8%. More precisely,</span></div><div  style="clear:both;"></div><div  style="clear:both;"></div><div  style="clear:both;"><br ></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">| benchamrk       | size change for all the .o files |<br >|-----------------|--------|<br >| 500.perlbench_r | +1.9%  |<br >| 502.gcc_r       | +3.8%  |<br >| 505.mcf_r       | +3%    |</span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">| 520.omnetpp_r | +12% |</span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">| 525.x264_r | +2.9% |</span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">| 531.deepsjeng_r | +1.4% |</span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">| 541.leela_r | +2% |</span></div></div><div ><br ></div><div >The omnetpp increase is a little large, and it is more likely representative of large C++ applications. Do you know how much of this is due to the ArgUsage info vs the FuncSpecCostInfo? Not sure if there is a good way to shrink these though (I've only skimmed your patch).</div><div ><br ></div><div >Thanks,</div><div >Teresa</div><div > </div><div ><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;"><br ></span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">The size changes less than 1% are omitted.</span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;"><br ></span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;"> </span><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;float:none;display:inline;">To </span><span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="margin:.0px;padding:.0px;border:.0px;outline:.0px;color:#000000;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;font-family:Calibri,sans-serif;font-size:15.0px;float:none;display:inline;">Teresa Johnson:</span></span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">>> <span  class=" __aliyun_node_has_color __aliyun_node_has_bgcolor" style="color:#000000;font-family:Tahoma,Arial,STHeiti,SimSun;font-size:14.0px;font-style:normal;font-variant-caps:normal;font-weight:normal;text-align:start;text-indent:.0px;text-transform:none;background-color:#ffffff;text-decoration:none;float:none;display:inline;">That being said, I think it is a great idea to try to extend function specialization to ThinLTO if it is giving good benefits with regular LTO.</span></span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;"><br ></span></div><div  style="clear:both;"><span  style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16.0px;">Thanks for your approval for the direction very much!</span></div><div  style="clear:both;"><br ></div><div  style="clear:both;">Thanks,</div><div  style="clear:both;">Chuanqi</div><div  style="clear:both;"><br ></div></div></div><div ><div ><div ><span  style="font-family:Times;font-size:medium;"></span></div></div></div></div></blockquote><div ><br ></div></div>