<div dir="ltr">Hi Arnold,<div><br></div><div>After a Christmas hiatus, please find attached the latest version of this patch. I also attach a larger patch which contains the entire implementation of function call vectorization (scalarization soon to follow).</div>

<div><br></div><div>I followed all your advice except about creating a library function to transform the call. The reason for this is that the logic is absolutely trivial - most of the difficulty is in converting the arguments and this is a per-vectorizer thing - a utility function wouldn't be able to help.</div>

<div><br></div><div>Please review!</div><div><br></div><div>Cheers,</div><div><br></div><div>James</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 25 December 2013 17:03, James Molloy <span dir="ltr"><<a href="mailto:james@jamesmolloy.co.uk" target="_blank">james@jamesmolloy.co.uk</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Arnold,<div><br></div><div>Thanks for the reply over the Christmas season :)</div><div><br></div><div>

Yep, your proposed scheme would work for me and I will implement that. Given that I'm ditching uniforms (at least for now), I don't think a library function is really required as the generation of the vectorized function call is trivial.</div>


<div><br></div><div>Merry Christmas,</div><div><br></div><div>James</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On 24 December 2013 22:43, Arnold <span dir="ltr"><<a href="mailto:aschwaighofer@apple.com" target="_blank">aschwaighofer@apple.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto"><div><br><br>Sent from my iPhone</div><div><div><br>On Dec 22, 2013, at 4:04 AM, James Molloy <<a href="mailto:james@jamesmolloy.co.uk" target="_blank">james@jamesmolloy.co.uk</a>> wrote:<br>


<br></div><blockquote type="cite"><div><div dir="ltr">Hi Arnold,<div><br></div><div>No worries, it's Christmas season so I expect long delays between replies (hence the day delay with this reply!)</div><div><br></div>


<div><span style="font-family:arial,sans-serif;font-size:13px">> I don't think that TargetLibraryInfo should do the transformation from scalar to vectorized call instruction itself.</span><br style="font-family:arial,sans-serif;font-size:13px">


<span style="font-family:arial,sans-serif;font-size:13px">> I think, TargetLibraryInfo should just provide a mapping from function name to function name (including the vector factor, etc). There should > be a function that lives in lib/transforms that does the transformation. I don’t think IR type transformations should go into lib/Target.</span><br>


</div><div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:13px">> </span><font face="arial, sans-serif">Do we need to reuse the LibFunc enum to identify these functions?</font></div>


<div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div><font face="arial, sans-serif">OK. I tried to go via LibFuncs purely so that TLI exposed a more coherent interface (call getLibFunc on a function name, get back an identifier for that function which you can then use to query the rest of TLI). I'm very happy with going direct from fn name to fn name, and leaving the creation of the CallInst to something else (probably LoopVectorizer itself).</font></div>


</div></div></blockquote><div><br></div></div>We will want to vectorize function calls in both vectorizers so a library function would be best.<div><br><blockquote type="cite"><div><div dir="ltr">

<div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">Regarding uniforms - I think the best way to handle these is to ignore them for the moment. At least in OpenCL, any function that can take a scalar uniform can also take a vector non-uniform. So mapping from all arguments scalar to all arguments vector is always valid, and will simplify the logic a lot. Then a later pass could, if it wanted, reidentify uniforms and change which function is called.</font></div>


</div></div></blockquote><div><br></div></div>Okay.<div><br><blockquote type="cite"><div><div dir="ltr">

<div><font face="arial, sans-serif"><br></font></div><div><span style="font-family:arial,sans-serif;font-size:13px">> I don't understand why the function calls need to be virtual. The mapping function name to family of functions should capture everything we need?</span><br style="font-family:arial,sans-serif;font-size:13px">


<span style="font-family:arial,sans-serif;font-size:13px">>  (Fun name, VF) -> {#num_entries, vector fun, {vector fun with uniform params, 1}, …}</span><br style="font-family:arial,sans-serif;font-size:13px"><font face="arial, sans-serif"><br>


</font></div><div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:13px">As Nadav mentioned, the entire suite of OpenCL built in functions is massive - hundreds of functions. Pre-seeding a map with all of these functions is something I'd quite like to avoid. There are two options - parameterise TLI with some map, like you said, or make TLI's functionality overridable. I prefer the latter, because it allows an implementation to do a lot of work lazily which is important for compile time performance (see the performance improvements I had to implement in the LLVM bitcode linker as an example of how much lazy link performance matters - small kernel, big library).</span></div>


<div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div><font face="arial, sans-serif">Making TLI's functionality overridable is, however, a soft requirement and I'm keen to satisfy reviewers. So if this use case doesn't wash with you, let me know and I'll make it take a map and sort out laziness my own way downstream.</font></div>


</div></div></blockquote><div><br></div><div><br></div></div><div><div>I don’t think we need to use virtual function calls to lazily setup the table. </div><div><br></div><div>One of the reasons why I don’t like subclassing as an extension mechanism is that it does not compose. Say we had VecLibA and VecLibB (and other times just one of them) it is hard to express this with subclassing.</div>


<div><br></div><div>How about a scheme similar to this:</div><div><br></div><div>VecDesc *veclibtbl = {{ “cos”, “cos2”, 2}, {"cos", “cos4”, 4}, {"sin", "sin4", 4}, ...};</div><div>SmallVector<VecDesc *, 4> VectorTables;</div>


<div>mutable StringMap<const char *> VecToScalar;</div><div><br></div><div>during initialization we push VectorDescs onto the vector. </div><div><br></div><div>intialize(*tli) {</div><div>  if (data layout.contains(“OpenCL”)</div>


<div>    tli->addVectorTable(openclvectbl);</div><div>  if (data layout.contains(“veclib”);</div><div>    tli->addVectorTable(veclibtbl);</div><div>  ...</div><div>}</div><div><br></div><div>When we query TLI for a vector variant we binary search each VecDesc for the function (similar to how we do for LibFuncs entries now).</div>


<div><br></div><div>The VecToScalar map can be lazily initialized the first time it is queried.</div><div><br></div><div><br></div><div>Would this work for you?</div></div><div><div><br><blockquote type="cite">

<div><div dir="ltr">

<div><font face="arial, sans-serif"><br></font></div><div><span style="font-family:arial,sans-serif;font-size:13px">> Can you infer the vectorized function name from the scalar function name in a predictable way (but why use an enum then)? I don’t understand the use case that requires virtual functions.</span><font face="arial, sans-serif"><br>


</font></div><div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:13px">Alas no, CL (my main use case) functions are overloaded and thus are name-mangled. I don't want to put  logic in *anywhere*.</span></div>


</div><div class="gmail_extra"><br><br><div class="gmail_quote">On 20 December 2013 18:45, Arnold <span dir="ltr"><<a href="mailto:aschwaighofer@apple.com" target="_blank">aschwaighofer@apple.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi James,<br>

<br>

Again thank you for moving this forward! Sorry for not chiming in earlier, I am in the midst of a move.<br>

<br>

I don't think that TargetLibraryInfo should do the transformation from scalar to vectorized call instruction itself.<br>

I think, TargetLibraryInfo should just provide a mapping from function name to function name (including the vector factor, etc). There should be a function that lives in lib/transforms that does the transformation. I don’t think IR type transformations should go into lib/Target.<br>


<br>

I don't understand why the function calls need to be virtual. The mapping function name to family of functions should capture everything we need?<br>

 (Fun name, VF) -> {#num_entries, vector fun, {vector fun with uniform params, 1}, …}<br>

Can you infer the vectorized function name from the scalar function name in a predictable way (but why use an enum then)? I don’t understand the use case that requires virtual functions.<br>

<br>

We can then initialize this mapping in a static function like we do now or at later point have this mapping additionally initialized by Module metadata.<br>

<br>

Do we need to reuse the LibFunc enum to identify these functions? Do you want to add (before-patch) TargetLibraryInfo::LibFunc style optimizations to the optimizer? (This would go beyond just enabling vectorization of call instruction).<br>


It seems to me you are solving two separate things with this patch: one is vectorization of call instructions and the second one is adding target (really "target language") defined “LibFunc” functions?<br>

<br>

For the former, I think a map of function names should be enough if we just want to know scalar and vector variants of functions? Otherwise, we would have to go through a map of function name string to LibFunc to vectorized function name. I think we can omit the intermediate step. If we had a string mapping, this would also readily support Module metadata.<br>


<br>

The latter I think is out of scope for this patch and something that needs to be discussed separately.<br>

<br>

Thanks,<br>

Arnold<br>

<div><div><br>

<br>

> On Dec 19, 2013, at 5:43 AM, James Molloy <<a href="mailto:James.Molloy@arm.com" target="_blank">James.Molloy@arm.com</a>> wrote:<br>

><br>

> Hi all,<br>

><br>

> Attached is the first patch in a sequence to implement this behaviour in the way described in this thread.<br>

><br>

> This patch simply:<br>

> * Changes most methods on TargetLibraryInfo to be virtual, to allow clients to override them.<br>

> * Adds three new functions for querying the vectorizability (and scalarizability) of library functions. The default implementation returns failure for all of these queries.<br>

><br>

> Please review!<br>

><br>

> James<br>

><br>

>> -----Original Message-----<br>

>> From: Hal Finkel [mailto:<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>]<br>

>> Sent: 16 December 2013 21:10<br>

>> To: Arnold Schwaighofer<br>

>> Cc: llvm-commits; James Molloy<br>

>> Subject: Re: RFC: Enable vectorization of call instructions in the loop<br>

>> vectorizer<br>

>><br>

>> ----- Original Message -----<br>

>>> From: "Arnold Schwaighofer" <<a href="mailto:aschwaighofer@apple.com" target="_blank">aschwaighofer@apple.com</a>><br>

>>> To: "Hal Finkel" <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>><br>

>>> Cc: "llvm-commits" <<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a>>, "James Molloy"<br>

>> <<a href="mailto:James.Molloy@arm.com" target="_blank">James.Molloy@arm.com</a>><br>

>>> Sent: Monday, December 16, 2013 3:08:13 PM<br>

>>> Subject: Re: RFC: Enable vectorization of call instructions in the loop<br>

>> vectorizer<br>

>>><br>

>>><br>

>>>> On Dec 16, 2013, at 2:59 PM, Hal Finkel <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>> wrote:<br>

>>>><br>

>>>> ----- Original Message -----<br>

>>>>> From: "Arnold Schwaighofer" <<a href="mailto:aschwaighofer@apple.com" target="_blank">aschwaighofer@apple.com</a>><br>

>>>>> To: "James Molloy" <<a href="mailto:James.Molloy@arm.com" target="_blank">James.Molloy@arm.com</a>><br>

>>>>> Cc: "llvm-commits" <<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a>><br>

>>>>> Sent: Monday, December 16, 2013 12:03:02 PM<br>

>>>>> Subject: Re: RFC: Enable vectorization of call instructions in the<br>

>>>>> loop     vectorizer<br>

>>>>><br>

>>>>><br>

>>>>> On Dec 16, 2013, at 11:08 AM, James Molloy <<a href="mailto:James.Molloy@arm.com" target="_blank">James.Molloy@arm.com</a>><br>

>>>>> wrote:<br>

>>>>><br>

>>>>>> Hi Renato, Nadav,<br>

>>>>>><br>

>>>>>> Attached is a proof of concept[1] patch for adding the ability to<br>

>>>>>> vectorize calls. The intended use case for this is in domain<br>

>>>>>> specific languages such as OpenCL where tuned implementation of<br>

>>>>>> functions for differing vector widths exist and can be guaranteed<br>

>>>>>> to be semantically the same as the scalar version.<br>

>>>>>><br>

>>>>>> I’ve considered two approaches to this. The first was to create a<br>

>>>>>> set of hooks that allow the LoopVectorizer to interrogate its<br>

>>>>>> client as to whether calls are vectorizable and if so, how.<br>

>>>>>> Renato<br>

>>>>>> argued that this was suboptimal as it required a client to invoke<br>

>>>>>> the LoopVectorizer manually and couldn’t be tested through opt. I<br>

>>>>>> agree.<br>

>>>>><br>

>>>>> I don’t understand this argument.<br>

>>>>><br>

>>>>> We could extend target library info with additional api calls to<br>

>>>>> query whether a function is vectorizable at a vector factor.<br>

>>>>> This can be tested by providing the target triple string (e.g<br>

>>>>> “target<br>

>>>>> triple = x86_64-gnu-linux-with_opencl_vector_lib") in the .ll file<br>

>>>>> that informs the optimizer that a set of vector library calls is<br>

>>>>> available.<br>

>>>>><br>

>>>>> The patch seems to restrict legal vector widths dependent on<br>

>>>>> available vectorizable function calls. I don’t think this should<br>

>>>>> work like this.<br>

>>>>> I believe, there should be an api on TargetTransformInfo for<br>

>>>>> library<br>

>>>>> function calls. The vectorizer chooses the cheapest of either an<br>

>>>>> intrinsic call or a library function call.<br>

>>>>> The overall cost model determines which VF will be chosen.<br>

>>>><br>

>>>> We don't have a good model currently for non-intrinsic function<br>

>>>> calls. Once we do, we'll want to know how expensive the vectorized<br>

>>>> versions are compared to the scalar version. Short of that, I<br>

>>>> think that a reasonable approximation is that any function calls<br>

>>>> will be the most expensive things in a loop, and the ability to<br>

>>>> vectorize them will be the most important factor in determining<br>

>>>> the vectorization factor.<br>

>>><br>

>>> Yes and we can easily model this in the cost model by asking what is<br>

>>> the cost of a (library) function call (vectorized or not) and have<br>

>>> this return a reasonably high value.<br>

>><br>

>> Sounds good to me.<br>

>><br>

>> -Hal<br>

>><br>

>>><br>

>>><br>

>>>><br>

>>>> -Hal<br>

>>>><br>

>>>>><br>

>>>>> Thanks,<br>

>>>>> Arnold<br>

>>>>><br>

>>>>><br>

>>>>> _______________________________________________<br>

>>>>> llvm-commits mailing list<br>

>>>>> <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

>>>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

>>>><br>

>>>> --<br>

>>>> Hal Finkel<br>

>>>> Assistant Computational Scientist<br>

>>>> Leadership Computing Facility<br>

>>>> Argonne National Laboratory<br>

>><br>

>> --<br>

>> Hal Finkel<br>

>> Assistant Computational Scientist<br>

>> Leadership Computing Facility<br>

>> Argonne National Laboratory<br>

><br>

><br>

> -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.<br>


><br>

> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2557590<br>

> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2548782<br>

</div></div>> <vectorizer-tli.diff><br>

<div><div><br>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

</div></div></blockquote></div><br></div>

</div></blockquote></div></div></div></blockquote></div><br></div>

</div></div></blockquote></div><br></div>