<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Arnold,<div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Feb 26, 2015, at 1:49 PM, Arnold Schwaighofer <<a href="mailto:aschwaighofer@apple.com" class="">aschwaighofer@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Michael,<div class=""><br class=""></div><div class="">thanks for following up on this. I think it is important to move forward with this.</div><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Feb 26, 2015, at 12:01 PM, Michael Zolotukhin <<a href="mailto:mzolotukhin@apple.com" class="">mzolotukhin@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi,<div class=""><br class=""></div><div class="">Some time ago James posted a patch for vectorization of calls [1]. Though it seems to be a missing feature in the vectorizer, it wasn’t committed, so I decided to revive the discussion.</div><div class=""><br class=""></div><div class="">To begin with, I rebased the patch and split it to several semantically separated parts. The approach used here is pretty natural: TargetLibraryInfo has a mapping for a set of scalar functions to their vector counterparts, TargetTransformInfo consults with it when cost of call instruction is calculated, if the vector-version is cheaper, vectorizer emits vector-function.</div><div class=""><br class=""></div><div class="">There are several question on this patch:</div><div class="">1. Is this approach still viable in general and worth pursuing?</div></div></div></blockquote><div class=""><br class=""></div>I believe this is still important. We want to be able to vectorize function calls if vectorized versions of math library functions are available.</div></div></div></div></blockquote>Thanks, I’ll continue digging this then:)<br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><div class=""><br class=""></div><div class=""><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">2. The cost model is changed. In theory it should become more accurate, but I haven’t measured performance impact of this change yet. Does these changes look good in general? If yes, do we need to tune some target-cost-model parameters, or those are the values that no one has been actually using yet?</div></div></div></blockquote><div class=""><br class=""></div><div class="">I think we will have to test performance impact of this change since we are most likely the first clients. We should test with/without the target tuple that enables the vectorized function calls. I think if we don’t see egregious regressions in the run that uses the modified cost model but is a run without vectorized versions of the math library being available we can commit this patch set such that it gets testing by the community.</div></div></div></div></div></blockquote>That sounds good.<br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><div class=""><div class=""><br class=""></div><div class=""><br class=""></div><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">3. What is the best way to populate the scalar to vector functions map? For now, I populate it when the target-triple contains ‘Accelerate’. If we have a lot of other cases, we might think of something different here.</div></div></div></blockquote><div class=""><br class=""></div>We use the environment substring of the target tuple (in this case quadruple). This is in spirit to what we are already to conditionally turn on/off features in TargetLibraryInfo. I think we should address the issue of scalability when we have a client where this gets to be an issue.</div></div></div></div></blockquote>Sounds good.<br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><br class=""></div><div class="">Any feedback or ideas are appreciated!</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Michael</div><div class=""><br class=""></div><div class="">[1]: <a href="http://thread.gmane.org/gmane.comp.compilers.llvm.cvs/169736" class="">http://thread.gmane.org/gmane.comp.compilers.llvm.cvs/169736</a></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""></div></div><span id="cid:72C9864E-B093-4908-B077-E26C83CD717A@apple.com" class=""><0001-Make-ToVectorTy-static.patch></span><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><meta http-equiv="Content-Type" content="text/html charset=us-ascii" class=""><div class=""></div></div><span id="cid:F3E4FF9B-FA63-49D4-99B4-D4C4C2A9011D@apple.com" class=""><0002-Fix-a-copy-paste-bug.patch></span><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><meta http-equiv="Content-Type" content="text/html charset=us-ascii" class=""><div class=""></div></div><span id="cid:8C2AE955-87AD-4C6D-8FFC-E33C0CDE2025@apple.com" class=""><0003-TLI-Use-lambda.-NFC.patch></span><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><meta http-equiv="Content-Type" content="text/html charset=us-ascii" class=""><div class=""></div></div><span id="cid:1271C1E2-A213-46C6-833D-E7F4F95D921A@apple.com" class=""><0004-LoopVectorizer-Add-TargetTransformInfo.patch></span><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><meta http-equiv="Content-Type" content="text/html charset=us-ascii" class=""><div class=""></div></div><span id="cid:10AF974B-E24C-47DD-8B86-AEBCD1C454C3@apple.com" class=""><0005-TLI-Factor-out-sanitizeFunctionName.-NFC.patch></span><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><meta http-equiv="Content-Type" content="text/html charset=us-ascii" class=""><div class=""></div></div><span id="cid:15AF3830-1C84-49D2-BB51-2DF8A6F87ED6@apple.com" class=""><0006-TLI-Add-interface-for-querying-whether-a-function-is.patch></span><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><meta http-equiv="Content-Type" content="text/html charset=us-ascii" class=""><div class=""></div></div><span id="cid:A97C03FA-C950-4BA5-8016-C7FF76DD6A58@apple.com" class=""><0007-TTI-Add-getCallInstrCost.patch></span><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><meta http-equiv="Content-Type" content="text/html charset=us-ascii" class=""><div class=""></div></div><span id="cid:EECD50FA-D49F-4131-BD52-D52D1CCBCEC0@apple.com" class=""><0008-LoopVectorize-teach-loop-vectorizer-to-vectorize-cal.patch></span><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><meta http-equiv="Content-Type" content="text/html charset=us-ascii" class=""><div class=""></div></div><span id="cid:E4B0D28F-4DC2-4981-9A80-663D4CBCA5E9@apple.com" class=""><0009-TTI-Honour-cost-model-for-estimating-cost-of-vector-.patch></span><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><meta http-equiv="Content-Type" content="text/html charset=us-ascii" class=""><div class=""></div></div><span id="cid:AA4F87BF-D1C9-4C6E-B9DD-5FA84F41C58B@apple.com" class=""><0010-TLI-Add-several-entries-to-VectorizableFunctions-tab.patch></span><meta http-equiv="Content-Type" content="text/html charset=us-ascii" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""></div></div></div></blockquote><br class=""></div><div class="">Some comments on the code:</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><div class="">This function could use some documentation. What are Tys? (Argument types).</div></div></div></div></div></blockquote>I’ll add some, thanks.<br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><div class=""><div class=""><br class=""></div><div class=""><div class="">+ unsigned getCallInstrCost(Function *F, Type *RetTy, ArrayRef<Type *> Tys,</div><div class="">+ const TargetLibraryInfo *TLI) {</div><div class="">+</div><div class="">+ // Scalar function calls are always expensive.</div><div class="">+ if (!RetTy->isVectorTy())</div><div class="">+ return 10;</div><div class="">+</div><div class="">+ // Functions with a vector form are no more expensive than a scalar call.</div><div class="">+ if (TLI &&</div><div class="">+ TLI->isFunctionVectorizable(F->getName(),</div><div class="">+ RetTy->getVectorNumElements()))</div><div class="">+ return 10;</div><div class="">+</div><div class="">+ // We have to scalarize this function call. Estimate the cost.</div><div class="">+ unsigned ScalarizationCost = getScalarizationOverhead(RetTy, true, false);</div><div class="">+ unsigned ScalarCalls = RetTy->getVectorNumElements();</div><div class="">+ for (unsigned i = 0, ie = Tys.size(); i != ie; ++i) {</div><div class="">+ if (Tys[i]->isVectorTy()) {</div><div class="">+ ScalarizationCost += getScalarizationOverhead(Tys[i], false, true);</div><div class=""><br class=""></div><div class="">This is not obvious. Why could we have more elements in the vector type of a function arguments than the vector return type when vectorizing a function such that we need more function calls when scalarizing?</div></div></div></div></div></div></blockquote>It’s an attempt to handle ‘unusual’ functions here. Possible examples would be converts, where input element size differs from output element size, or reductions.I think that for now we can just assert that we only deal with functions where operand types are the same as the result type.<br class=""><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><div class=""><div class=""><div class=""><br class=""></div><div class="">+ ScalarCalls = std::max(ScalarCalls, Tys[i]->getVectorNumElements());</div><div class="">+ }</div><div class="">+ }</div><div class="">+</div><div class="">+ return ScalarCalls * 10 + ScalarizationCost;</div><div class="">+ }</div></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">There seems to be some repetition in the code below. Can we refactor that?</div></div></div></div></div></blockquote>After re-examining this code I spotted strange part (I didn’t change it in the patches):</div><div> Type *Tys[] = {CI->getType()};<br class=""> if (VF > 1)<br class=""><div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div class=""><div class=""><div class=""><div class=""> Tys[0] = VectorType::get(CI->getType()->getScalarType(), VF);</div><div class="">I assume that only works for instrinsics with one argument, and those are the only cases we handle now. But shouldn’t we guard it with an assert or try to handle general case here?</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Michael</div></div></div></div></div><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><div class=""><div class=""><br class=""></div><div class=""><div class="">+ default:</div><div class="">+ bool HasScalarOpd = hasVectorInstrinsicScalarOpd(ID, 1);</div><div class="">+ for (unsigned Part = 0; Part < UF; ++Part) {</div><div class="">+ SmallVector<Value *, 4> Args;</div><div class="">+ for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {</div><div class="">+ if (HasScalarOpd && i == 1) {</div><div class="">+ Args.push_back(CI->getArgOperand(i));</div><div class="">+ continue;</div><div class="">+ }</div><div class="">+ VectorParts &Arg = getVectorValue(CI->getArgOperand(i));</div><div class="">+ Args.push_back(Arg[Part]);</div><div class="">+ }</div><div class="">+ Type *Tys[] = {CI->getType()};</div><div class="">+ if (VF > 1)</div><div class="">+ Tys[0] = VectorType::get(CI->getType()->getScalarType(), VF);</div><div class="">+</div><div class="">+ Function *F = Intrinsic::getDeclaration(M, ID, Tys);</div><div class="">+ Entry[Part] = Builder.CreateCall(F, Args);</div><div class="">+ }</div><div class="">+</div><div class="">+ propagateMetadata(Entry, it);</div><div class="">+ break;</div><div class="">+ }</div><div class="">+ // Otherwise, perform the lib call.</div><div class="">+ } else if (TLI && TLI->isFunctionVectorizable(FnName, VF)) {</div><div class="">+ // This is a function with a vector form.</div><div class="">+ StringRef VFnName = TLI->getVectorizedFunction(FnName, VF);</div><div class="">+ assert(!VFnName.empty());</div><div class="">+</div><div class="">+ Function *VectorF = M->getFunction(VFnName);</div><div class="">+ if (!VectorF) {</div><div class="">+ // Generate a declaration</div><div class="">+ FunctionType *FTy = FunctionType::get(RetTy, Tys, false);</div><div class="">+ VectorF =</div><div class="">+ Function::Create(FTy, Function::ExternalLinkage, VFnName, M);</div><div class="">+ assert(VectorF);</div><div class="">+ }</div><div class="">+</div><div class=""> for (unsigned Part = 0; Part < UF; ++Part) {</div><div class=""> SmallVector<Value *, 4> Args;</div><div class=""> for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {</div><div class="">- if (HasScalarOpd && i == 1) {</div><div class="">- Args.push_back(CI->getArgOperand(i));</div><div class="">- continue;</div><div class="">- }</div><div class=""> VectorParts &Arg = getVectorValue(CI->getArgOperand(i));</div><div class=""> Args.push_back(Arg[Part]);</div><div class=""> }</div><div class="">- Type *Tys[] = {CI->getType()};</div><div class="">- if (VF > 1)</div><div class="">- Tys[0] = VectorType::get(CI->getType()->getScalarType(), VF);</div><div class="">-</div><div class="">- Function *F = Intrinsic::getDeclaration(M, ID, Tys);</div><div class="">- Entry[Part] = Builder.CreateCall(F, Args);</div><div class="">+ Entry[Part] = Builder.CreateCall(VectorF, Args);</div><div class=""> }</div><div class=""><br class=""></div></div><div class=""><br class=""></div></div><br class=""></div></div></div></blockquote></div><br class=""></div></body></html>