<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Eric,<div class=""><br class=""></div><div class="">I guess you’re talking about 0009 patch, where I changed cost model for intrinsics and adjusted tests accordingly (please correct me if I’m wrong).</div><div class=""><br class=""></div><div class="">We have several constants in this patch (again, I’m not sure which of them are you talking about, so I’ll try to explain all of them):</div><div class=""><br class=""></div><div class="">1. 10 in 'return ScalarCalls * 10 + ScalarizationCost’. The constant 10 is indeed magic here, but this constant is used everywhere for a cost of a call. It might be a good idea to parametrize it, but I’d say that would be an independent change then.</div><div class=""><br class=""></div><div class="">2. Concrete costs in the tests, like these:</div><div class=""><div class="">-; CORE2: Cost Model: Found an estimated cost of 400 for instruction:   %2 = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> %wide.load)</div><div class="">+; CORE2: Cost Model: Found an estimated cost of 46 for instruction:   %2 = call <4 x float> @llvm.nearbyint.v4f32(<4 x float> %wide.load)</div><div class=""><br class=""></div><div class="">The idea of the test is to check that if an intrinsics maps well to a vector instruction, it should be much cheaper, than in the case where it should be scalarized. In the original test the cheap cost is 1, and the expensive cost is 400. In the modified test the cheap cost is 1, and the expensive cost is 46. I think we can replace ’46’ with a regexp {{[1-9][0-9]+}}, which will match 2-digit numbers, and that would be more general. What do you think?</div></div><div class=""><br class=""></div><div class="">3. Constants in scalarize.ll tests, e.g. these:</div><div class=""><div class="">-; CHECK32: cost of 12 {{.*}}bswap.v4i32</div><div class="">-; CHECK64: cost of 12 {{.*}}bswap.v4i32</div><div class="">+; CHECK32: cost of 48 {{.*}}bswap.v4i32</div><div class="">+; CHECK64: cost of 48 {{.*}}bswap.v4i32</div></div><div class=""><br class=""></div><div class="">This test checks the specific costs, so if we change the cost model, we need to change constants here. Some constants have changed quite significantly in this test, that’s why I especially wanted to hear opinions on the change of the cost model. I still need to measure performance impact of this change, but if there are no regressions, I’d assume nobody really relies on exactly these values, and the new ones would serve perfectly fine.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">It also worth noticing, that it’s not just a random change in the cost model - the intention is to make it more accurate. We have parameters like ScalarizationOverhead, we know number of scalar calls and we know cost of each scalar cost, but we didn’t use that info, and ended up with overly conservative cost (10 * CostOfCall * NumOfCalls). With this change the new formula is (CostOfCall*NumOfCall + ScalarizationCost), which should be pretty accurate, providing that CostOfCall and ScalarizationCall are accurate.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Michael</div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Feb 28, 2015, at 1:14 PM, Eric Christopher <<a href="mailto:echristo@gmail.com" class="">echristo@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><br class=""><br class=""><div class="gmail_quote">On Fri, Feb 27, 2015 at 4:10 PM Michael Zolotukhin <<a href="mailto:mzolotukhin@apple.com" class="">mzolotukhin@apple.com</a>> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class="">Hi Arnold,<div class=""><br class=""></div><div class="">These are sligthly updated patches 0007 and 0008 - I added some description for getCallInstrCost and refactored changes in loop vectorizer. I don’t know if these changes actually look better now, but at least we don’t have duplicated code :)</div><div class=""><br class=""></div><div class=""></div></div><div style="word-wrap:break-word" class=""><div class=""></div></div><div style="word-wrap:break-word" class=""><div class=""></div><div class=""><br class=""></div></div></blockquote><div class=""><br class=""></div><div class="">How'd you get the magic constants?</div><div class=""><br class=""></div><div class="">-eric</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""></div><div class="">What is the best way to proceed with this set of the patches? Should I create a separate phabricator review for each of them?</div><div class=""><br class=""></div><div class="">Also, the 0002 patch is an independent bug fix - is it ok to commit it?</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Michael</div></div><div style="word-wrap:break-word" class=""><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Feb 26, 2015, at 4:57 PM, Michael Zolotukhin <<a href="mailto:mzolotukhin@apple.com" target="_blank" class="">mzolotukhin@apple.com</a>> wrote:</div><br class=""><div class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important" class="">Hi Arnold,</span><div style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Feb 26, 2015, at 1:49 PM, Arnold Schwaighofer <<a href="mailto:aschwaighofer@apple.com" target="_blank" class="">aschwaighofer@apple.com</a>> wrote:</div><br class=""><div class=""><div style="word-wrap:break-word" class="">Hi Michael,<div class=""><br class=""></div><div class="">thanks for following up on this. I think it is important to move forward with this.</div><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Feb 26, 2015, at 12:01 PM, Michael Zolotukhin <<a href="mailto:mzolotukhin@apple.com" target="_blank" class="">mzolotukhin@apple.com</a>> wrote:</div><br class=""><div class=""><div style="word-wrap:break-word" class="">Hi,<div class=""><br class=""></div><div class="">Some time ago James posted a patch for vectorization of calls [1]. Though it seems to be a missing feature in the vectorizer, it wasn’t committed, so I decided to revive the discussion.</div><div class=""><br class=""></div><div class="">To begin with, I rebased the patch and split it to several semantically separated parts. The approach used here is pretty natural: TargetLibraryInfo has a mapping for a set of scalar functions to their vector counterparts, TargetTransformInfo consults with it when cost of call instruction is calculated, if the vector-version is cheaper, vectorizer emits vector-function.</div><div class=""><br class=""></div><div class="">There are several question on this patch:</div><div class="">1. Is this approach still viable in general and worth pursuing?</div></div></div></blockquote><div class=""><br class=""></div>I believe this is still important. We want to be able to vectorize function calls if vectorized versions of math library functions are available.</div></div></div></div></blockquote>Thanks, I’ll continue digging this then:)<br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap:break-word" class=""><div class=""><div class=""><br class=""></div><div class=""><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap:break-word" class=""><div class="">2. The cost model is changed. In theory it should become more accurate, but I haven’t measured performance impact of this change yet. Does these changes look good in general? If yes, do we need to tune some target-cost-model parameters, or those are the values that no one has been actually using yet?</div></div></div></blockquote><div class=""><br class=""></div><div class="">I think we will have to test performance impact of this change since we are most likely the first clients. We should test with/without the target tuple that enables the vectorized function calls. I think if we don’t see egregious regressions in the run that uses the modified cost model but is a run without vectorized versions of the math library being available we can commit this patch set such that it gets testing by the community.</div></div></div></div></div></blockquote>That sounds good.<br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap:break-word" class=""><div class=""><div class=""><div class=""><br class=""></div><div class=""><br class=""></div><blockquote type="cite" class=""><div class=""><div style="word-wrap:break-word" class=""><div class="">3. What is the best way to populate the scalar to vector functions map? For now, I populate it when the target-triple contains ‘Accelerate’. If we have a lot of other cases, we might think of something different here.</div></div></div></blockquote><div class=""><br class=""></div>We use the environment substring of the target tuple (in this case quadruple). This is in spirit to what we are already to conditionally turn on/off features in TargetLibraryInfo. I think we should address the issue of scalability when we have a client where this gets to be an issue.</div></div></div></div></blockquote>Sounds good.<br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap:break-word" class=""><div class=""><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap:break-word" class=""><div class=""><br class=""></div><div class="">Any feedback or ideas are appreciated!</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Michael</div><div class=""><br class=""></div><div class="">[1]: <a href="http://thread.gmane.org/gmane.comp.compilers.llvm.cvs/169736" target="_blank" class="">http://thread.gmane.org/gmane.comp.compilers.llvm.cvs/169736</a></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""></div></div><span class=""><0001-Make-ToVectorTy-static.patch></span><div style="word-wrap:break-word" class=""><div class=""></div></div><span class=""><0002-Fix-a-copy-paste-bug.patch></span><div style="word-wrap:break-word" class=""><div class=""></div></div><span class=""><0003-TLI-Use-lambda.-NFC.patch></span><div style="word-wrap:break-word" class=""><div class=""></div></div><span class=""><0004-LoopVectorizer-Add-TargetTransformInfo.patch></span><div style="word-wrap:break-word" class=""><div class=""></div></div><span class=""><0005-TLI-Factor-out-sanitizeFunctionName.-NFC.patch></span><div style="word-wrap:break-word" class=""><div class=""></div></div><span class=""><0006-TLI-Add-interface-for-querying-whether-a-function-is.patch></span><div style="word-wrap:break-word" class=""><div class=""></div></div><span class=""><0007-TTI-Add-getCallInstrCost.patch></span><div style="word-wrap:break-word" class=""><div class=""></div></div><span class=""><0008-LoopVectorize-teach-loop-vectorizer-to-vectorize-cal.patch></span><div style="word-wrap:break-word" class=""><div class=""></div></div><span class=""><0009-TTI-Honour-cost-model-for-estimating-cost-of-vector-.patch></span><div style="word-wrap:break-word" class=""><div class=""></div></div><span class=""><0010-TLI-Add-several-entries-to-VectorizableFunctions-tab.patch></span><div style="word-wrap:break-word" class=""><div class=""></div></div></div></blockquote><br class=""></div><div class="">Some comments on the code:</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><div class="">This function could use some documentation. What are Tys? (Argument types).</div></div></div></div></div></blockquote>I’ll add some, thanks.<br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap:break-word" class=""><div class=""><div class=""><div class=""><br class=""></div><div class=""><div class="">+  unsigned getCallInstrCost(Function *F, Type *RetTy, ArrayRef<Type *> Tys,</div><div class="">+                            const TargetLibraryInfo *TLI) {</div><div class="">+</div><div class="">+    // Scalar function calls are always expensive.</div><div class="">+    if (!RetTy->isVectorTy())</div><div class="">+      return 10;</div><div class="">+</div><div class="">+    // Functions with a vector form are no more expensive than a scalar call.</div><div class="">+    if (TLI &&</div><div class="">+        TLI->isFunctionVectorizable(F->getName(),</div><div class="">+                                    RetTy->getVectorNumElements()))</div><div class="">+      return 10;</div><div class="">+</div><div class="">+    // We have to scalarize this function call. Estimate the cost.</div><div class="">+    unsigned ScalarizationCost = getScalarizationOverhead(RetTy, true, false);</div><div class="">+    unsigned ScalarCalls = RetTy->getVectorNumElements();</div><div class="">+    for (unsigned i = 0, ie = Tys.size(); i != ie; ++i) {</div><div class="">+      if (Tys[i]->isVectorTy()) {</div><div class="">+        ScalarizationCost += getScalarizationOverhead(Tys[i], false, true);</div><div class=""><br class=""></div><div class="">This is not obvious. Why could we have more elements in the vector type of a function arguments than the vector return type when vectorizing a function such that we need more function calls when scalarizing?</div></div></div></div></div></div></blockquote>It’s an attempt to handle ‘unusual’ functions here. Possible examples would be converts, where input element size differs from output element size, or reductions.I think that for now we can just assert that we only deal with functions where operand types are the same as the result type.<br class=""><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap:break-word" class=""><div class=""><div class=""><div class=""><div class=""><br class=""></div><div class="">+        ScalarCalls = std::max(ScalarCalls, Tys[i]->getVectorNumElements());</div><div class="">+      }</div><div class="">+    }</div><div class="">+</div><div class="">+    return ScalarCalls * 10 + ScalarizationCost;</div><div class="">+  }</div></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">There seems to be some repetition in the code below. Can we refactor that?</div></div></div></div></div></blockquote>After re-examining this code I spotted strange part (I didn’t change it in the patches):</div><div class="">          Type *Tys[] = {CI->getType()};<br class="">          if (VF > 1)<br class=""><div style="word-wrap:break-word" class=""><div class=""><div class=""><div class=""><div class="">            Tys[0] = VectorType::get(CI->getType()->getScalarType(), VF);</div><div class="">I assume that only works for instrinsics with one argument, and those are the only cases we handle now. But shouldn’t we guard it with an assert or try to handle general case here?</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Michael</div></div></div></div></div><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap:break-word" class=""><div class=""><div class=""><div class=""><br class=""></div><div class=""><div class="">+        default:</div><div class="">+          bool HasScalarOpd = hasVectorInstrinsicScalarOpd(ID, 1);</div><div class="">+          for (unsigned Part = 0; Part < UF; ++Part) {</div><div class="">+            SmallVector<Value *, 4> Args;</div><div class="">+            for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {</div><div class="">+              if (HasScalarOpd && i == 1) {</div><div class="">+                Args.push_back(CI->getArgOperand(i));</div><div class="">+                continue;</div><div class="">+              }</div><div class="">+              VectorParts &Arg = getVectorValue(CI->getArgOperand(i));</div><div class="">+              Args.push_back(Arg[Part]);</div><div class="">+            }</div><div class="">+            Type *Tys[] = {CI->getType()};</div><div class="">+            if (VF > 1)</div><div class="">+              Tys[0] = VectorType::get(CI->getType()->getScalarType(), VF);</div><div class="">+</div><div class="">+            Function *F = Intrinsic::getDeclaration(M, ID, Tys);</div><div class="">+            Entry[Part] = Builder.CreateCall(F, Args);</div><div class="">+          }</div><div class="">+</div><div class="">+          propagateMetadata(Entry, it);</div><div class="">+          break;</div><div class="">+        }</div><div class="">+        // Otherwise, perform the lib call.</div><div class="">+      } else if (TLI && TLI->isFunctionVectorizable(FnName, VF)) {</div><div class="">+        // This is a function with a vector form.</div><div class="">+        StringRef VFnName = TLI->getVectorizedFunction(FnName, VF);</div><div class="">+        assert(!VFnName.empty());</div><div class="">+</div><div class="">+        Function *VectorF = M->getFunction(VFnName);</div><div class="">+        if (!VectorF) {</div><div class="">+          // Generate a declaration</div><div class="">+          FunctionType *FTy = FunctionType::get(RetTy, Tys, false);</div><div class="">+          VectorF =</div><div class="">+              Function::Create(FTy, Function::ExternalLinkage, VFnName, M);</div><div class="">+          assert(VectorF);</div><div class="">+        }</div><div class="">+</div><div class="">         for (unsigned Part = 0; Part < UF; ++Part) {</div><div class="">           SmallVector<Value *, 4> Args;</div><div class="">           for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {</div><div class="">-            if (HasScalarOpd && i == 1) {</div><div class="">-              Args.push_back(CI->getArgOperand(i));</div><div class="">-              continue;</div><div class="">-            }</div><div class="">             VectorParts &Arg = getVectorValue(CI->getArgOperand(i));</div><div class="">             Args.push_back(Arg[Part]);</div><div class="">           }</div><div class="">-          Type *Tys[] = {CI->getType()};</div><div class="">-          if (VF > 1)</div><div class="">-            Tys[0] = VectorType::get(CI->getType()->getScalarType(), VF);</div><div class="">-</div><div class="">-          Function *F = Intrinsic::getDeclaration(M, ID, Tys);</div><div class="">-          Entry[Part] = Builder.CreateCall(F, Args);</div><div class="">+          Entry[Part] = Builder.CreateCall(VectorF, Args);</div><div class="">         }</div><div class=""><br class=""></div></div><div class=""><br class=""></div></div><br class=""></div></div></div></blockquote></div><br class=""></div><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important" class="">_______________________________________________</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important" class="">llvm-commits mailing list</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><a href="mailto:llvm-commits@cs.uiuc.edu" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank" class="">llvm-commits@cs.uiuc.edu</a><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank" class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a></div></blockquote></div><br class=""></div></div>______________________________<u class=""></u>_________________<br class="">

llvm-commits mailing list<br class="">

<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank" class="">llvm-commits@cs.uiuc.edu</a><br class="">

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank" class="">http://lists.cs.uiuc.edu/<u class=""></u>mailman/listinfo/llvm-commits</a><br class="">

</blockquote></div></div>

</div></blockquote></div><br class=""></div></body></html>