<div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div><span style="font-family:verdana,sans-serif">SGTM. </span><br></div><div><div style="font-family:verdana,sans-serif">Providing a fixed set of replacements for specific intrinsics is all NVPTX needs now.</div><div style="font-family:verdana,sans-serif">Expanding intrinsics late may miss some optimization opportunities, </div><div style="font-family:verdana,sans-serif">so we may consider doing it earlier and/or more than once, in case we happen to materialize new intrinsics in the later passes.</div></div></div></div></blockquote><div><br></div><div>Good old phase ordering. I don't think we've got any optimisations that target the nv/oc named functions and would personally prefer to never implement any.</div><div><br></div><div>We do have ones that target llvm.libm, and some that target extern C functions with the same names as libm. There's some code in clang that converts some libm functions into llvm intrinsics, and I think some other code in clang that converts in the other direction. Maybe dependent on various math flags.</div><div><br></div><div>So it seems we either canonicalise libm-like code and rearrange optimisations to work on the canonical form, or we write optimisations that know there are N names for essentially the same function. I'd prefer to go with the canonical form approach, e.g. we could rewrite calls to __nv_sin into calls to sin early on in the pipeline (or ignore them? seems likely applications call libm functions directly), and rewrite calls to sin to __nv_sin late on, with optimisations written against sin.</div><div><br></div><div>Thanks!</div><div><br></div></div></div>