<div class="gmail_quote">On Sat, Dec 24, 2011 at 10:25 PM, Craig Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com">craig.topper@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Author: ctopper<br>
Date: Sun Dec 25 00:25:37 2011<br>
New Revision: 147263<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=147263&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=147263&view=rev</a><br>
Log:<br>
Add intrinsics for lzcnt and tzcnt instructions.<br></blockquote><div><br></div><div>I have similar concerns as Benjamin, these appear to be the wrong semantics. More specific comments below.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+// LZCNT<br>
+BUILTIN(__builtin_clzs, "UsUs", "")<br></blockquote><div><br></div><div>What has this to do with lzcnt? This is just a short version of __builtin_clz, which we can implement for x86 w/o lzcnt....</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+<br>
+// BMI<br>
+BUILTIN(__builtin_ctzs, "UsUs", "")<br></blockquote><div><br></div><div>Same question. I don't understand your comments or why these builtins are grouped separately.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+ case X86::BI__builtin_clzs: {<br>
+ Value *ArgValue = EmitScalarExpr(E->getArg(0));<br>
+<br>
+ llvm::Type *ArgType = ArgValue->getType();<br>
+ Value *F = CGM.getIntrinsic(Intrinsic::ctlz, ArgType);<br>
+<br>
+ llvm::Type *ResultType = ConvertType(E->getType());<br>
+ Value *Result = Builder.CreateCall2(F, ArgValue, Builder.getTrue());<br></blockquote><div><br></div><div>If you're intending to provide builtins which model the semantics of LZCNT and TZCNT (a noble goal), then this is wrong. You want False here as Benjamin alluded to, as both of those instructions provide *defined* behavior for zero inputs.</div>
<div><br></div><div>I feel like this patch has two intermixed components:</div><div><br></div><div>1) extend the existing __builtin_clz?(...) intrinsics to support shorts</div><div>2) add intrinsics based around the LZCNT and TZCNT instructions.</div>
<div><br></div><div>#1 is fine, but should be commented as having the same bizarre semantics as the other __builtin_clz functions, and grouped with them. There is nothing specific to LZCNT or TZCNT here.</div><div><br></div>
<div>#2 is actually fine with the existing intrinsics, but you need to define them a bit differently:</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+static __inline__ unsigned short __attribute__((__always_inline__, __nodebug__))<br>
+__tzcnt16(unsigned short __X)<br>
+{<br>
+ return __builtin_ctzs(__X);<br>
+}<br>
+<br>
+static __inline__ unsigned int __attribute__((__always_inline__, __nodebug__))<br>
+__tzcnt32(unsigned int __X)<br>
+{<br>
+ return __builtin_ctz(__X);<br>
+}<br>
+<br>
+#ifdef __x86_64__<br>
+static __inline__ unsigned long long __attribute__((__always_inline__, __nodebug__))<br>
+__tzcnt64(unsigned long long __X)<br>
+{<br>
+ return __builtin_ctzll(__X);<br>
+}<br></blockquote><div><br></div><div>These all need to follow the pattern:</div><div><br></div><div> return (__X == 0) ? sizeof(X) * 8 : __builtin_ctz*(__X);</div><div><br></div><div>As otherwise, a zero input produces an undefined result.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">+static __inline__ unsigned short __attribute__((__always_inline__, __nodebug__))<br>
+__lzcnt16(unsigned short __X)<br>
+{<br>
+ return __builtin_clzs(__X);<br>
+}<br>
+<br>
+static __inline__ unsigned int __attribute__((__always_inline__, __nodebug__))<br>
+__lzcnt32(unsigned int __X)<br>
+{<br>
+ return __builtin_clz(__X);<br>
+}<br>
+<br>
+#ifdef __x86_64__<br>
+static __inline__ unsigned long long __attribute__((__always_inline__, __nodebug__))<br>
+__lzcnt64(unsigned long long __X)<br>
+{<br>
+ return __builtin_clzll(__X);<br>
+}<br></blockquote><div><br></div><div>These also need the ?: treatment to give them a defined result for zero inputs.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
==============================================================================<br>
--- cfe/trunk/test/CodeGen/bmi-builtins.c (added)<br>
+++ cfe/trunk/test/CodeGen/bmi-builtins.c Sun Dec 25 00:25:37 2011<br>
@@ -0,0 +1,24 @@<br>
+// RUN: %clang_cc1 %s -O3 -triple=x86_64-apple-darwin -target-feature +bmi -S -o - | FileCheck %s<br>
+<br>
+// Don't include mm_malloc.h, it's system specific.<br>
+#define __MM_MALLOC_H<br>
+<br>
+#include <x86intrin.h><br>
+<br>
+unsigned short test__tzcnt16(unsigned short __X)<br>
+{<br>
+ // CHECK: tzcntw<br>
+ return __tzcnt16(__X);<br></blockquote><div><br></div><div>Please do not test x86 instructions inside the Clang test suite. These tests need to be defined in terms of the LLVM IR the produce, not in terms of the x86 instructions they produce, even though they are x86-specific intrinsics. Otherwise, random and unrelated optimizations in LLVM may at any time break these tests without cause.</div>
</div>