<div dir="ltr">Thanks Fiona and Stehpen, let me go over this. I've been reading the Intel optimization manual and not Agner.<div><br></div><div>The bug actually has 2 parts:</div><div><br></div><div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div>(1) in LowerToBT() there is some logic that determines whether to use a BT or a TEST but it gets the dividing line between the two wrong:</div><div><br></div><div><div>11994,11995c11994,11997</div><div>< // Use BT if the immediate can't be encoded in a TEST instruction.</div><div>< if (!isUInt<32>(AndRHSVal) && isPowerOf2_64(AndRHSVal)) {</div><div>---</div><div>> // 16 bit mode: 32 bit mode: 64 bit mode:</div><div>> // TEST reg,imm 24b TEST reg,imm 40b TEST reg,imm 80b</div><div>> // BT reg,imm 32b BT reg,imm 32b BT reg,imm 40b</div><div>> if (AndRHSVal >= 256 && isPowerOf2_64(AndRHSVal)) {</div></div><div><br></div></blockquote></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px">(2) For an expression like (var & (1 << 37)) Clang emits LSHR/AND.<br>This actually confuses the hell out of LLVM generating fairly similar assembly even with O3. It would be better for Clang to emit AND reg, #(1<<37) and let LLVM recognize + optimize that.<br></blockquote><div><br></div>You can argue whether BT should be emitted and I'll work through the Intel/Agner docs. But LLVM is generating bad code from bad IR. I'll have a patch for Clang later today and you can look these over.<div><br><div><div>BTW, macrofusion of TEST only applies for conditional jumps and not to CMOV.<div class="gmail_extra">
</div></div></div></div></div>