<div dir="ltr">Hi LLVM,<div><br></div><div>I think the potential of divq instruction is not fully exploited by X86 target.</div><div><br></div><div>The following IR can be lowered to one divq instruction:</div><div><div>define i64 @div128by64lo(i128 %d, i64 %n) nounwind readnone {</div><div>  %m = zext i64 %n to i128</div><div>  %q = udiv i128 %d, %m</div><div>  %q.l = trunc i128 %q to i64</div><div>  ret i64 %q.l</div><div>}</div></div><div><br></div><div>And that one can be 2 divq instructions:</div><div><div>define i128 @div128by64full(i128 %d, i64 %n) nounwind readnone {</div><div>  %m = zext i64 %n to i128</div><div>  %q = udiv i128 %d, %m</div><div>  ret i128 %q</div><div>}</div></div><div><br></div><div>In current implementation, everywhere where i128 type shows up codegen generates a call to  __udivti3 builtin function.</div><div><br></div><div>Am I missing something?</div><div>- Paweł</div></div>