[PATCH] Improved udivmodsi4 with support for ARMv4

Joerg Sonnenberger joerg at NetBSD.org
Thu Jan 23 04:43:33 PST 2014


  Tim:

  Just calling them in a loop with incrementing numerator and constant denumerator, I get the following timing on a ARM1176JZ-S (2nd generation rpi).

  | denumerator | old code | new code (-march=armv6) | new code (-march=armv4) |
  |           65534 |       9.43 |                                 3.53 |                                4.06 |
  |                16 |       9.58 |                                 3.52 |                                4.06 |
  |              128 |       8.17 |                                 3.21 |                                 3.76 |


================
Comment at: udivmodsi4.S:62
@@ +61,3 @@
+	sub	r3, r3, ip
+	/* r1 > r0 implies r3 >= 0. */
+	adr	ip, LOCAL_LABEL(div0block)
----------------
Renato Golin wrote:
> I'd have thought that the code would only get here if r0 >= r1, because of the BCC above.
> 
> If r1 > r0, this is a case for quotient0, no?
quotient0 is used for r0 < r1, so at this point r1 >= r0. This means r3 >= ip and therefore r3-ip >= 0.


http://llvm-reviews.chandlerc.com/D2595



More information about the llvm-commits mailing list