[LLVMdev] Optimizing math code
David Chisnall
David.Chisnall at cl.cam.ac.uk
Tue Feb 18 00:43:32 PST 2014
On 18 Feb 2014, at 03:13, Michael Hamburg <mike at shiftleft.org> wrote:
> Yes. However, I’m using a redundant representation, with 56-bit digits in a 64-bit limb. (The real code uses uint64_t’s instead of ints.) That way up to 8 bits of carries can accumulate, and I can propagate them all at a more opportune time.
Two things to note here:
1) Your representation makes it very hard to vectorise. If you were doing the accumulate on every cycle, then you wouldn't even need to treat them as a vector and they could simply be folded to the largest integer type that the target supports, so if the vector registers can contain 128-bit integers that would work too.
2) By deferring the carry calculation, there is a significant chance that you are inserting a timing attack into your crypto code. I'm finding it quite difficult to think of ways that it is possible to compile code using this representation that doesn't leak information about the key to sufficiently patient attackers. You might like to look up some of the problems with 64-bit multiply on older ARM systems for an idea why - many of those theoretical attacks are now practical for a remote attacker in a couple of hours.
David
(Who is glad he doesn't have to write crypto code)
More information about the llvm-dev
mailing list