Hi Kevin, I think the load/store intrinsics are unnecessary; you can use a normal load or store instruction here quite happily (perhaps with a bitcast around it, though even that's probably excessive). The pmull changes look reasonable though. Cheers. Tim. http://llvm-reviews.chandlerc.com/D2344