PS: For a small optimization, in the case where Amt is bigger than 32 (or whatever NVTBits is) you might want to use an "and" to mask off the top bits of Amt rather than subtracting 32 (if Amt is 64 or greater then the result of the shift was undefined anyway, so it is ok to mask off all the upper bits).