RKSimon added a comment. Almost there - the only other improvement I can think of would be to use vcmptrueps(undef, undef) for OptForSize - doesn't break the dependency so slight perf regression in exchange for no vxorps. https://reviews.llvm.org/D32416