r205436 - Extend the SSE2 comment lexing to AVX2. Only 16byte align when not on AVX2.
Roman Divacky
rdivacky at freebsd.org
Thu Apr 3 10:38:20 PDT 2014
On Thu, Apr 03, 2014 at 10:13:15AM +0100, Jay Foad wrote:
> Hi Roman,
>
> On 2 April 2014 18:27, Roman Divacky <rdivacky at freebsd.org> wrote:
> > #ifdef __SSE2__
> > - __m128i Slashes = _mm_set1_epi8('/');
> > - while (CurPtr+16 <= BufferEnd) {
> > - int cmp = _mm_movemask_epi8(_mm_cmpeq_epi8(*(const __m128i*)CurPtr,
> > - Slashes));
> > +#define VECTOR_TYPE __m128i
> > +#define SET1_EPI8(v) _mm_set1_epi8(v)
> > +#define CMPEQ_EPI8(v1,v2) _mm_cmpeq_epi8(v1,v2)
> > +#define MOVEMASK_EPI8(v) _mm_movemask_epi8(v)
> > +#define STEP 16
> > +#elif __AVX2__
> > +#define VECTOR_TYPE __m256i
> > +#define SET1_EPI8(v) _mm256_set1_epi8(v)
> > +#define CMPEQ_EPI8(v1,v2) _mm256_cmpeq_epi8(v1,v2)
> > +#define MOVEMASK_EPI8(v) _mm256_movemask_epi8(v)
> > +#define STEP 32
> > +#endif
>
> Surely any machine with AVX2 also has SSE2, and if both are defined
> then your code will prefer to use the SSE2 intrinsics. This doesn't
> seem right. Am I missing something?
You're absolutely right. I fixed that and rebenchmarked and now
there's no difference at all. I have no explanation for the previous
3% speedup (but it was proved at 95% significance over 10 samples).
I'll revert my commit. Sorry for the noise.
Roman
More information about the cfe-commits
mailing list