[PATCH] D14761: [X86][SSE] Detect AVG pattern during instruction combine for SSE2/AVX2/AVX512BW.

Sun Nov 22 13:26:47 PST 2015

congh added a comment.

In http://reviews.llvm.org/D14761#294566, @RKSimon wrote:

> Out of curiosity - how well does this work with if InstCombiner::visitCallInst is used to convert _mm_avg_epu16 (etc.) calls to general IR? It should constant fold if possible - but could the lowering work if only one input is constant?

I didn't consider the case that one input is constant, in which case we are detecting (a + C) / 2 where C is a 8-bit constant and is greater than zero. Then we could perform PAVGW on a and C-1. I will update this patch to take care of this case. Thanks!

================
Comment at: test/CodeGen/X86/avg.ll:4
@@ +3,3 @@
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+avx512bw | FileCheck %s --check-prefix=AVX512BW
+
+define void @avg_v4i8(<4 x i8> %a, <4 x i8> %b) {
----------------
RKSimon wrote:
> AVX2/AVX512BW can share an additional AVX prefix - reduce test duplication:
> 
> FileCheck %s --check-prefix=AVX --check-prefix=AVX2
> FileCheck %s --check-prefix=AVX --check-prefix=AVX512BW
> 
I don't get it here: I didn't use AVX prefix at all. Should I test all SSE versions?

================
Comment at: test/CodeGen/X86/avg.ll:5
@@ +4,3 @@
+
+define void @avg_v4i8(<4 x i8> %a, <4 x i8> %b) {
+; SSE2-LABEL: avg_v4i8
----------------
RKSimon wrote:
> What does the code look like if we load the args instead of passing them in registers? Non-legal types in these cases often make the test cases less clear - in this case with all the pand/packuswb calls.
If we load v4i8 from memory, those packing instructions will be gone. I will update the test cases.

http://reviews.llvm.org/D14761