[PATCH] D36213: [InstCombine] Remove check for sext of vector icmp from shouldOptimizeCast

Wed Aug 2 09:19:52 PDT 2017

spatel added subscribers: efriedma, mcrosier, t.p.northover.
spatel added a comment.

I pushed 'test7' through llc for x86 and PPC64LE, and no problems. But then I tried AArch64 and ARM, and they went nuts whether it was an 'xor' or an 'and':

  define <2 x i64> @test7(<4 x float> %a, <4 x float> %b) {
    %cmp = fcmp ult <4 x float> %a, zeroinitializer
    %cmp4 = fcmp ult <4 x float> %b, zeroinitializer
    %sext = sext <4 x i1> %cmp to <4 x i32>
    %sext5 = sext <4 x i1> %cmp4 to <4 x i32>
    %and = and <4 x i32> %sext, %sext5
    %conv = bitcast <4 x i32> %and to <2 x i64>
    ret <2 x i64> %conv
  }

  define <2 x i64> @test7_better(<4 x float> %a, <4 x float> %b) {
    %cmp = fcmp ult <4 x float> %a, zeroinitializer
    %cmp4 = fcmp ult <4 x float> %b, zeroinitializer
    %and1 = and <4 x i1> %cmp, %cmp4
    %and = sext <4 x i1> %and1 to <4 x i32>
    %conv = bitcast <4 x i32> %and to <2 x i64>
    ret <2 x i64> %conv
  }

$ ./llc -o - vcmp.ll -mtriple=aarch64

  test7:                           // @test7
  	fcmge	v0.4s, v0.4s, #0.0
  	mvn	 v0.16b, v0.16b
  	fcmge	v1.4s, v1.4s, #0.0
  	bic	v0.16b, v0.16b, v1.16b
  	ret
  test7_better:                           // @test7_better
  // BB#0:
  	fcmge	v0.4s, v0.4s, #0.0
  	fcmge	v1.4s, v1.4s, #0.0
  	mvn	 v0.16b, v0.16b
  	mvn	 v1.16b, v1.16b
  	xtn	v0.4h, v0.4s
  	xtn	v1.4h, v1.4s
  	and	v0.8b, v0.8b, v1.8b
  	ushll	v0.4s, v0.4h, #0
  	shl	v0.4s, v0.4s, #31
  	sshr	v0.4s, v0.4s, #31
  	ret

Given that the more common problem patterns already exist independent of this patch, I would agree to proceed. But let's ping people with an ARM stake for their opinions - @t.p.northover @efriedma @mcrosier ?

https://reviews.llvm.org/D36213