[Libclc-dev] [PATCH v2 1/1] Implement generic mad_sat

Tue Aug 5 13:08:10 PDT 2014

On 08/05/2014 01:06 PM, Aaron Watry wrote:
> On Tue, Aug 5, 2014 at 2:58 PM, Matt Arsenault
> <Matthew.Arsenault at amd.com> wrote:
>> On 08/05/2014 12:51 PM, Aaron Watry wrote:
>>> Either way, I've successfully tested this version of the code with your
>>> LLVM FlattenCFG.cpp patch and gotten successful unit test passes on CEDAR
>>> (Radeon 5400). I believe that radeonsi will probably still fail due to the
>>> ulong instruction selection issue that I noted yesterday
>> What operation is not selecting? I thought most of those were taken care of
>> already
> I've attached the bitcode and resulting LLVM Error from the mad_sat
> ulong2 test kernel.
>
> The kernel source is:
> kernel void test_2_mad_sat_ulong(global ulong* out, global ulong* in0,
> global ulong* in1, global ulong* in2){
>    vstore2(mad_sat(vload2(0, in0), vload2(0, in1), vload2(0, in2)), 0, out);
> }
>
> Note that it's likely that mad_sat is fine, and the mul_hi and/or
> add_sat call embedded in mad_sat is actually where the issue is
> generated.

OK, I think this is because of the 64-bit ands in control flow which 
currently blocks selecting the scalar version. With the current 
workarounds for SGPRs and control flow, a 64-bit VALU and pattern needs 
to be added (which I recall seeing a patch for recently)