[Mesa-dev] [PATCH] R600/SI: fix MIMG writemask adjustement

Tue Sep 24 03:32:59 PDT 2013

> Sorry, I didn't investigate why. Maybe some pass is executed twice?
Probably. Well when I wrote the function the idea was to figure out 
(once) if a result of MIMG is used or not and if it's not used to adjust 
the mask, but that doesn't seems to be the case any more.

Whatever, the patch itself looks good to me and obviously fixes a bug, 
so feel free to commit it. We should just keep in mind to maybe rework 
that at some point, cause it definitely only needs to be done once.

Christian.

Am 24.09.2013 12:25, schrieb Marek Olšák:
> Sorry, I didn't investigate why. Maybe some pass is executed twice?
>
> Marek
>
> On Tue, Sep 24, 2013 at 11:48 AM, Christian König
> <deathsimple at vodafone.de> wrote:
>> Sorry, my fault let me refine the question: Why the heck is the function
>> applied twice?
>>
>> Christian.
>>
>> Am 24.09.2013 11:44, schrieb Marek Olšák:
>>
>>> If the TGSI writemask is .xzw and the initial dmask is 0xf, the first
>>> application of the function sees 4 VGPRs and correctly sets the dmask
>>> to 13 (xzw) because the second VGPR is unused, but the second
>>> application of the function now sees 3 VGPRs, ignores the fact that
>>> dmask is 13 meaning xzw, and sets the writemask to 7 meaning xyz,
>>> which breaks the texture instruction. I thought it was obvious from
>>> the comments I added.
>>>
>>> Marek
>>>
>>> On Tue, Sep 24, 2013 at 9:00 AM, Christian König
>>> <deathsimple at vodafone.de> wrote:
>>>> Why should that be necessary?
>>>>
>>>> Christian.
>>>>
>>>> Am 23.09.2013 21:09, schrieb maraeo at gmail.com:
>>>>
>>>>> From: Marek Olšák <marek.olsak at amd.com>
>>>>>
>>>>> This fixes piglit:
>>>>> - shaders/glsl-fs-texture2d-masked
>>>>> - shaders/glsl-fs-texture2d-masked-4
>>>>>
>>>>> Signed-off-by: Marek Olšák <marek.olsak at amd.com>
>>>>> ---
>>>>>     lib/Target/R600/SIISelLowering.cpp | 27 +++++++++++++++++++++------
>>>>>     1 file changed, 21 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/lib/Target/R600/SIISelLowering.cpp
>>>>> b/lib/Target/R600/SIISelLowering.cpp
>>>>> index 2174753..891a51b 100644
>>>>> --- a/lib/Target/R600/SIISelLowering.cpp
>>>>> +++ b/lib/Target/R600/SIISelLowering.cpp
>>>>> @@ -1065,7 +1065,9 @@ static unsigned SubIdx2Lane(unsigned Idx) {
>>>>>     void SITargetLowering::adjustWritemask(MachineSDNode *&Node,
>>>>>                                            SelectionDAG &DAG) const {
>>>>>       SDNode *Users[4] = { };
>>>>> -  unsigned Writemask = 0, Lane = 0;
>>>>> +  unsigned Lane = 0;
>>>>> +  unsigned OldDmask = Node->getConstantOperandVal(0);
>>>>> +  unsigned NewDmask = 0;
>>>>>         // Try to figure out the used register components
>>>>>       for (SDNode::use_iterator I = Node->use_begin(), E =
>>>>> Node->use_end();
>>>>> @@ -1076,29 +1078,42 @@ void
>>>>> SITargetLowering::adjustWritemask(MachineSDNode *&Node,
>>>>>             I->getMachineOpcode() != TargetOpcode::EXTRACT_SUBREG)
>>>>>           return;
>>>>>     +    /* Lane means which subreg of %VGPRa_VGPRb_VGPRc_VGPRd is used.
>>>>> +     * Note that subregs are packed, i.e. Lane==0 is the first bit set
>>>>> +     * in OldDmask, so it can be any of X,Y,Z,W; Lane==1 is the second
>>>>> bit
>>>>> +     * set, etc. */
>>>>>         Lane = SubIdx2Lane(I->getConstantOperandVal(1));
>>>>>     +    // Set which texture component corresponds to the lane.
>>>>> +    unsigned Comp;
>>>>> +    for (unsigned i = 0, Dmask = OldDmask; i <= Lane; i++) {
>>>>> +      assert(Dmask);
>>>>> +      Comp = ffs(Dmask)-1;
>>>>> +      Dmask &= ~(1 << Comp);
>>>>> +    }
>>>>> +
>>>>>         // Abort if we have more than one user per component
>>>>>         if (Users[Lane])
>>>>>           return;
>>>>>           Users[Lane] = *I;
>>>>> -    Writemask |= 1 << Lane;
>>>>> +    NewDmask |= 1 << Comp;
>>>>>       }
>>>>>     -  // Abort if all components are used
>>>>> -  if (Writemask == 0xf)
>>>>> +  // Abort if there's no change
>>>>> +  if (NewDmask == OldDmask)
>>>>>         return;
>>>>>         // Adjust the writemask in the node
>>>>>       std::vector<SDValue> Ops;
>>>>> -  Ops.push_back(DAG.getTargetConstant(Writemask, MVT::i32));
>>>>> +  Ops.push_back(DAG.getTargetConstant(NewDmask, MVT::i32));
>>>>>       for (unsigned i = 1, e = Node->getNumOperands(); i != e; ++i)
>>>>>         Ops.push_back(Node->getOperand(i));
>>>>>       Node = (MachineSDNode*)DAG.UpdateNodeOperands(Node, Ops.data(),
>>>>> Ops.size());
>>>>>         // If we only got one lane, replace it with a copy
>>>>> -  if (Writemask == (1U << Lane)) {
>>>>> +  // (if NewDmask has only one bit set...)
>>>>> +  if (NewDmask && (NewDmask & (NewDmask-1)) == 0) {
>>>>>         SDValue RC = DAG.getTargetConstant(AMDGPU::VReg_32RegClassID,
>>>>> MVT::i32);
>>>>>         SDNode *Copy = DAG.getMachineNode(TargetOpcode::COPY_TO_REGCLASS,
>>>>>                                           SDLoc(),
>>>>> Users[Lane]->getValueType(0),
>>>>