<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/86669>86669</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [X86] Add X86 unfold instruction pass
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:X86
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          RKSimon
      </td>
    </tr>
</table>

<pre>
    There are a number of cases where we might want to unfold memory loads from instructions, but keep the memory load/store in the same basic block:

- [ ] Instructions that are notably slower in their folded form (#72530, #14640)
- [ ] optsize/minsize builds -  Loading a vector constant that can be compressed by X86FixupVectorConstants
- [ ] RMW scalar arithmetic on many Intel targets (#40176)

I imagine this being similar to MachineLICM; driven by register pressure and scheduler throughput/latency, but within a basicblock.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJxck0tv4zYQxz8NdRnEkCiJtg865AEDQTeXbdHulY-xxIYPgzOK6376gtqkG-QgigA5Mz_O_z-ayM8JcRLjgxifGr3yksv0_bfffcypMdndpj8WLAi6fpDWaLBAPoPVhATX7eyKEP28MFx1YuAMazrn4CBizOUGIWtHcC45gk_EZbXscyIhH8GsDK-IF-AFP18X8kScC4JP2xHpiGA0eQsmZPsq-nvRPon2fb0DMT6AGJ_g-VMB4EXzBp4yaxNuQCFfsbwn9QUqJTo45xJByIOQ_V6OfVvJhOy7QQ2tkMevNfKFyf-LQp6iT3UHZvXBEdwBfMva-TSDhje0nAvYnIi3tlQYqxMYBJvjpSAROjA3-HFQJ__Pevlzi3h8D6CvZb-__AVkddAFdPG8RGRvISeIOt3gOTEGYF1mZHp_zNB2e_X_A36uz-Cjnn1C4MUTGKy05KOveTnDi7aLT_jt-fFF9A_gin_DVCkLzp4YC2zka7VDckB2QbcGLMBLyeu8XFYW8hQ0Y7K3D4mvnhefQP-UcFNw17ipd8f-qBucun3XDUoeu0OzTG7oz_Y4KNRmPOy7TumjMXstXdcrp9zY-Em2cmh7qbpBdqPa2bMyve2NxcPYq34QQ4tR-7AL4S3ucpkbT7TidFBKHZugDQbaHC-l0fYVkxP9_Y-DElLWIShTjbsz60xiaIMnpl-Z2HPYxqXeH5_g3rkq4IflPxkcLpqoWUuYFuYLVcvKk5Cn2fOymp3NsTYqvH387i4l_422tm_DJSFPG_F_AQAA__-QjzJg">