<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/135619>135619</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            clang-format-diff.py Fails to Correctly Handle Filenames with Spaces
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            clang-format
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          selimkeles
      </td>
    </tr>
</table>

<pre>
    I’ve encountered an issue with clang-format-diff.py where the script is unable to properly detect filenames that contain spaces. Upon investigating the source code, I discovered that the filename extraction relies on the following code block:
```python
filename = None
lines_by_file = {}
for line in sys.stdin:
    match = re.search(r'^\+\+\+\ (.*?/){%s}(\S*)' % args.p, line)
    if match:
        filename = match.group(2)
    if filename == None:
```
Because this regular expression uses \S* to capture the filename, it stops at the first encountered space. This results in the filename not being recognized in its entirety when there are spaces—causing inaccurate processing of the diff.

**Steps to Reproduce:**

1- Create or rename a file such that its path contains one or more spaces (e.g., my folder/file.c).
2- Make some changes and stage them so that a diff is generated.
3- Run the process clang-format-diff.py (e.g., via a pre-commit hook).
4- Observe that the script fails to detect the file correctly, as the regex does not capture filenames with spaces.

**Proposed Fix:** 

One straightforward solution is to update the regex to capture everything after the scaled path components—even if that includes spaces. For example, replacing (\S*) with (.+) and applying .strip() can address the issue:
```python
match = re.search(r'^\+\+\+\s+(?:.*?/){%s}(.+)$' % args.p, line)
if match:
    filename = match.group(1).strip()
```
This change—matching one or more characters until the end of the line—should correctly capture filenames with spaces, while also handling any extraneous whitespace. I have tested this modification with various file paths and it resolves the issue.

**Additional Note:**

In my local integration of clang-format-diff.py, I also noted that the script always exits with a status code of 0—even when formatting differences are present. This can be misleading for integration in automated checks where one might expect a non-zero exit code in case of formatting issues. While I understand that the design intention might be to simply output the diff changes, it could be useful to mention this behavior in the documentation or consider an opt-in flag to affect the exit code.

**Would you be open to a patch based on the above changes?** I’d be happy to contribute a patch if the maintainers agree that this is an improvement.

</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyUVl1v2zoS_TX0y8CCTNux_eAHJ73G5mHvXbS76GMxIkcSNxQpkJQT99cvhpK_um2BWxQJAlIzZ2bOOUOM0TSOaC_Wz2L9aYZDan3YR7KmeyNLcVZ5fd6_ij-k2JZitzsRkFN-cIkCaUAHJsaB4N2kFpRF18xrHzpMc23quujP8N5SIEgtQVTB9AlMhMFhZQmShz74noI9g6ZEKkFtLDnsKEJqMYHyLqFxEHtUFAv4T-8dGHeimEyDybhmjOyHoAiU1yTkC7yCNlH5U4aY4_ClS2igjxRQJeMdBLKGIng33vDW-ncOypGgsl69ieVBlAfxVI7_-3NqvRPl4RpNLD_Bn96RKA_WOIrfqvM3PswHYvMsNp_4ug_Ax8DFnGMRkzZujA0A0GFSbf4iUBEJg2qF3AYhN2L9h1i_CPn8w08QclsIeRDLo5BHIXecSq4jp5NbsX75wodyJ-QGhFwDhiYWPXeHYfDBmNnUY_IbFv73UF0-L5rgh17IrXz49v7itROPLRPl4ZkUDpFZYCIEagaLAeijDxQjj2GIFGHCzKxQ2KdhYs0lA0M3CWLyfYTrSENMD4TMRCng32OiONgUueUP83c-QUU85kDKN858J82XTIpALplAKdM2fxYIMNBEwIsMVlwOBzAOlRoCJmImKy7HNeDrnDArgDtRHvIsDl8S9ZHr-0x98HpQuVX5aLy2mMNLII7mA4QRLWbgEAfVjlxmmD2y3EZxMH3zB52_AmV2UNEU3LTuzMTWFIQ8cqhCCbljXHIO_8Q3Vk9HoFp0DUVApyEmbHLzO4h-TIq5GtZuQ464YM0hlnP4PIztner_uQncwTkZBIQ-0Fz5rjMJWu_fJkSrOfxVRQonusl2co0ajc29m4ziMlFQPgRSyZ45OMZ8EKihD9CeYh72hU43c8l2NZnK_YT-FXzvI2k4mo_rbGC88ZcjiCmgadpU-_COQUP0dshGYjK2odc8vBuEOy7TicI5tcwPrBOFqTi0pC_j7HrvyKUbzehEjlU2zt0pO2iKVzM8ehYRdr3N4gjUW1Qc_8EAxlpHt3jmv3nC2Pf2zFeLmIJhWfOJQgeoNasyo8vW_isD_PuWFTOALTvW8vBr75pwCrn6nXf9v2_9xrMWzK-7Sn-wp-wWowCurc8BspjvxKVa5M1BgVdYMjZ3iZy-KD7Du0SIrR-svvHz9zTk6t5bJjTa6KFFp23mijuPC8uRHyJfSTSZ3Cu0yEqhmPKeMxE6r01tFGZO5ugnDIY_zFphno0SN4nt0dsT3Y36QQoHrQ2HQQt_-vSjU7069hXrFVowLlETxpy-_qkBjFs5V-Z8ut_Kk7zRvuM5An2wu2XgyDaUhjjuYl9D-SiLbNBjlvwO4EwUyLH7sWPzciGXpl3A1K4IOhMtoeb7vJHvkRsHOCTfsbWBakm9xentwgzoWPW8sth7EJx38-8UfAY8IjQOFMaM9A5V7mss4Gse7SsMTlOIiUdwbYEmfoRlMC5DGZNV-YEUTdfbM_gh9UO6rpWLXU9LUWWmVcSLtB4sf9dNsTItKmrxZHLBYwivBr4wzSzwKolGU-AHne_T3DioLTYcCOv64rfXYh-I8jUnP_uBAfie16Znh8_-UCG76fS-wsqfrpsmiz_b6-11mWtose_P2Tm9S8FUQ6JrODPqrEOTVx8LEZtA13VhIjsxv0q7PvgTcZEMdqb3S71b7nBG-8VmtXpaLpe7zazd42pFm21NtZK03ZYrVVUL3JVab54WZbldzMxelnJdrharhZTLxabYbktcS6SNXJBCtRCrkjo0trD21BU-NLM88_1iuX5a7GYWK7Ixv6-lvJeGkJJf3GHP382roYliVVoTU7xFSiZZ2v90ox4v-_Dl6i__YM8gOD7ay5dsL7Mh2H2bUh-zkNl3G5PaoSqU74Q8csrp17wP_r-kkpDHkb1CHqdiTnv5vwAAAP__LegPcg">