[clang] [emacs][clang-format] Add elisp API for clang-format on git diffs (PR #112792)

Sun Nov 17 18:33:51 PST 2024

================
@@ -206,20 +191,30 @@ which can be passed directly to ‘clang-format’."
        ((= status 0) nil)
        ;; Return of 1 indicates found diffs and no error.
        ((= status 1)
-        ;; Iterate through all lines in diff buffer and collect all
-        ;; lines in current buffer that have a diff.
-        (goto-char (point-min))
-        (while (not (eobp))
-          (let ((diff-line (clang-format--vc-diff-match-diff-line
-                            (buffer-substring-no-properties
-                             (line-beginning-position)
-                             (line-end-position)))))
-            (when diff-line
-              ;; Create list line regions with diffs to pass to
-              ;; clang-format.
-              (push (concat "--lines=" diff-line) diff-lines)))
-          (forward-line 1))
-        (reverse diff-lines))
+        ;; Find and collect all diff lines.
+        ;; We are matching something like:
+        ;; "@@ -80 +80 @@" or "@@ -80,2 +80,2 @@"
+        (let ((diff-lines-re
+               "^@@\s-[0-9,]+\s\\+\\([0-9]+\\)\\(,\\([0-9]+\\)\\)?\s@@$")
+              (index 0)
+              (all-lines
+               (buffer-substring-no-properties (point-min) (point-max))))
----------------
ideasman42 wrote:

Did you try searching within the current buffer?

It's quite strange to make a copy of the whole buffer just to search it.

Even if you can measure some performance improvement, I think it's reasonable to use common practices, instead of relying on performance characteristics of the current implementation. Unless there are really significant user visible advantages. Based on my own experience with ELisp, I've never seen scripts duplicate a buffer into a string to search it. Besides relying on some details of the performance of your version of emacs, subtle changes to ELisp may render this odd way of searching the buffer invalid. There is also the down side of storing the buffer twice in memory - probably it's not going to cause problems on modern systems, but it could cause some GC/performance issues with very large buffers still.

Furthermore `(while (re-search-forward ...) ...)` is such a common pattern, any bottleneck here could be caused by the arguments your using ... or some other buffer setting (case sensitivity.. or other arguments that control behavior). It seems quite suspicious that there should be a significant difference.

https://github.com/llvm/llvm-project/pull/112792