[all-commits] [llvm/llvm-project] c49d14: [trace][intel pt] Simple detection of infinite dec...
walter erquinigo via All-commits
all-commits at lists.llvm.org
Tue Oct 25 10:21:36 PDT 2022
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: c49d14aca5c7b06a4f896e9923bdab8a33f2866d
https://github.com/llvm/llvm-project/commit/c49d14aca5c7b06a4f896e9923bdab8a33f2866d
Author: Walter Erquinigo <wallace at fb.com>
Date: 2022-10-25 (Tue, 25 Oct 2022)
Changed paths:
M lldb/include/lldb/Core/PluginManager.h
M lldb/source/Core/PluginManager.cpp
M lldb/source/Plugins/Trace/intel-pt/CMakeLists.txt
M lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
M lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
M lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.cpp
M lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
M lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.h
A lldb/source/Plugins/Trace/intel-pt/TraceIntelPTProperties.td
M lldb/test/API/commands/trace/TestTraceDumpInfo.py
M lldb/test/API/commands/trace/TestTraceLoad.py
Log Message:
-----------
[trace][intel pt] Simple detection of infinite decoding loops
The low-level decoder might fall into an infinite decoding loop for
various reasons, the simplest being an infinite direct loop reached due
to wrong handling of self-modified code in the kernel, e.g. it might
reach
```
0x0A: pause
0x0C: jump to 0x0A
```
In this case, all the code is sequential and requires no packets to be
decoded. The low-level decoder would produce an output like the
following
```
0x0A: pause
0x0C: jump to 0x0A
0x0A: pause
0x0C: jump to 0x0A
0x0A: pause
0x0C: jump to 0x0A
... infinite amount of times
```
These cases require stopping the decoder to avoid infinite work and signal this
at least as a trace error.
- Add a check that breaks decoding of a single PSB once 500k instructions have been decoded since the last packet was processed.
- Add a check that looks for infinite loops after certain amount of instructions have been decoded since the last packet was processed.
- Add some `settings` properties for tweaking the thresholds of the checks above. This is also nice because it does the basic work needed for future settings.
- Add an AnomalyDetector class that inspects the DecodedThread and the libipt decoder in search for anomalies. These anomalies are then signaled as fatal errors in the trace.
- Add an ErrorStats class that keeps track of all the errors in a DecodedThread, with a special counter for fatal errors.
- Add an entry for decoded thread errors in the `dump info` command.
Some notes are added in the code and in the documention of the settings,
so please read them.
Besides that, I haven't been unable to create a test case in LLVM style, but
I've found an anomaly in the thread #12 of the trace
72533820-3eb8-4465-b8e4-4e6bf0ccca99 at Meta. We have to figure out how to
artificially create traces with this kind of anomalies in LLVM style.
With this change, that anomalous thread now shows:
```
(lldb)thread trace dump instructions 12 -e -i 23101
thread #12: tid = 8
...missing instructions
23101: (error) anomalous trace: possible infinite loop detected of size 2
vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2
23100: 0xffffffff81342785 pause
vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2
23099: 0xffffffff81342787 jmp 0xffffffff81342785 ; <+5> [inlined] rep_nop at processor.h:13:2
vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2
23098: 0xffffffff81342785 pause
vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2
23097: 0xffffffff81342787 jmp 0xffffffff81342785 ; <+5> [inlined] rep_nop at processor.h:13:2
vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2
23096: 0xffffffff81342785 pause
vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2
23095: 0xffffffff81342787 jmp 0xffffffff81342785 ; <+5> [inlined] rep_nop at processor.h:13:2
```
It used to be in an infinite loop where the decoder never stopped.
Besides that, the dump info command shows
```
(lldb) thread trace dump info 12
Errors:
Number of individual errors: 32
Number of fatal errors: 1
Number of other errors: 31
```
and in json format
```
(lldb) thread trace dump info 12 -j
"errors": {
"totalCount": 32,
"libiptErrors": {},
"fatalErrors": 1,
"otherErrors": 31
}
```
Differential Revision: https://reviews.llvm.org/D136557
More information about the All-commits
mailing list