[PATCH] D113251: [analyzer][doc] Add user documenation for taint analysis

Balázs Benics via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Nov 5 10:43:50 PDT 2021


steakhal added a comment.

Excellent work. Don't be afraid of the number of nits I dump on you!
Good job.



================
Comment at: clang/docs/analyzer/checkers.rst:2338
 
+Default sources defined by `GenericTaintChecker`:
+``fdopen``, ``fopen``, ``freopen``, ``getch``, ``getchar``, ``getchar_unlocked``, ``gets``, ``scanf``, ``socket``, ``wgetch``
----------------
Same for the rest.


================
Comment at: clang/docs/analyzer/checkers.rst:2351
+
+ clang --analyze ... -Xclang -analyzer-config -Xclang alpha.security.taint.TaintPropagation:Config=taint_config.yaml
+
----------------
Per https://reviews.llvm.org/D113004#inline-1078695 we should not advocate users use the `-Xclang` machinery, we should rather refer to it by other tools such as `scan-build`. However, we haven't reached a consensus about this decision yet.
Consider moving some parts of this doc to the proposed Configuration documentation file - housing the //more// user-facing analyzer options.


================
Comment at: clang/docs/analyzer/checkers.rst:2356
+
+For a more detailed description of configuration options, please see the :doc:`user-docs/TaintAnalysisConfiguration`. For an example see :ref:`taint-configuration-example`.
+
----------------
Please note that this configuration file is not stable (yet).
We might change it in a non-backward compatible way without any notice in the upcoming releases.


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:19
+Taint analysis works by checking for the occurrence of special events during the symbolic execution of the program.
+Taint analysis defines sources, sinks, and propagation rules. It identifies errors by detecting a flow of information that originates in a taint source, touches a taint sink, and propagates through the program paths via propagation rules.
+A source, sink, or an event that propagates taint is mainly domain-specific knowledge, but there are some built-in defaults provided by :ref:`alpha-security-taint-TaintPropagation`.
----------------



================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:41-51
+  # sources:
+    # signature:
+    # size_t fread(void *ptr, size_t size, size_t nmemb, FILE * stream)
+    #
+    # example:
+    # FILE* f = fopen("file.txt");
+    # char buf[1024];
----------------
Since there the `SrcArgs` is //empty//, `fread` will work as an //unconditional// propagator - aka. a taint source.
It's probably better to phrase it something like that.

By doing so you could signify that in the next example you have a //non-empty// `SrcArgs` list, thus the propagation is //conditional//.

I think it would be better to follow this pattern in the corresponding section later as well.


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:49
+    # size_t read = fread(buf, sizeof(buf[0]), sizeof(buf)/sizeof(buf[0]), f);
+    # // read and buf is tainted
+    - Name: fread
----------------



================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:76
+In the example file above, the entries under the `Propagation` key implement the conceptual sources and propagations, and sinks have their dedicated `Sinks` key.
+The user can define program points where the tainted values should be cleansed by listing entries under the `Filters` key.
+Filters model the sanitization of values done by the programmer, and providing these is key to avoiding false-positive findings.
----------------
The literature probably refers to //cleansing// functions as **sanitizers**. [[ https://en.wikipedia.org/wiki/Taint_checking | Wikipedia ]]


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:89
+
+Under the `Filters` entry, the user can specify a list of events that remove taint (see :ref:`taint-filter-details` for details).
+
----------------
Previously it was referred to by the `key` word. Chose one for consistency.


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:89
+
+Under the `Filters` entry, the user can specify a list of events that remove taint (see :ref:`taint-filter-details` for details).
+
----------------
steakhal wrote:
> Previously it was referred to by the `key` word. Chose one for consistency.
I would rather use the //operation// or //function call// instead of //event// everywhere.


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:91
+
+Under the `Propagations` entry, the user can specify a list of events that generate and propagate taint (see :ref:`taint-propagation-details` for details).
+The user can identify taint sources with a `SrcArgs` key in the `Propagation` entry, while propagations have none.
----------------



================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:92
+Under the `Propagations` entry, the user can specify a list of events that generate and propagate taint (see :ref:`taint-propagation-details` for details).
+The user can identify taint sources with a `SrcArgs` key in the `Propagation` entry, while propagations have none.
+
----------------
This sentence definitely needs some polishing.
1) It should emphasize that a source is an unconditional propagation - at least in the current design.
2) The user might want to //enlist// or //mark// a specific function as //taint source// but definitely not //identify// them.


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:94
+
+Under the `Sinks` entry, the user can specify a list of events where the checker should emit a bug report if taint reaches there (see :ref:`taint-sink-details` for details).
+
----------------



================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:102
+ - `Name` is a string that specifies the name of a function.
+   Encountering this function during symbolic execution will clean taint on some parameters or the return value.
+ - `Args` is a list of numbers in the range of [-1..int_max].
----------------



================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:109
+The following keys are optional:
+ - `Scope` is a string that specifies the prefix of the function's name in its fully qualified name. This option restricts the set of matching function calls.
+
----------------



================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:114
+Propagation syntax and semantics
+################################
+An entry under `Propagation` is a `YAML <https://yaml.org/>`_ object with the following mandatory keys:
----------------
In the following section, you have an empty line after this marker line. Consolidate these.


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:117
+ - `Name` is a string that specifies the name of a function.
+   Encountering this function during symbolic execution propagate taint from one or more parameters to other parameters and possibly the return value.
+   It helps model the taint-related behavior of functions that are not analyzable otherwise.
----------------
You sometimes use `parameter` and in other cases `argument` seemingly interchangeably. You should consider consolidating these.


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:133
+   The value of ``None`` will not consider the arguments that are part of a variadic argument list (this option is redundant but can be used to temporarily switch off handling of a particular variadic argument option without removing the entire variadic entry).
+ - `VariadicIndex` is a number in the range of [0..int_max]. It indicates the starting index of the variadic argument in the signature of the function.
+
----------------
It's not exactly for this patch, but we should investigate If we could infer this index from the declaration of the function.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113251/new/

https://reviews.llvm.org/D113251



More information about the cfe-commits mailing list