[PATCH] D113251: [analyzer][doc] Add user documenation for taint analysis

Endre Fülöp via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Nov 23 01:28:46 PST 2021


gamesh411 added inline comments.


================
Comment at: clang/docs/analyzer/checkers.rst:2341-2342
+
+Default propagations defined by `GenericTaintChecker`:
+``atoi``, ``atol``, ``atoll``, ``fgetc``, ``fgetln``, ``fgets``, ``fscanf``, ``sscanf``, ``getc``, ``getc_unlocked``, ``getdelim``, ``getline``, ``getw``, ``pread``, ``read``, ``strchr``, ``strrchr``, ``tolower``, ``toupper``
+
----------------
whisperity wrote:
> What does it mean that these are "default propagations"? That taint passes through calls to them trivially?
Added just 2 sentences about what default means.
Please see them a bit further up in the new version of the patch. 


================
Comment at: clang/docs/analyzer/checkers.rst:2345
+Default sinks defined in `GenericTaintChecker`:
+``printf``, ``setproctitle``, ``system``, ``popen``, ``execl``, ``execle``, ``execlp``, ``execv``, ``execvp``, ``execvP``, ``execve``, ``dlopen``, ``memcpy``, ``memmove``, ``strncpy``, ``strndup``, ``malloc``, ``calloc``, ``alloca``, ``memccpy``, ``realloc``, ``bcopy``
+
----------------
whisperity wrote:
> `execvp` and `execvP`? What is the capital `P` for? I can't find this overload in the POSIX docs.
I have found a refence to this function inside `TargetLibraryInfoTest.cpp`.
```
 926   │   case LibFunc_execv:
 927   │   case LibFunc_execvp:
 928   │     return (NumParams == 2 && FTy.getParamType(0)->isPointerTy() &&
 929   │             FTy.getParamType(1)->isPointerTy() &&
 930   │             FTy.getReturnType()->isIntegerTy(32));
 931   │   case LibFunc_execvP:
 932   │   case LibFunc_execvpe:
 933   │   case LibFunc_execve:
 934   │     return (NumParams == 3 && FTy.getParamType(0)->isPointerTy() &&
 935   │             FTy.getParamType(1)->isPointerTy() &&
 936   │             FTy.getParamType(2)->isPointerTy() &&
 937   │             FTy.getReturnType()->isIntegerTy(32));
```
It seems like an OSX-specific spelling and seems like it has the semantics of `execve`.
During the implementation of this patch, I just collected the references to functions inside  `GenericTaintChecker.cpp`.


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:5
+
+Clang Static Analyzer uses taint analysis to detect security-related issues in code.
+The backbone of taint analysis in Clang is the `GenericTaintChecker`, which the user can access via the :ref:`alpha-security-taint-TaintPropagation` checker alias and this checker has a default taint-related configuration.
----------------
whisperity wrote:
> whisperity wrote:
> > //The// Clang Static [...]?
> uses, or can use?
For brevity and clarity, I will leave it at `uses`, it is just a generic statement about the availability of taint analysis as a method, and I think it would not improve the understanding to change it to `can use`. IMHO it is implicit, and `can use` would make people wonder as to what does Clang need to use it etc.


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:6
+Clang Static Analyzer uses taint analysis to detect security-related issues in code.
+The backbone of taint analysis in Clang is the `GenericTaintChecker`, which the user can access via the :ref:`alpha-security-taint-TaintPropagation` checker alias and this checker has a default taint-related configuration.
+The checker also provides a configuration interface for extending the default settings by providing a configuration file in `YAML <https://yaml.org/>`_ format.
----------------
whisperity wrote:
> whisperity wrote:
> > Clang -> Clang SA / CSA?
> So the checker has the defaults all the time, or only by enabling the alias can I get the defaults?
Good point, added a line to clarify this.


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:23
+
+.. _taint-configuration-example:
+
----------------
whisperity wrote:
> AFAIK, RST anchors are global identifiers for the project, so maybe we should take care in adding "clang-sa" or some other sort of namespaceing...
> 
> (Although this might depend on how exactly the docs are built!)
added `clangsa-` prefix


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:30
+
+  Filters:
+    # signature:
----------------
whisperity wrote:
> //Filters// are not mentioned earlier.
Added a mention with references in the Overview section.


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:31-36
+    # signature:
+    # void cleanse_first_arg(int* arg)
+    #
+    # example:
+    # int x; // x is tainted
+    # cleanse_first_arg(&x); // x is not tainted anymore
----------------
whisperity wrote:
> If none of the comments themselves are just commented out ("optional") keys in the YAML, it would be better readable with a bit of formatting.
Reformatted the examples section according to your suggestions.


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:41
+  Propagations:
+  # sources:
+    # signature:
----------------
whisperity wrote:
> steakhal wrote:
> > Since there the `SrcArgs` is //empty//, `fread` will work as an //unconditional// propagator - aka. a taint source.
> > It's probably better to phrase it something like that.
> > 
> > By doing so you could signify that in the next example you have a //non-empty// `SrcArgs` list, thus the propagation is //conditional//.
> > 
> > I think it would be better to follow this pattern in the corresponding section later as well.
> Why is this comment at indent 0?
It is because `Propagations` `YAML`-keys are used to define both sources and propagations. So to comment the "source-section" 0-indent is used, and to comment on individual entries within, 2-indent comments are used.

I have consolidated the indents and introduced vertical whitespace instead of multiple indentation levels.


================
Comment at: clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst:51
+    - Name: fread
+      DstArgs: [0, -1]
+
----------------
whisperity wrote:
> What are these magic numbers and what is their meaning?
Added an extra comment on top of the example YAML file.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113251/new/

https://reviews.llvm.org/D113251



More information about the cfe-commits mailing list