[www] r295851 - Add missing spaces in abstracts that were lost during processing

Wed Feb 22 09:15:58 PST 2017

Author: streit
Date: Wed Feb 22 11:15:58 2017
New Revision: 295851

URL: http://llvm.org/viewvc/llvm-project?rev=295851&view=rev
Log:

Add missing spaces in abstracts that were lost during processing

On behalf of Johannes Doerfert <johannes at jdoerfert.de> (Wed Feb 22 18:23:50 2017 +0100)

Modified:
    www/trunk/devmtg/2017-03/2017/02/20/accepted-sessions.html

Modified: www/trunk/devmtg/2017-03/2017/02/20/accepted-sessions.html
URL: http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2017-03/2017/02/20/accepted-sessions.html?rev=295851&r1=295850&r2=295851&view=diff
==============================================================================

--- www/trunk/devmtg/2017-03/2017/02/20/accepted-sessions.html (original)
+++ www/trunk/devmtg/2017-03/2017/02/20/accepted-sessions.html Wed Feb 22 11:15:58 2017
@@ -208,8 +208,24 @@ Keynotes
        LLVM for the future of Supercomputing
      </p>
      <p class="abstract">
-LLVM is solidifying its foothold in high-performance computing, and as we look forward toward the exascale computing era, LLVM promises to be a cornerstone of our programming environments. In this talk, I'll discuss several of the ways in which we're working to improve LLVM in support of this vision. Ongoing work includes better handling of restrict-qualified pointers [2], optimization of OpenMP constructs [3], and extending LLVM's IR to support an explicit representation of parallelism [4]. We're exploring several ways in which LLVM can be better integrated with autotuning technologies, how we can improve optimization reporting and profiling, and a myriad of other ways we can help move LLVM forward. Much of this effort is now a part of the US Department of Energy's Exascale Computing Project [1]. This talk will start by presenting the big picture, in part discussing goals of performance portability and how those maps into technical requirements, and then discuss details of current 
 and planned development.<br /><br />[1] https://exascaleproject.org/2016/11/10/ecp-awards-34m-for-software-development/<br />[2] https://reviews.llvm.org/D9375 (and dependent patches)<br />[3] https://reviews.llvm.org/D28870 (a first step in this direction)<br />[4] http://lists.llvm.org/pipermail/llvm-dev/2017-January/108906.html
-     </p>
+LLVM is solidifying its foothold in high-performance computing, and as we look
+forward toward the exascale computing era, LLVM promises to be a cornerstone of
+our programming environments. In this talk, I'll discuss several of the ways in
+which we're working to improve LLVM in support of this vision. Ongoing work
+includes better handling of restrict-qualified pointers [2], optimization of
+OpenMP constructs [3], and extending LLVM's IR to support an explicit
+representation of parallelism [4]. We're exploring several ways in which LLVM
+can be better integrated with autotuning technologies, how we can improve
+optimization reporting and profiling, and a myriad of other ways we can help
+move LLVM forward. Much of this effort is now a part of the US Department of
+Energy's Exascale Computing Project [1]. This talk will start by presenting the
+big picture, in part discussing goals of performance portability and how those
+maps into technical requirements, and then discuss details of current and
+planned development.<br /><br />[1]
+https://exascaleproject.org/2016/11/10/ecp-awards-34m-for-software-development/<br />[2]
+https://reviews.llvm.org/D9375 (and dependent patches)<br />[3]
+https://reviews.llvm.org/D28870 (a first step in this direction)<br />[4]
+http://lists.llvm.org/pipermail/llvm-dev/2017-January/108906.html </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -261,8 +277,32 @@ Full Talks
        Adventures in Fuzzing Instruction Selection
      </p>
      <p class="abstract">
-Recently there has been a lot of work on GlobalISel, which aims to entirelyreplace the existing instruction selectors for LLVM. In order to approach sucha transition, we need an effective way to test instruction selection andevaluate the new selector compared to the older ones.<br /><br />This talk will focus on our experiments and results in using fuzzing and inputgeneration to test instruction selection. We'll discuss the tradeoffs in how tofind valuable test inputs as well as the approach to validating the generatedcode. This will essentially consist of three parts:<br /><br />- Generating useful inputs to test instruction selection<br />- Evaluating the output of instruction selection effectively<br />- Results and lessons learned<br /><br />Generating Inputs<br />-----------------<br /><br />We will discuss the tradeoffs between types of input generation and look at theoptions in terms of the level of abstraction of those inputs. Here we talkabout how we improved on the input g
 eneration of the llvm-stress tool byleveraging libfuzzer and embracing coverage guided testing and input mutation.<br />We also go into the relative effectiveness of generating LLVM IR versusgenerating machine-level IR directly in terms of finding valuable test cases.<br /><br />Evaluating Outputs<br />------------------<br /><br />Given that we're feeding instruction selection arbitrary inputs, we need tocome up with ways to evaluate whether the results are sane. Here we'll discussthe kinds of bugs that were found simply by looking for crashes and error pathsversus those found by comparing against the older instruction selectors. Wealso explain the complexity of trying to compare instruction selectors andevaluate whether or not differences are functionally relevant.<br /><br />Results<br />-------<br /><br />Finally, we'll talk about the effectiveness of these experiments and theadaptibility of these methods to other problem spaces.
-     </p>
+Recently there has been a lot of work on GlobalISel, which aims to
+entirely replace the existing instruction selectors for LLVM. In order to
+approach such a transition, we need an effective way to test instruction
+selection and evaluate the new selector compared to the older ones.<br /><br />This
+talk will focus on our experiments and results in using fuzzing and
+input generation to test instruction selection. We'll discuss the trade-offs in
+how to find valuable test inputs as well as the approach to validating the
+generated code. This will essentially consist of three parts:<br /><br />-
+Generating useful inputs to test instruction selection<br />- Evaluating the
+output of instruction selection effectively<br />- Results and lessons
+learned<br /><br />Generating Inputs<br />-----------------<br /><br />We will
+discuss the trade-offs between types of input generation and look at the options
+in terms of the level of abstraction of those inputs. Here we talk about how we
+improved on the input generation of the llvm-stress tool by leveraging libfuzzer
+and embracing coverage guided testing and input mutation.<br />We also go into
+the relative effectiveness of generating LLVM IR versus generating machine-level
+IR directly in terms of finding valuable test cases.<br /><br />Evaluating
+Outputs<br />------------------<br /><br />Given that we're feeding instruction
+selection arbitrary inputs, we need to come up with ways to evaluate whether the
+results are sane. Here we'll discuss the kinds of bugs that were found simply by
+looking for crashes and error paths versus those found by comparing against the
+older instruction selectors. We also explain the complexity of trying to compare
+instruction selectors and evaluate whether or not differences are functionally
+relevant.<br /><br />Results<br />-------<br /><br />Finally, we'll talk about the
+effectiveness of these experiments and the adaptability of these methods to other
+problem spaces.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -297,8 +337,13 @@ Recently there has been a lot of work on
        ARM Code Size Optimisations
      </p>
      <p class="abstract">
-Last year, we've done considerable ARM code size optimisations in LLVM as that's an area that LLVM was lacking, see also e.g. Samsung's and Intel's EuroLLVM talks. In this presentation, we want to present lessons learned and insights gained from our work, leading to about 200 commits. The areas that we identified that are most important for code size are:<br />I) turn off specific optimisations when optimising for size,<br />II) tuning optimisations,<br />III) constants, and<br />IV) bit twiddling.
-     </p>
+Last year, we've done considerable ARM code size optimisations in LLVM as that's
+an area that LLVM was lacking, see also e.g. Samsung's and Intel's EuroLLVM
+talks. In this presentation, we want to present lessons learned and insights
+gained from our work, leading to about 200 commits. The areas that we identified
+that are most important for code size are:<br />I) turn off specific optimisations
+when optimising for size,<br />II) tuning optimisations,<br />III) constants,
+and<br />IV) bit twiddling.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -321,8 +366,35 @@ Last year, we've done considerable ARM c
        AVX-512 Mask Registers Code Generation Challenges in LLVM
      </p>
      <p class="abstract">
-In the past years LLVM has been extended to support Intel AVX512 [1] [2] instructions. One of the features introduced by the AVX-512 architecture is the concept of masked operations. In the Euro LLVM 2015 developer meeting Intel presented the new masked vector intrinsics, which assist LLVM IR optimizations (e.g. Loop Vectorizer) in selecting vector masked operations [3].<br /><br />In this talk, we are going to cover some of the key problems encountered when extending the LLVM code generator to support the AVX-512 mask registers.<br /><br />The current implementation of mask lowering, favors assigning LLVM IR conditions (i1 data type) to mask registers over General Purpose Registers (GPR). The decision leads to sub-optimal code generation when compiling for AVX-512 targets. This exposes a fundamental limitation of the existing instruction selection framework when a type can be lowered to different register classes. In addition, we will show that achieving optimal mask register selec
 tion requires a global analysis [5]. We will overview the various issues caused by the current approach, followed by a solution that achieves better results by favoring GPRs over mask registers [4]. In addition, we will overview a suggested optimization that mitigates artifacts created by the instruction selection phase.<br /><br />Additionally, AVX-512 mask registers create a dilemma with the memory representation of LLVM IR vectors of i1 - Is a mask a bit or a byte in memory? AVX2 and older vector instruction sets can efficiently support masks in bytes. AVX-512 favors representation by bits, thus achieving a smaller memory footprint. However, this creates a possible cross-generation interoperability conflict which needs to be addressed. We will overview the issue and explore the alternatives.<br /><br />[1] https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf<br />[2] http://llvm.org/devmtg/2013-11/slides/De
 mikhovsky-Poster.pdf<br />[3] http://llvm.org/devmtg/2015-04/slides/MaskedIntrinsics.pdf<br />[4] https://groups.google.com/forum/#!topic/llvm-dev/-OmfyIY3SaU<br />[5] http://llvm.org/devmtg/2016-11/Slides/Colombet-GlobalISel.pdf
-     </p>
+In the past years LLVM has been extended to support Intel AVX512 [1] [2]
+instructions. One of the features introduced by the AVX-512 architecture is the
+concept of masked operations. In the Euro LLVM 2015 developer meeting Intel
+presented the new masked vector intrinsics, which assist LLVM IR optimizations
+(e.g. Loop Vectorizer) in selecting vector masked operations [3].<br /><br />In
+this talk, we are going to cover some of the key problems encountered when
+extending the LLVM code generator to support the AVX-512 mask
+registers.<br /><br />The current implementation of mask lowering, favors
+assigning LLVM IR conditions (i1 data type) to mask registers over General
+Purpose Registers (GPR). The decision leads to sub-optimal code generation when
+compiling for AVX-512 targets. This exposes a fundamental limitation of the
+existing instruction selection framework when a type can be lowered to different
+register classes. In addition, we will show that achieving optimal mask register
+selection requires a global analysis [5]. We will overview the various issues
+caused by the current approach, followed by a solution that achieves better
+results by favoring GPRs over mask registers [4]. In addition, we will overview
+a suggested optimization that mitigates artifacts created by the instruction
+selection phase.<br /><br />Additionally, AVX-512 mask registers create a dilemma
+with the memory representation of LLVM IR vectors of i1 - Is a mask a bit or a
+byte in memory? AVX2 and older vector instruction sets can efficiently support
+masks in bytes. AVX-512 favors representation by bits, thus achieving a smaller
+memory footprint. However, this creates a possible cross-generation
+interoperability conflict which needs to be addressed. We will overview the
+issue and explore the alternatives.<br /><br />[1]
+https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf<br />[2]
+http://llvm.org/devmtg/2013-11/slides/Demikhovsky-Poster.pdf<br />[3]
+http://llvm.org/devmtg/2015-04/slides/MaskedIntrinsics.pdf<br />[4]
+https://groups.google.com/forum/#!topic/llvm-dev/-OmfyIY3SaU<br />[5]
+http://llvm.org/devmtg/2016-11/Slides/Colombet-GlobalISel.pdf </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -351,8 +423,27 @@ In the past years LLVM has been extended
        Clank: Java-port of C/C++ compiler frontend
      </p>
      <p class="abstract">
-Clang was written in a way that allows to use it inside IDEs as a provider for various things - fromnavigation and code completion to refactorings. But is it possible to use it with the modern IDE writtenin pure Java?Our team spent some time porting Clang into Java and got "Clank - the Java equivalent of nativeClang".<br />We will tell you why we failed to use native Clang, how porting to Java was done, what difficultieswe faced and what outcome we have at this point.<br /><br />Extended Abstract:<br />We will present the project Clank (with last K) - the Java port of native Clang.<br />The goal was to get the Java code as close to the original C++ code of Clang as possible:<br />preserving structure, names, comments and formatting of original code, but built once to runeverywhere.<br />In this talk we will describe which tooling (also based on Clang) we created to automateconversion of C++ LLVM/Clang codebase into Clank Java codebase. The tooling for upgrade Clankcode base when new
  version of Clang is released will be described as well.<br />We will present our experience with evaluating native Clang/libClang technology as the providerfor Open Source NetBeans IDE project for C++ language support. We will describe why we failed touse native Clang in the IDE written in pure Java and why created the Java-port named Clank.<br />Will consider C++ constructions used in Clank codebase without direct equivalent in Java and howwe resolved the challenges to keep code as close to the original as possible.<br />Also we will mention how Clank was finally used in the production of Open Source NetBeansproject.
-     </p>
+Clang was written in a way that allows to use it inside IDEs as a provider for
+various things - from navigation and code completion to refactorings. But is it
+possible to use it with the modern IDE written in pure Java?Our team spent some
+time porting Clang into Java and got "Clank - the Java equivalent of
+native Clang".<br />We will tell you why we failed to use native Clang, how
+porting to Java was done, what difficulties we faced and what outcome we have at
+this point.<br /><br />Extended Abstract:<br />We will present the project Clank
+(with last K) - the Java port of native Clang.<br />The goal was to get the Java
+code as close to the original C++ code of Clang as possible:<br />preserving
+structure, names, comments and formatting of original code, but built once to
+run everywhere.<br />In this talk we will describe which tooling (also based on
+Clang) we created to automate conversion of C++ LLVM/Clang codebase into Clank
+Java codebase. The tooling for upgrade Clank code base when new version of Clang
+is released will be described as well.<br />We will present our experience with
+evaluating native Clang/libClang technology as the provider for Open Source
+NetBeans IDE project for C++ language support. We will describe why we failed
+to use native Clang in the IDE written in pure Java and why created the Java-port
+named Clank.<br />Will consider C++ constructions used in Clank codebase without
+direct equivalent in Java and how we resolved the challenges to keep code as
+close to the original as possible.<br />Also we will mention how Clank was
+finally used in the production of Open Source NetBeans project.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -393,8 +484,39 @@ Clang was written in a way that allows t
        CodeCompass: An Open Software Comprehension Framework
      </p>
      <p class="abstract">
-Bugfixing or new feature development requires a confident understanding of all details and consequences of the planned changes. For long existing large telecom systems, where the code base have been developed and maintained for decades byfluctuating teams, original intentions are lost, the documentation is untrustworthy or missing, the only reliable information is the code itself. Code comprehension of such large software systems is an essential, but usually very challenging task. As the method of comprehension is fundamentally different fromwriting new code, development tools are not performing well. During the years, different programs have been developed with various complexity and feature set for code comprehension but none of them fulfilled all requirements.<br /><br />CodeCompass is an open source LLVM/Clang based tool developed by Ericsson Ltd. and the EÃ¶tvÃ¶s LorÃ¡nd University, Budapest to help understanding large legacy software systems. Based on the LLVM/Clang comp
 iler infrastructure, CodeCompass gives exact information on complex C/C++ language elements like overloading, inheritance, the (read or write) usage of variables, possible call.<br />on function pointers and the virtual functions -- features that various existing tools support only partially. The wide range of interactive visualizations extends further than the usual class and function call diagrams; architectural, component and interface diagrams are a few of the implemented graphs.<br /><br />To make comprehension more extensive, CodeCompass is not restricted to the source code. It also utilizes build information to explore the system architecture as well as version control information when available: git commit history and blame view are also visualized. Clang based static analysis results are also integrated to CodeCompass. Although the tool focuses mainly on C and C++, it also supports Java and Python languages. Having a web-based, pluginable, extensible architecture, the CodeC
 ompass framework can bean open platform to further code comprehension, static analysis and software metrics efforts.<br /><br />Lecture outline:<br />- First we show why current development tools are not satisfactory for code comprehension<br />- Then we specify the requirements for such a tool<br />- Introduce codecompass architectur.<br />- Revail some challenges we have met and how we solve them<br />- Show a live demo<br />- Describe the open architecture and<br />- Talk about future plans and how the community can extend the feature set
-     </p>
+Bug fixing or new feature development requires a confident understanding of all
+details and consequences of the planned changes. For long existing large telecom
+systems, where the code base have been developed and maintained for decades
+by fluctuating teams, original intentions are lost, the documentation is
+untrustworthy or missing, the only reliable information is the code itself. Code
+comprehension of such large software systems is an essential, but usually very
+challenging task. As the method of comprehension is fundamentally different
+from writing new code, development tools are not performing well. During the
+years, different programs have been developed with various complexity and
+feature set for code comprehension but none of them fulfilled all
+requirements.<br /><br />CodeCompass is an open source LLVM/Clang based tool
+developed by Ericsson Ltd. and the EÃ¶tvÃ¶s LorÃ¡nd University, Budapest to help
+understanding large legacy software systems. Based on the LLVM/Clang compiler
+infrastructure, CodeCompass gives exact information on complex C/C++ language
+elements like overloading, inheritance, the (read or write) usage of variables,
+possible call.<br />on function pointers and the virtual functions -- features
+that various existing tools support only partially. The wide range of
+interactive visualizations extends further than the usual class and function
+call diagrams; architectural, component and interface diagrams are a few of the
+implemented graphs.<br /><br />To make comprehension more extensive, CodeCompass
+is not restricted to the source code. It also utilizes build information to
+explore the system architecture as well as version control information when
+available: git commit history and blame view are also visualized. Clang based
+static analysis results are also integrated to CodeCompass. Although the tool
+focuses mainly on C and C++, it also supports Java and Python languages. Having
+a web-based, pluginable, extensible architecture, the CodeCompass framework can
+bean open platform to further code comprehension, static analysis and software
+metrics efforts.<br /><br />Lecture outline:<br />- First we show why current
+development tools are not satisfactory for code comprehension<br />- Then we
+specify the requirements for such a tool<br />- Introduce CodeCompass
+architecture.<br />- Revail some challenges we have met and how we solve
+them<br />- Show a live demo<br />- Describe the open architecture and<br />- Talk
+about future plans and how the community can extend the feature set </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -429,8 +551,32 @@ Bugfixing or new feature development req
        Cross Translational Unit Analysis in Clang Static Analyzer: Prototype and measurements
      </p>
      <p class="abstract">
-Today Clang Static Analyzer [4] can perform (context-sensitive) interproceduralanalysis for C,C++ and Objective C les by inlining the calledfunction into the callers' context. This means that that the full calling context(assumptions about the values of function parameters, global variables) ispassed when analyzing the called function and then the assumptions aboutthe returned value is passed back to the caller. This works well for functioncalls within a translation unit (TU), but when the symbolic execution reachesa function that is implemented in another TU, the analyzer engine skips theanalysis of the called function denition. In particular, assumptions aboutreferences and pointers passed as function parameters get invalidated, andthe return value of the function will be unknown. Losing information thisway may lead to false positive and false negative ndings.<br />The cross translation unit (CTU) feature allows the analysis of calledfunctions even if the denition of the function 
 is external to the currentlyanalyzed TU. This would allow detection of bugs in library functions stemmingfrom incorrect usage (e.g. a library assumes that the user will free amemory block allocated by the library), and allows for more precise analysisof the caller in general if a TU external function is invoked (by not losingassumptions).<br />We implemented (based on the prototype by A. Sidorin, et al. [2]) theCross Translation Unit analysis feature for Clang SA (4.0) and evaluated itsperformance on various open source projects. In our presentation, we showthat by using the CTU feature we found many new true positive reports andeliminated some false positives in real open source projects. We show thatwhile the total analysis time increases by 2-3 times compared to the non-CTUanalysis time, the execution remains scalable in the number of CPUs.<br />We also point out how the analysis coverage changes that may lead to the loss ofreports compared to the non-CTU baseline version.
-     </p>
+Today Clang Static Analyzer [4] can perform (context-sensitive)
+interprocedural analysis for C,C++ and Objective C les by inlining the
+called function into the callers' context. This means that that the full calling
+context(assumptions about the values of function parameters, global variables)
+is passed when analyzing the called function and then the assumptions about the
+returned value is passed back to the caller. This works well for function calls
+within a translation unit (TU), but when the symbolic execution reaches a
+function that is implemented in another TU, the analyzer engine skips
+the analysis of the called function definition. In particular, assumptions
+about references and pointers passed as function parameters get invalidated,
+and the return value of the function will be unknown. Losing information this way
+may lead to false positive and false negative warnings.<br />The cross translation
+unit (CTU) feature allows the analysis of called functions even if the definition
+of the function is external to the currently analyzed TU. This would allow
+detection of bugs in library functions stemming from incorrect usage (e.g. a
+library assumes that the user will free a memory block allocated by the library),
+and allows for more precise analysis of the caller in general if a TU external
+function is invoked (by not losing assumptions).<br />We implemented (based on the
+prototype by A. Sidorin, et al. [2]) the Cross Translation Unit analysis feature
+for Clang SA (4.0) and evaluated its performance on various open source projects.
+In our presentation, we show that by using the CTU feature we found many new true
+positive reports and eliminated some false positives in real open source
+projects. We show that while the total analysis time increases by 2-3 times
+compared to the non-CTU analysis time, the execution remains scalable in the
+number of CPUs.<br />We also point out how the analysis coverage changes that may
+lead to the loss of reports compared to the non-CTU baseline version.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -453,8 +599,27 @@ Today Clang Static Analyzer [4] can perf
        Delivering Sample-based PGO for PlayStation(R)4 (and the impact on optimized debugging)
      </p>
      <p class="abstract">
-Users of the PlayStation(R)4 toolchain have a number of expectations from their development tools: good runtime performance is vitally important, as is the ability to debug fully optimized code.  The team at Sony Interactive Entertainment have been working on delivering a Profile Guided Optimization solution to our users to allow them to maximize their runtime performance.  First we provided instrumentation-based PGO which has been successfully used by a number of our users.  More recently we followed this up by also providing a Sample-based PGO approach, built upon the work of and working together with the LLVM community, and integrated with the PS4 SDK's profiling tools for a simple and seamless workflow.<br /><br />In this talk, we'll present real-world case-studies showing how the Sample-based approach compares with Instrumented PGO in terms of user workflow, runtime intrusion while profiling, and final runtime performance improvement.  We'll show with the aid of real code examp
 les how the performance results of Sample-based PGO are heavily impacted by the accuracy of the compiler's line table debugging information and how by improving the propagation of debug data in some transformations both the Sample-based PGO runtime performance results and the overall user experience of debugging optimized code have been improved, so that anyone implementing new transformations can take this into account, especially as debug information is increasingly being used by consumers other than traditional debuggers that rely on its accuracy.
-     </p>
+Users of the PlayStation(R)4 toolchain have a number of expectations from their
+development tools: good runtime performance is vitally important, as is the
+ability to debug fully optimized code.  The team at Sony Interactive
+Entertainment have been working on delivering a Profile Guided Optimization
+solution to our users to allow them to maximize their runtime performance.
+First we provided instrumentation-based PGO which has been successfully used by
+a number of our users.  More recently we followed this up by also providing a
+Sample-based PGO approach, built upon the work of and working together with the
+LLVM community, and integrated with the PS4 SDK's profiling tools for a simple
+and seamless workflow.<br /><br />In this talk, we'll present real-world
+case-studies showing how the Sample-based approach compares with Instrumented
+PGO in terms of user workflow, runtime intrusion while profiling, and final
+runtime performance improvement.  We'll show with the aid of real code examples
+how the performance results of Sample-based PGO are heavily impacted by the
+accuracy of the compiler's line table debugging information and how by improving
+the propagation of debug data in some transformations both the Sample-based PGO
+runtime performance results and the overall user experience of debugging
+optimized code have been improved, so that anyone implementing new
+transformations can take this into account, especially as debug information is
+increasingly being used by consumers other than traditional debuggers that rely
+on its accuracy.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -489,8 +654,17 @@ Users of the PlayStation(R)4 toolchain h
        Effective Compilation of Higher-Order Programs
      </p>
      <p class="abstract">
-Many modern programming languages support both imperative and functional idioms.<br />However, state-of-the-art SSA-based intermediate representations like LLVM cannot natively represent crucial functional concepts like higher-order functions.<br />On the other hand, functional intermediate representations like GHC's Core employ an explicit scope nesting, which is cumbersome to maintain across certain transformations.<br />In this talk we present the functional, higher-order intermediate representation Thorin.<br />Thorin is based upon continuation-passing style and abandons explicit scope nesting in favor of a dependency graph.<br />Based on Thorin, we discuss an aggressive closure elimination phase and how we lower this higher-order intermediate representation to LLVM.<br />
-     </p>
+Many modern programming languages support both imperative and functional
+idioms.<br />However, state-of-the-art SSA-based intermediate representations
+like LLVM cannot natively represent crucial functional concepts like
+higher-order functions.<br />On the other hand, functional intermediate
+representations like GHC's Core employ an explicit scope nesting, which is
+cumbersome to maintain across certain transformations.<br />In this talk we
+present the functional, higher-order intermediate representation
+Thorin.<br />Thorin is based upon continuation-passing style and abandons
+explicit scope nesting in favor of a dependency graph.<br />Based on Thorin, we
+discuss an aggressive closure elimination phase and how we lower this
+higher-order intermediate representation to LLVM.<br /> </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -519,8 +693,33 @@ Many modern programming languages suppor
        Expressing high level optimizations within LLVM
      </p>
      <p class="abstract">
-At Azul we are building a production quality, state of the art LLVM based JIT compiler for Java. Originally targeted for C and C++, the LLVM IR is a rather low-level representation, which makes it challenging to represent and utilize high level Java semantics in the optimizer. One of the approaches is to perform all the high-level transformations over another IR before lowering the code to the LLVM IR, like it is done in the Swift compiler. However, this involves building a new IR and related infrastructure. In our compiler we have opted to express all the information we need in the LLVM IR instead. In this talk we will outline the embedded high level IR which enables us to perform high level Java specific optimizations over the LLVM IR. We will show the optimizations based on top of it and discuss some pros and cons of the approach we chose.<br /><br />The java type framework is the core of the system we built. It allows us to express the information about java types of the objects
  referenced by pointer values. One of the sources of this information is the bytecode. Our frontend uses metadata and attributes to annotate the IR with the types known from the bytecode. On the optimizer side we have a type inference analysis which computes the type for any given value using frontend generated facts and other information, like type checks in the code. This analysis is used by Java-specific optimizations, like devirtualization and simplification of type checks. We also taught some of the existing LLVM analyses and passes to take Java type information into account. For example, we use the java type of the pointer to infer the dereferenceability and aliasing properties of the pointer. We made inline cost analysis more accurate in the presence of java type based optimizations. We will discuss the optimizations we built on top of the java type framework and will show how the existing optimizations interact with it. Some parts of the system we built can be useful for oth
 ers, so we would like to start the discussion about upstreaming some of the parts.
-     </p>
+At Azul we are building a production quality, state of the art LLVM based JIT
+compiler for Java. Originally targeted for C and C++, the LLVM IR is a rather
+low-level representation, which makes it challenging to represent and utilize
+high level Java semantics in the optimizer. One of the approaches is to perform
+all the high-level transformations over another IR before lowering the code to
+the LLVM IR, like it is done in the Swift compiler. However, this involves
+building a new IR and related infrastructure. In our compiler we have opted to
+express all the information we need in the LLVM IR instead. In this talk we will
+outline the embedded high level IR which enables us to perform high level Java
+specific optimizations over the LLVM IR. We will show the optimizations based on
+top of it and discuss some pros and cons of the approach we chose.<br /><br />The
+java type framework is the core of the system we built. It allows us to express
+the information about java types of the objects referenced by pointer values.
+One of the sources of this information is the bytecode. Our frontend uses
+metadata and attributes to annotate the IR with the types known from the
+bytecode. On the optimizer side we have a type inference analysis which computes
+the type for any given value using frontend generated facts and other
+information, like type checks in the code. This analysis is used by
+Java-specific optimizations, like devirtualization and simplification of type
+checks. We also taught some of the existing LLVM analyses and passes to take
+Java type information into account. For example, we use the java type of the
+pointer to infer the dereferenceability and aliasing properties of the pointer.
+We made inline cost analysis more accurate in the presence of java type based
+optimizations. We will discuss the optimizations we built on top of the java
+type framework and will show how the existing optimizations interact with it.
+Some parts of the system we built can be useful for others, so we would like to
+start the discussion about upstreaming some of the parts.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -549,8 +748,20 @@ At Azul we are building a production qua
        Formalizing the Concurrency Semantics of an LLVM Fragment
      </p>
      <p class="abstract">
-The LLVM compiler follows closely the concurrency model of C/C++ 2011, but with a crucial difference. While in C/C++ a data race between a non-atomic read and a write is declared to be undefined behavior, in LLVM such a race has defined behavior: the read returns the special `undef' value. This subtle difference in the semantics of racy programs has profound consequences on the set of allowed program transformations, but it has been not formally been studied before.<br /><br />This work closes this gap by providing a formal memory model for a substantial fragment of LLVM and showing that it is correct as a concurrency model for a compiler intermediate language:<br />(1) it is stronger than the C/C++ model.<br />(2) weaker than the known hardware models, an.<br />(3) supports the expected program transformations.<br />In order to support LLVM's semantics for racy accesses, our formal model does not work on the level of single executions as the hardware and the C/C++ models do, but ra
 ther uses more elaborate structures called event structures.
-     </p>
+The LLVM compiler follows closely the concurrency model of C/C++ 2011, but with
+a crucial difference. While in C/C++ a data race between a non-atomic read and a
+write is declared to be undefined behavior, in LLVM such a race has defined
+behavior: the read returns the special 'undef' value. This subtle difference in
+the semantics of racy programs has profound consequences on the set of allowed
+program transformations, but it has been not formally been studied
+before.<br /><br />This work closes this gap by providing a formal memory model
+for a substantial fragment of LLVM and showing that it is correct as a
+concurrency model for a compiler intermediate language:<br />(1) it is stronger
+than the C/C++ model.<br />(2) weaker than the known hardware models, an.<br />(3)
+supports the expected program transformations.<br />In order to support LLVM's
+semantics for racy accesses, our formal model does not work on the level of
+single executions as the hardware and the C/C++ models do, but rather uses more
+elaborate structures called event structures.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -579,8 +790,37 @@ The LLVM compiler follows closely the co
        Introducing VPlan to the Loop Vectorizer
      </p>
      <p class="abstract">
-This talk describes our efforts to refactor LLVMâs Loop Vectorizer following the RFC posted on llvm-dev mailing list[1] and the presentation delivered at LLVM-US 2016[2]. We describe the design and initial implementation of VPlan which models the vectorized code and drives its transformation.<br /><br />In this talk we cover the main aspects implemented in our first proposed major patch[3]. These include introducing a Planning step into the Loop Vectorizer which follows its Legality step. The refactored Loop Vectorizer records in VPlans all vectorization decisions taken inside a candidate vectorized loop body, and uses the best VPlan to carry them out. These decisions specify which instructions are to  + be vectorized naturally, or  + be part of an interleave group, or  + be scalarized, and  + be packed or unpacked - at the definition rather than at its uses - to  provide both scalarized and vectorized forms.<br /><br />VPlan also explicitly represents all control-flow within t
 he loop body of the vectorized code. The Planner can optionally sink to-be scalarized instructions into predicated basic blocks in VPlan, thereby converting a current post-vectorization optimization of the Loop Vectorizer into the Planning step. Once the Planning step concludes a best VPlan is selected; this VPlan drives the vectorization transformation itself, including both the generation of basic-blocks and the generation of new instructions filling them, reusing existing Loop Vectorizer routines.<br /><br />The VPlan model implemented strives to be compact, addressing compile-time concerns. We conclude the talk by presenting ongoing and planned future steps for incremental refactoring of the Loop Vectorizer following our proposed patch[3] and the roadmap outlined in the LLVM-US presentation[2].<br /><br />Joint work with the Intel vectorization team.<br /><br />[1] [llvm-dev] RFC: Extending LV to vectorize outerloops, http://lists.llvm.org/pipermail/llvm-dev/2016-September/10505
 7.htm.<br />[2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer loop auto-vectorization, 2016 LLVM Developers' Meeting, https://www.youtube.com/watch?v=XXAvdUwO7k.<br />[3] [LV] Introducing VPlan to model the vectorized code and drive its transformation, https://reviews.llvm.org/D28975
-     </p>
+This talk describes our efforts to refactor LLVMâs Loop Vectorizer following the
+RFC posted on llvm-dev mailing list[1] and the presentation delivered at LLVM-US
+2016[2]. We describe the design and initial implementation of VPlan which models
+the vectorized code and drives its transformation.<br /><br />In this talk we
+cover the main aspects implemented in our first proposed major patch[3]. These
+include introducing a Planning step into the Loop Vectorizer which follows its
+Legality step. The refactored Loop Vectorizer records in VPlans all
+vectorization decisions taken inside a candidate vectorized loop body, and uses
+the best VPlan to carry them out. These decisions specify which instructions are
+to  + be vectorized naturally, or  + be part of an interleave group, or  + be
+scalarized, and  + be packed or unpacked - at the definition rather than at its
+uses - to  provide both scalarized and vectorized forms.<br /><br />VPlan also
+explicitly represents all control-flow within the loop body of the vectorized
+code. The Planner can optionally sink to-be scalarized instructions into
+predicated basic blocks in VPlan, thereby converting a current
+post-vectorization optimization of the Loop Vectorizer into the Planning step.
+Once the Planning step concludes a best VPlan is selected; this VPlan drives the
+vectorization transformation itself, including both the generation of
+basic-blocks and the generation of new instructions filling them, reusing
+existing Loop Vectorizer routines.<br /><br />The VPlan model implemented strives
+to be compact, addressing compile-time concerns. We conclude the talk by
+presenting ongoing and planned future steps for incremental refactoring of the
+Loop Vectorizer following our proposed patch[3] and the roadmap outlined in the
+LLVM-US presentation[2].<br /><br />Joint work with the Intel vectorization
+team.<br /><br />[1] [llvm-dev] RFC: Extending LV to vectorize outerloops,
+http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.htm.<br />[2]
+Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer loop
+auto-vectorization, 2016 LLVM Developers' Meeting,
+https://www.youtube.com/watch?v=XXAvdUwO7k.<br />[3] [LV] Introducing VPlan to
+model the vectorized code and drive its transformation,
+https://reviews.llvm.org/D28975 </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -603,8 +843,25 @@ This talk describes our efforts to refac
        LLVM performance optimization for z Systems
      </p>
      <p class="abstract">
-Since we initially added support for the IBM z Systemsline of mainframe processors back in 2013, one of themain goals of ongoing LLVM back-end development workhas been to improve the performance of generated code.<br /><br />Now, we have for the first time reached parity withGCC: the latest benchmark results of LLVM 4.0 matchthose measured with current GCC.<br /><br />In this talk I'll report on the most important changeswe had to make to the back-end to achieve this goal.<br />On the one hand, this includes changes to fully exploitall relevant instruction-set architecture features tomake best possible use of z/Architecture instructions,e.g. including support for condition code values, theregister high-word facility, and conditional execution.<br /><br />On the other hand, I'll talk about some of the changesnecessary to tune generated code for the micro-architectureof selected z Systems processors, in particular z13.<br />This includes considerations like instruction scheduling,but 
 also tuning loop unrolling, vectorization, and otherinstruction selection choices.<br /><br />Finally, I'll show some opportunities for even furtherperformance optimization, with focus on those wherewe are currently unable to fully exploit some hardwarecapabilities due to limitations in common-code partsof LLVM's code generator.<br />
-     </p>
+Since we initially added support for the IBM z Systems line of mainframe
+processors back in 2013, one of the main goals of ongoing LLVM back-end
+development work has been to improve the performance of generated
+code.<br /><br />Now, we have for the first time reached parity with
+GCC:<br /><br />The
+latest benchmark results of LLVM 4.0 match those measured with current
+GCC.<br /><br />In this talk I'll report on the most important changes we had to
+make to the back-end to achieve this goal.<br />On the one hand, this includes
+changes to fully exploit all relevant instruction-set architecture features
+to make best possible use of z/Architecture instructions,e.g. including support
+for condition code values, the register high-word facility, and conditional
+execution.<br /><br />On the other hand, I'll talk about some of the
+changes necessary to tune generated code for the micro-architecture of selected z
+Systems processors, in particular z13.<br />This includes considerations like
+instruction scheduling,but also tuning loop unrolling, vectorization, and
+other instruction selection choices.<br /><br />Finally, I'll show some
+opportunities for even further performance optimization, with focus on those
+where we are currently unable to fully exploit some hardware capabilities due to
+limitations in common-code parts of LLVM's code generator.<br /> </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -633,8 +890,12 @@ Since we initially added support for the
        LLVMTuner: An Autotuning framework for LLVM
      </p>
      <p class="abstract">
-We present LLVMTuner, an autotuning framework targeting whole program autotuning (instead of just small computation kernels). LLVMTuner significantly speeds up search by extracting the hottest top-level loop nests into separate LLVM modules, along with private copies of the functions most frequently called from each such loop nest and individually applying some search strategy to optimize each such extracted module.
-     </p>
+We present LLVMTuner, an autotuning framework targeting whole program autotuning
+(instead of just small computation kernels). LLVMTuner significantly speeds up
+search by extracting the hottest top-level loop nests into separate LLVM
+modules, along with private copies of the functions most frequently called from
+each such loop nest and individually applying some search strategy to optimize
+each such extracted module.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -669,8 +930,14 @@ We present LLVMTuner, an autotuning fram
        Path Invariance Based Partial Loop Un-switching
      </p>
      <p class="abstract">
-Loop un-switching is a well-known compiler optimization technique, it moves a conditional inside a loop outside by duplicating the loop's body and placing a version of it inside each of the if and else clauses of the conditional. Efficient Loop un-switching is inhibited in cases where a condition inside a loop is not loop-invariant or invariant in any of the conditional-paths inside the loop but not invariant in all the paths. We propose here a novel, efficient technique to identify partial invariant cases and optimize them by using partial loop un-switching. 
-     </p>
+Loop un-switching is a well-known compiler optimization technique, it moves a
+conditional inside a loop outside by duplicating the loop's body and placing a
+version of it inside each of the if and else clauses of the conditional.
+Efficient Loop un-switching is inhibited in cases where a condition inside a
+loop is not loop-invariant or invariant in any of the conditional-paths inside
+the loop but not invariant in all the paths. We propose here a novel, efficient
+technique to identify partial invariant cases and optimize them by using partial
+loop un-switching.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -711,8 +978,15 @@ Loop un-switching is a well-known compil
        Register Allocation and Instruction Scheduling in Unison
      </p>
      <p class="abstract">
-This talk presents Unison - a simple, flexible and potentially optimal tool that solves register allocation and instruction scheduling simultaneously. Unison is integrated with LLVM's code generator and can be used as a complement to the existing heuristic algorithms.<br /><br />The ability to deliver optimal code makes Unison a powerful tool for LLVM users and developers: LLVM users can trade compilation time for code quality beyond the usual -O{0,1,2,3,..} optimization levels; LLVM developers can identify improvement opportunities in the existing heuristic algorithms. The talk discusses some of the improvement opportunities identified so far with the help of Unison.
-     </p>
+This talk presents Unison - a simple, flexible and potentially optimal tool that
+solves register allocation and instruction scheduling simultaneously. Unison is
+integrated with LLVM's code generator and can be used as a complement to the
+existing heuristic algorithms.<br /><br />The ability to deliver optimal code
+makes Unison a powerful tool for LLVM users and developers: LLVM users can trade
+compilation time for code quality beyond the usual -O{0,1,2,3,..} optimization
+levels; LLVM developers can identify improvement opportunities in the existing
+heuristic algorithms. The talk discusses some of the improvement opportunities
+identified so far with the help of Unison.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -765,8 +1039,30 @@ tbd.
        Using LLVM for Safety-Critical Applications
      </p>
      <p class="abstract">
-Would you step into a car if you knew that the software for the brakes was compiled with LLVM? The question is not academic. Compiled code is used today for many of the safety-critical components in modern cars. For the development of autonomous driving systems, the car industry demands safety qualified, high performance compilers to compile image and radar signal processing libraries written in C++, among other things. Fortunately, there are international standards such as ISO 26262 that describe the requirements for  electronic components, and their software, to be used in safety-critical systems.<br /><br />Perhaps surprisingly, quality and safety are not necessarily the same, although they go together well. A compiler that dumps core during compilation would not be considered good quality, but it would be very safe: no erroneous code is generated that can be used in a safety-critical component.<br /><br />This presentation discusses general techniques used to design safe systems
  and more specifically the steps that are needed to develop sufficient trust for compilation tools to be used in cars, medical equipment and nuclear installations. For compiler libraries, often an invisible part to the user of an SDK, safety requirements are actually set higher than those of the compiler itself. This is logical to the extent that the compiler itself does not, and the library code does become part of the safety-critical component.<br /><br />We will look at the steps that are necessary to qualify compilers and libraries, the V-model of software engineering, MC/DC analysis, the MISRA coding guidelines, how LLVM's engineering can be improved, what this means for the developer, and if you, as a compiler developer, can be held responsible for a car breaking down with fatal consequences.
-     </p>
+Would you step into a car if you knew that the software for the brakes was
+compiled with LLVM? The question is not academic. Compiled code is used today
+for many of the safety-critical components in modern cars. For the development
+of autonomous driving systems, the car industry demands safety qualified, high
+performance compilers to compile image and radar signal processing libraries
+written in C++, among other things. Fortunately, there are international
+standards such as ISO 26262 that describe the requirements for  electronic
+components, and their software, to be used in safety-critical
+systems.<br /><br />Perhaps surprisingly, quality and safety are not necessarily
+the same, although they go together well. A compiler that dumps core during
+compilation would not be considered good quality, but it would be very safe: no
+erroneous code is generated that can be used in a safety-critical
+component.<br /><br />This presentation discusses general techniques used to
+design safe systems and more specifically the steps that are needed to develop
+sufficient trust for compilation tools to be used in cars, medical equipment and
+nuclear installations. For compiler libraries, often an invisible part to the
+user of an SDK, safety requirements are actually set higher than those of the
+compiler itself. This is logical to the extent that the compiler itself does
+not, and the library code does become part of the safety-critical
+component.<br /><br />We will look at the steps that are necessary to qualify
+compilers and libraries, the V-model of software engineering, MC/DC analysis,
+the MISRA coding guidelines, how LLVM's engineering can be improved, what this
+means for the developer, and if you, as a compiler developer, can be held
+responsible for a car breaking down with fatal consequences.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -795,8 +1091,50 @@ Would you step into a car if you knew th
        Using LLVM in a scalable, high-available, in-memory database server
      </p>
      <p class="abstract">
-In this presentation we would like to show you how we at SAP are using LLVM within our HANA database. We will show the benefits we have from using LLVM as well as the specific challenges of working in an in-memory database server. Thereby we will explain the changes we have to do in the LLVM source and why we have a significant delay until we can move to the latest LLVM version.<br /><br />A key differentiator of a compiler integrated into a server compared to a standalone compiler is that within the server you may not crash whatever input you get. Even in out-of-memory situation you have to stop and cleanup your current work and return back to your starting state. This is doable but requires to immediately assign all resource allocations to an owner and to take special care when working at the edge of C++ memory handling e.g. when overloading operator new. About two thirds of the changes to LLVM we are doing on our version of the LLVM source are related to out-of-memory situations.
 <br /><br />Within the HANA database we use LLVM to compile stored procedures and query plans. For stored procedures several domain specific languages are available which are translated to LLVM IR via an intermediate language. The domain specific languages have powerful features and through the layered code generation the resulting LLVM IR code can become rather large. Furthermore, within our domain specific languages all code is often put into one function which results in having one large function in the LLVM IR. Since the runtime of many optimizer passes and of the register allocator increases non-linear with the size of the functions our compile times exploded up to many hours. To reduce the compile time we are now trying to split large functions automatically into smaller pieces.<br /><br />In contrast when compiling query plans to machine code the resulting functions typically have small to medium size. The overall response time of the query is determined by the compile time o
 f the query plan plus the execution time of the resulting machine code. So in this scenario the compile time for small and medium sized functions becomes important, sometimes it exceeds the actual execution time. If the time to execute a query without compilation is X microseconds per data row and the time to compile the execution plan of the query is Y microseconds then you need to process Y/X data rows to amortize the cost of compilation. We made several tries to speed up the compilation by reducing the number of optimization passes but are currently stuck at the actual machine code generation. Currently our break-even point between interpreted execution and compiled execution is at about 10.000 data rows.<br /><br />The key factor why we are happy to use LLVM is the excellent quality we experienced. We use LLVM for 6 years and we had less than a handful issues which were caused by bugs in LLVM. Also when upgrading from one LLVM version to another we did not experience new bugs (b
 esides handling of out-of-memory situations). Further we like the available traces and supportability features to track down problems that occur, the easy to consume APIs and we are very pleased that it is possible to generate debug info for the compiled code so debugging with GDB and profiling is possible even when we have a mixture of C++ and LLVM stack frames.
-     </p>
+In this presentation we would like to show you how we at SAP are using LLVM
+within our HANA database. We will show the benefits we have from using LLVM as
+well as the specific challenges of working in an in-memory database server.
+Thereby we will explain the changes we have to do in the LLVM source and why we
+have a significant delay until we can move to the latest LLVM
+version.<br /><br />A key differentiator of a compiler integrated into a server
+compared to a standalone compiler is that within the server you may not crash
+whatever input you get. Even in out-of-memory situation you have to stop and
+cleanup your current work and return back to your starting state. This is doable
+but requires to immediately assign all resource allocations to an owner and to
+take special care when working at the edge of C++ memory handling e.g. when
+overloading operator new. About two thirds of the changes to LLVM we are doing
+on our version of the LLVM source are related to out-of-memory
+situations.<br /><br />Within the HANA database we use LLVM to compile stored
+procedures and query plans. For stored procedures several domain specific
+languages are available which are translated to LLVM IR via an intermediate
+language. The domain specific languages have powerful features and through the
+layered code generation the resulting LLVM IR code can become rather large.
+Furthermore, within our domain specific languages all code is often put into one
+function which results in having one large function in the LLVM IR. Since the
+runtime of many optimizer passes and of the register allocator increases
+non-linear with the size of the functions our compile times exploded up to many
+hours. To reduce the compile time we are now trying to split large functions
+automatically into smaller pieces.<br /><br />In contrast when compiling query
+plans to machine code the resulting functions typically have small to medium
+size. The overall response time of the query is determined by the compile time
+of the query plan plus the execution time of the resulting machine code. So in
+this scenario the compile time for small and medium sized functions becomes
+important, sometimes it exceeds the actual execution time. If the time to
+execute a query without compilation is X microseconds per data row and the time
+to compile the execution plan of the query is Y microseconds then you need to
+process Y/X data rows to amortize the cost of compilation. We made several tries
+to speed up the compilation by reducing the number of optimization passes but
+are currently stuck at the actual machine code generation. Currently our
+break-even point between interpreted execution and compiled execution is at
+about 10.000 data rows.<br /><br />The key factor why we are happy to use LLVM is
+the excellent quality we experienced. We use LLVM for 6 years and we had less
+than a handful issues which were caused by bugs in LLVM. Also when upgrading
+from one LLVM version to another we did not experience new bugs (besides
+handling of out-of-memory situations). Further we like the available traces and
+supportability features to track down problems that occur, the easy to consume
+APIs and we are very pleased that it is possible to generate debug info for the
+compiled code so debugging with GDB and profiling is possible even when we have
+a mixture of C++ and LLVM stack frames.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -879,8 +1217,11 @@ In this presentation we would like to sh
        XLA: Accelerated Linear Algebra
      </p>
      <p class="abstract">
-We'll introduce XLA, a domain-specific optimizing compiler and runtime for linear algebra. XLA compiles a graph of linear algebra operations to LLVM IR and then uses LLVM to compile IR to CPU or GPU executables. We integrated XLA to TensorFlow, and XLA sped up a variety of internal and open-source TensorFlow benchmarks by up to 4.7x with a geometric mean of 1.4x. 
-     </p>
+We'll introduce XLA, a domain-specific optimizing compiler and runtime for
+linear algebra. XLA compiles a graph of linear algebra operations to LLVM IR and
+then uses LLVM to compile IR to CPU or GPU executables. We integrated XLA to
+TensorFlow, and XLA sped up a variety of internal and open-source TensorFlow
+benchmarks by up to 4.7x with a geometric mean of 1.4x.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -903,8 +1244,15 @@ We'll introduce XLA, a domain-specific o
        XRay in LLVM
      </p>
      <p class="abstract">
-Debugging high throughput, low-latency C/C++ systems in production is hard. At Google we developed XRay, a function call tracing system that allows Google engineers to get accurate function call traces with negligible overhead when off and moderate overhead when on, suitable for services deployed in production. XRay enables efficient function call entry/exit logging with high accuracy timestamps, and can be dynamically enabled and disabled. This talk is about the ongoing developments with XRay in the LLVM project, what you can do with it now, and what to look forward to as we continue working on XRay in the LLVM project.
-     </p>
+Debugging high throughput, low-latency C/C++ systems in production is hard. At
+Google we developed XRay, a function call tracing system that allows Google
+engineers to get accurate function call traces with negligible overhead when off
+and moderate overhead when on, suitable for services deployed in production.
+XRay enables efficient function call entry/exit logging with high accuracy
+timestamps, and can be dynamically enabled and disabled. This talk is about the
+ongoing developments with XRay in the LLVM project, what you can do with it now,
+and what to look forward to as we continue working on XRay in the LLVM project.
+</p>
      </td>
     </tr>
     <tr class="separator" />
@@ -932,8 +1280,23 @@ Student Research Competition (SRC)
        Automated Combination of Tolerance and Control Flow Integrity Countermeasures against Multiple Fault Attacks
      </p>
      <p class="abstract">
-Fault injection attacks are considered as one of the most fearsome threats against secure embedded systems.<br />Existing software countermeasures are either applied at the source code level where cautions must be taking to prevent the compiler from altering the countermeasure during compilation, or at the assembly code level where the code lacks semantic information, which as a result, limits the possibilities of code transformation and leads to significant overheads. Moreover, to protect against various fault models, countermeasures are usually applied incrementally without taking into account the impact one can have on another.<br /><br />This paper presents an automated application of several countermeasures against fault attacks,  that combines fault tolerance and control flow integrity.<br />The fault tolerance schemes are parameterizable over the width of the fault injection, and the number of fault injections that the secured code must be protected against.<br />The counterm
 easures are applied by a modified compiler based on clang/LLVM.<br />As a result, the produced code is both optimized and secure by design.<br />Performance and security evaluations on different benchmarks show reduced performance overheads compared to existing solutions, with the expected security level.
-     </p>
+Fault injection attacks are considered as one of the most fearsome threats
+against secure embedded systems.<br />Existing software countermeasures are
+either applied at the source code level where cautions must be taking to prevent
+the compiler from altering the countermeasure during compilation, or at the
+assembly code level where the code lacks semantic information, which as a
+result, limits the possibilities of code transformation and leads to significant
+overheads. Moreover, to protect against various fault models, countermeasures
+are usually applied incrementally without taking into account the impact one can
+have on another.<br /><br />This paper presents an automated application of
+several countermeasures against fault attacks,  that combines fault tolerance
+and control flow integrity.<br />The fault tolerance schemes are parameterizable
+over the width of the fault injection, and the number of fault injections that
+the secured code must be protected against.<br />The countermeasures are applied
+by a modified compiler based on clang/LLVM.<br />As a result, the produced code
+is both optimized and secure by design.<br />Performance and security evaluations
+on different benchmarks show reduced performance overheads compared to existing
+solutions, with the expected security level.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -974,8 +1337,32 @@ Fault injection attacks are considered a
        Bringing Next Generation C++ to GPUs: The LLVM-based PACXX Approach
      </p>
      <p class="abstract">
-In this paper, we describe PACXX -- our approach for programming Graphics Processing Unit (GPU) in C++. PACXX is based on Clang and LLVM and allows to compile arbitrary C++ code for GPU execution. PACXX enables developers to use all the convenient features of modern C++14: type deduction, lambda expressions, and algorithms from the Standard Template Library (STL).<br />Using PACXX, a GPU program is written as a single C++ program, rather than two distinct host and kernel programs as in CUDA or OpenCL.<br />Using LLVM's just-in-time compilation capabilities, PACXX generates efficient GPU code at runtime.<br /><br />We demonstrate how PACXX supports a composable GPU programming approach: developers compose their applications from simple and reusable patterns.<br />We extend the range-v3 library which is currently developed as the next generation of the C++ Standard Template Library (STL) to allow for GPU programming using ranges.<br /><br />We describe how PACXX enables developers to 
 use multi-staging in C++ to optimize their GPU programs at runtime. PACXX provides an easy-to-use and type-safe API  avoiding the pitfalls of string manipulation for multi-staging known from other GPU programming models (e.g., OpenCL).<br /><br />Our evaluation shows that using PACXX achieves competitive performance to CUDA, and our extended range-v3 programming approach can outperform Nvidia's highly-tuned Thrust library.<br /><br />---<br /><br />This submission is a compilation of:<br />Multi-Stage Programming for GPUs in Modern C++ using PACXX published in the proceedings of the 9th GPGPU Workshop @ PPoPP 2016 - http://dl.acm.org/citation.cfm?id=2884049Towards Composable GPU Programming: Programming GPUs with Eager Actions and Lazy Views published in the proceedings of the 8th PMAM Workshop @ PPoPP 2017 (to appear) https://github.com/michel-steuwer/publications/raw/master/2017/PMAM-2017.pdf
-     </p>
+In this paper, we describe PACXX -- our approach for programming Graphics
+Processing Unit (GPU) in C++. PACXX is based on Clang and LLVM and allows to
+compile arbitrary C++ code for GPU execution. PACXX enables developers to use
+all the convenient features of modern C++14: type deduction, lambda expressions,
+and algorithms from the Standard Template Library (STL).<br />Using PACXX, a GPU
+program is written as a single C++ program, rather than two distinct host and
+kernel programs as in CUDA or OpenCL.<br />Using LLVM's just-in-time compilation
+capabilities, PACXX generates efficient GPU code at runtime.<br /><br />We
+demonstrate how PACXX supports a composable GPU programming approach: developers
+compose their applications from simple and reusable patterns.<br />We extend the
+range-v3 library which is currently developed as the next generation of the C++
+Standard Template Library (STL) to allow for GPU programming using
+ranges.<br /><br />We describe how PACXX enables developers to use multi-staging
+in C++ to optimize their GPU programs at runtime. PACXX provides an easy-to-use
+and type-safe API  avoiding the pitfalls of string manipulation for
+multi-staging known from other GPU programming models (e.g.,
+OpenCL).<br /><br />Our evaluation shows that using PACXX achieves competitive
+performance to CUDA, and our extended range-v3 programming approach can
+outperform Nvidia's highly-tuned Thrust library.<br /><br />---<br /><br />This
+submission is a compilation of:<br />Multi-Stage Programming for GPUs in Modern
+C++ using PACXX published in the proceedings of the 9th GPGPU Workshop @ PPoPP
+2016 - http://dl.acm.org/citation.cfm?id=2884049Towards Composable GPU
+Programming: Programming GPUs with Eager Actions and Lazy Views published in the
+proceedings of the 8th PMAM Workshop @ PPoPP 2017 (to appear)
+https://github.com/michel-steuwer/publications/raw/master/2017/PMAM-2017.pdf
+</p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1010,8 +1397,18 @@ In this paper, we describe PACXX -- our
        Data Reuse Analysis for Automated Synthesis of Custom Instructions in Sliding Window Applications
      </p>
      <p class="abstract">
-The efficiency of accelerators supporting complex instruc- tions is often limited by their input/output bandwidth re- quirements. To overcome this bottleneck, we herein intro- duce a novel methodology that, following a static code anal- ysis approach, harnesses data reuse in-between multiple it- eration of loop bodies to reduce the amount of data trans- fers. Our methodology, building upon the features offered by the LLVM-Polly framework, enables the automated de- sign of fully synthesisable and highly-efficient accelerators. Our approach is targeted towards sliding window kernels, which are employed in many applications in the signal and image processing domain.<br /><br />NOTE: This paper has been published in IMPACT 2017 Seventh International Workshop on Polyhedral Compilation Techniques Jan 23, 2017, Stockholm, Sweden<br />In conjunction with HiPEAC 2017. http://impact.gforge.inria.fr/impact2017
-     </p>
+The efficiency of accelerators supporting complex instructions is often
+limited by their input/output bandwidth requirements. To overcome this
+bottleneck, we herein introduce a novel methodology that, following a static
+code analysis approach, harnesses data reuse in-between multiple iteration
+of loop bodies to reduce the amount of data transfers. Our methodology,
+building upon the features offered by the LLVM-Polly framework, enables the
+automated design of fully synthesisable and highly-efficient accelerators. Our
+approach is targeted towards sliding window kernels, which are employed in many
+applications in the signal and image processing domain.<br /><br />NOTE: This
+paper has been published in IMPACT 2017 Seventh International Workshop on
+Polyhedral Compilation Techniques Jan 23, 2017, Stockholm, Sweden<br />In
+conjunction with HiPEAC 2017. http://impact.gforge.inria.fr/impact2017 </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1052,8 +1449,27 @@ The efficiency of accelerators supportin
        ELF GOT Problems? C F I Can Help.
      </p>
      <p class="abstract">
-Control-Flow Integrity (CFI) techniques make the deployment of malicious exploits harder by constraining the control flow of programs to that of a statically analyzed control-flow graph (CFG). This is made harder when position-independent dynamically shared objects are compiled separately, and then linked together only at runtime by a dynamic linker. Deploying CFI only on statically linked objects ensures that control flow enters only the correct procedure linkage table (PLT) entry, not where that trampoline jumps to; it leaves a weak link at the boundaries of shared objects that attackers can use to gain control. We show that manipulation of the PLT GOT has a long history of exploitation, and is still being used today against real binaries - even with state of the art CFI enforcement. PLT-CFI is a CFI implementation for the ELF dynamic-linkage model, designed to work along-side existing CFI implementations that ensure correct control flow within a single dynamic shared object (DSO)
 . We make modifications to the LLVM stack to insert dynamic checks into the PLT that ensure correct control flow even in the presence of an unknown base address of a dynamic library, while maintaining the ability to link in a lazy fashion and allowing new implementations (e.g., plug-ins) to be loaded at runtime. We make only minor ABI changes, and still offer full backwards compatibility with binaries compiled without our scheme. Furthermore, we deployed our CFI scheme for both AMD64 and AArch64 on the FreeBSD operating system and measured performance.
-     </p>
+Control-Flow Integrity (CFI) techniques make the deployment of malicious
+exploits harder by constraining the control flow of programs to that of a
+statically analyzed control-flow graph (CFG). This is made harder when
+position-independent dynamically shared objects are compiled separately, and
+then linked together only at runtime by a dynamic linker. Deploying CFI only on
+statically linked objects ensures that control flow enters only the correct
+procedure linkage table (PLT) entry, not where that trampoline jumps to; it
+leaves a weak link at the boundaries of shared objects that attackers can use to
+gain control. We show that manipulation of the PLT GOT has a long history of
+exploitation, and is still being used today against real binaries - even with
+state of the art CFI enforcement. PLT-CFI is a CFI implementation for the ELF
+dynamic-linkage model, designed to work along-side existing CFI implementations
+that ensure correct control flow within a single dynamic shared object (DSO). We
+make modifications to the LLVM stack to insert dynamic checks into the PLT that
+ensure correct control flow even in the presence of an unknown base address of a
+dynamic library, while maintaining the ability to link in a lazy fashion and
+allowing new implementations (e.g., plug-ins) to be loaded at runtime. We make
+only minor ABI changes, and still offer full backwards compatibility with
+binaries compiled without our scheme. Furthermore, we deployed our CFI scheme
+for both AMD64 and AArch64 on the FreeBSD operating system and measured
+performance.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1082,8 +1498,17 @@ Control-Flow Integrity (CFI) techniques
        LifeJacket: Verifying Precise Floating-Point Optimizations in LLVM
      </p>
      <p class="abstract">
-Users depend on correct compiler optimizations but floating-point arithmetic is difficult to optimize transparently. Manually reasoning about all of floating-point arithmeticâs esoteric properties is error-prone and increases the cost of adding new optimizations.<br />We present an approach to automate reasoning about precise floating-point optimizations using satisfiability modulo theories (SMT) solvers. We implement the approach in LifeJacket, a system for automatically verifying precise floating-point optimizations for the LLVM assembly language. We have used LifeJacket to verify 43 LLVM optimizations and to discover eight incorrect ones, including three previously unreported problems. LifeJacket is an open source extension of the Alive system for optimization verification.
-     </p>
+Users depend on correct compiler optimizations but floating-point arithmetic is
+difficult to optimize transparently. Manually reasoning about all of
+floating-point arithmeticâs esoteric properties is error-prone and increases the
+cost of adding new optimizations.<br />We present an approach to automate
+reasoning about precise floating-point optimizations using satisfiability modulo
+theories (SMT) solvers. We implement the approach in LifeJacket, a system for
+automatically verifying precise floating-point optimizations for the LLVM
+assembly language. We have used LifeJacket to verify 43 LLVM optimizations and
+to discover eight incorrect ones, including three previously unreported
+problems. LifeJacket is an open source extension of the Alive system for
+optimization verification.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1112,8 +1537,19 @@ Users depend on correct compiler optimiz
        Software Prefetching for Indirect Memory Accesses
      </p>
      <p class="abstract">
-Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting proposition to solve this is software prefetching, where special non-blocking loads are used to bring data into the cache hierarchy just before being required. However, these are difficult to insert to effectively improve performance, and techniques for automatic insertion are currently limited.<br /><br />This paper develops a novel compiler pass to automatically generate software prefetches for indirect memory accesses, a special class of irregular accesses often seen in high-performance workloads. We evaluate this across a wide set of systems, all of which gain benefit from the technique. Across a set of memory-bound benchmarks, our automated pass achieves average speedups of 1.3x and 1.1x for an Intel Haswell processor and an ARM Cortex-A57, both out-of-order cores, and improvements of 2.1x and 3.7x for the in-order ARM Cortex-A53 and Intel Xeon Phi.
-     </p>
+Many modern data processing and HPC workloads are heavily memory-latency bound.
+A tempting proposition to solve this is software prefetching, where special
+non-blocking loads are used to bring data into the cache hierarchy just before
+being required. However, these are difficult to insert to effectively improve
+performance, and techniques for automatic insertion are currently
+limited.<br /><br />This paper develops a novel compiler pass to automatically
+generate software prefetches for indirect memory accesses, a special class of
+irregular accesses often seen in high-performance workloads. We evaluate this
+across a wide set of systems, all of which gain benefit from the technique.
+Across a set of memory-bound benchmarks, our automated pass achieves average
+speedups of 1.3x and 1.1x for an Intel Haswell processor and an ARM Cortex-A57,
+both out-of-order cores, and improvements of 2.1x and 3.7x for the in-order ARM
+Cortex-A53 and Intel Xeon Phi.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1147,8 +1583,26 @@ Lightning Talks
        ClrFreqPrinter: A Tool for Frequency Annotated Control Flow Graphs Generation
      </p>
      <p class="abstract">
-Recent LLVM distributions have been offering the option to print the Control Flow Graph (CFG) of functions in the Intermediate Representation (IR) level. This feature is fairly useful as it enables the visualization of the CFG of a function, thus providing a better overview of the control flow among the Basic Blocks (BBs). In many occasions, though, more information than that is needed in order to obtain quickly an adequate high level view of the execution of a function. One such desired attribute, that could lead to a better understanding, is the execution frequency of each Basic Block. We have developed our own LLVM analysis pass which makes use of the BB Frequency Info Analysis pass methods, as well as the profiling information gathered by the use of the llvm-profdata tool. Our analysis pass gathers the execution frequency of each BB in every function of an application. Subsequently, the other part of our toolchain, exploiting the default LLVM CFG printer, makes use of this data 
 and assigns a specific colour to each BB in a CFG of a function. The colour scheme followed was inspired by a typical weather map, as it can bee seen in Figure 1. An example of the generated colour annotated CFG of a jpeg function can be seen in Figure 2. Our tool, ClrFreqPrinter, can be applied in any benchmark and can be used to provide instant intuition regarding the execution frequency of BBs inside a function. A feature that can be useful for any developer or researcher working with the LLVM framework.
-     </p>
+Recent LLVM distributions have been offering the option to print the Control
+Flow Graph (CFG) of functions in the Intermediate Representation (IR) level.
+This feature is fairly useful as it enables the visualization of the CFG of a
+function, thus providing a better overview of the control flow among the Basic
+Blocks (BBs). In many occasions, though, more information than that is needed in
+order to obtain quickly an adequate high level view of the execution of a
+function. One such desired attribute, that could lead to a better understanding,
+is the execution frequency of each Basic Block. We have developed our own LLVM
+analysis pass which makes use of the BB Frequency Info Analysis pass methods, as
+well as the profiling information gathered by the use of the llvm-profdata tool.
+Our analysis pass gathers the execution frequency of each BB in every function
+of an application. Subsequently, the other part of our toolchain, exploiting the
+default LLVM CFG printer, makes use of this data and assigns a specific colour
+to each BB in a CFG of a function. The colour scheme followed was inspired by a
+typical weather map, as it can bee seen in Figure 1. An example of the generated
+colour annotated CFG of a jpeg function can be seen in Figure 2. Our tool,
+ClrFreqPrinter, can be applied in any benchmark and can be used to provide
+instant intuition regarding the execution frequency of BBs inside a function. A
+feature that can be useful for any developer or researcher working with the LLVM
+framework.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1171,8 +1625,18 @@ Recent LLVM distributions have been offe
        DIVA (Debug Information Visual Analyzer)
      </p>
      <p class="abstract">
-In this lightning talk, Phillip will present DIVA (Debug Information Visual Analyzer). DIVA is a new command line tool that processes DWARF debug information contained within ELF files and prints the semantics of that debug information. The DIVA output is designed with an aim to be understandable by software programmers without any low-level compiler or DWARF knowledge; as such, it can be used to report debug information bugs to the compiler provider. DIVA's output can also be used as the input to DWARF tests, to compare the debug information generated from multiple compilers, from different versions of the same compiler, from different compiler switches and from the use of different DWARF specifications (i.e. DWARF 3, 4 and 5).  DIVA will be open sourced in 2017 to be used in the LLVM project to test and validate the output of clang to help improve the quality of the debug experience.
-     </p>
+In this lightning talk, Phillip will present DIVA (Debug Information Visual
+Analyzer). DIVA is a new command line tool that processes DWARF debug
+information contained within ELF files and prints the semantics of that debug
+information. The DIVA output is designed with an aim to be understandable by
+software programmers without any low-level compiler or DWARF knowledge; as such,
+it can be used to report debug information bugs to the compiler provider. DIVA's
+output can also be used as the input to DWARF tests, to compare the debug
+information generated from multiple compilers, from different versions of the
+same compiler, from different compiler switches and from the use of different
+DWARF specifications (i.e. DWARF 3, 4 and 5).  DIVA will be open sourced in 2017
+to be used in the LLVM project to test and validate the output of clang to help
+improve the quality of the debug experience.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1195,8 +1659,23 @@ In this lightning talk, Phillip will pre
        Generalized API checkers for the Clang Static Analyzer
      </p>
      <p class="abstract">
-I present three modified API checkers, that use external metadata, to warn on improper function calls. We aim to upstream these checkers to replace existing hard-coded data and duplicated code. The goal is to allow anyone to check any API, using the Static Analyzer as a black box.<br /><br />Draft Presentation:<br /><br />    Slide 1: Background        - At Sony we have some custom Clang Static Analyzer checkers for APIs        - They read in YAML data describing an API so they know what to check for    Slide 2: Unchecked Return Checker (Simple Example)        - Warn when error returns from functions are not checked        - Based on upstream checkers with hard-coded lists of functions    Slide 3: Async Argument Checker (Simple Example)        - Warn on calls to asynchronous functions with pointers to the stack        - These bad calls can result in timing dependent stack corruptions        - Also based on upstream checkers    Slide 4: Argument Value Checker (Simple Example)        
 - Warn when incorrect values are passed to functions        - These calls may only occur in very specific circumstances    Slide 5:<br />        - We aim to upstream these checkers so anyone can check APIs, without knowing anything about the static analyzer implementation
-     </p>
+I present three modified API checkers, that use external metadata, to warn on
+improper function calls. We aim to upstream these checkers to replace existing
+hard-coded data and duplicated code. The goal is to allow anyone to check any
+API, using the Static Analyzer as a black box.<br /><br />Draft
+Presentation:<br /><br />    Slide 1: Background        - At Sony we have some
+custom Clang Static Analyzer checkers for APIs        - They read in YAML data
+describing an API so they know what to check for    Slide 2: Unchecked Return
+Checker (Simple Example)        - Warn when error returns from functions are not
+checked        - Based on upstream checkers with hard-coded lists of functions
+Slide 3: Async Argument Checker (Simple Example)        - Warn on calls to
+asynchronous functions with pointers to the stack        - These bad calls can
+result in timing dependent stack corruptions        - Also based on upstream
+checkers    Slide 4: Argument Value Checker (Simple Example)        - Warn when
+incorrect values are passed to functions        - These calls may only occur in
+very specific circumstances    Slide 5:<br />        - We aim to upstream these
+checkers so anyone can check APIs, without knowing anything about the static
+analyzer implementation </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1219,8 +1698,14 @@ I present three modified API checkers, t
        LibreOffice loves LLVM
      </p>
      <p class="abstract">
-LibreOffice (with its StarOffice/OpenOffice.org ancestry) is one of the behemoths in the open source C++ project zoo.  On the one hand, we are always looking for tools that help us in keeping its code in shape and maintainable.  On the other hand, the sheer size of the code base and its diversity are a welcome test bed for any tool to run against.  Whatever clever static analysis feat you come up with, you'll be sure to find at least one hit in the LibreOffice code base.<br /><br />This talk gives a short overview of how we use Clang-based tooling in LibreOffice development.
-     </p>
+LibreOffice (with its StarOffice/OpenOffice.org ancestry) is one of the
+behemoths in the open source C++ project zoo.  On the one hand, we are always
+looking for tools that help us in keeping its code in shape and maintainable.
+On the other hand, the sheer size of the code base and its diversity are a
+welcome test bed for any tool to run against.  Whatever clever static analysis
+feat you come up with, you'll be sure to find at least one hit in the
+LibreOffice code base.<br /><br />This talk gives a short overview of how we use
+Clang-based tooling in LibreOffice development.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1243,8 +1728,14 @@ LibreOffice (with its StarOffice/OpenOff
        Simple C++ reflection with a Clang plugin
      </p>
      <p class="abstract">
-Static and dynamic reflection is a mechanism that can be used for various purposes: serialization of arbitrary data structures, scripting, remote procedure calls, etc. Currently, the C++ programming language lacks a standard solution for it, but it is not that difficult to implement a simple reflection framework as a library with a custom Clang plugin.<br /><br />In this talk, I will present a simple solution for visualizing algorithm execution in C++ programs which consists of a runtime library, a Clang plugin, and a web application for displaying animations.
-     </p>
+Static and dynamic reflection is a mechanism that can be used for various
+purposes: serialization of arbitrary data structures, scripting, remote
+procedure calls, etc. Currently, the C++ programming language lacks a standard
+solution for it, but it is not that difficult to implement a simple reflection
+framework as a library with a custom Clang plugin.<br /><br />In this talk, I will
+present a simple solution for visualizing algorithm execution in C++ programs
+which consists of a runtime library, a Clang plugin, and a web application for
+displaying animations.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1272,8 +1763,21 @@ BoFs
        Alternative Backend Design
      </p>
      <p class="abstract">
-While LLVM has a modern mostly graph-based intermediate language in SSA form, its backend infrastructure relies upon a more classic imperative approach.<br />In this session, I want to present and discuss a design for backends which heavily uses a single graph-based representation in SSA form.<br />In this approach, code generation is seen as the process of adding more invariants to this graph, e.g. an instruction schedule and a register assignment, until it is suitable for assembly emission.<br />During the entire process, this representation is invariantly kept in SSA form.<br />I take a close look at the advantages this property has for the steps in code generation.<br />For example it allows to decouple spilling from register allocation, which mitigates the phase ordering problem of these two steps.<br />I also examine some typical challenges during code generation, both caused by the chosen program representation and the target machine.<br />This includes SSA reconstruction dur
 ing spilling as well as live-range splitting and copy coalescing to tackle instructions with constraints on registers.
-     </p>
+While LLVM has a modern mostly graph-based intermediate language in SSA form,
+its backend infrastructure relies upon a more classic imperative
+approach.<br />In this session, I want to present and discuss a design for
+backends which heavily uses a single graph-based representation in SSA
+form.<br />In this approach, code generation is seen as the process of adding
+more invariants to this graph, e.g. an instruction schedule and a register
+assignment, until it is suitable for assembly emission.<br />During the entire
+process, this representation is invariantly kept in SSA form.<br />I take a close
+look at the advantages this property has for the steps in code
+generation.<br />For example it allows to decouple spilling from register
+allocation, which mitigates the phase ordering problem of these two steps.<br />I
+also examine some typical challenges during code generation, both caused by the
+chosen program representation and the target machine.<br />This includes SSA
+reconstruction during spilling as well as live-range splitting and copy
+coalescing to tackle instructions with constraints on registers.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1296,8 +1800,16 @@ While LLVM has a modern mostly graph-bas
        Clangd: A new Language Server Protocol implementation leveraging Clang
      </p>
      <p class="abstract">
-Clangd is a new tool developed as part of clang-tools-extra. It aimsat implementing the Language Server Protocol, a protocol that provides<br />IDEs and code editors all the language "smartness". Work in this areais only just beginning however there is already a large interestsurrounding it. This BoF session will be a nice opportunity for theattendees to get to know each other as well as discuss several topicsthat will help make this tool a success.<br /><br />Possible agenda/topics:<br />- Introductions<br />- Goals and scope of Clangd<br />- Existing language server implementations. Comparisons,advantages/disadvantages, etc.<br />- Challenges<br />- Proposed architecture<br />- Collaborations and planning
-     </p>
+Clangd is a new tool developed as part of clang-tools-extra. It aims at
+implementing the Language Server Protocol, a protocol that provides<br />IDEs and
+code editors all the language "smartness". Work in this area is only just
+beginning however there is already a large interest surrounding it. This BoF
+session will be a nice opportunity for the attendees to get to know each other as
+well as discuss several topics that will help make this tool a
+success.<br /><br />Possible agenda/topics:<br />- Introductions<br />- Goals and
+scope of Clangd<br />- Existing language server implementations.
+Comparisons,advantages/disadvantages, etc.<br />- Challenges<br />- Proposed
+architecture<br />- Collaborations and planning </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1332,8 +1844,12 @@ Clangd is a new tool developed as part o
        GlobalISel
      </p>
      <p class="abstract">
-Global ISel is catching up, with stride progress being made on AArch64, ARM, x86 and AMDGPU back-ends, and we need to decide what the next steps are.<br /><br />* Do we start building it by default? How do we validate it across buildbots and Jenkins builders?<br />* When do we turn it on by default?<br />* Is self-hosting + test-suite enough?<br />* How do we validate Chromium, BSD, and Linux distros?
-     </p>
+Global ISel is catching up, with stride progress being made on AArch64, ARM, x86
+and AMDGPU back-ends, and we need to decide what the next steps are.<br /><br />*
+Do we start building it by default? How do we validate it across buildbots and
+Jenkins builders?<br />* When do we turn it on by default?<br />* Is self-hosting
++ test-suite enough?<br />* How do we validate Chromium, BSD, and Linux distros?
+</p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1367,8 +1883,20 @@ Posters
        AnyDSL: A Compiler-Framework for Domain-Specific Libraries (DSLs)
      </p>
      <p class="abstract">
-AnyDSL is a framework for the rapid development of domain-specific libraries (DSLs). AnyDSL's main ingredient is AnyDSL's intermediate representation Thorin. In contrast to other intermediate representations, Thorin features certain abstractions which allow to maintain domain-specific types and control-flow. On these grounds, a DSL compiler gains two major advantages:<br />- The domain expert can focus on the semantics of the DSL. The DSL's code generator can leave low-level details like exact iteration order of looping constructs or detailed memory layout of data types open. Nevertheless, the code generator can emit Thorin code which acts as interchange format.<br />- The expert of a certain target machine just has to specify the required details once. These details are linked like a library to the abstract Thorin code. Thorin's analyses and transformations will then optimize the resulting Thorin code in a way such that the resulting Thorin code appears to be written by an expert o
 f that target machine.
-     </p>
+AnyDSL is a framework for the rapid development of domain-specific libraries
+(DSLs). AnyDSL's main ingredient is AnyDSL's intermediate representation Thorin.
+In contrast to other intermediate representations, Thorin features certain
+abstractions which allow to maintain domain-specific types and control-flow. On
+these grounds, a DSL compiler gains two major advantages:<br />- The domain
+expert can focus on the semantics of the DSL. The DSL's code generator can leave
+low-level details like exact iteration order of looping constructs or detailed
+memory layout of data types open. Nevertheless, the code generator can emit
+Thorin code which acts as interchange format.<br />- The expert of a certain
+target machine just has to specify the required details once. These details are
+linked like a library to the abstract Thorin code. Thorin's analyses and
+transformations will then optimize the resulting Thorin code in a way such that
+the resulting Thorin code appears to be written by an expert of that target
+machine.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1391,8 +1919,12 @@ AnyDSL is a framework for the rapid deve
        Binary Instrumentation of ELF Objects on ARM
      </p>
      <p class="abstract">
-Often application source code is not available to compiler engineers, which can make program analysis more difficult. Binary instrumentation is a process of binary modification, where code is inserted into an already existing binary, which can help understand how the program performs. We have created an LLVM-based binary instrumenter, building upon llvm-objdump, to enable us to gather static and runtime information of ELF binaries.<br /><br />
-     </p>
+Often application source code is not available to compiler engineers, which can
+make program analysis more difficult. Binary instrumentation is a process of
+binary modification, where code is inserted into an already existing binary,
+which can help understand how the program performs. We have created an
+LLVM-based binary instrumenter, building upon llvm-objdump, to enable us to
+gather static and runtime information of ELF binaries.<br /><br /> </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1433,8 +1965,39 @@ Often application source code is not ava
        CodeCompass: An Open Software Comprehension Framework
      </p>
      <p class="abstract">
-Bugfixing or new feature development requires a confident understanding of all details and consequences of the planned changes. For long existing large telecom systems, where the code base have been developed and maintained for decades byfluctuating teams, original intentions are lost, the documentation is untrustworthy or missing, the only reliable information is the code itself. Code comprehension of such large software systems is an essential, but usually very challenging task. As the method of comprehension is fundamentally different fromwriting new code, development tools are not performing well. During the years, different programs have been developed with various complexity and feature set for code comprehension but none of them fulfilled all requirements.<br /><br />CodeCompass is an open source LLVM/Clang based tool developed by Ericsson Ltd. and the EÃ¶tvÃ¶s LorÃ¡nd University, Budapest to help understanding large legacy software systems. Based on the LLVM/Clang comp
 iler infrastructure, CodeCompass gives exact information on complex C/C++ language elements like overloading, inheritance, the (read or write) usage of variables, possible call.<br />on function pointers and the virtual functions -- features that various existing tools support only partially. The wide range of interactive visualizations extends further than the usual class and function call diagrams; architectural, component and interface diagrams are a few of the implemented graphs.<br /><br />To make comprehension more extensive, CodeCompass is not restricted to the source code. It also utilizes build information to explore the system architecture as well as version control information when available: git commit history and blame view are also visualized. Clang based static analysis results are also integrated to CodeCompass. Although the tool focuses mainly on C and C++, it also supports Java and Python languages. Having a web-based, pluginable, extensible architecture, the CodeC
 ompass framework can bean open platform to further code comprehension, static analysis and software metrics efforts.<br /><br />Lecture outline:<br />- First we show why current development tools are not satisfactory for code comprehension<br />- Then we specify the requirements for such a tool<br />- Introduce codecompass architectur.<br />- Revail some challenges we have met and how we solve them<br />- Show a live demo<br />- Describe the open architecture and<br />- Talk about future plans and how the community can extend the feature set
-     </p>
+Bug fixing or new feature development requires a confident understanding of all
+details and consequences of the planned changes. For long existing large telecom
+systems, where the code base have been developed and maintained for decades
+by fluctuating teams, original intentions are lost, the documentation is
+untrustworthy or missing, the only reliable information is the code itself. Code
+comprehension of such large software systems is an essential, but usually very
+challenging task. As the method of comprehension is fundamentally different
+from writing new code, development tools are not performing well. During the
+years, different programs have been developed with various complexity and
+feature set for code comprehension but none of them fulfilled all
+requirements.<br /><br />CodeCompass is an open source LLVM/Clang based tool
+developed by Ericsson Ltd. and the EÃ¶tvÃ¶s LorÃ¡nd University, Budapest to help
+understanding large legacy software systems. Based on the LLVM/Clang compiler
+infrastructure, CodeCompass gives exact information on complex C/C++ language
+elements like overloading, inheritance, the (read or write) usage of variables,
+possible call.<br />on function pointers and the virtual functions -- features
+that various existing tools support only partially. The wide range of
+interactive visualizations extends further than the usual class and function
+call diagrams; architectural, component and interface diagrams are a few of the
+implemented graphs.<br /><br />To make comprehension more extensive, CodeCompass
+is not restricted to the source code. It also utilizes build information to
+explore the system architecture as well as version control information when
+available: git commit history and blame view are also visualized. Clang based
+static analysis results are also integrated to CodeCompass. Although the tool
+focuses mainly on C and C++, it also supports Java and Python languages. Having
+a web-based, pluginable, extensible architecture, the CodeCompass framework can
+bean open platform to further code comprehension, static analysis and software
+metrics efforts.<br /><br />Lecture outline:<br />- First we show why current
+development tools are not satisfactory for code comprehension<br />- Then we
+specify the requirements for such a tool<br />- Introduce codecompass
+architectur.<br />- Revail some challenges we have met and how we solve
+them<br />- Show a live demo<br />- Describe the open architecture and<br />- Talk
+about future plans and how the community can extend the feature set </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1463,8 +2026,31 @@ Bugfixing or new feature development req
        Hydra LLVM: Instruction Selection with Threads
      </p>
      <p class="abstract">
-By the rise of program complexity and some specific usages like JIT(Just-In-Time) compilation, compilation speed becomes more and more important in recent years.<br />Instruction selection in LLVM, on the other hand, is the most time-consuming part among all the LLVM components, which can take nearly 50% of total compilation time. We believe that by reducing time consumption of instruction selection, the total compilation speed can get a significant increase.<br />Thus, we propose a (work-in-progress) prototype design that use multi-thread programming to parallelize the instruction selector in order to reach the goal mentioned above. The original instruction selector is implemented as a bytecode interpreter, which executes the operation codes generated by TableGen files that models the machine instructions, and transform IR selection graph into machine-dependent selection graph at the end. The selector, to our surprised, shows some great properties which we can benefit from in creat
 ing multi-thread version of that. For example, an opcode scope that save the current context before executing the following opcodes sequence, and restore the context after finishing them. While preserving the original algorithm of the selector, we also try hard to reduce the concurrency overhead by replacing unnecessary mutex lock with better one like read/write lock and atomic variables.<br />Though the experiments didnât show promising result, we are still looking forward to the potential of reducing the consuming time of instruction selection in order to increase the overall compilation speed. In the future, we will try different compilation regions to parallelize for the sake of finding the optimal one that causes less overhead. At the same time, we are also going to combine this project with existing JIT framework in LLVM in order to reduce the execution latency caused by runtime compilation.
-     </p>
+By the rise of program complexity and some specific usages like
+JIT(Just-In-Time) compilation, compilation speed becomes more and more important
+in recent years.<br />Instruction selection in LLVM, on the other hand, is the
+most time-consuming part among all the LLVM components, which can take nearly
+50% of total compilation time. We believe that by reducing time consumption of
+instruction selection, the total compilation speed can get a significant
+increase.<br />Thus, we propose a (work-in-progress) prototype design that use
+multi-thread programming to parallelize the instruction selector in order to
+reach the goal mentioned above. The original instruction selector is implemented
+as a bytecode interpreter, which executes the operation codes generated by
+TableGen files that models the machine instructions, and transform IR selection
+graph into machine-dependent selection graph at the end. The selector, to our
+surprised, shows some great properties which we can benefit from in creating
+multi-thread version of that. For example, an opcode scope that save the current
+context before executing the following opcodes sequence, and restore the context
+after finishing them. While preserving the original algorithm of the selector,
+we also try hard to reduce the concurrency overhead by replacing unnecessary
+mutex lock with better one like read/write lock and atomic variables.<br />Though
+the experiments didnât show promising result, we are still looking forward to
+the potential of reducing the consuming time of instruction selection in order
+to increase the overall compilation speed. In the future, we will try different
+compilation regions to parallelize for the sake of finding the optimal one that
+causes less overhead. At the same time, we are also going to combine this
+project with existing JIT framework in LLVM in order to reduce the execution
+latency caused by runtime compilation.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1499,8 +2085,44 @@ By the rise of program complexity and so
        Intelligent selection of compiler options to optimize compile time and performance
      </p>
      <p class="abstract">
-The efficiency of the optimization process during the compilation is crucial for the later execution behavior of the code.<br />The achieved performance depends on the hardware architecture and the compiler's capabilities to extract this performance.<br /><br />Code optimization can be a CPU- and memory-intensive process which -- for large codes -- can lead to high compilation times during development.<br />Optimization also influences the debuggability of the resulting binary; for example, by storing data in registers.<br />During development, it would be interesting to compile files individually with appropriate flags that enable debugging and provide high (near-production) performance during the testing but with moderate compile times.<br />We are exploring to create a tool to identify code regions that are candidates for higher optimization levels.<br />We follow two different approaches to identify the most efficient code optimization:<br />1) compiling different files with dif
 ferent options by brute force;2) using profilers to identify the relevant code regions that should be optimized.<br /><br />Since big projects comprise hundreds of files, brute force is not efficient.<br />The problem in, e.g., climate applications is that codes have too many files to test them individually.<br />Improving this strategy using a profiler, we can identify the time consuming regions (and files) and then repeatedly refine our selection.<br />Then, the relevant files are evaluated with different compiler flags to determine a good compromise of the flags.<br />Once the appropriate flags are determined, this information could be retained across builds and shared between users.<br /><br /><br />In our poster, we motivate and demonstrate this strategy on a stencil code derived from climate applications.<br />The experiments done throughout this work are carried out on a recent Intel Skylake (i7-6700 CPU @ 3.40GHz) machine.<br />We compare performance of the compilers clang (
 version 3.9.1) and gcc (version 6.3.0) for various optimization flags and using profile guided optimization (PGO) with the traditional compile with instrumentation/run/compile phase and when using the perf tool for dynamic instrumentation.<br />The results show that more time (2x) is spent for compiling code using higher optimization levels in general, though gcc takes a little less time in general than clang.<br />Yet the performance of the application were comparable after compiling the whole code with O3 to that of applying O3 optimization to the right subset of files.<br />Thus, the approach proves to be effective for repositories where compilation is analyzed to guide subsequent compilations.<br /><br />Based on these results, we are building a prototype tool that can be embedded into building systems that realizes the aforementioned strategies of brute-force testing and profile guided analysis of relevant compilation flags.
-     </p>
+The efficiency of the optimization process during the compilation is crucial for
+the later execution behavior of the code.<br />The achieved performance depends
+on the hardware architecture and the compiler's capabilities to extract this
+performance.<br /><br />Code optimization can be a CPU- and memory-intensive
+process which -- for large codes -- can lead to high compilation times during
+development.<br />Optimization also influences the debuggability of the resulting
+binary; for example, by storing data in registers.<br />During development, it
+would be interesting to compile files individually with appropriate flags that
+enable debugging and provide high (near-production) performance during the
+testing but with moderate compile times.<br />We are exploring to create a tool
+to identify code regions that are candidates for higher optimization
+levels.<br />We follow two different approaches to identify the most efficient
+code optimization:<br />1) compiling different files with different options by
+brute force;2) using profilers to identify the relevant code regions that should
+be optimized.<br /><br />Since big projects comprise hundreds of files, brute
+force is not efficient.<br />The problem in, e.g., climate applications is that
+codes have too many files to test them individually.<br />Improving this strategy
+using a profiler, we can identify the time consuming regions (and files) and
+then repeatedly refine our selection.<br />Then, the relevant files are evaluated
+with different compiler flags to determine a good compromise of the
+flags.<br />Once the appropriate flags are determined, this information could be
+retained across builds and shared between users.<br /><br /><br />In our poster, we
+motivate and demonstrate this strategy on a stencil code derived from climate
+applications.<br />The experiments done throughout this work are carried out on a
+recent Intel Skylake (i7-6700 CPU @ 3.40GHz) machine.<br />We compare performance
+of the compilers clang (version 3.9.1) and gcc (version 6.3.0) for various
+optimization flags and using profile guided optimization (PGO) with the
+traditional compile with instrumentation/run/compile phase and when using the
+perf tool for dynamic instrumentation.<br />The results show that more time (2x)
+is spent for compiling code using higher optimization levels in general, though
+gcc takes a little less time in general than clang.<br />Yet the performance of
+the application were comparable after compiling the whole code with O3 to that
+of applying O3 optimization to the right subset of files.<br />Thus, the approach
+proves to be effective for repositories where compilation is analyzed to guide
+subsequent compilations.<br /><br />Based on these results, we are building a
+prototype tool that can be embedded into building systems that realizes the
+aforementioned strategies of brute-force testing and profile guided analysis of
+relevant compilation flags.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1541,8 +2163,19 @@ The efficiency of the optimization proce
        Modeling Universal Instruction Selection
      </p>
      <p class="abstract">
-Instruction selection implements a program under compilation by selectingprocessor instructions and has tremendous impact on the performance of the codegenerated by a compiler. We have introduced a graph-based universalrepresentation that unifies data and control flow for both programs andprocessor instructions. The representation is the essential prerequisite for aconstraint model for instruction selection introduced in this paper. The modelis demonstrated to be expressive in that it supports many processor featuresthat are out of reach of state-of-the-art approaches, such as advanced branchinginstructions, multiple register banks, and SIMD instructions. The resultingmodel can be solved for small to medium size input programs and sophisticatedprocessor instructions and is competitive with LLVM in code quality. Model andrepresentation are significant due to their expressiveness and their potentialto be combined with models for other code generation tasks.
-     </p>
+Instruction selection implements a program under compilation by
+selecting processor instructions and has tremendous impact on the performance of
+the code generated by a compiler. We have introduced a graph-based
+universal representation that unifies data and control flow for both programs
+and processor instructions. The representation is the essential prerequisite for
+a constraint model for instruction selection introduced in this paper. The
+model is demonstrated to be expressive in that it supports many processor
+features that are out of reach of state-of-the-art approaches, such as advanced
+branching instructions, multiple register banks, and SIMD instructions. The
+resulting model can be solved for small to medium size input programs and
+sophisticated processor instructions and is competitive with LLVM in code
+quality. Model and representation are significant due to their expressiveness and
+their potential to be combined with models for other code generation tasks.  </p>
      </td>
     </tr>
     <tr class="separator" />
@@ -1565,8 +2198,20 @@ Instruction selection implements a progr
        Preparing LLVM for the Future of Supercomputing
      </p>
      <p class="abstract">
-LLVM is solidifying its foothold in high-performance computing, and as we look forward toward the exascale computing era, LLVM promises to be a cornerstone of our programming environments. In this talk, I'll discuss several of the ways in which we're working to improve LLVM in support of this vision. Ongoing work includes better handling of restrict-qualified pointers [2], optimization of OpenMP constructs [3], and extending LLVM's IR to support an explicit representation of parallelism [4]. We're exploring several ways in which LLVM can be better integrated with autotuning technologies, how we can improve optimization reporting and profiling, and a myriad of other ways we can help move LLVM forward. Much of this effort is now a part of the US Department of Energy's Exascale Computing Project [1]. This talk will start by presenting the big picture, in part discussing goals of performance portability and how those maps into technical requirements, and then discuss details of current 
 and planned development.
-     </p>
+LLVM is solidifying its foothold in high-performance computing, and as we look
+forward toward the exascale computing era, LLVM promises to be a cornerstone of
+our programming environments. In this talk, I'll discuss several of the ways in
+which we're working to improve LLVM in support of this vision. Ongoing work
+includes better handling of restrict-qualified pointers [2], optimization of
+OpenMP constructs [3], and extending LLVM's IR to support an explicit
+representation of parallelism [4]. We're exploring several ways in which LLVM
+can be better integrated with autotuning technologies, how we can improve
+optimization reporting and profiling, and a myriad of other ways we can help
+move LLVM forward. Much of this effort is now a part of the US Department of
+Energy's Exascale Computing Project [1]. This talk will start by presenting the
+big picture, in part discussing goals of performance portability and how those
+maps into technical requirements, and then discuss details of current and
+planned development.  </p>
      </td>
     </tr>
 </tbody>