Research Projects

ALLVM: Exploring the benefits for software performance, security and reliability if all software on a system (either all userspace software or or userspace+OS software) is available in a rich virtual instruction set that can be analyzed and transformed by sophisticated compiler techniques (think Java bytecode, but for all software).

Heterogeneous Parallel Virtual Machine: A compiler infrastructure and parallel program representation for heterogeneous parallel systems, with the goal of making it much easier to write performance-portable parallel programs. A single program in the HPVM representation can be compiled to GPUs and to multicore CPUs (with and without) vector extensions, while achieving performance close to separately hand-tuned code for each of those systems. The project is also exploring code generation for FPGAs, for specialized deep-in-memory compute hardware, and developing optimizing compilers for parallel languages like OpenMP and Domain Specific Languages.

Previous Projects

Deterministic-by-Default Parallel Programming: Provably deterministic parallel programs have major productivity advantages over today’s multithreaded programming models: programmers can reason about a program as if it had sequential semantics; they do not need to be concerned with complex issues such as atomicity, deadlock, and memory models (not even sequential consistency); debugging can happen with standard tools and mechanisms, similar to those for sequential programs; programs only need to be tested once for each input instead of many times; and, during porting, parallelism can be introduced incrementally without the worry that the program behavior might change due to parallelism. Today, determinism is not available for most commonly used programming styles such as imperative, object-oriented programming. We are developing Deterministic Parallel Java (DPJ): an extension to the sequential subset of Java that aims to provide a deterministic-by-default programming model for object-oriented languages via compile-time type checking where possible, falling back on run-time mechanisms where needed. Algorithms that can wish to exploit non-deterministic behavior must explicitly request such behavior (hence the label “deterministic-by-default”) and, where possible, encapsulate and isolate it behind interfaces with enforceable contracts. With minor changes, these language extensions should be applicable to other base O-O languages, such as C++ and C#.

SVA: Secure Virtual Architecture: A compiler-based virtual machine for commodity operating systems: SVA runs below the operating system (like Xen or VMWare server), but uses a virtual instruction set and a compiler-based execution model (like JVM, but for C, not Java). We have ported the standard Linux kernel to SVA as a new (virtual) architecture, changing only about 150 lines of code in the architecture-independent parts of the kernel. The combination of a compiler and privileged run-time enables novel solutions to a wide range of security and reliability challenges in systems, including memory safety, OS recovery, and information flow. In fact, SVA is the first system we know of that can provide a safe execution environment for a complete commodity operating system such as the Linux kernel.

SAFECode: Static Analysis For safe Execution of Code: SAFECode is compiler that enforces memory safety and partial type safety fully automatically for unmodified C programs. It uses a combination of novel techniques to guarantee array bounds integrity, no uninitialized pointer uses, control flow integrity, type safety for a subset of objects, and sound analysis; furthermore it does so without requiring wrappers for linking with externally compiled code although it cannot detect all errors in external code. For production code, SAFECode does not eliminate dangling pointer errors but guarantees that such errors cannot violate any of the previous guarantees; it is the first and only compiler we know of that can do so. For debugging, SAFECode can also be used to detect all dangling pointer references in an execution. We have also identified a proper subset of the C language for which SAFECode can enforce memory safety without any run-time checks or garbage collection; this is aimed at embedded programs where excessive “under the covers” run-time overheads or automatic memory management are undesirable.

The LLVM Compiler Infrastructure: A novel virtual instruction set and compiler framework that enables lifelong analysis and transformation of programs in arbitrary programming languages. It also serves as a general compiler infrastructure for all of the research projects in our group. The LLVM infrastructure has been distributed in open source form under a liberal BSD-like license since Oct. 2003, and is now widely used in both academia and industry.