Research Projects
Digital Agriculture: 21st century agriculture presents a number of major challenges and opportunities for researchers in computer science and engineering: heterogeneous data analysis, IOT and sensor networks, computer vision, robotics, edge computing, machine learning with limited and sparse data, and others. I co-founded and co-lead the Center for Digital Agriculture at UIUC. CDA is a campus-level organization that brings together multidisciplinary teams of researchers and educators spanning Computer Science, Engineering, Agriculture, Food Sciences, Biology, Sustainability, Ecology and related areas for joint research, education, and outreach. Initial funding for CDA has been provided by the office of the Provost and by the Colleges of Engineering and Agriculture. I also lead the AIFARMS Institute , one of the national artificial intelligence (AI) research institutes launched under the aegis of the US National AI R&D Strategic Plan: 2019 Update. AIFARMS, funded primarily by USDA NIFA, is exploring foundational AI challenges and how they can play a role in tackling major agriculture challenges, most importantly, how to sustainably feed a global population projected to grow by 70% by 2050.
CropWizard Project on Generative AI for Agriculture: This project I initiated as part of AIFARMS is exploring ways in which generative AI advances can be useful in modern agriculture, from agronomic advice to on-farm decision making to interactive data analytics. The CropWizard system is an interactive question-answering service for agricultural professionals based on generative AI and knowledge retrieval, using a large database of over 450K documents from land-grant universities and open-access research. CropWizard supports multimodal inputs, including text and images, and automatically invokes (predefined) computational tools for data-driven decision making. One research direction is developing multiple different QA data sets for training and benchmarking AI-driven automated QA services for agricultural topics. These data sets will form a basis for a AI AgriBench, a public benchmarking consortium recently announced by our group along with external partners. Another direction is on improving the accuracy of image analysis through an enhanced retrieval-based (RAG) pipeline for multimodal data. A third is on a multi-agent framework combining reasoning, knowledge extraction, and computational tools, guided by user interactions to achieve more sophisticated data-driven decision making in precision agriculture for open-ended questions. The CropWizard system has been licensed for commercial uses by multiple companies. The CropWizard project is sponsored by USDA NIFA through the AIFARMS institute, and additional funding has been provided by UIUC through the Center for Digital Agriculture, Intel Corporation, Amazon-Illinois AICE Center, and the state of Illinois through the Discovery Partners Institute (DPI).
Edge Computing with HPVM, ApproxHPVM, and ApproxTuner: Many important and emerging application domains require increasingly powerful computing capabilities at the “edge of the network,” near the sensors, displays and actuators that interact with the physical world. Heterogenous computing with specialized accelerators and often system-on-chip designs are critical to deliver these computing capabilities under tight constraints on memory capacity, compute capacity, energy, power, weight, and heat dissipation. Building on the HPVM project, we are exploring how compilers, autotuners, and approximate computing techniques can be used to achieve these computing capabilities under such tight resource constraints. We are exploring the use of automatically tuned approximation algorithms to trade off small amounts of accuracy (or result “quality”) to achieve reduced energy or higher performance or both. We are focusing on three broad edge computing application domains: mobile robots in agriculture, autonomous vehicles, and distributed AR/VR systems.
Hydride and MISAAL: Building Compilers Automatically Using Program Synthesis: Hydride is a novel approach to compiling for complex, emerging hardware architectures. Hydride uses vendor-defined pseudocode specifications of multiple hardware ISAs to automatically design retargetable instructions for AutoLLVM IR, an extensible compiler IR which consists of (formally defined) language-independent and target-independent LLVM IR instructions to compile to those ISAs, and automatically generated instruction selection passes to lower AutoLLVM IR to each of the specified hardware ISAs. Hydride also includes a code synthesizer that automatically generates code generation support for schedule-based languages, such as Halide, to optimally generate AutoLLVM IR.
MISAAL dramatically speeds up Hydride (and improves performance) by moving program synthesis entire offline. MISAAL employs a novel strategy to use formal semantics of hardware instructions to automatically prune a large search space of rewrite rules for modern complex instructions in an offline stage. It then uses term-rewriting process in the online stage (at compile-time) instead of pattern-matching to perform optimized code generation that achieves closer-to-optimal code. MISAAL uses a novel methodology to make (online) term rewriting extremely lightweight so as to enable programs to compile in seconds.
Heterogeneous Parallel Virtual Machine: The slowdown of Moore’s Law and the end of Dennard scaling are putting an end to computing speed increases from general-purpose architectures and processor design techniques. Modern computing systems, ranging from smart speakers to mobile phones to laptops to the cloud, now rely on increasing numbers of specialized computing elements (or “accelerators”), such as GPUs, DSPs, FPGAs, image processors, cryptographic hardware, and — increasingly — tensor accelerators for machine learning. The major drawback of these heterogeneous systems is that they are highly challenging to program. The underlying hardware usually has diverse programming interfaces (both languages and instruction sets), differing forms of parallelism, different memory architectures, and even diverse, mutually incompatible programming tools that prevent developing unified applications that can use all components flexibly. In the HPVM project, we are developing a retargetable compiler infrastructure and parallel program representation for heterogeneous parallel systems, with the goal of making it much easier to write performance-portable parallel programs. A single program in the HPVM representation can be compiled to diverse hardware compute units in a heterogeneous system, using the HPVM program representation to target parallelism and data movement. Current targets include GPUs, CPUs with vector extensions, FPGAs, and custom accelerators for convolutional neural networks, FFT and Viterbi.
Previous Projects
The LLVM Compiler Infrastructure: A novel virtual instruction set and compiler infrastructure that enables lifelong analysis and transformation of programs in arbitrary programming languages. The LLVM infrastructure has been distributed in open source form under a liberal license since October 2003. Since then it has been adopted by a large number of commercial and academic organizations, and is a foundation for major commercial products. For example, virtually all mobile apps for Apple devices (iPhone, iPad, Apple Watch) are shipped by app developers using the LLVM virtual instruction set and compiled for individual devices in the Apple App Store. NVIDIA GPUs, Android smartphones, Sony PlayStations, the Chrome browser, and many other commercial products also make strong use of LLVM. LLVM is also used worldwide for academic research, and continues to be a key foundation for some of our other ongoing projects, including HPVM and Hydride.
Automated Debugging for Software Failures: We are developing automated static and dynamic analysis techniques to understand the causes of failures in software systems, in order to help programmers diagnose and fix software bugs with as little effort as possible. The project is investigating automated fault localization and diagnosis techniques for both standalone and distributed programs.
ALLVM: Exploring the benefits for software performance, security and reliability if all software on a system (either all userspace software or or userspace+OS software) is available in a rich virtual instruction set that can be analyzed and transformed by sophisticated compiler techniques (think Java bytecode, but for all software).
Deterministic-by-Default Parallel Programming: Provably deterministic parallel programs have major productivity advantages over today’s multithreaded programming models: programmers can reason about a program as if it had sequential semantics; they do not need to be concerned with complex issues such as atomicity, deadlock, and memory models (not even sequential consistency); debugging can happen with standard tools and mechanisms, similar to those for sequential programs; programs only need to be tested once for each input instead of many times; and, during porting, parallelism can be introduced incrementally without the worry that the program behavior might change due to parallelism. Today, determinism is not available for most commonly used programming styles such as imperative, object-oriented programming. We are developing Deterministic Parallel Java (DPJ): an extension to the sequential subset of Java that aims to provide a deterministic-by-default programming model for object-oriented languages via compile-time type checking where possible, falling back on run-time mechanisms where needed. Algorithms that can wish to exploit non-deterministic behavior must explicitly request such behavior (hence the label “deterministic-by-default”) and, where possible, encapsulate and isolate it behind interfaces with enforceable contracts. With minor changes, these language extensions should be applicable to other base O-O languages, such as C++ and C#.
SVA: Secure Virtual Architecture: A compiler-based virtual machine for commodity operating systems: SVA runs below the operating system (like Xen or VMWare server), but uses a virtual instruction set and a compiler-based execution model (like JVM, but for C, not Java). We have ported the standard Linux kernel to SVA as a new (virtual) architecture, changing only about 150 lines of code in the architecture-independent parts of the kernel. The combination of a compiler and privileged run-time enables novel solutions to a wide range of security and reliability challenges in systems, including memory safety, OS recovery, and information flow. In fact, SVA is the first system we know of that can provide a safe execution environment for a complete commodity operating system such as the Linux kernel.
SAFECode: Static Analysis For safe Execution of Code: SAFECode is compiler that enforces memory safety and partial type safety fully automatically for unmodified C programs. It uses a combination of novel techniques to guarantee array bounds integrity, no uninitialized pointer uses, control flow integrity, type safety for a subset of objects, and sound analysis; furthermore it does so without requiring wrappers for linking with externally compiled code although it cannot detect all errors in external code. For production code, SAFECode does not eliminate dangling pointer errors but guarantees that such errors cannot violate any of the previous guarantees; it is the first and only compiler we know of that can do so. For debugging, SAFECode can also be used to detect all dangling pointer references in an execution. We have also identified a proper subset of the C language for which SAFECode can enforce memory safety without any run-time checks or garbage collection; this is aimed at embedded programs where excessive “under the covers” run-time overheads or automatic memory management are undesirable.