Research Experiences for Undergraduates and Masters Students

Plaid Research Group: Programming Languages and Software Engineering

The Plaid research group does research applying programming language techniques to software engineering problems.  We work on all kinds of languages, but many projects focus in particular on object-oriented systems.  Projects range from foundational development of new object models and core calculi, to pragmatic language extensions or analyses that can verify properties of programs written in industrial languages like Java.  The group name comes from Plaid, a new object-oriented language currently in planning based on technologies developed within the group and elsewhere.

Logistics

We have several research opportunities open for Summer 2007.  Positions are open until filled, and compensation is negotiable (or students can get course credit).  Please contact jonathan.aldrich@cs.cmu.edu to apply.

The NSF REU program emphasizes broadening participation in computing, and accordingly members of underrepresented groups are especially encouraged to apply.

Current Projects

We have multiple opportunities available in the general area of programming language and software engineering research.  Descriptions of some potential projects are below, but I'm happy to discuss alternative possibilities as well; just contact me if you're interested.

1. An Educational Tool for Formalizating, Implementing, and Verifying Languages

Former students: Key Shin, Lutz Wrage.  Positions open for continuing work.

This project focuses on building a tool to aid in teaching students how to more effectively write formal models of a programming language o rlogic and prove properties about those models.  Right now, many students find mathematical formalism and proof difficult.  There are many reasons for this, but one of the most important is that students do not get immediate feedback on what they are doing.  A tool that can check the students’ proofs and provide helpful error messages could go a long way towards resolving this problem.

Many proof assistant tools have been proposed, and a few have been developed specifically for education.  Our proposed approach is unique in its focus on allowing students to define a simple formal system from scratch, and then proving meta-theorems about the formal system.  It thus facilitates creative design and model building over brute calculation within a given domain; we believe the former skills are more important to computer science in general, and are especially relevant for teaching the formal foundations of programming languages and mathematical logic.  In order to facilitate this style of modeling, we build on CMU’s Twelf logical framework, which is widely recognized as the system most appropriate for formally modeling languages and logics.

Proofs.  The student working on this project would build on an early prototype built by Key Shin.  This prototype allows a student to define a formal system as a grammar and set of inference rules.  Ongoing work will focus on proving theorems about the system using rule induction, case analysis, and natural-deduction style derivations.  Our work will then address research questions like:
Implementation.  A machine-readable formalizaiton of a language could also be used to produce a simple implementation of that language.  Another direction for a summer project, therefore, would be exploring how to generate efficient interpreters or compilers for a language, based on it's specification in terms of inference rules.

2. Ownership Type Inference

Former students: Will Cooper, Owen Yamauchi.  Positions open for continuing work.

Ownership types promise to provide a practical mechanism for enforcing stronger encapsulation by controlling aliasing in object-oriented languages. Ownership is a central feature of the ArchJava language, developed in the Plaid research group, which ensures that an object-oriented implementation is consistent with its software architecture.  This paper provides an introduction to ownership types, and a compiler for Java with ownership is available at the ArchJava web site.

A major barrier to the adoption of ownership-and thus to achieving the benefits of systems built using ownership-is that it is difficult and time-consuming to annotate existing code with ownership types.  While previous work (including previous CMU REUs) has developed basic algorithms for inferring ownership types, these algorithms are not yet precise enough or scalable enough to be practical.

This project will investigate a new approach to the problem: incremental, programmer-guided ownership inference.  Because ownership requires knowledge of the programmer's design intent, our system gathers simple design intent from the programmer and uses this together with analysis technology to annotate the program semi-automatically.  Because ownership, as a form of context-sensitive aliasing information, is intractable to compute for large programs, our approach will provide tool support for incrementally annotating a few classes of the program at a time.

The student working on this project will begin by adapting previous ownership inference algorithms to a purely local setting.  Local inference will rely on partial program annotations added by the developer; these annotations will cut the recursive program dependencies that have made inference intractable in the past.  As a result, the student should be able to prove that inference results in a principal type: ownership annotations that generalize all other possible annotations for the same partially annotated source code.  Based on this design, the student will implement an Eclipse plugin that allows a developer to specify a set of classes on which to run inference, to provide an initial set of annotations, to run the inference algorithm, and to examine and accept the results of inference.  Finally, the student will evaluate the final tool on code examples at scale, and write up a report describing the algorithm, tool, and experience.

3. Language Design: Combining Strengths of Functional and Object-Oriented Languages

Functional and object-oriented languages have complimentary strengths.  Functional languages allow programmers to define fixed datatypes and then easily write many different operations over those datatypes: however extending datatypes to add a new case is difficult, requiring the datatype definition and all functions to be edited.  Object-oriented (OO) languages allow programmers to define classes that are similar to datatype cases, with a fixed set of operations that can be overridden by subclasses.  OO languages make it easy to add new cases to a datatype, simply by adding new subclasses, but adding new functions that dispatch over a hierarchy is difficult and requires awkward workarounds like the visitor design pattern.

This project is aimed at developing a new foundational language that captures the best of both language paradigms.  It is based on research on multi-methods and open classes in the object-oriented community, which attempt to bring some of the extensibility of functional programming to the object-oriented paradigm.  However, these existing approaches retain a declaration-oriented style that is common to OO languages but is much more limited from a modularity and reuse point of view compared to "value-oriented" functional languages like ML or Lisp.  New approaches are needed to fully unify these two programming paradigms.

4. Data Flow Analysis Infrastructure

Former students: Tim Kirchner.  Positions open for continuing work.

The goal of this project would be to refine a data flow analysis framework to support richer forms of analysis (e.g. interprocedural analysis, flow sensitivity).  The target data flow analysis framework is Crystal, an Eclipse-based project currently used to teach CMU courses on software analysis.

5. Model Checking for Objects

This project would involve investigating ways to model check object-oriented software systems.  Very sophisticated predicate abstraction refinement techniques have been developed in projects like MAGIC, SLAM, and Blast for system programs written in C-like languages.  However, much less has been done for object-oriented software.  Two major challenges are handling the recursive functions that are common in OO systems, as well as dealing with the structure of the heap.  This research project is currently less well defined than those above, but could be refined for a very interesting REU.

6. Protocol Specification for Objects in JML

This project would apply our previous research specifying protocols of interaction between objects, to a practical setting.  The Java Modeling Language (JML) is a standard specification language for Java; the goal of this project would be to take a draft specification of an extension to JML that defines protocols and implement it, by translation to existing JML constructs.

Past Undergraduate Research Projects

We've had a lot of great REUs in the past.  Here are some of the undergraduates that have done research in the Plaid group, along with a brief description of their project and a paper, if applicable.