Research Experiences for Undergraduates and Masters Students
Plaid Research Group: Programming Languages and Software Engineering
The Plaid research group does research applying programming language
techniques to software engineering problems. We work on all kinds
of languages, but many projects focus in particular on object-oriented
systems. Projects range from foundational development of new
object models and core calculi, to pragmatic language extensions or
analyses that can verify properties of programs written in industrial
languages like Java. The group name comes from Plaid, a new
object-oriented language currently in planning based on technologies
developed within the group and elsewhere.
Logistics
We have several research opportunities open for Summer 2007.
Positions are open
until filled, and compensation is negotiable (or students can get
course credit). Please contact jonathan.aldrich@cs.cmu.edu
to apply.
The NSF REU program emphasizes broadening participation in computing,
and accordingly members of underrepresented groups are especially
encouraged to apply.
Current Projects
We have multiple opportunities available in the general area of
programming
language and software engineering research. Descriptions of some
potential projects are below, but I'm happy to discuss alternative
possibilities as well; just contact me if you're interested.
1. An Educational Tool for Formalizating, Implementing, and
Verifying Languages
Former students: Key Shin, Lutz Wrage. Positions open for continuing work.
This project focuses on building a tool to aid in teaching
students how to more effectively write formal models of a programming
language o rlogic and prove properties about those models. Right
now, many students find mathematical formalism and proof
difficult. There are many reasons for this, but one of the most
important is that students do not get immediate feedback on what they
are doing. A tool that can check the students’ proofs and provide
helpful error messages could go a long way towards resolving this
problem.
Many proof assistant tools have been proposed, and a few have been
developed specifically for education. Our proposed approach is
unique in its focus on allowing students to define a simple formal
system from scratch, and then proving meta-theorems about the formal
system. It thus facilitates creative design and model building
over brute calculation within a given domain; we believe the former
skills are more important to computer science in general, and are
especially relevant for teaching the formal foundations of programming
languages and mathematical logic. In order to facilitate this
style of modeling, we build on CMU’s Twelf logical framework,
which is widely recognized as the system most appropriate for formally
modeling languages and logics.
Proofs. The student
working on this project would build on an early prototype built by Key
Shin. This prototype allows a student to define a formal system
as a grammar and set of inference rules. Ongoing work will focus
on proving theorems about the system using rule induction, case
analysis, and natural-deduction style derivations. Our work will
then address research questions like:
- What kinds of error messages should the system give in order to
point students most effectively towards their mistakes?]
- What mathematical conventions are used on paper that would make
the tool more usable if they were directly supported by the tool?
- Could the system provide support to the student by, for example,
automatically searching for counterexamples to a theorem the student is
trying to prove, but which may not in fact be true?
- What kind of user interface is necessary for students to be
productive in working with the environment?
Implementation. A
machine-readable formalizaiton of a language could also be used to
produce a simple implementation of that language. Another
direction for a summer project, therefore, would be exploring how to
generate efficient interpreters or compilers for a language, based on
it's specification in terms of inference rules.
2. Ownership Type Inference
Former students: Will Cooper, Owen Yamauchi. Positions open for continuing work.
Ownership types promise to provide a practical mechanism for enforcing
stronger encapsulation by controlling aliasing in object-oriented
languages. Ownership is a central feature of the ArchJava language, developed in
the Plaid research group, which ensures that an object-oriented
implementation is consistent with its software architecture. This paper
provides an introduction to ownership types, and a compiler for Java
with ownership is available at the ArchJava
web site.
A major barrier to the adoption of ownership-and thus to achieving the
benefits of systems built using ownership-is that it is difficult and
time-consuming to annotate existing code with ownership types.
While previous work (including previous CMU REUs) has developed basic
algorithms for inferring ownership types, these algorithms are not yet
precise enough or scalable enough to be practical.
This project will investigate a new approach to the problem:
incremental,
programmer-guided ownership inference. Because ownership requires
knowledge of the programmer's design intent, our system gathers simple
design intent from the programmer and uses this together with analysis
technology to annotate the program semi-automatically. Because
ownership, as a form of context-sensitive aliasing information, is
intractable to compute for large programs, our approach will provide
tool support for incrementally annotating a few classes of the program
at a time.
The student working on this project will begin by adapting previous
ownership inference algorithms to a purely local setting. Local
inference will rely on partial program annotations added by the
developer; these annotations will cut the recursive program
dependencies that have made inference intractable in the past. As
a result, the student should be able to prove that inference results in
a principal type: ownership annotations that generalize all other
possible annotations for the same partially annotated source
code. Based on this design, the student will implement an Eclipse
plugin that allows a developer to specify a set of classes on which to
run inference, to provide an initial set of annotations, to run the
inference algorithm, and to examine and accept the results of
inference. Finally, the student will evaluate the final tool on
code examples at scale, and write up a report describing the algorithm,
tool, and experience.
3. Language Design: Combining Strengths of Functional and
Object-Oriented Languages
Functional and object-oriented languages have complimentary
strengths. Functional languages allow programmers to define fixed
datatypes and then easily write many different operations over those
datatypes: however extending datatypes to add a new case is difficult,
requiring the datatype definition and all functions to be edited.
Object-oriented (OO) languages allow programmers to define classes that
are similar to datatype cases, with a fixed set of operations that can
be overridden by subclasses. OO languages make it easy to add new
cases to a datatype, simply by adding new subclasses, but adding new
functions that dispatch over a hierarchy is difficult and requires
awkward workarounds like the visitor design pattern.
This project is aimed at developing a new foundational language that
captures the best of both language paradigms. It is based on
research on multi-methods and open classes in the object-oriented
community, which attempt to bring some of the extensibility of
functional programming to the object-oriented paradigm. However,
these existing approaches retain a declaration-oriented style that is
common to OO languages but is much more limited from a modularity and
reuse point of view compared to "value-oriented" functional languages
like ML or Lisp. New approaches are needed to fully unify these
two programming paradigms.
4. Data Flow Analysis Infrastructure
Former students: Tim Kirchner. Positions
open for continuing work.
The goal of this project would be to refine a data flow analysis
framework to support richer forms of analysis (e.g. interprocedural
analysis, flow sensitivity). The target data flow analysis
framework is Crystal, an Eclipse-based project currently used to teach
CMU courses on software analysis.
5. Model Checking for Objects
This project would involve investigating ways to model check
object-oriented software systems. Very sophisticated predicate
abstraction refinement techniques have been developed in projects like
MAGIC, SLAM, and Blast for system programs written in C-like
languages. However, much less has been done for object-oriented
software. Two major challenges are handling the recursive
functions that are common in OO systems, as well as dealing with the
structure of the heap. This research project is currently less
well defined than those above, but could be refined for a very
interesting REU.
6. Protocol Specification for Objects in JML
This project would apply our previous
research specifying protocols of interaction between objects, to a
practical setting. The Java Modeling Language
(JML) is a standard specification language for Java; the goal of this
project would be to take a draft specification of an extension to JML
that defines protocols and implement it, by translation to existing JML
constructs.
Past Undergraduate Research Projects
We've had a lot of great REUs in the past. Here are some of the
undergraduates that have done research in the Plaid group, along with a
brief description of their project and a paper, if applicable.
- Andi Bejlieri
- Ego: a new core language with linear objects and methods, which
supports type-safe method addition, removal, and update.
- Andi Bejleri, Jonathan Aldrich, and Kevin Bierhoff. Ego:
Controlling the Power of Simplicity.
In Proceedings of the POPL '06 Workshop on the Foundations of
Object-Oriented
Languages, January 2006.
- Matthew Kehrt
- Refined the Ego core language to simplify the design and add
support for a region-based borrowing technique.
- Paper in preparation.
- Will Cooper
- Worked on ownership type inference, a predecessor to the
related project described above.
- Paper in preparation.
- Lee Salzman
- Kevin Donnelly
- Implementation and empirical evaluation work on selective open
recursion, a technique that provides a more modular version of
inheritance (and solves the fragile base class problem) by specifying
where callbacks to subclasses are expected.
- Jonathan Aldrich and Kevin Donnelly. Selective
Open Recursion: Modular Reasoning about Components and Inheritance.
Proc. FSE 2004 Workshop on Specification and Verification of
Component-Based Systems, November 2004.