MIT System Automatically Repairs Security Bugs In Code
The CodePhage research system solves crashes by ‘grafting’ security checks from a ‘donor’ program
MIT has presented a system designed to automate the repair of security bugs in computer code by borrowing ideas from medicine, using a process the institute compares to a graft of healthy tissue from a donor.
CodePhage, as MIT’s system is called, is designed to fix bugs by analysing the security checks of a “donor” program that isn’t susceptible to the same problem. The system then crafts code based on the “healthy” program and integrates it into the “recipient”.
Automated repair
The system doesn’t require access to the donor’s source code, and as a result can import checks from applications written in languages other than that of the recipient.
MIT said security checks can make up 80 percent or more of the code in current commercial software, so that automating the insertion of those checks could drastically reduce the grunt work carried otu by developers.
Stelios Sidiroglou-Douskos, the MIT research scientist at Computer Science and Artificial Intelligence Laboratory (CSAIL) who led CodePhage’s development, said the system could reduce the duplication of effort that currently takes place in code development.
“We have tons of source code available in open-source repositories, millions of projects, and a lot of these projects implement similar specifications,” he said in a statement. With CodePhage, “over time, what you’d be doing is building this hybrid system that takes the best components from all these implementations”.
Sidiroglou-Douskos and co-authors Martin Rinard, Fan Long, and Eric Lahtinen tested CodePhage on seven common open source programs in which bugs had been detected, importing repairs from between two and four donors for each. In all instances CodePhage was able to effectively patch the vulnerable code, with patching taking two to 10 minutes per repair, MIT said.
“The longer-term vision is that you never have to write a piece of code that somebody else has written before,” stated MIT professor of computer science and engineering Martin Rinard. “The system finds that piece of code and automatically puts it together with whatever pieces of code you need to make your program work.”
Code donation
CodePhage starts off with two sample inputs, one that causes the recipient to crash and one that doesn’t. It introduces the “safe” input to the donor, and tracks the sequence of operations the donor executes.
Then it feeds the donor the crash-inducing input and again tracks the donor’s actions. Where the two sequences of actions diverge, the system guesses that this might represent a security check missing in the recipient.
It translates the donor’s actions at that point into the language of the recipient and adds this code, testing to see if the problem has been solved. If it hasn’t, the system continues to try the same technique at other points of divergence until it finds a security check that protects against the crash.
Emery Berger, a professor of computer science at the University of Massachusetts at Amherst, said the fact that the system works as well as it does is “surprising”.
“The donor program was not written by the same people,” he stated. “They have different coding standards; they name variables differently; they use all kinds of different variables; the variables could be local; or they could be higher up in the stack. And CodePhage is able to identify these connections and say, ‘These variables correlate to these variables.’ Speaking in terms of organ donation, it transforms that code to make it a perfect graft, as if it had been written that way in the beginning.”
The system was presented at this month’s conference of the Association for Computing Machinery’s Programming Language Design and Implementation.
Are you a security pro? Try our quiz!