IBM Unveils OCR-Based Data Masking Technology

IBM pulled the covers off new technology that uses optical character recognition to conceal data, filtering it before it reaches the PC screen

IBM researchers have developed new data masking technology they say mixes screen scraping and optical character recognition (OCR) to conceal confidential data.

The platform-agnostic software, codenamed MAGEN (Masking Gateway for Enterprises) works by treating information on the screen as a picture and relying on optical character-recognition technology to determine which on-screen fields need to be blanked out or replaced with random values. According to IBM, unlike other tools, MAGEN does not change the data. Instead, it uses a list of rules to conceal sensitive data before it reaches the PC screen.

“Say, for example, we have a call centre in India where agents provide customer support for ABC Rental Car in the US,” IBM public affairs manager, Ari Fishkind said. “The agent in India isn’t running the application on his computer, rather he is running an application from the company’s headquarters in the US. We add a MAGEN computer to the system — whether in India, the US or somewhere in between… Basically, our computer catches a bitmap of the screen that is on its way to the agent in India.”

“We take the picture of the screen and use OCR to identify where there are structures and labels,” he continued. “Initially, the administrator [who can be sitting at any remote location] is shown the picture of the screen and also a summary of the type of data that appears in this screen. The administrator can then decide what she wants to do with each type of label, table, field and so forth.”

The innovation marks the second data privacy breakthrough to come out of IBM Labs in the past month. A few weeks ago, a researcher at IBM developed a fully homomorphic encryption scheme that allows data to be manipulated without being exposed – a discovery the company contended could be important for securing cloud-computing environments and fighting spam.

For now, MAGEN is still in the proof-of-concept phase and is a way off from being market-ready. However, officials with IBM said that the technology would make it unnecessary to create a brand new sanitised copy of data for individual readers, thereby reducing complexity.

“MAGEN’s screen masking approach eliminates the need to painstakingly tailor ‘data masking’ solutions to specific environments,” said Haim Nelken, manager of Integration Technologies at IBM’s research lab in Haifa, Israel, in a statement. “The bottom line is faster performance, simpler database security and reduced costs for protecting sensitive data.”