The next generations of supercomputers will be powerful enough to analyze complex interactions, like those between the users of a social network or between the neurons in a human brain. But the match between a machine that works on the exascale (1018 operations per second) and a complex dataset on the petascale (1015 elements) presents multiple challenges and must be carefully orchestrated.
George Slota, a computer scientist at Rensselaer Polytechnic Institute, has been granted a prestigious National Science Foundation Faculty Early Career Development (CAREER) award to develop approaches to the problem.
“How do we best understand and get insight from this kind of data? To do that, we have to map the data to the hardware, with consideration of the algorithm itself,” said Slota, who has been awarded a $490,000 grant. “Each aspect is fairly challenging because of the complexity of the data and the complexity of modern hardware.”
Imagine a social network as large as Twitter or Facebook. In the world of data, each user can be described as a point or “node,” and each interaction between users is a line or “edge” between the nodes. There are nearly limitless attributes for a node – geographical location, age, favorite tree – and the number of edges that connect users are apt to be highly irregular given that some people have millions of connections and other just a handful. The collection of nodes and edges, called “graph structured data,” presents two problems to computer scientists in that the data set can be both massive and irregular.
“There isn’t any inherent kind of pattern or structure to a lot of these networks,” Slota said. “So it becomes computationally challenging to work with these data sets because you can’t make any assumptions about what’s going on under the hood without actually first studying what’s going on under the hood.”
With the grant, Slota will develop a “graph layout,” a high-quality and scalable means of partitioning, ordering, and storing the data given the data type, the relevant algorithms, and the hardware platform that will be used to analyze it.
Once the data is loaded in a way that makes sense, the second challenge is processing it. In many ways, these are similar problems.
Slota will work to map the data and the algorithmic analysis method to the equally complex and irregular architecture of the supercomputer, with its network of interconnected computers, processors, and multiple levels of memory.
And then, Slota said, “we want to bring it home,” by which he means producing scalable open-source software, software frameworks, and toolkits that will enable the broader scientific community to easily address these challenges as related to their specific dataset, analytical problem, and hardware.
“The ideal solution is a general purpose way that, given any arbitrary dataset, we can load it, process it, and solve some arbitrary problem that’s relevant to a biologist who’s studying the brain, or a physicist studying particle interactions, or whatever the application may be,” Slota said.
“A key challenge pacing data-driven scientific advancement is the ability to organize and extract meaning from massive datasets associated with many fields of present-day research. George’s work points the way towards new solutions to this critical data analysis bottleneck. This CAREER award recognizes his potential, as well as the importance of this research, and we congratulate him on this recognition,” said Curt Breneman, dean of the School of Science.