The National Science Foundation (NSF) has awarded a five-year, $5 million grant to Rensselaer Polytechnic Institute and partners to expand United States leadership and engagement in the international data community through the Research Data Alliance (RDA). RDA is accelerating the development of global infrastructure for data sharing and exchange among diverse research areas—including tools, code, institutional policy, and best practices—that provide the foundation for new data-driven insights and discoveries. RDA/US is comprised of the United States members of RDA, which was officially launched in March 2013 with support from the NSF, European Commission, and the Australian Commonwealth Government.

The $5 million grant broadens the reach of RDA/US with three pilot programs in the U.S. and expanded participation in the international RDA. The three pilot programs will engage more U.S. data-oriented organizations in the RDA, disseminate techniques and tools developed by the RDA within the U.S., and provide opportunities for students and young professionals in the U.S. to work with the RDA, according to Francine Berman, chair of RDA/US and the Edward P. Hamilton Distinguished Professor of Computer Science at Rensselaer. The new grant is a collaboration between Rensselaer, Indiana University Bloomington, and the Corporation for Networked Research Information (CNRI), and part of a suite of activities at the new Rensselaer Institute for Data Exploration and Applications (IDEA).

“In one year, the Research Data Alliance has grown from an organizing group of eight individuals to a vibrant community of more than 1,000 members from 53 countries and all sectors,” said Berman. “The dramatic growth of the RDA emphasizes the need for data infrastructure and coordination worldwide and has enhanced U.S. leadership and competitiveness in the data arena. U.S. engagement in the RDA is particularly timely, as U.S. ‘big data’ and ‘public access to research’ initiatives are emerging as top national priorities to spur innovation.”

“The new grant promotes practical advances in data sharing that are key to scientific collaborations enabling discoveries to address needs of our global society,” said Robert Chadduck, NSF program director for data and cyber infrastructure. “NSF is proud to join our global colleagues in continuing to support the RDA initiative.”

RDA supports interest groups that discuss broad-based topics from preservation of e-infrastructure, to agricultural data interoperability, as well as 12- to 18-month working groups that focus on the development and use of specific tools, software, standards, and other infrastructure to facilitate data sharing for a targeted purpose.

Mark Parsons, managing director of the RDA/US and the Rensselaer Center for the Digital Society, said the organization produces deliberately flexible tools that can be adapted to suit the purpose of diverse research needs. For example, said Parsons, one RDA working group is facilitating research that correlates pollution levels with asthma risk, and must link health data with environmental data. The group may devise a simple protocol aligning the format of physical addresses in datasets, a step that could be widely applicable to other research.

“We know that what we need is something like an Internet for data, but we don’t know what the best configuration for that infrastructure will be,” said Parsons. “So instead of building a monolithic one-size-fits-all system, we are providing small building blocks—protocols, pieces of code—for specific situations, and then looking for other research scenarios where those building blocks might be applicable.”

Indiana University Bloomington will lead two of the three pilot programs. The first is a series of “data challenges” that will encourage and reward the community to adapt and use the techniques and tools developed by RDA working groups. These challenges will range from computer science tools such as a registry of data types, to specific applications that could, for example, help improve crop yields by integrating diverse agricultural data.

“The data challenges bring together multiple teams over a compressed and energetic timeframe. These are researchers who can benefit from an RDA emergent technology, and that gives them a chance to ‘kick the tires,’” said Beth Plale, professor in the School of Informatics and Computing at Indiana University, and RDA/US executive committee member.

Indiana University Bloomington will also lead a program to engage emerging data scientists and users. RDA/US will provide internships for students and fellowships for early career professionals to contribute to and use the products of RDA, according to Plale.

“The career trajectory of the data scientist is a topic of considerable attention in academia and industry.  The emerging scholar program gives the RDA/US community an opportunity to enhance and integrate the emerging scholar into the fabric of RDA in ways that strengthen their career path,” said Plale.

The Corporation for Networked Research Information will take the lead on the third pilot—a program to engage more data-oriented organizations in RDA. RDA/US will offer informational and working sessions at key meetings of data-oriented organizations over a wide variety of domains. It will coordinate relevant activities with such organizations, and develop joint activities with the RDA/US student and young professionals program, according to Larry Lannom, of CNRI.

“We want to develop RDA as the foremost venue for organizations and individuals to come together to craft the tools and approaches that will let us unlock the potential in the vast amounts of scientific data now being gathered by research institutions worldwide, enabling that knowledge to be applied to the great challenges we face as a global society,” said Lannom.

Big Data, broad data, high performance computing, data analytics, and Web science are creating a significant transformation globally in the way we make connections, make discoveries, make decisions, make products, and ultimately make progress. RDA/US is part of the university-wide effort at Rensselaer to maximize the capabilities of these tools and technologies for the purpose of expediting scientific discovery and innovation, developing the next generation of these digital enablers, and preparing our students to succeed and lead in this new data-driven world.