Olin Level 3

Olin Library Level 3 is temporarily closed through the summer. See the Carpet Replacement on Level 3 of the Olin Library article to learn more about this project.

Students in Olin Library
Back to All News

WashU Libraries Take Action to Preserve Federal Data

Data rescue event
Avianna Wooten and Jason Murray, WashU staffers at a Data Rescue hackathon in John M. Olin Library. Photo by Hiba Ahmad/St. Louis Public Radio.

During the week of January 27, 2025, several federal government websites and datasets went offline. According to The New York Times, 8,000 web pages were taken down. Approximately 3,000 datasets relating to public health research, widely used large scale national health surveys, climate change, environmental policy, social science research, and more were removed. Most of the affected datasets were from the websites of federal agencies, including the Centers for Disease Control, the Census Bureau, and the Department of Health and Human Services.

Quite a few of the missing websites and datasets were subsequently restored. However, concerns persist around the restoration being only partial and about missing and altered documentation on the websites. In May, NOAA decommissioned several datasets related to climate and ocean monitoring. Additionally, numerous datasets on government websites continue to face the grave risk of deletion.

Public data lies at the heart of scientific research, inquiry, and public health policy decisions. The possibility of its elimination has galvanized institutions and universities across the nation to launch urgent data rescue efforts.

“We heard alarm from every corner of our campus and the community over the loss of critical, scientific research datasets. The research community has made significant strides in making data publicly accessible, so we can see a return on our investment. Since these data are taxpayer-funded, the public literally owns them. Our vice provost and university librarian heard the call and supported library staff in organizing data rescue at WashU. Although our main objective was to ensure these data persist, we underestimated how much individuals needed to engage in rescue efforts to feel a sense of empowerment and hope,” said Jennifer Moore, head of Data Services at WashU Libraries.

Data Rescue Project

WashU’s Data Rescue series is part of a larger global endeavor. The Data Rescue Project is a joint effort among a group of data organizations, including the International Association for Social Science Information, Research Data Access and Preservation, and members of the Data Curation Network, to serve as a clearinghouse for data rescue efforts nationally and internationally.

WashU Libraries staff organized six Data Rescue hackathons in the spring semester to identify and safeguard at-risk websites and federal data. A total of 135 librarians, students, faculty, researchers, and community members participated in the hackathons held at John M. Olin Library. Volunteers were able to participate in three tracks: Advocacy & Education, Web Archiving, and Data Capture.

The events garnered significant media attention. Both the St. Louis Post-Dispatch and St. Louis Public Radio documented the dedicated efforts of volunteers at the events.

“As a library and information professional, access to information is one of the key tenets of library work. So, for me personally, regardless of what the data is about and its potential uses, the elimination of access to those datasets is sort of an affront to my professional ethics,” Esther Gabriel, a WashU librarian, said in an interview with St. Louis Public Radio.

Vice Provost and University Librarian Mimi Calter points out that the Data Rescue Project has enabled institutions to collaborate more effectively in archiving and making available critical federal and public data. “I’m thrilled that WashU libraries have been able to participate in the program. Accessibility is core to the mission of the libraries, and having this opportunity has enabled us to directly address a need of our faculty and students,” said Calter.

Working with a consortium of data organizations helps universities like WashU avoid duplicative efforts, said Avianna Wooten, lead organizer and the Data Management and Sharing specialist at the Libraries. “What a centralized effort does is it helps the organizations and universities that are participating in this effort to see what we are all working on,” Wooten remarked.

The Data Rescue Project has a selection of federal datasets that have been categorized as at risk of removal and are designated for rescue. In the Web Archiving track, WashU Data Rescue volunteers investigated 719 websites and nominated a total of 372.

The process of preserving a dataset during a hackathon involves several steps. Organizations and universities can access the tracker on the Data Rescue Project that lists at-risk data sets. An institution such as WashU can then identify and claim datasets to work on and their URLs are compiled in the WashU Data Rescue Tracker. Next, a volunteer will claim a specific dataset to work on. This involves collecting the data, gathering metadata or relevant information about the data, packaging it, and transferring it to the Libraries team. A coordinator then reviews the data and deposits it into DataLumos, a publicly accessible archive and repository of valuable government data managed by the University of Michigan.

A total of 176 datasets were processed and preserved by volunteers at the six hackathons at the Libraries. Of these, 40 have been verified and deposited in DataLumos by Libraries coordinators. These include a wide variety of public data like the bird influenza surveillance statistics at the USDA, NOAA’s National Weather Service Historical Heat Risk Data, and the Bureau of Transportation statistics related to the transport of hazardous materials.

In addition to the datasets identified by the Data Rescue Project, volunteers at WashU have also worked toward preserving environmental data sets from the Environmental Data Government’s Initiative (EDGI), which has its own priority list of at-risk data. Several datasets highlighted for rescue by the Libraries were nominated by the WashU community as critically important information worth saving. These include datasets from the CDC’s National Institute for Occupational Safety and Health.

The data rescue hackathons were open to all and volunteers of varying levels of expertise who contributed to advocacy, web archiving, and data preservation efforts. The data sets, Wooten explains, range from simple CSV files to sprawling data webpages with complex dashboards, requiring both basic level and advanced tech skills. To save a multi-layered, large dataset, volunteers have been creating scripts to crawl the webpages and download them in an automated way.

While the spring semester data rescues have ended, there is still a way to get involved in the asynchronous Summer Data Rescue. “Unfortunately, efforts to save federal data remain urgent and needed. We immensely appreciate the work of our volunteers and welcome continued participation in the summer series,” said Wooten.

Asynchronous Summer Data Rescue

This summer, WashU Libraries is piloting an asynchronous version of the Data Rescue workflows that is open to all. Listed below are the tracks available for volunteers and links to additional information on how to get involved.

Volunteer Tracks: Please see our How to Start page to get started (link here).

Track 1: Advocacy & Education

While it is not required, we would appreciate your sharing any of the materials you create for this track with us.

Track 2: Web Archiving

This track is wholly self-paced. Please let us know if you have any questions.

Track 3: Data Capture

a. To participate in this track, you will need to create a free Smartsheet account.

b. After creating a Smartsheet account, you will be able to claim a dataset on the WashU 2025 Data Rescue Tracker.

c. You will then have 3 days after claiming a dataset to work on it.

Before getting started on any of the Data Rescue volunteer tracks, please complete an anonymous survey. If you have any questions, please let us know at [email protected].

OSZAR »