Overview

To meet NIH Genomic Data Security Policy, regulated storage is configured for datasets covered by Data Use Certifications (DUC).

Currently the regulated storage service is designed to meet the data management and access requirements in the NIH Genomic Data Sharing (GDS) Policy outlined in NIH notice NOT-OD-24-157. While we are not currently configuring regulated storage for other DUCs or DUAs, this is on the roadmap for this service.

For more information about access and use regulated storage please email scicomp. If you would like to learn more about the process of creating an appropriate data stewardship plan, please visit the Data Governance team’s page about NIH Repository Data Access.

Provisioning

Regulated data storage is provisioned by SciComp after a data stewardship plan has been created and executed. You can visit the Data Governance team’s page about NIH Repository Data Access for more information, including how to start that process.

Data Loss Safeguards

Data protection in regulated is minimal. There are two snapshots A snapshot is a point-in-time image of a file system that is accessible from within the file system. Snapshots are not substitutes for backups. taken approximately 30 minutes apart. There are no backups a backup is a copy of data taken and stored elsewhere so that it may be used to restore the original after a data loss event
or replicas (on or off campus).

Backups are being considered, but there are no immediate plans to provide backups of data in regulated storage.

Data Lifecycle

Data in regulated directories are generally not lifecycled or purged. The exception is data in the temp directories, which are purged after 30 days similar to data in the temp filesystem. See the section below about how temp directories are organized.

The data steward for the project is responsible for removing regulated data upon the expiry of the data use agreement

Accessing Regulated Storage

Regulated storage is available in the rhino/gizmo compute environment under the path /fh/regulated. This is organized by PI, with each regulated data set stored in its own directory:

/fh/regulated/pi_n/
├── 12345
├── 54321

There are also directories configured for each user with access to a regulated directory- these are to be used for derived and intermediate data that is still covered by the DUA. These are underneath temp in the PI regulated data directory:

/fh/regulated/pi_n/
   ...
└── temp
    └── user
        ├── npi
        ├── usera
        ├── userb

Data Lifecycle

Data in the temp directories are purged after 30 days similar to data in the temp filesystem.

Data in regulated directories are not lifecycled or purged.

The data steward for the project is responsible for removing regulated data upon the expiry of the data use agreement

PROOF Compatibility

We recommend using PROOF to orchestrate the analysis of genomic data stored on /fh/regulated. PROOF has features to ensure that some but not all of the intermediate files and artifacts created during genomic data analysis are handled in compliance with the NIH GDS Policy. See the PROOF guide and our page about PROOF Regulated for more information.

Quotas and Charges

There are currently no quotas or charges for regulated storage.

This will change- the exact parameters have not been determined at this time (Sep 2025) but there will be some chargeback for usage and a quota to manage space on the underlying storage server

Updated: