Regulated File System
Overview
To meet NIH Genomic Data Security Policy, regulated storage is configured for datasets covered by Data Use Certifications (DUC).
Currently the regulated storage service is designed to meet the data management and access requirements in the NIH Genomic Data Sharing (GDS) Policy outlined in NIH notice NOT-OD-24-157. While we are not currently configuring regulated storage for other DUCs or DUAs, this is on the roadmap for this service.
For more information about access and use regulated storage please email scicomp
. If you would like to learn more about the process of creating an appropriate data stewardship plan, please visit the Data Governance team’s page about NIH Repository Data Access.
Provisioning
Regulated data storage is provisioned by SciComp after a data stewardship plan has been created and executed. You can visit the Data Governance team’s page about NIH Repository Data Access for more information, including how to start that process.
Data Loss Safeguards
Data protection in regulated is minimal. There are two
snapshots
A snapshot is a point-in-time image of a file system that is accessible from within the file system. Snapshots are not substitutes for backups.
taken approximately 30 minutes apart. There are no
backups
a backup is a copy of data taken and stored elsewhere so that it may be used to restore the original after a data loss event
or replicas (on or off campus).
Backups are being considered, but there are no immediate plans to provide backups of data in regulated storage.
Data Lifecycle
Data in regulated directories are generally not lifecycled or purged. The exception is data in the temp
directories, which are purged after 30 days similar to data in the temp filesystem. See the section below about how temp
directories are organized.
The data steward for the project is responsible for removing regulated data upon the expiry of the data use agreement
Accessing Regulated Storage
Regulated storage is available in the rhino/gizmo compute environment under the path /fh/regulated
. This is organized by PI, with each regulated data set stored in its own directory:
/fh/regulated/pi_n/
├── 12345
├── 54321
There are also directories configured for each user with access to a regulated directory- these are to be used for derived and intermediate data that is still covered by the DUA. These are underneath temp
in the PI regulated data directory:
/fh/regulated/pi_n/
...
└── temp
└── user
├── npi
├── usera
├── userb
Data Lifecycle
Data in the temp
directories are purged after 30 days similar to data in the temp filesystem.
Data in regulated directories are not lifecycled or purged.
The data steward for the project is responsible for removing regulated data upon the expiry of the data use agreement
PROOF Compatibility
We recommend using PROOF to orchestrate the analysis
of genomic data stored on /fh/regulated
. PROOF has features to ensure that
some but not all of the intermediate files and artifacts created during
genomic data analysis are handled in compliance with the NIH GDS Policy. See
the PROOF guide and our page about
PROOF Regulated for more information.
Quotas and Charges
There are currently no quotas or charges for regulated storage.
This will change- the exact parameters have not been determined at this time (Sep 2025) but there will be some chargeback for usage and a quota to manage space on the underlying storage server