Uploading Tracks to view with the UCSC Genome Browser
Updated: October 6, 2023
Edit this Page via GitHub Comment by Filing an Issue Have Questions? Ask them here.This demo provides specific examples of how to upload tracks (or track hubs) for viewing in the UCSC Genome Browser.
The UCSC Genome Browser provides two different facilities to let you view your own data, Custom Tracks and Track Hubs.
If you just quickly want to view some of your own data in the UCSC Genome Brower, Custom Tracks are the easiest way to go.
If you want a more fully-featured and sustainable solution, consider Track Hubs.
SciComp supports the use of Amazon S3 to facilitate both approaches.
Getting set up with S3
In order to use S3, you will need credentials. You can obtain the credentials by following the instructions.
You will be uploading your genome data into your PI’s public S3 bucket.
By default, public buckets at the Hutch do not allow access (even with credentials) from outside the Hutch network. So you will need to email CLD
and tell us that you plan to host publicly accessible files in your PI’s public bucket, and we will enable sharing.
Uploading files to S3 for public access
Important Note: Viewing your own data in the UCSC Genome Browser (whether you use the custom track or track hub approach) involves uploading data and making it publicly accessible on the Internet. Even though the URL of the data may not be obvious, it is still public, and security through obscurity is not a recommended or supported approach to security. Therefore, you should never upload and make publicly accessible any data that contains PHI/PII or requires HIPAA compliance. If you have these needs, you may instead need to set up a mirror of the UCSC Genome Browser locally inside the Fred Hutch network. Contact
scicomp
for help with this.
You can use the AWS Command Line Interface (CLI) to upload files to S3, even before the CLD team has enabled the share on your public bucket.
This Wiki contains some basic instructions for interacting with S3 using the CLI, and AWS provides the full documentation.
The main thing to remember when uploading data for use with the UCSC Genome Browser is that the data will need to be publicly accessible, however this can be handled through the S3 bucket resource policy so your aws s3 cp
or aws s3 sync
commands should have the --acl public-read
flag at the end.
Example
Assume the following:
- Your PI is Jane Doe, so your PI bucket is called
fh-pi-doe-j-eco-public
. - You have a single VCF file (
foo.vcf.gz
) and an index (foo.vcf.gz.tbi
) in a directory calledvcfs
, which is in your current directory.
You can use the following commands to upload these files:
ml purge
ml awscli
aws s3 cp vcfs/foo.vcf.gz s3://fh-pi-doe-j-eco-public/ucsc-tracks/
aws s3 cp vcfs/foo.vcf.gz.tbi s3://fh-pi-doe-j-eco-public/ucsc-tracks/
After doing these commands, the files will be available at the following URLs (which you can provide to the UCSC Genome Browser):
https://fh-pi-doe-j-eco-public.s3.amazonaws.com/ucsc-tracks/foo.vcf.gz
https://fh-pi-doe-j-eco-public.s3.amazonaws.com/ucsc-tracks/foo.vcf.gz.tbi
Of course, you will need to substitute your own bucket name and file names.
cp
vs sync
If you are setting up a Track Hub, you might need to upload more files, perhaps a whole directory (which may contain subdirectories). In this case, aws s3 sync should be used instead of aws s3 cp.
Assuming your track hub files are in a directory called hub
, underneath your current directory, you can copy all the contents of that directory with this single command:
ml purge
ml awscli
aws s3 sync hub s3://fh-pi-doe-j-eco-public/track-hub/ --acl public-read
If you had a file in hub
called foo.txt
, that file would then be accessible at the URL:
https://fh-pi-doe-j-eco-public.s3.amazonaws.com/track-hub/foo.txt
Questions?
Please email scicomp
for more assistance.
Updated: October 6, 2023
Edit this Page via GitHub Comment by Filing an Issue Have Questions? Ask them here.