In this section we will provide some guidance for researchers looking to develop a hypothesis that will have reasonable statistical power, identify the appropriate set of samples, and execute a large scale data production from those samples.
You could design a house without an architect, but a large part of the time and money you would spend could be alleviated by involvement from people for whom this procedural and rote knowledge is their job. The same is true for execution of a study involving large scale molecular datasets, where there is an incredible quantity of decision-making relating to study design, sample preparation, genomics data production, bioinformatics, and statistics. To give your study the best chance of succeeding (both in getting funding and answering hypotheses), it’s best to identify a set of collaborators familiar with aspects of the research for which your team does not already have expertise.
Clearly defining the experimental hypothesis will ease the process of evaluating the available tools and techniques. Finding a collaborator to help identify if any biological feature supports or rejects your hypothesis can be critical when using large scale molecular data. Multiple testing, difficulty determining biological significance, and power issues are common for large data sets where the number of measurements (p) is much larger than the sample size (n). There are different groups available at the Fred Hutch who collaborate with researchers using large scale datasets and can provide valuable insight into study design, data types and hypothesis generation as well as assistance with downstream analytics.