Environment and background
Our focus in the Center for Applied Molecular Medicine (CAMM) is the discovery of biomarkers that are indicative of a patient’s likely response to existing therapies. The benefits of a technology to “personalize” medical treatment by accurately predicting outcomes are hard to overestimate in terms of improved outcomes and reduced costs in bringing drugs to market and delivering effective care. The barriers to personalized medicine are both technological and biological in nature. First, reliable approaches to routinely and rigorously measure the composition of biological samples must be developed, then approaches for interpreting those measurements to describe the state of a complex biological system are required.
CAMM investigators generate large quantities of experimental data across multiple platforms, including mass spectrometry, gene expression, microscopy, ELISA, and protein microarrays. We are seeking a talented statistical data scientist to ensure these data are accurate, derived from reproducible experiments, and answer clearly defined scientific questions. The candidate is expected to have a strong background in the statistical analysis of biological data. Key success factors in the performance of this position include a high level of attentiveness, the ability to collaborate closely with others from diverse disciplines, and a willingness to learn.
Specific responsibilities include:
1. Collaborate with CAMM investigators to design well-powered, randomized, balanced experiments by applying knowledge of laboratory equipment, design of experiments techniques, and the biological questions to be investigated.
2. Design quality assurance procedures and analytical methods for all data collection instrumentation. Collaborate with experimental colleagues to ensure proper performance and evaluation of quality control experiments and internal controls. Quantify and build models of technical variability in each of the experimental platforms and collaborate with experimentalists to reduce this variability. Design and propose improvements to experimental process. Lead regularly scheduled QC performance discussions at CAMM group meetings.
3. Collaborate with CAMM investigators to define and execute data integration and analysis to answer scientific questions from designed studies. Develop, execute, and maintain relevant data analysis scripts in R.
4. Design, implement, and maintain a center-wide data repository. Define naming conventions and search mechanisms both for CAMM-generated data and data imported from external resources and collaborators. Implement backup and recovery protocols for data repository. Perform regularly scheduled tests of repository backup recovery. Report on repository performance and usage statistics at group meetings.
5. Keep up-to-date with best practices and current literature on data repositories, design of experiments, and statistical methods of data analysis. Attend and present at relevant conferences.
1. Ph.D. in relevant quantitative subject (e.g. biostatistics, bioinformatics, biomedical engineering, physical sciences, computer science)
2. At least two first- or senior-author publications on relevant topics (e.g. data warehousing, analysis of genomic/proteomic data sets, development of statistical techniques, machine learning).
3. At least five years’ experience with the R statistical analysis platform. Ability to write R code to professional standards.
4. Experience working with or implementing/maintaining long-term use data repositories (e.g. within an RDBS or on a file system). Good familiarity with Linux and Linux system administration.
5. Familiarity with design of experiment techniques.
6. Background in biological research and ability to engage in meaningful scientific conversation with pure biologists.
7. Ability to quickly learn about new experimental techniques, biological experiments, and data types.
8. Experience developing and using quality control metrics.