Sr. Data Engineer, Epigenetics

San Francisco, CA
Apr 30, 2018
Required Education
Bachelors Degree
Position Type
Full time

Other Locations:US- CA- San Francisco- Owens Street Celgene

Celgene is a global biopharmaceutical company leading the way in medical innovation to help patients live longer, better lives. Our purpose as a company is to discover and develop therapies that will change the course of human health.  We value our passion for patients, ​quest for innovation, spirit of independence and love of challenge. With a presence in more than 70 countries, and growing - we look for talented people to grow our business, advance our science and contribute to our unique culture.

Celgene seeks a talented, results-oriented individual to contribute to our informatics and data management initiatives in Research and Early Development (R/ED).  This hands-on role interfaces with programs spanning both discovery and translational sciences where processing and interpreting multi-platform and multi-dimensional ‘omic' data in pre-clinical and clinical settings is being employed to identify drug targets and mechanistic insights, prioritize clinical indications and generate patient selection hypotheses.  We are seeking an individual with extensive experience managing, processing, integrating and applying quality control processes to a wide range of data types in support of R/ED clinical trials and drug development experiments while working in close collaboration with basic research, translational and computational scientists.

Responsibilities include

  • Ensure accurate, complete and timely collection, integration and tracking of analytical information from internal or contract laboratory providers or collaborating laboratories for curation, ingestion and delivery to computational and translational scientists

  • Empower scientists with tools, processes and data structures needed to support project objectives, including data integration efforts that may span multiple studies or experiments

  • Help define, deliver and implement R/ED, collaborator and partner laboratory analytical data management systems, processes and procedures

  • Work with R/ED study teams to develop R/ED information management plans that outline data capture, data flow, data queries, manual checks, and data listings needed to ensure data integrity and interpretability

  • Participate in comprehensive data review activities in coordination with project and study teams

  • Work with computational biologists, computational scientists, biostatisticians and study scientists to resolve data quality issues

  • Make data, including raw/interim data, available to R/ED department personnel as required

  • Acquire user feedback to inform business requirements for future data systems development.

Experience and Education

  • Bachelor's degree in a relevant discipline with at least 14 years' experience, Master's degree with at least 12 years' experience or PhD with at least 6 years' experience in biomedical data management, assay development, specimen data management or related discipline 

  • Demonstrated proficiency with molecular biology assay concepts and ability to support, develop and deploy laboratory and other research data management processes and procedures as they apply to complex, high dimensional data sets; understanding of biological principles governing gene expression regulation a plus

  • Extensive practical experience in curating and working with diverse but highly-connected scientific knowledge collections and their query interfaces to enable research hypotheses around compound targets, mechanisms of action, and patient response

  • Demonstrated ability to understand and translate high-level scientific datasets and results into data curation and management strategies

  • Proven ability to work in a team environment with clinical personnel, operational personnel, study monitors, computational biologists, biostatisticians, programmers, and medical writers

  • Demonstrated proficiency with current software engineering methodologies, such as Agile, source control, project management and issue tracking

  • Working knowledge of cloud computing; AWS experience preferred

  • Working knowledge of Rest APIs and container strategies strongly preferred.

  • Knowledge of distributed database design and implementation, LAMP/ MySQL, etc. with capability to perform/direct/assess implementation of such databases

  • Excellent skills in R programming and experience in additional computer languages such as Perl, Python, PHP, S-PLUS or Java (or C/C )

  • Experience producing visualization of data sets (eg., R/shiny, Spotfire, etc)

  • Working knowledge of both Windows and Linux operating systems is required

  • Along with programming proficiency must have creativity, and show a strong capacity for independent thinking and the ability to grasp underlying biological questions

  • Must have excellent written and verbal communication and presentation skills

  • Must have excellent time management and organizational skills

  • Experience working with sequencing datasets a plus



Celgene is committed to equal opportunity in the terms and conditions of employment for all employees and job applicants without regard to race, color, religion, sex, sexual orientation, age, gender identity or gender expression, national origin, disability or veteran status. Celgene complies with all applicable national, state and local laws governing nondiscrimination in employment as well as employment eligibility verification requirements of the Immigration and Nationality Act. All applicants must have authorization to work for Celgene in the U.S.