Senior Data Engineer

Tarrytown, NY
Sep 17, 2019
Required Education
Bachelors Degree
Position Type
Full time
The Regeneron Genetics Center (RGC) is a wholly-owned subsidiary of the Company, whose goals are to apply large scale human genetics to identify new drug targets and to guide the development of therapeutics programs and precision medicine. Building upon Regeneron's strengths in mouse genetics and genetics-driven drug discovery and development, the RGC specializes in ultra-high-throughput exome sequencing, large scale informatics and data analysis encompassing genomics and electronic health records, and translating genetic discoveries into new biology and drug discovery opportunities. The RGC leverages multiple approaches including large population based studies, Mendelian genetics and family based studies, founder population genetics, and large-scale disease focused projects and has developed a network of over 50 collaborations with research organizations around the world. Including some of the largest sequencing studies in the world, such as the DiscovEHR study in collaboration with Geisinger Health System, and an initiative to sequence 500,000 participants with the UK Biobank, the RGC has built one of the largest human genetics databases, including sequence data from over several hundred thousand participants and rapidly growing. Our interests encompass a breadth of different areas across all therapeutic areas and the RGC is highly integrated into all facets of research and development at Regeneron. Program goals include target discovery, indication discovery, and patient-disease stratification. Objectives include advancing basic science around the world through public sharing of discoveries, providing clinically-valuable insights to physicians and providers of collaborating health-care systems, improving patient outcomes, and identifying novel targets for drug development.

The RGC's Genome Informatics team leads the primary and secondary analysis of more than 500,000 samples a year, including production pipelines, cloud-compute infrastructure, and sequencing and variant quality control. Working closely with other RGC teams, our extensive genomics R&D portfolio supports multi-omics applications (RNA, long reads), unprecedented-scale variant calling, disease association studies, and loci-specific analyses that directly impact cutting-edge drug development.
The Senior Production Engineer will lead the innovation, design and development of the RGC's production genomics compute infrastructure. Under the direction of the GI-Prod Director, this role will apply the latest hardware, software and cloud technologies to innovative solutions in genomics data structuring, distributed compute workflows, and unprecedented scale data manipulation and mining.


• Optimize production workflows and architecture

• Working with GI-Prod leads, innovate ultra-large scale variant freeze process (QC and joint-genotyping)

• Work closely with RGC Data Engineering to integrate production with distributed compute environment

• Develop tools for GI-Prod and RGC users to facilitate at-scale, at-speed genomics (e.g. QC, variant calling, multi-omics)

• Lead GI-Prod software development best practices (version control, code repositories)

• Extend R&D work of GI-Prod to scalable, distributable software (CNV detection, RNA-seq)


• Bachelors in Computer Science or similar & 5+ years

• Unix, Python (or similar)

• Code repositories (GitHub, Sourceforge, Bitbucket)

• Cloud Platform (AWS, GCP)

• Standard bioinformatics tools (Samtools, BCFtools, VCFtools, BWA, GATK, Picard, BEDtools, PLINK)

• Genomics data formats and analysis

• Distributed compute platforms

This is an opportunity to join our select team that is already leading the way in the Pharmaceutical/Biotech industry. Apply today and learn more about Regeneron's unwavering commitment to combining good science & good business.

To all agencies: Please, no phone calls or emails to any employee of Regeneron about this opening. All resumes submitted by search firms/employment agencies to any employee at Regeneron via-email, the internet or in any form and/or method will be deemed the sole property of Regeneron, unless such search firms/employment agencies were engaged by Regeneron for this position and a valid agreement with Regeneron is in place. In the event a candidate who was submitted outside of the Regeneron agency engagement process is hired, no fee or payment of any kind will be paid.

Regeneron is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability status, protected veteran status, or any other characteristic protected by law.