Genomics Data Engineer (Data Science) - Regeneron Genetics Center

Tarrytown, NY, United States
Aug 05, 2019
Required Education
Bachelors Degree
Position Type
Full time
Known for its scientific and operational excellence, Regeneron is a leading science-based biopharmaceutical company that discovers, invents, develops, manufactures, and commercializes medicines for the treatment of serious medical conditions. Regeneron commercializes medicines for eye diseases, high LDL-cholesterol, atopic dermatitis and a rare inflammatory condition and has product candidates in development in other areas of high unmet medical need, including rheumatoid arthritis, asthma, pain, cancer and infectious diseases.

The Regeneron Genetics Center's Genome Informatics & Data Engineering team is looking for a R&D Data Engineer/Data Scientist to work at the interface of genomics, big data engineering, and advanced analytics. The candidate will contribute to the expansion of our Apache Spark-based distributed analytics platform, building production-quality data processing infrastructure and developing scalable algorithms to leverage genomic and health data from millions of individuals. This role encompasses engineering of end-to-end solutions that 1) unify and structure diverse data sets efficiently, and 2) perform downstream analyses at scale and derive genomic health insights that support Regeneron's drug development pipelines.
The ideal candidate will have a strong background in computer science, data mining, machine learning, or a related field, with demonstrated experience in engineering scalable and performant data processing software in Spark or another distributed compute environment. Background in bioinformatics or another life sciences domain is a plus, but not essential. This job requires strong communication skills in order to effectively collaborate with multiple cross-functional teams of scientists, analysts, and engineers.
This position will provide exciting opportunities to work on the bleeding edge of big data analytics and genomic medicine. The RGC hosts one of the world's largest data sets encompassing paired genomic and health data, presenting a unique opportunity to incorporate modern big data technologies into the field of precision medicine and to drive novel therapeutic discovery efforts forward.


• Build out a big data distributed architecture capable of efficiently processing terabytes of genomic and clinical data

• Develop algorithms and tools to analyze large data sets consisting of billions of rows

• Develop and deploy machine learning algorithms at scale

• Develop new web applications used by Regeneron scientists to analyze genomic and clinical datasets

• Build automated and production-quality data processing systems

• Interact and collaborate with other scientists to clearly define and iterate on requirements

• Keep abreast of new state-of-the-art software data engineering and data science technologies


This position requires a B.S. (M.S. or Ph.D. preferred) with experience in computer science, specializing in high-performance/distributed computing, data mining, machine learning, bioinformatics, or a related STEM discipline.

Additional requirements include:

• Experience in developing scalable, high-performance software, with a deep understanding of algorithm design principles and data processing pipelines

• Knowledge of distributed compute technologies, such as Spark, Hadoop, map-reduce, MPI, or other parallel computing frameworks is essential

• Strong foundation in data engineering, data science, and/or machine learning, with demonstrated experience applying these technologies at scale on real-world data sets

• 3+ years of software engineering experience in a modern Object Oriented or Functional language

• Knowledge of database technologies, indexing/partitioning, and SQL

• Excellent communication and presentation skills required

• Experience with cloud computing (AWS preferred)

• Familiarity with genomics and bioinformatics is preferred, but not required

This is an opportunity to join our select team that is already leading the way in the Pharmaceutical/Biotech industry. Apply today and learn more about Regeneron's unwavering commitment to combining good science & good business.

To all agencies: Please, no phone calls or emails to any employee of Regeneron about this opening. All resumes submitted by search firms/employment agencies to any employee at Regeneron via-email, the internet or in any form and/or method will be deemed the sole property of Regeneron, unless such search firms/employment agencies were engaged by Regeneron for this position and a valid agreement with Regeneron is in place. In the event a candidate who was submitted outside of the Regeneron agency engagement process is hired, no fee or payment of any kind will be paid.

Regeneron is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability status, protected veteran status, or any other characteristic protected by law.