Data Science Engineer

Tarrytown, NY, United States
Nov 20, 2020
Required Education
Position Type
Full time
The Regeneron Genetics Center's Genome Informatics & Data Engineering R&D team is looking for a Data Science Engineer to work at the interface of genomics, big data engineering, and advanced analytics. We are expanding our Apache Spark-based distributed analytics platform and our Project Glow open-source project, which is developing and applying scalable algorithms to genomic and health data from millions of individuals. We are engineering end-to-end solutions that unify and structure diverse data sets efficiently, and perform scalable downstream analyses to derive genomic health insights that support Regeneron's drug development pipelines.

In this role, a typical day might include the following:
  • Engineer production-quality distributed analytics pipelines capable of processing terabytes of genomic and clinical data
  • Apply modern data mining and machine learning methodologies on internal genetic and phenotypic datasets, using distributed and high-performance computing technologies to uncover insights to improve drug development efforts
  • Develop novel analytics tools and methods for both internal pipelines and within our open-source Project Glow initiative
  • Contribute to the design and implementation of large scale data pipelines and infrastructure
  • Interact and collaborate with other scientific and technical teams
  • Keep abreast of state-of-the-art software engineering and data science technologies
This job might be for you if:
  • You have an eye for detail and pride yourself on the quality of your work. Operational excellence matters more than just finishing the tasks.
  • With your sleeves rolled up, you work on current problems while thinking of future solutions.
  • You are motivated by the challenge of engineering analytics software that can be automated and scaled to massive datasets.
To be considered for this role, you must have a Ph.D with relevant experience. A strong background in computer science, data mining, machine learning, or a related field, with domain knowledge in life sciences, preferred but not required. Demonstrated experience in engineering scalable and performant data processing software in Spark or another distributed compute environment such as Hadoop, map-reduce, MPI, or other parallel computing frameworks is essential. 3+ years of software engineering experience in a modern Object Oriented or Functional language. Knowledge of database technologies, indexing/partitioning, and SQL.


Does this sound like you? Apply now to take your first steps toward living the Regeneron Way! We have an inclusive and diverse culture that provides amazing benefits including health and wellness programs, fitness centers and stock for employees at all levels!

Regeneron is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion or belief (or lack thereof), sex, nationality, national or ethnic origin, civil status, age, citizenship status, membership of the Traveler community, sexual orientation, disability, genetic information, familial status, marital or registered civil partnership status, pregnancy or maternity status, gender identity, gender reassignment, military or veteran status, or any other protected characteristic in accordance with applicable laws and regulations. We will ensure that individuals with disabilities are provided reasonable accommodations to participate in the job application process. Please contact us to discuss any accommodations you think you may need.