Software Engineer – Data Analytics Platform

Tarrytown, New York
Jun 08, 2022
Required Education
Bachelors Degree
Position Type
Full time

The Regeneron Genetics Center’s Production Data Analytics & Engineering team is looking for a Software Engineer to contribute to the design, implementation, and deployment of the code bases underlying our genomic health data platform. Within the Enterprise Data arm of the RGC, our team strives to democratize the analysis of large scale genomic data by using modern data platforms, standard methodologies, and thoughtful engineering. By bringing large scale genomic and phenotypic data into modern cloud data & analytics environments, we are pushing to mature the field of genomic analyses beyond local analysis settings and environment-dependent tools. With well-engineered and generalized data systems, we enable seamless integration into enterprise applications and remove barriers that prevent systems and analytics tools from being used for genetic data by non-domain experts.

Working with a team of data, ML, and software engineers, the candidate will have the opportunity to work on developing systems that support one of the largest genotypic & phenotypic data operations in the world. you will develop infrastructure & software supporting our data platforms and pipelines, ultimately producing tools that advance our genetics-driven drug development & discovery efforts in an enterprise-quality environment. Our team seeks an individual who will evangelize robust software engineering principles, can manage the SDLC and CI/CD pipelines, and who can contribute to a breadth of engineering projects spanning data engineering, ML operations, pipeline optimization, and genome informatics. As our systems grow in scope and scale, we aspire to enable faster development cycles, promote systems interoperability, increase modularity & test coverage, reduce technical debt, and enable more agile change management through the best engineering practices.You will spearhead these efforts for the data analytics & engineering team, while working in coordination with devops and other software engineering teams.

In this role, a typical day might include the following:

  • Develop and maintain core software engineering infrastructure within data platform engineering & operations, including SDLC, CI/CD, data QA & logging, performance benchmarking, pipeline orchestration, and multi-platform integration
  • Manage code repositories, code reviews, dev/test/qa/prod environments, etc.
  • Contribute to multiple software engineering efforts, including data engineering/analytic pipelines, data platforms, custom APIs & SDKs, and open-source genomics projects (eg. Project Glow)
  • Coordinate with other development and devops teams to promote collaborative engineering efforts, interoperability, and shared infrastructure
  • Monitor production systems and code bases to identify inefficiencies and remove bottlenecks
  • Support internal & external R&D projects, working with domain experts to mature POCs into production-quality tools

This job might be for you if:

  • You have a passion to bring state-of-the-art technology to data in the health & life sciences domain.
  • With your sleeves rolled up, you work on current problems while thinking of future solutions.
  • You enjoy collaborating with multiple cross-functional teams of scientists, analysts, and engineers.

To be considered for this role you must have a B.S. or higher degree in computer science, software engineering, bioinformatics, or a related field. Strong understanding of software engineering principles and demonstrated experience contributing to and maintaining multi-contributor software projects (open or closed-source), pipelines, and/or enterprise systems required. The candidate should have experience with modern software development tools (version control, CI/CD, IDEs, containers, and debuggers/profilers) and have the ability to work in multiple languages (Python, Scala, C/C++, Java, shell, SQL). Exposure to cloud computing environments (AWS preferred) and technologies in the data & analytics engineering domain (Spark, Databricks, Airflow, data lakes, data QA tools, ML tools) are a plus. Additionally, domain knowledge in genomics, bioinformatics, or statistical genetics is a plus, but not required. Need to be able to collaborate with other engineers and interface with cross-functional teams of developers, architects, and scientists.


Does this sound like you? Apply now to take your first steps toward living the Regeneron Way! We have an inclusive and diverse culture that provides amazing benefits including health and wellness programs, fitness centers and stock for employees at all levels!

Regeneron is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion or belief (or lack thereof), sex, nationality, national or ethnic origin, civil status, age, citizenship status, membership of the Traveler community, sexual orientation, disability, genetic information, familial status, marital or registered civil partnership status, pregnancy or maternity status, gender identity, gender reassignment, military or veteran status, or any other protected characteristic in accordance with applicable laws and regulations. We will ensure that individuals with disabilities are provided reasonable accommodations to participate in the job application process. Please contact us to discuss any accommodations you think you may need. role.