Manager, Data Engineer

Tarrytown, New York
Jun 08, 2022
Required Education
Masters Degree/MBA
Position Type
Full time

Known for its scientific and operational excellence, Regeneron is a leading science-based biopharmaceutical company that discovers, invents, develops, manufactures, and commercializes medicines for the treatment of serious medical conditions. Regeneron commercializes medicines for eye diseases, high LDL-cholesterol, atopic dermatitis and a rare inflammatory condition and has product candidates in development in other areas of high unmet medical need, including rheumatoid arthritis, alloimmunity, solid organ transplant, asthma, pain, cancer and infectious diseases.


Regeneron is advancing its pipeline portfolio using a data-driven human translational approach. As part of this initiative, the Precision Medicine team in Early Clinical Development & Experimental Sciences (ECD&ES) is integrated with discovery research teams to promote early adoption of biomarker strategies to guide clinical translation. Precision Medicine collaborates with Clinical Sciences colleagues to develop the clinical experimental sciences (CES) studies for generating clinically testable, mechanistic hypotheses for each molecule. Precision Medicine leads the biomarker strategy and execution to achieve proof of mechanism/concept in early clinical trials.  The team also implements Precision Medicine strategies to support late-stage clinical programs and companion diagnostics. Our team includes Precision Medicine Strategy Leads, Precision Medicine Operations specialists, as well as quantitative analytical and companion diagnostics scientists. 

Precision Medicine, Quantitative Translational Sciences is looking for a data engineer who can assist in Precision Medicine initiatives to inform clinical studies through analytical, computational, and novel scientific/technological strategies. 

This position is responsible for data extraction, transformation, and publishing of clinical, operations, and research data.This will include leading the development of tools and dashboards for querying, visualizing, and cleaning data and working collaboratively with internal and external groups at Regeneron to assure data integrity. In this role, the individual will be responsible for creating automated data pipelines, implementing best practices for data analytics, architecting database solutions, and offering strategic insights based on data queries. This will require an individual with exceptional attention to detail, advanced computational and programming skills, and the capacity to rapidly develop and act upon in-depth knowledge of our business, products, and processes. 

This will include cross-functional work on projects with many internal stakeholders (data management, sample management, research and clinical IT, PM operations, bioinformaticians, data scientists, and research scientists) and external collaborators for the quantitative needs of Precision Medicine.


  • Help to build required infrastructure for optimal extraction, transformation and loading of data from various data sources and formats
  • Writes complex ETL (Extract / Transform / Load) processes, designs database systems and develops tools for real-time and offline analytic processing.
  • Own the development, and maintenance of scalable solutions for ongoing metrics, reports, analyses, dashboards,etc.
  • Work cross-functionally to optimize the organization of raw and processed data from multiple sources to facilitate downstream data retrieval and integrated data analysis
  • Build data standardization pipelines that clean, transform, and aggregate data from disparate sources including large, complex sets of data
  • Develops and maintains scalable data pipelines and builds out new API integrations to support continuing increases in data volume and complexity.
  • Identify opportunities to design and implement internal process improvements including re-designing infrastructure for greater scalability, optimizing data delivery, and automating manual processes.  
  • Streamline processes to automate data summary reports and data quality checks to ensure data integrity. 
  • Collaborate with stakeholders including data architects, clinical and research scientists, and clinical operations to resolve data-related technical issues
  • Work closely with our data scientists to help formulate complex algorithms that extract novel and meaningful insights from multiple data sources.  
  • Build analytical tools to utilize the data pipeline, providing actionable insight into key business performance metrics including operational efficiency
  • Works closely with all business units and engineering teams to develop strategy for long term data platform architecture.
  • Work with data to solve business problems, building and maintaining the infrastructure to answer questions and improve processes

Required Skills and Qualifications

  •  Master’s degree in computer science, information technology, engineering, data science, or a related technical field
  • 3+ years of experience with Python, R, R Shiny, Spark, SQL and/or AWS and data visualization/exploration tools
  • Experience designing, building, and maintaining data processing systems
  • Demonstrated ability designing & optimizing queries to build scalable, modular, efficient data pipelines
  • Built end-to-end scalable data pipelines
  • Communication skills, especially explaining technical concepts to non-technical business leaders
  • Data Modeling, Relational and Dimensional
  • Comfort working in a dynamic, research-oriented team with concurrent projects
  • Be an advocate for best practices and continued learning