This job has expired

You will need to login before you can apply for a job.

Informatics Data Engineer

Regeneron Pharmaceuticals, Inc.
Tarrytown, New York
Start date
Feb 1, 2023

Job Details

The Informatics Data Engineer will be responsible for developing and maintaining highly scalable and reliable data management pipelines, tools, and centralized databases for conducting analyses involving clinical and phenotypic data with an overarching goal of improving clinical phenotyping. You will design, implement, automate and maintain the ETL pipelines that facilitate approaches for extracting and analyzing large-scale phenotypic datasets, including de-identified EHR data from external collaborators, targeted clinical datasets in selected cohorts, and internal datasets from clinical trials and other human subject research. You will work with analysts, clinical scientists, software developers, and programmers to provide the best data management technology solutions to store, standardize, structure and mine the clinical and phenotypic data sets

As an Informatics Data Engineer, a typical day might include:

  • Develop tools and pipelines, and optimize internal data processes for extraction, curation, processing, and storage of clinical data.

  • Develop code and automate production data ETL and quality assurance pipelines.

  • Develop, update, and maintain standards and procedures for data and databases access, storage, versioning, and maintenance.

  • Use a customer-focused approach to provide data extraction and storage solutions driven by scientific use cases.

  • Function as a "super user" of data reporting, analysis, and management processes and tools.

  • Build and maintain data standardization and optimization solutions to support the use of AL/ML for advanced phenotyping.

  • Conduct data analysis, including mining and curating phenotypic datasets with primary responsibility in developing, identifying, and standardizing clinical phenotypes and cohorts of interest for "phenotype first" genomic analysis of associated samples and efficient data mining and association analysis in both phenotype first and genotype first queries.

  • Maintain close collaboration and coordination with external health system collaborators and informatics teams mining EHR and phenotypic data sets. Work with these collaborators to structure data and develop algorithms, rules engines, and querying tools to access and curate phenotypic datasets.

This role might be for you if:

  • You are a data steward.

  • You are interested in data management, mining, clinical databases, and hospital health informatics databases.

  • You can multitask and manage simultaneous projects to meet deadlines with strong attention to detail.

  • You possess the ability to interpret and communicate analytical information clearly and concisely.

  • You have exceptional analytical, organizational, and quantitative problem-solving skills and a willingness to learn and acquire new skills.

  • You excel at managing relationships and projects involving diverse partners.

  • You communicate findings clearly and document work for training and replication purposes.

To be considered for this role, you must have a bachelors or master’s (preferred) degree in Computer Science, Information Science, informatics, or other relevant data engineering field, and a minimum of 3 years of working experience in data engineering and management, ETL pipeline development, AWS Glue, automation and management. Healthcare and EHR data management experience is preferred. Familiarity with data mining, clinical databases, and hospital health informatics databases, including EHR data structures. Familiarity with clinical data standards such as ICD, SNOMED, LOINC, and OMOP, database architecture and administration. PySpark, Spark and Databricks experience, a plus. Experience with HIPAA and experience with IRB protocols around the use of EHR data. Experience working with investigators and scientist to understand the data requirement and provide the best automated solution for data extraction and consumption. Involvement in relevant programs such as PCORnet, eMERGE, HMO Research Network or other such projects is a plus. Demonstrated understanding of relational database concepts and querying tools. Experience with CI/CD framework. Working knowledge of programming languages such as Python and R. Experience with cloud computing services such as AWS and GCP. Experience with agile methods (Scrum, Kanban) and tools such as Atlassian JIRA and Microsoft Teams foundation server. The level is commensurate with education and experience.



Does this sound like you? Apply now to take your first steps toward living the Regeneron Way! We have an inclusive and diverse culture that provides comprehensive benefits including health and wellness programs, fitness centers and equity awards, annual bonuses, and paid time off for eligible employees at all levels!

Regeneron is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion or belief (or lack thereof), sex, nationality, national or ethnic origin, civil status, age, citizenship status, membership of the Traveler community, sexual orientation, disability, genetic information, familial status, marital or registered civil partnership status, pregnancy or parental status, gender identity, gender reassignment, military or veteran status, or any other protected characteristic in accordance with applicable laws and regulations. We will ensure that individuals with disabilities are provided reasonable accommodations to participate in the job application process. Please contact us to discuss any accommodations you think you may need.

The salary ranges provided are shown in accordance with U.S. law and apply to U.S. based positions, where the hired candidate will be located in the U.S. If you are outside the U.S, please speak with your recruiter about salaries and benefits in your location.

Salary Range (annually)

$114,900.00 - $187,500.00


Regeneron is a leading biotechnology company that invents life-transforming medicines for people with serious diseases. Founded and led for 30 years by physician-scientists, our unique ability to repeatedly and consistently translate science into medicine has led to seven FDA-approved treatments and numerous product candidates in development, all of which were homegrown in our laboratories. Our medicines and pipeline are designed to help patients with eye disease, allergic and inflammatory diseases, cancer, cardiovascular and metabolic diseases, infectious diseases, pain and rare diseases.
Regeneron is accelerating and improving the traditional drug development process through our proprietary VelociSuite® technologies, such as VelocImmune® which produces optimized fully-human antibodies, and ambitious research initiatives such as the Regeneron Genetics Center, which is conducting one of the largest genetics sequencing efforts in the world.

Stock Symbol: REGN

Stock Exchange: NASDAQ

FacebookTwitterInstagramYouTube Logo

Find Us
Regeneron Pharmaceuticals, Inc.
Corporate Headquarters
777 Old Saw Mill River Road
New York

Get job alerts

Create a job alert and receive personalized job recommendations straight to your inbox.

Create alert