Senior Data Engineer

Sleepy Hollow, NY, United States
Apr 09, 2019
Required Education
Bachelors Degree
Position Type
Full time
Known for its scientific and operational excellence, Regeneron is a leading science-based biopharmaceutical company that discovers, invents, develops, manufactures, and commercializes medicines for the treatment of serious medical conditions. Regeneron commercializes medicines for eye diseases, high LDL-cholesterol, atopic dermatitis and a rare inflammatory condition and has product candidates in development in other areas of high unmet medical need, including rheumatoid arthritis, asthma, pain, cancer and infectious diseases.


Imagine - A health Blue Print to increase patient outcome, is what we are after.... "Make Great Medicine. And then do it again " is what we are passionate about. Our team provides the data engineering muscle needed to make sense of petabytes of data. Our relentless focus on enabling research to make data-driven decision as we scale the science.

You will focus on building scalable Machine Learning & Search Infrastructure for accelerating Regeneron innovation for ' Search Algorithms, Recommendation, data classification & much more . In this role, you will have the opportunity to build the next generation ML infrastructure to help us scale our analytic capabilities as we discover and make medicine for our patient. You will be working closely with our researchers, product manager and engineers in the personalization and recommendation domain to help them scale their adhoc exploration and to execute hypothesis.

As a Principle Data Engineer in our team you would help solve problems like

• Real-time streaming infrastructure: To enable teams to move quickly, getting accurate data with minimal delay is a core focus in data integration.

• Machine learning infrastructure: Many products at Regeneron rely on machine learning (ML) to achieve their goals, and are in the process of developing a common infrastructure for ML that saves significant development time for the company

• Data Discovery & cataloging - help accelerate our meta-data and search capabilities

• Interactive analysis: our user has a strong need to query data and compute aggregates on various dimensional cuts. To address this, we are building a query tool based on Druid , Jupiter Notebook.. to allow users to interactively slice-and-dice large datasets.

• Data Pipeline Services - we are building tools to enable user to self-service & schedule data related workflow using Kylo , Airflow & AWS data services.


• Dive deep into Regeneron Data Services in the Cloud and Post-production initiatives and data.

• Build incredibly valuable Data Platform Services that will be leveraged across Regeneron Pharmaceuticals Inc..

• Creatively explore how to use data to continually add value to Regeneron. Translate data questions into flexible methodologies that scale to answer broad problems across the organization.

• Be a bridge between data engineering and the business, enabling insight that can empower better decision-making.

• Be comfortable outside of your comfort zone - explore new tech, make your own tool, or find a new way to address an old problem.


• 7+ years relevant engineering work experience

• Working with data at the petabyte scale

• Design and operation of robust distributed systems

• Experience with Java, Python / Scala is preferred

• Prior experience developing using AWS Big-Data Services

• Experience using Hadoop/Spark tech for acquiring, processing data ( Structure, Unstructured, semi-structure, document , video, audio and gnomic data )

• Strong scripting ability in Ruby / Python / Bash

• Domain expertise in working with complex high volume data pipelines, data visualization, and utilizing the data for large scale machine learning applications.

• Working knowledge of relational databases and query authoring (SQL)

• Love to use and develop open source technologies like Kafka, Hadoop, Hive, Presto, and Spark

• General Understanding of contextual bandits framework for explore/exploit data pipelines and serverless computing.

• Rigor in high code quality, automated testing, and other engineering best practice

• BS/MS in Computer Science or a related field (ideal)

• Bio-Tech & Pharma industry experience is a plus

Minimal Educational Requirements:
Bachelor's Degree (or 7 years equivalent experience)

This is an opportunity to join our select team that is already leading the way in the Pharmaceutical/Biotech industry. Apply today and learn more about Regeneron's unwavering commitment to combining good science & good business.

To all agencies: Please, no phone calls or emails to any employee of Regeneron about this opening. All resumes submitted by search firms/employment agencies to any employee at Regeneron via-email, the internet or in any form and/or method will be deemed the sole property of Regeneron, unless such search firms/employment agencies were engaged by Regeneron for this position and a valid agreement with Regeneron is in place. In the event a candidate who was submitted outside of the Regeneron agency engagement process is hired, no fee or payment of any kind will be paid.

Regeneron is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability status, protected veteran status, or any other characteristic protected by law.