Spark R&D Developer - Tarrytown, NY | Biospace
Get Our FREE Industry eNewsletter

Spark R&D Developer

Regeneron Pharmaceuticals, Inc.

Tarrytown, NY
Posted Date:
Position Type:
Full time
Job Code:
Required Education:
Masters Degree
Areas of Expertise Desired:

Job Description

The Regeneron Genetics Center is a wholly-owned subsidiary of the Company organized to collaborate with health systems and research groups to elucidate, on a large scale, genetic factors that cause or influence a range of human diseases. Building upon Regeneron's strengths in mouse genetics and genetics-driven drug discovery and development, the Center will specialize in ultra-high-throughput exome sequencing and computational biology; discovery of genotype-phenotype associations through linkage to well-annotated de-identified patient electronic medical records; and validation of discoveries using Regeneron's VelociGene® technology. Our interests encompass a breadth of different areas such as Mendelian and family frameworks, large-scale population genetics (both common and rare variants), and gene-gene interactions. Program goals include target discovery, indication discovery, and patient-disease stratification. Objectives include advancing basic science around the world through public sharing of discoveries, providing clinically-valuable insights to physicians and patients of collaborating health-care systems, and identifying novel targets for drug development.

We are looking for an R&D Spark Developer to join the Genome Informatics team to expand the RGC's big data infrastructure and develop new algorithms/tools to support various workflows/analyses throughout the RGC and Regeneron. Specifically, the candidate will implement solutions within our Databricks Apache Spark ecosystem, collaborating closely with various team members at the RGC to (i) establish efficient data representations for genotypes, phenotypes and association results, (ii) implement scalable production workflows, and (iii) develop novel machine learning approaches to uncover new relationships between genotypes and phenotypes.
The ideal candidate will have a strong background in computer science specializing in distributed systems and/or machine learning, experience in analyzing large datasets, and have strong communication skills as this job requires collaboration among multiple cross-functional teams.
This position will provide exciting opportunities to work on the bleeding edge of genome informatics and genomic medicine. The RGC hosts a vast amount of data encompassing thousands of phenotypes derived from electronic medical records, integrated with genomic data. Together, these represent a landmark collection of information that will move precision medicine and novel therapeutic discovery forward as a new data-driven paradigm in healthcare.

* Build out a big data distributed architecture capable of efficiently processing terabytes of genomic and clinical data
* Develop algorithms and tools to analyze large data sets consisting of billions of rows
* Develop and deploy machine learning algorithms
* Develop new web applications used by Regeneron scientists to analyze genomic and clinical datasets
* Build automation around various components of the system
* Interact and collaborate with other scientists to clearly define and iterate on requirements
* Keep abreast of new state-of-the-art software technologies and best-practices including: Spark, Hadoop, various NoSQL databases, AWS, React, and Functional Programming


This position requires a MS (Ph.D. preferred) with 3 or more years of experience in computer science specializing in distributed systems and/or machine learning.

Additional requirements include:

* Expertise in large distributed systems, such as Spark, Hadoop, or related frameworks/databases is essential
* 3+ years of software engineering experience in a modern Object Oriented or Functional language (e.g., Scala)
* Experience in developing and applying machine learning algorithms
* Experience with client side software development (e.g., HTML, JavaScript, CSS, D3)
* Excellent communication and presentation skills required
* Working knowledge of SQL
* Experience with cloud computing (AWS preferred)
* Familiarity with genomics and bioinformatics is preferred, but not required

Level is commensurate with experience

This is an opportunity to join our select team that is already leading the way in the Pharmaceutical/Biotech industry. Apply today and learn more about Regeneron Genetics Center's unwavering commitment to combining good science & good business.

RGC is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability status, protected veteran status, or any other characteristic protected by law.