RGC Data Architect

Tarrytown, New York
Dec 22, 2021
Required Education
Masters Degree/MBA
Position Type
Full time

The Genomic Informatics & Data Engineering team seeks a Data Architect who will be the primary steward of the RGC’s Enterprise Data strategy. These data, spanning more than one-million samples worth of genomic variants, clinical and research phenotypes, genotype/phenotype association results and the entirety of the RGC’s production metadata, are generated by the RGC’s multi-disciplinary teams at an unprecedented scale through a spectrum of automated and manual processes before ingestion into RGC data repositories and ultimate consumption by RGC Enterprise Research Applications, internal researchers and more than 100 partner institutions.

As the RGC Data Architect, you will be ultimately responsible for the development, communication and maintenance of a consensus Enterprise Data strategy that embodies FAIR principles, ensuring that all RGC Enterprise Data are robustly structured, scalable, validated and available. You will provide deep technical expertise and hands-on support to RGC team leads for the design, implementation and execution of data systems and protocols that manifest the Enterprise Data strategy.

The primary responsibilities of this role are to:

  • Develop and maintain the Enterprise Data strategy: a FAIR data lifecycle, including data ingestion, validation, persistence, access and deletion, within a change-management infrastructure.
  • Continually engage all RGC teams, building consensus for new policies and communicating the strategic vision.
  • Continually audit all RGC systems for compliance, identifying areas requiring support.
  • Coordinate compliance support with Operations and Technical leads.
  • Support Operations team leads (Sequencing Lab, LIMS, Genome Informatics, Clinical Informatics) in the integration of RGC Production metadata management and workflows and the migration of legacy data into FAIR infrastructure.
  • Support Engineering team leads (Data Engineering, Software Engineering) in the design and development of RGC data infrastructure, comprising independent platforms, a unified data schema, and standardized APIs.
  • Support Research teams leads (Genetics, Therapeutics) in the transition of R&D processes to FAIR production.
  • Generate complete documentation and SOPs.
  • Ensure all RGC data and processes are optimized for consumption by Enterprise Research Applications.
  • Innovate logistic solutions to the unique challenges of the RGC’s scale, speed and data composition.

This job might be for you if

  • You are a proven consensus-builder of data strategies with deep experience coordinating multi-functional projects.
  • You excel at shepherding new ideas and technologies through rigorous vetting processes, providing both technical assessments and constructive strategic feedback.
  • You have excellent written and verbal communication skills with expertise in documentation and visual presentations.

To be considered for this role you must have Master’s degree or equivalent experience in Computer Science, Software Engineering, Computer Engineering or similar field. Extensive technical experience in data modeling, visualization, ingestion, automation and validation. Experience with production-scale data and metadata management. Experience with distributed compute (Spark, Databricks, Hadoop), enterprise cloud services (AWS, Kubernetes, DNAnexus), and workflow management tools (e.g. WDL, CWL, Nextflow). Experience with industry-standard enterprise database technologies (SQL, Sapio LIMS, data warehousing). Proven dedication to customer service and hands-on troubleshooting. Experience with large-scale sequencing and genomics processes, preferred. Detailed knowledge of biological science data formats, preferred. Knowledge of compliance environments (e.g. GDPR, FISMA, CAP/CLIA), preferred.


Does this sound like you? Apply now to take your first steps toward living the Regeneron Way! We have an inclusive and diverse culture that provides amazing benefits including health and wellness programs, fitness centers and stock for employees at all levels!

Regeneron is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion or belief (or lack thereof), sex, nationality, national or ethnic origin, civil status, age, citizenship status, membership of the Traveler community, sexual orientation, disability, genetic information, familial status, marital or registered civil partnership status, pregnancy or maternity status, gender identity, gender reassignment, military or veteran status, or any other protected characteristic in accordance with applicable laws and regulations. We will ensure that individuals with disabilities are provided reasonable accommodations to participate in the job application process. Please contact us to discuss any accommodations you think you may need.