Skip to main content

This job has expired

You will need to login before you can apply for a job.

Senior Data Engineer

Employer
Formation Bio
Location
New York, NY
Start date
Sep 29, 2022

Job Details

About TrialSpark

TrialSpark is a technology-driven drug development company that runs end-to-end clinical trials, focused on bringing new treatments to patients faster and more efficiently.

The biggest bottleneck in bringing new treatments to patients is the clinical trial. On average, getting a drug through the trial process takes nearly a decade and frequently costs $1B+. To combat this industry problem, TrialSpark is building a technology platform that optimizes all aspects of a clinical trial, enabling more efficient trial design, faster trial completion, and higher trial data quality.

TrialSpark recently raised their Series C, and is putting the capital to work by in-licensing and co-developing drug programs through in-house development, joint ventures, and NewCos. Together with doctors, patients, and communities, TrialSpark is working to develop the treatments of tomorrow.

About the Position

As a Senior Data Engineer on our Data Platform Team, you will own TrialSpark’s data infrastructure. You will make key data architecture decisions and lead significant greenfield initiatives to implement the next generation of our data platform and pipelines. You’ll enable product and analytical teams with timely, high quality data from diverse sources including application databases, partner Electronic Health Record systems, and medical devices. You’ll become a domain expert in clinical data and its application to products and operations across the company. 

In this role you’ll collaborate with Product Engineers and our Infrastructure Team to build data transformations that unlock efficiencies in the clinical trial process, for example to ingest hundreds of millions of complicated health records, clean and structure this data for analytical and product use cases, and identify patients that will be served by previously inaccessible treatments. You’ll partner with the Analytics, Product, and Medical teams to set and achieve targets for data quality and latency, and build a learning feedback loop to move these needles over time. As a founding member of the Data Platform team, you will play a significant role in developing the team’s culture and strategy. Ultimately, you will leverage data to bring treatments to patients who may not have had access otherwise.

Responsibilities
  • Evolve our infrastructure and data architecture to accommodate growing business needs and data volume and complexity
  • Collaborate with operational and product partners to achieve business and mission outcomes
  • Design and build data pipelines to clean and structure clinical data
  • Deploy tools to continuously monitor, test, and optimize data pipelines to ensure timely delivery and high data quality
  • Partner with Analytical stakeholders to assess the quality of our data and automate targeted improvements
  • Safeguard patient privacy and trial data integrity by Implementing data privacy and security, for example by implementing de-identification of Personally Identifiable Information
  • Partner with our Analytics team to maintain and evolve our modern data stack as necessary (Looker, Redshift, DBT, Stitch)
  • Help enforce best practices and promote testability and maintainability throughout our systems and codebase

Qualifications

  • Three or more years of professional software development experience preferably in a data-oriented role (e.g. Data Engineer)
  • Professional experience building and maintaining data pipelines (e.g. Airflow, Prefect, Luigi, AWS Glue or Batch)
  • Fluency in SQL and at least one other programming language (Python preferred)
  • Strong knowledge of data modeling with a track record of creating simple models to solve for nuanced product needs from complex data
  • Experience architecting data systems
  • Comfortable with Linux, Docker, and cloud technologies
  • Excellent problem solving and debugging skills
  • Strong written and verbal communication skills with the ability to convey complicated systems to both technical and non-technical audiences

Nice to have

  • Experience building cross functional feedback loops
  • Experience with infrastructure as code tools (e.g. Terraform, Ansible, Pulumi)
  • Experience performance tuning row-based (e.g. PostgreSQL) and columnar (e.g. Redshift) data stores
  • Experience working in a regulated environment (Healthcare, Finance, etc.)
  • Experience working with healthcare data (Electronic Health Records, Insurance Claims, etc.)
  • B.S. in Computer Science or related field

You will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.

Company

Formation Bio is a tech-driven pharma company differentiated by radically more efficient drug development. Formation Bio has built a technology platform that optimizes all aspects of drug development, enabling more efficient trial design, faster trial completion, and higher quality trial data capture.

Formation Bio acquires clinical-stage drugs from pharma and biotech and develops them faster and more efficiently, unlocking greater value per program and accelerating access to new treatments for patients.

Join our culture of innovation where your work directly contributes to transforming patient care in areas such as rheumatology, dermatology, CNS, and cardiometabolic diseases. Our dynamic environment blends advanced technology with strategic drug development, speeding up the delivery of new treatments. Here, every role plays a part in our mission to bring new treatments to patients faster and more efficiently.

Company info
Website
Phone
+1 510-545-3803
Location
16 East 34th Street floor 10
New York
NY
10016
United States

Get job alerts

Create a job alert and receive personalized job recommendations straight to your inbox.

Create alert