Software Engineer - Scientific Pipelines

insitro inc
South San Francisco, CA
Oct 07, 2021
Biotech Bay
Required Education
Bachelors Degree
Position Type
Full time
The Opportunity
Data Engineering plays a key role in insitro’s approach to rethinking drug development. Our team is responsible for ensuring our biological data factory’s robots and instruments produce high quality data, optimizing storage, queries, and ingestion of petabytes of experimental results. On top of this stack, we build the infrastructure that our machine learning engineers and scientists leverage to train powerful models that solve key problems in the drug development process.

You will work closely with a cross-functional team of scientists, bioengineers, and data scientists to identify areas where data engineering can make a difference, by developing data architectures and systems on the high throughput platforms that enable our scientists to be maximally productive. You will design, implement, and deploy novel methods that use a broad spectrum of data engineering approaches, including techniques at the forefront of the field. You will work as part of a team to rigorously design our data platform, identify key architectural performance improvements and support ongoing discovery and automation platforms.

Here are some examples of the style of project you can expect to work on:
- Design cheminformatic tools and pipelines that enable us to wrangle multi-billion molecule screening libraries.
- Build image processing pipelines that transform raw microscopy data into phenotype predictions through our machine learning processes.
- Architect data cataloging and provenance tracking solutions that accelerate multidisciplinary scientific teams.

You will be joining as the founding team of a biotech startup that has long-term stability due to significant funding, but yet is very much in formation. A lot can change in this early and exciting phase, providing many opportunities for significant impact. You will work closely with a very talented team, learn a broad range of skills, and help shape insitro’s culture, strategic direction, and outcomes. Join us, and help make a difference to patients!

About You
- BS, MS, or Ph.D. in computer science, statistics, mathematics, physics, engineering, or equivalent practical experience.
- Expertise in one or more general-purpose programming languages such as Python, C/C++, or Go. We primarily use Python.
- Demonstrated ability to write high-quality, production-ready code with well-designed APIs.
- Familiarity with cloud computing services. We use AWS.
- Familiarity with database technologies, data pipelines, workflow engines, distributed computing technologies such as Spark. We primarily use Postgres and Spark.
- Familiarity with web services and application frameworks such as Django and Flask.
- Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions.
- Proficiency in Linux environments including shell scripting and experience with version control practices and tools such as git.
- Passion for making a positive impact through your work.

Nice to Have
- Experience with biological data such as DNA sequences, RNAseq, proteomics and microscopy images.
- Experience with medium-sized data sets (100TB+)
- Experience with the SciPy/PyData ecosystem (numpy, pandas, scipy, dask, etc.)
- Demonstrated ability to develop novel data engineering methods that go beyond putting together of existing code, and to apply problem-solving skills to complex issues.
- 4+ years of real-world work experience in software development for high-end data processing engines.

Benefits at insitro
- Excellent medical, dental, and vision coverage
- Open vacation policy
- Team lunches (catered daily)
- Commuter benefits
- Paid parental leave

About insitro
insitro is a drug discovery and development company using machine learning and data generation at scale to transform the way that drugs are discovered and delivered to patients. We rely on human genetic cohorts, human-derived cellular disease models, and high-throughput biology and chemistry to identify coherent patient segments, actionable therapeutic targets, and new or existing chemical matter. The goal is to deliver predictive insights to improve the probability of success and reduce the number of costly dead ends along the R&D journey. The company has established enabling collaborations with Gilead in NASH and Bristol Myers Squibb in ALS and is building a pipeline of wholly owned and partnered medicines leveraging its unique insights on patient biomarkers, targets, and molecules. insitro is located in South San Francisco, CA and has raised over $600M from top tech, biotech, and crossover investors since formation in 2018. For more information on insitro, please visit the company’s website at