Data Engineering Intern

insitro inc
South San Francisco, CA
Oct 06, 2021
Biotech Bay
Required Education
Bachelors Degree
Position Type
Full time
The Opportunity
Data Engineering plays a key role in insitro’s approach to rethinking drug development. Our team is responsible for ensuring our biological data factory’s robots and instruments produce high quality data, optimizing storage, queries, and ingestion of petabytes of experimental results. On top of this stack, we build the infrastructure that our machine learning engineers and scientists leverage to train powerful models that solve key problems in the drug development process.

This summer we are looking for highly motivated interns looking to work at this intersection of software engineering and the life sciences. You will be paired directly with an engineering mentor and lead a project from inception to prototype over the course of the summer.

Here are some examples of the style of project you can expect to work on:
- Design and implement an extraction and transformation pipeline for a new microscope.
- Integrate a new physical instrument into our robotic automation stack.
- Add a feature to our internal ML experimentation system to improve performance profiling of multi-GPU training jobs.

Beyond your primary project, you will work closely with a talented team of engineers and scientists, learn a broad range of skills, and have the opportunity to present your work to the rest of the company.

Join us, and help make a difference to patients!

About You
- Working towards a BS, MS, or Ph.D. in an engineering, mathematics or life sciences discipline.
- Experience in one or more general-purpose programming languages. We primarily use Python.
- Ability to write high-quality code as demonstrated by prior experience, Github account or personal webpage.
- Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions.
- Passion for making a difference in the world.

Nice to Have
- Experience with biological data (e.g. DNA sequences, RNAseq, proteomics, microscopy)
- Experience in Linux environment, database languages (e.g., SQL, No-SQL) and version control practices and tools such a Git or Mercurial.
- Familiarity with the SciPy/PyData ecosystem (numpy, pandas, scipy, dask etc.).
- Familiarity with web services and application frameworks (Django, Flask).
- Familiarity with cloud computing services (AWS or GCP).
- Familiarity with data pipelines, workflow engines, distributed computing technologies (Spark, Hadoop, etc)..

Benefits at insitro
- Excellent medical, dental, and vision coverage
- Open vacation policy
- Team lunches (catered daily)
- Complementary onsite barista and snacks 
- Commuter benefits
- Paid parental leave
- Flexible work schedule (on site and remote)

We are fortunate to have the strong support from the top investors in both biotech and tech: ARCH Ventures, Foresite Capital, A16Z, GV, and Third Rock Ventures. We are building a remarkable team that embodies a new type of culture, one based on a true partnership between scientists, engineers, and data scientists. Together we are working to define the problems, design experiments, analyze the data, and derive the insights that will lead us to new therapeutics. Join us, and help make a difference to patients!

About insitro
insitro is a drug discovery and development company using machine learning and data generation at scale to transform the way that drugs are discovered and delivered to patients. We rely on human genetic cohorts, human-derived cellular disease models, and high-throughput biology and chemistry to identify coherent patient segments, actionable therapeutic targets, and new or existing chemical matter. The goal is to deliver predictive insights to improve the probability of success and reduce the number of costly dead ends along the R&D journey. The company has established enabling collaborations with Gilead in NASH and Bristol Myers Squibb in ALS and is building a pipeline of wholly owned and partnered medicines leveraging its unique insights on patient biomarkers, targets, and molecules. insitro is located in South San Francisco, CA and has raised over $600M from top tech, biotech, and crossover investors since formation in 2018. For more information on insitro, please visit the company’s website at