Senior Software Engineer - ML Research Platform

Location
Working from Home
Posted
Jul 09, 2021
Ref
142
Required Education
Bachelors Degree
Position Type
Full time

The Opportunity

insitro’s machine learning research platform is central to our approach to rethinking drug discovery. Our tools empower a team of 25+ data scientists and engineers to conduct cutting-edge applied ML research with diverse types of biological data. We provide the foundations for reproducible ML experimentation, including frameworks for defining and running experiments, tools for experiment tracking and hyperparameter search, and primitives for constructing inference pipelines. We also develop tooling to support rapid experimentation in Jupyter notebooks by diverse sets of users including both software engineers and wet lab scientists. Our tools directly enable insitro’s data science and ML engineering teams to train and evaluate ML models on multi-petabyte collections of biological data spanning high content imaging, functional genomics, and biomolecular structures. You will work as part of a team to define, build, and improve key components of insitro’s ML experimentation platform, elevating the rigor and efficiency of ML research company-wide.


This is a unique role that sits at the interface of insitro’s software engineering and data science teams, weaving together the fabric of tools, systems, and interfaces that enable ML-powered discoveries from our large-scale biological data collections. An ideal candidate has experience implementing and training ML models allowing them to relate to ML researchers, while also having significant software engineering craft enabling them to design and implement extensible ML systems. The role does not focus on conducting research but rather developing novel tools and capabilities that enable researchers to be more productive. While not required, some knowledge of biological or chemical data is especially valuable in understanding the unique requirements and applications of ML to biology and drug discovery.


You will be joining a biotech startup that has long-term stability due to significant funding, providing many opportunities for meaningful impact. You will work closely with a very talented team, learn a broad range of skills, and help shape insitro’s culture, strategic direction, and outcomes. Join us, and help make a difference to patients!


About You

  • BS, MS, or Ph.D. in computer science, statistics, mathematics, physics, engineering, or equivalent practical experience
  • Expertise in one or more general-purpose programming languages (strong preference for significant experience in scientific Python; Java, Scala, C/C++, and Go are also relevant) 
  • Demonstrated ability to critique, design and implement ML abstractions that balance experimental flexibility with constraints that enable reusability and portability
  • Experience training DNNs in PyTorch or TensorFlow, including knowledge of key performance metrics for common tasks, diagnosing learning curves, and troubleshooting optimization dynamics
  • Familiarity with current approaches to distributed training and inference
  • Knowledge of performance characteristics of modern GPUs and other hardware accelerators, experience troubleshooting CUDA/cuDNN/GPU drivers running in containers, and experience with profiling GPU code to identify potential performance improvements 
  • Ability to empathize with diverse ML platform users, balancing proposing pragmatic fixes to support short-term experimental iteration with identifying non-obvious underlying needs and designing longer-term solutions
  • Comfort with the ambiguity and changing requirements of supporting early-stage ML research
  • Ability to identify and lead redesigns of ML code to support reusability, robustness and readability
  • Experience making buy-vs-build decisions and evaluating third-party ML tools (commercial and/or open source), and exposure to managing relationships with software vendors
  • Passion for making a difference in the world


Nice to Have

  • Experience with optimizing datasets and file formats for ML use cases (e.g. HDF5, Parquet, Zarr, etc), and/or using database or distributed query systems (e.g. PostgreSQL/MySQL, Presto/Athena/BigQuery, etc)
  • Experience with image, molecular structure, genetic, or genomic data modalities
  • Previous open-source contributions or publications demonstrating impact in relevant projects


Benefits at insitro

  • Excellent medical, dental, and vision coverage
  • Open vacation policy
  • Team lunches (catered daily)
  • Commuter benefits
  • Paid parental leave
  • Complementary onsite barista and snacks
  • Flexible work schedule (on site and remote)


About insitro

insitro is a drug discovery and development company using machine learning and data generation at scale to transform the way that drugs are discovered and delivered to patients. We rely on human genetic cohorts, human-derived cellular disease models, and high-throughput biology and chemistry to identify coherent patient segments, actionable therapeutic targets, and new or existing chemical matter. The goal is to deliver predictive insights to improve the probability of success and reduce the number of costly dead ends along the R&D journey. The company has established enabling collaborations with Gilead in NASH and Bristol Myers Squibb in ALS and is building a pipeline of wholly owned and partnered medicines leveraging its unique insights on patient biomarkers, targets, and molecules. insitro is located in South San Francisco, CA and has raised over $600M from top tech, biotech, and crossover investors since formation in 2018. For more information on insitro, please visit the company’s website at www.insitro.com.