Big Data Engineer

Cambridge, MA
Jun 07, 2022
Required Education
Bachelors Degree
Position Type
Full time

Omega Therapeutics is a development-stage biotechnology company pioneering the first systematic approach to use mRNA therapeutics as a new class of programmable epigenetic medicines by leveraging its OMEGA Epigenomic Programming™ platform. The OMEGA™ platform harnesses the power of epigenetics, the mechanism that controls gene expression and every aspect of an organism's life from cell genesis, growth and differentiation to cell death. Using a suite of technologies, paired with Omega’s process of systemic, rational and integrative drug design, the OMEGA platform enables control of fundamental epigenetic processes to correct the root cause of disease by returning aberrant gene expression to a normal range without altering native nucleic acid sequences. Omega’s engineered, modular, and programmable mRNA-encoded epigenetic medicines, Omega Epigenomic Controllers™, target specific intervention points amongst the thousands of  mapped and validated proprietary and novel DNA-sequence-based epigenomic loci, EpiZips™ to durably tune single or multiple genes to treat and cure disease through Precision Genomic Control™. Omega is currently advancing a broad pipeline of development candidates spanning a range of disease areas, including oncology, regenerative medicine, multigenic diseases including immunology, and select monogenic diseases.

About the Role:

Omega Therapeutics, Inc. is seeking a Big Data Engineer to join our Data and IT team.  This person will collaborate with the internal team to develop, maintain, test and help administer a robust and scalable Bioinformatics Company-Wide Data Warehouse/Lake and Data-Integration Platform that will contain a combination of  Big-Data / SQL / NoSQL / Graph DB technologies.  This person will be a strong communicator, a fast learner and passionate about all aspects of Big-Data, reliability, security and open to new technologies such as AI-ML-NLP, IoT.  This person will have the ability to work in a fast-paced environment in an agile and collaborative manner.

Key Responsibilities:

  • Using appropriate software; acquire, cleanse, curate, wrangle, analyze, report, store and efficiently index multi-format data (Binary, Text, Image, Audio, Structured, Unstructured …) from multiple sources (labs, manufacturing sites, websites, DB’s, files …), to and from the Data Warehouse/Lake, using the Data-Integration Platform
  • Develop, test, and operate multiple data processing pipelines; containing but not limited to, AI-ML, IoT, Real-Time technologies
  • Help establish data engineering processes and best-practices for the Company-Wide Data Lake & Data-Integration Platform

 Required Skills:

  • Good knowledge of most to all software aspects of Big-Data Engineering:
    • Design, create, build test and maintain multiple parallel data pipelines
    • Aggregate and transform raw data coming from a variety of data sources to fulfill the functional and non-functional business needs
    • Performance optimization: automating processes, optimizing data delivery, and re-designing the complete architecture to improve performance
    • Handling, transforming and managing Big Data using Big Data Frameworks, SQL & NoSQL databases
    • Building complete infrastructure to ingest, transform and store data for further analysis and business requirement
  • Programming Knowledge of at least one Big-Data DB: SPARK or HADOOP
  • Programming Knowledge of at least one SQL DB: MySQL, Postgres, AWS Redshift, MSFT SQL Server, or Oracle
  • Programming Knowledge of at least one NoSQL DB: MongoDB, AWS DynamoDB, Cassandra, Memcache, or Redis
  • Programing Knowledge of creating Data-Integration Platform: REST API’s, JSON, XML, HTTP/HTTPS, TCP/IP, UDP, Multithreaded Servers, and Microservices
  • Excellent knowledge of Big Data principles, Big Data management, and Big Data security methods
  • Excellent Computing fundamental skills such as: Data Structures and algorithms, parallel and distributed processing, operating systems, networking, and data storage
  • Excellent Software skills: at least one: Python, Java, C/C++ or similar; SQL, scripting (any of: Perl, Awk, Sed, Linux Shell Scripts, Powershell, or Apple MAC scripting), ETL
  • Knowledge of Software & Big-Data Testing tools (or similar tools) such as: Selenium, JMeter, Junit, WebLoad, Aconitia, Netsparker, Wireshark, or Malwarebytes
  • Preferred programming knowledge of at least one Graph DB: such as Neo4j, TigerGraph, AWS, or Neptune
  • Preferred programming knowledge of various AWS Big-Data services such as: S3, EMR, Athena etc.

Required Qualifications:

  • BS/MS (electrical or computer engineering, computer science, math, physics, or related field)
  • 3+ years industry experience as a Big-Data Engineer, using cloud-based infrastructure (AWS preferred, GCP, Azure)
  • 3+ years of Server-Side and DB Software Development and Testing
  • Preferred experience with Front-End GUI Development and Testing
  • Preferred experience with installation, design and architecture, schema creation, indexing, backups, tuning, querying, resource allocation, scripting, data transfer, data-deduping, data integrity and quality, ETL, testing, and troubleshooting a heterogeneous Big-Data DB environment