Metadata Integration Engineer (Data Engineer II)
The mission of the Allen Institute is to unlock the complexities of bioscience and advance our knowledge to improve human health. Using an open science, multi-scale, team-oriented approach, the Allen Institute focuses on accelerating foundational research, developing standards and models, and cultivating new ideas to make a broad, transformational impact on science.
The mission of the Allen Institute for Brain Science is to accelerate the understanding of how the human brain works in health and disease. Using a big science approach, we generate useful public resources, drive technological and analytical advances, and discover fundamental brain properties through integration of experiments, modeling and theory.
We are looking for an experienced Metadata Integration Engineer to join our Data Integration Team. This position will work closely with scientists, ontologists, engineering teams, and consortia members to develop solutions for data integration and interoperability to support data analysis and reuse.
The ideal candidate understands the complexities of combining biological data from heterogenous sources, the difficulty of standardizing (meta)data, and can serve as an advocate for the value of standardization in the support of interoperability and data re-use. They will have a strong background in data standards and data modeling for scientific data, as well as familiarity with software and data engineering. You will interact regularly with neuroscientists and a wide variety of engineers both internally and as part of a large consortium, collaborating on developing a vast resource on the brain.
To gain more insight on the projects you will impact, please visit the Allen Brain Map data portal (portal.brain-map.org) and BICCN portal (biccn.org)
The Allen Institute believes that team science significantly benefits from the participation of diverse voices, experiences, and backgrounds. High-quality science can only be produced when it includes different perspectives. We are committed to increasing diversity across every team and encourage people from all backgrounds to apply for this role.
- Support development of data management standards, best practices, and policies for internal and consortium ecosystems; contribute to a data life-cycle framework, ensuring long-term value of data assets, according to FAIR principles
- Lead and contribute to efforts with community partners and scientists to develop and document (meta)data standards
- Assess data and product needs to define data requirements for data integration; facilitate requirements review
- Research existing domain-specific data models and schemas; develop and extend data models and schemas
- Build consensus for data integration efforts across platforms and organizations and advocate for FAIR principles
- Participate in outreach efforts to curate, publish and publicize high-dimensional biomedical data sets, research tools and publications
- Codevelop requirements and collaborate to support creation of infrastructure for tools to support data ingest, ETL and dashboarding
- Support ontology extensions as needed
- Support data wrangling, data curation and data validation as needed
Required Education and Experience
- Bachelor degree in a relevant technical discipline (e.g., informatics, neuroscience, genomics, MLIS, MIS)
- At least 2 years’ experience with scientific data modeling, schema development, metadata management, data governance or data quality technologies
- Proficiency with one of the following or similar languages: Python, Scala, OWL, RDF, JSON-LD or graph query
Preferred Education and Experience
- Advanced degree (M.S., PhD.) in a relevant technical discipline (e.g., informatics, neuroscience, genomics, MLIS, MIS)
- Profound understanding of data integration and management concepts such as semantic data integration, ontology management, and FAIR principles
- In-depth understanding of some of the data domains that are relevant for neuroscience, imaging, and genomics
- Demonstrable experience with community standards development (for metadata, data formats, and/or quality metrics); good knowledge of domain-relevant standards
- Experience with non-relational/unstructured, graph or semantic technologies; good knowledge of relational databases and SQL
- History of contributing to open source and/or community-based projects
- Hands-on knowledge of cloud-based infrastructure (AWS, Azure, or GCP) and experience using data-related services
- Strong interpersonal, communication and presentation skills
- Strong project management and organizational skills
- Excellent analytical and problem-solving skills combined with capacity for complex, detail-oriented work
- Occasional exposure to laboratory atmosphere - possible exposure to chemical, biological or other hazardous substances
- Sitting, standing, bending, squatting as found in typical office environment
Position Type/Expected Hours of Work
- This role is currently able to work remotely due to COVID-19 and our focus on employee safety. We are a Washington State employer, and remote work must be performed in Washington State. We continue to evaluate the safest options for our employees. As restrictions are lifted in relation to COVID-19, this role will return to work onsite.
It is the policy of the Allen Institute to provide equal employment opportunity (EEO) to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, or any other characteristic protected by federal, state or local law. In addition, the Allen Institute will provide reasonable accommodations for qualified individuals with disabilities.