HPC DevOps Engineer

Employer: PacBio
Location: Menlo Park, CA, United States
Start date: Feb 3, 2021

Discipline: Engineering, Software Engineer
Required Education: Bachelors Degree
Position Type: Full time
Hotbed: Biotech Bay

HPC DevOps Engineer

The High-Performance Computing and Infrastructure group at Pacific Biosciences seeks a smart, creative, energetic new teammate to take our DevOps, automation, and engineering technology to the next level. This person will have a high level of technical knowledge, interpersonal skills to work cooperatively with a skilled group of passionate, friendly engineers, and a penchant for improving compute and storage systems with a thirst for automation.

Responsibilities:

Monitor and manage operation of the HPC Linux and clustered network storage environments
- Anticipate and stay ahead of long-term capacity growth and short-term needs
- Archive and marshal data to keep costs low
- Maintain function and reduce downtime of shared systems
Streamline and automate HPC system deployment and updates
Work with HPC cluster users to ensure efficient resource utilization
Generate and communicate compute and storage cluster usage guidelines
Work closely within the team, as well as with cross-functional engineering, software, and bioinformatics groups to support and improve robustness and efficiency of systems
- Capture current needs and potential improvements
- Develop, test, re-test, deploy, and document cluster modifications
- Interface with and train groups as needed
Be part of an on-call rotation for evening and weekend support

All listed tasks and responsibilities are deemed as essential functions to this position; however, business conditions may require reasonable accommodations for additional tasks and responsibilities.

Position Requirements:

Secondary school degree in computer science or related field and/or 6+ years of relevant full-time experience
Advanced working knowledge of:
- Linux (CentOS, RHEL, etc)
- NetApp ONTAP clustered storage
- VMWare virtualization
- Slurm/SGE schedulers
- Configuration management (Ansible, Chef, Puppet, CFEngine, …)
Experience architecting and administering Linux-based HPC clusters, including resource scheduling, deployment, and hardware troubleshooting
Strong architectural knowledge of networked storage systems and file-sharing protocols
Experience with cloud deployment and integration (routing, automation)
Scripting skills (Python, YAML, Shell, Perl)
Experience with monitoring and alerting tools (Zabbix, PRTG)
Experience with Kafka, Redis, Elasticsearch
Knowledge of GPU workflows for machine learning
Ability to handle occasional after-hours or weekend support
Must be able to lift 40+ lb equipment (for datacenter work)
Must be a self-starter and be able to work effectively with minimal guidance and supervision

All qualified applicants will receive consideration for employment without regard to race, sex, color, religion, national origin, protected veteran status, or on the basis of disability, gender identity, and sexual orientation.

Send job

Get job alerts

Create a job alert and receive personalized job recommendations straight to your inbox.

Create alert