HPC DevOps Engineer

Location
Working from home
Posted
Jul 29, 2021
Ref
2342
Required Education
Bachelors Degree
Position Type
Full time

HPC DevOps Engineer

The High-Performance Computing and Infrastructure group at Pacific Biosciences seeks a smart, creative, energetic new teammate to take our DevOps, automation, and engineering technology to the next level. This person will have a high level of technical knowledge, interpersonal skills to work cooperatively with a skilled group of passionate, friendly engineers, and a penchant for improving compute and storage systems with a thirst for automation.

Responsibilities:

  • Monitor and manage operation of the HPC Linux and clustered network storage environments
    • Anticipate and stay ahead of long-term capacity growth and short-term needs
    • Archive and marshal data to keep costs low
    • Maintain function and reduce downtime of shared systems
  • Streamline and automate HPC system deployment and updates
  • Work with HPC cluster users to ensure efficient resource utilization
  • Generate and communicate compute and storage cluster usage guidelines
  • Work closely within the team, as well as with cross-functional engineering, software, and bioinformatics groups to support and improve robustness and efficiency of systems
    • Capture current needs and potential improvements
    • Develop, test, re-test, deploy, and document cluster modifications
    • Interface with and train groups as needed
  • Be part of an on-call rotation for evening and weekend support

 


All listed tasks and responsibilities are deemed as essential functions to this position; however, business conditions may require reasonable accommodations for additional tasks and responsibilities.

Position Requirements:

 

 

  • Secondary school degree in computer science or related field and/or 6+ years of relevant full-time experience
  • Advanced working knowledge of:
    • Linux (CentOS, RHEL, etc)
    • NetApp ONTAP clustered storage
    • VMWare virtualization
    • Slurm/SGE schedulers
    • Configuration management (Ansible, Chef, Puppet, CFEngine, …)
  • Experience architecting and administering Linux-based HPC clusters, including resource scheduling, deployment, and hardware troubleshooting
  • Strong architectural knowledge of networked storage systems and file-sharing protocols
  • Experience with cloud deployment and integration (routing, automation)
  • Scripting skills (Python, YAML, Shell, Perl)
  • Experience with monitoring and alerting tools (Zabbix, PRTG)
  • Experience with Kafka, Redis, Elasticsearch
  • Knowledge of GPU workflows for machine learning
  • Ability to handle occasional after-hours or weekend support
  • Must be able to lift 40+ lb equipment (for datacenter work)
  • Must be a self-starter and be able to work effectively with minimal guidance and supervision

 


All qualified applicants will receive consideration for employment without regard to race, sex, color, religion, national origin, protected veteran status, or on the basis of disability, gender identity, and sexual orientation.