Site Reliability Engineer (London)Anthropic

London 12-05-2023
We're looking for a Site Reliability Engineer who can work in the UK time zone. You'll ensure the high availability and performance of our Kubernetes clusters that power machine learning research and services.

Responsibilities:

    • Own Kubernetes clusters with thousands of nodes
    • Troubleshoot and resolve issues across the stack, from networking to applications
    • Improve monitoring, alerting, and incident response
    • Automate operations and infrastructure management
    • Partner with ML researchers and engineers to meet their infrastructure needs
    • Tune autoscaling and resource allocation for ML jobs
    • Build fault-tolerance into infrastructure to handle node failures
    • Monitor clusters and set up alerts/on-call playbooks
    • Migrate cloud deployments to Kubernetes using Terraform

Requirements:

    • Significant experience with Kubernetes and cloud-native infrastructure
    • A DevOps/SRE mindset: you enjoy debugging complex systems and automating solutions
    • Strong communication skills to work with a range of technical and non-technical colleagues
    • An interest in the societal impacts of ML and a commitment to building robust, reliable systems

While not required, experience with the following would be a bonus:

    • Cloud infrastructure on AWS/GCP
    • Terraform/Infrastructure as Code
    • Monitoring/alerting tools like Prometheus/Grafana
    • Python and Linux sysadmin skills
    • Significant experience with Kubernetes architecture and administration
    • Strong Linux skills and cloud infrastructure expertise
    • Familiarity with networking, caching, and storage optimizations
    • Track record of building resilient, scalable systems
    • Comfort debugging complex, distributed systems
    • Excellent communication and collaboration skills

Annual Salary (GBP)

    • The expected salary range for this position is £230k - £430k.
Hybrid policy: For this role, we prefer candidates who are able to be in our office more than 25% of the time, though we encourage you to apply even if you don’t think you will be able to do that.

c

Applications are now closed