Site Reliability Engineer
Company: Amiri Recruiting
Location: Mountain View
Posted on: February 19, 2026
|
|
|
Job Description:
Job Description Job Description Site Reliability Engineer
Onsite- Bay Area, CA Skills Relevant Skills and Experience What
You’ll Do (Day-to-Day) Own and manage our cloud infrastructure (GCP
or AWS, on-prem). Build, maintain, and optimize Kubernetes clusters
(including GPU-backed clusters). Implement and improve CI/CD
pipelines (GitHub Actions). Write and maintain Infrastructure as
Code (Terraform). Monitor system health and performance using
Grafana and other observability tools. Ensure high availability,
reliability, and uptime across platforms. Handle infrastructure
maintenance, upgrades, and scaling. Administer and improve our
platform architecture and apply general security best practices
across the stack. Note: This is an internal-facing role — no
customer interaction. Must-Have: 4 years in SRE, DevOps, or
Infrastructure Engineering Solid experience with GCP or AWS
(hybrid/on-prem a plus) Experience with Kubernetes cluster
management (GPU experience a bonus) Hands-on with Terraform and
CI/CD (GitHub) Experience with monitoring/observability (Grafana,
etc.) Strong understanding of high availability and infrastructure
reliability Familiarity with platform/cluster architecture and
administration Security mindset and ability to apply best practice
Nice-to-Have: Startup experience (you enjoy building, not just
maintaining) Experience with scalable GPU infrastructure for
AI/ML
Keywords: Amiri Recruiting, Lodi , Site Reliability Engineer, IT / Software / Systems , Mountain View, California