Senior/Staff Site Reliability Engineer
Company: Mochi Health
Location: San Francisco
Posted on: February 15, 2026
|
|
|
Job Description:
Job Description Job Description Healthcare is broken at the
first step: patients can't find the right care, understand what it
costs, or access the medications they need. Mochi Health is fixing
this. We're building an AI-driven marketplace that makes healthcare
discoverable—connecting patients to the right providers,
transparent pharmacy pricing, and affordable medications. Over the
past few years, we've grown rapidly by combining clinical expertise
with technology that actually works for real people, not just
hospital systems. Our platform does what legacy healthcare can't:
it gives patients transparent pricing before they pay, personalized
medication management that follows them across providers, and
long-term access to their own medical records. We're proving that
healthcare can be more affordable, more human, and far more
intuitive than what exists today. Join a team that's rebuilding
healthcare from the patient up. At Mochi Health, you'll work
alongside people who value bold thinking, inclusive collaboration,
and getting meaningful work into the world. If you want to do the
most impactful work of your career, this is where to do it.
$230,000 - $280,000 Full-time / Onsite (5 days/week) About The Role
We're looking for a Senior/Staff Site Reliability Engineer to build
Mochi's AI-driven APM and incident management system that alert and
page, but learns. This is a foundational role at the intersection
of SRE, platform engineering, and applied AI: you'll design the
feedback loops (human-in-the-loop / RLHF-style), guardrails, and
automation that let our reliability posture improve over time .
You'll own the systems and workflows that turn incidents into
intelligence: automated triage, root cause analysis, remediation,
and bug-fix proposals (PRs, test runs, staged rollouts) when issues
are code-level. If you're excited by the idea of building a
self-improving SRE "copilot", this job is for you. What You'll Do
Build an AI-driven SRE platform that ingests telemetry
(logs/metrics/traces), deploy events, and incident artifacts to
detect anomalies, summarize failures, and propose mitigations.
Design a human-in-the-loop learning loop (RLHF-style) so the system
gets better with every incident: capturing decisions, outcomes, and
postmortems into training/evaluation data. Create safe
auto-remediation capabilities: runbook execution, automated
rollbacks, feature-flag actions with strong guardrails,
auditability, and progressive rollout controls. Build tooling that
can propose bug fixes : generate well-scoped PRs, run tests,
support canary releases—with clear handoff and approval flows.
Define and operationalize SLOs/SLIs and error budgets for critical
user journeys (patient onboarding, provider workflows, pharmacy
fulfillment, billing, etc.). Level up observability end-to-end:
alert quality, dashboarding, tracing standards, and "unknown
unknown" detection. Lead incident response excellence: on-call
improvements, incident command, blameless postmortems, and driving
systemic fixes that reduce repeat failures. Partner with product
engineering teams to reduce toil and improve reliability via better
architecture, load testing, resilience testing, and capacity
planning. Establish reliability standards and patterns across the
org (golden signals, deployment safety, dependency management,
fault isolation). Who You Are 7 years in SRE / platform /
infrastructure engineering, with a track record of owning
production reliability at scale. Deep experience operating
Kubernetes-based systems in the cloud (AWS preferred), including
networking, autoscaling, rollout strategies, and incident
mitigation. Strong software engineering ability—you can debug
production issues across services, understand failure modes, and
contribute code when needed (Python/Go/TypeScript are all great).
Expert-level grasp of observability and incident response: metrics,
logs, tracing, alerting design, and postmortem-driven improvements.
Comfortable building automation that touches production—and
obsessive about safety: least-privilege access, audit logs,
approvals, canaries, and rollback. Excited by AI tooling and
agentic workflows (or already experienced): LLM-based
triage/summarization, retrieval over runbooks/postmortems,
evaluation harnesses, and feedback loops. Strong communication and
collaboration skills—you can lead during incidents, write clearly,
and align teams around reliability priorities. Startup mindset: you
move fast, take end-to-end ownership, and love turning ambiguity
into shipped systems. Excited to work in-person with our team in
San Francisco. Nice to Haves Experience building LLM-powered
internal tools (incident copilots, automated debugging, RAG over
docs/runbooks) and/or RLHF-style feedback pipelines. Familiarity
with security and compliance in regulated environments (HIPAA, SOC
2, audit requirements, PHI handling). Experience with chaos
engineering / game days and resilience testing programs. Experience
building CI/CD guardrails and progressive delivery systems
(canaries, automated verification, safe rollout policies). Prior
work on distributed tracing standards (OpenTelemetry), service
meshes, or large-scale event-driven systems. Our Core Technologies
Include: AWS, Kubernetes, Postgres, Redis, TypeScript/Node.js,
Python, SQL (plus whatever we need to build a world-class
reliability platform) Life at Mochi At Mochi, we believe your best
work happens when you feel your best—so we've designed an
environment that fuels your creativity, supports your growth, and
makes every day exciting. Daily Meals and Espresso Bar - Breakfast,
lunch, and dinner every weekday. Our on-site barista keeps the
espresso and matcha flowing all day Pre-Tax Commuter Perks - Save
on transit and parking through pre-tax commuter benefits
Top-of-Market Compensation - We offer competitive salaries along
with generous equity packages so you can share in the success you
help create Profitable and Rapid Growth - We're scaling fast, with
financial discipline and long-term vision. No VC constraints, just
sustainable momentum and smart decisions High-Impact Work - Help
shape the future of digital healthcare. Your work here directly
improves lives and scales nationwide ?? World-Class Team -
Collaborate with teammates from Tesla, SpaceX, Citadel, Harvard,
IIT, and more. We value excellence, humility, and empathy in equal
measure ? Comprehensive Benefits - 401(k) with match, generous time
off, life insurance, and high-quality medical, dental, and vision
plans Mochi Health Membership – We cover your monthly subscription
fee so you can experience the same care as our patients
(medications not included) Time to Recharge – Enjoy unlimited PTO,
generous company holidays, and true flexibility. We trust you to
take the time you need to rest, reset, and thrive Wellness First –
From weekly mindfulness sessions to group workouts and fitness
perks, your physical and mental health are top priority Team
Socials and Community - We make time to connect through regular
socials, happy hours, and spontaneous events. Our stocked kitchen
doesn't hurt either Downtown SF HQ - Our San Francisco office is
just steps from BART, Muni, and great food. It's designed for deep
work and casual collaboration The base salary for this full-time
position ranges from $230,000 to $280,000, in addition to equity
and benefits. The salary range listed in each job posting
represents the minimum and maximum targets for new hire salaries
across all locations. Actual compensation within this range is
determined by various factors, such as job-related skills,
experience, relevant education or training, and location. LI-Onsite
LI-AK1 Workplace Policy Mochi Health is an in-person company based
in San Francisco, CA. Our team works together in person five days a
week to foster collaboration, innovation, and strong connections.
We believe that face-to-face interaction builds a culture of
excellence and allows us to deliver the best outcomes for the
patients and providers we serve. Equal Opportunity Mochi Health is
an Equal Opportunity Employer. We make all employment decisions
based solely on merit. We provide equal employment opportunities to
all applicants and employees without discrimination on the basis of
race, religion, color, national origin, gender (including
pregnancy, childbirth, or related medical conditions), sexual
orientation, gender identity, gender expression, age, status as a
protected veteran, disability status, or any other applicable
legally protected characteristic. We prohibit any form of
discrimination or harassment. This policy applies to all terms and
conditions of employment, including hiring. Candidate Privacy
Notice Please review Mochi Health's Candidate Privacy Notice here.
Accommodations Mochi Health complies with the Americans with
Disabilities Act (ADA), as amended by the ADA Amendments Act, and
all applicable state or local laws. We will reasonably accommodate
qualified individuals with a disability during the application
process and throughout employment as required by law. If you need
any assistance or accommodations due to a disability, please
contact us at hr@joinmochi.com .
Keywords: Mochi Health, Lodi , Senior/Staff Site Reliability Engineer, IT / Software / Systems , San Francisco, California