Staff Site Reliability Engineer Job at Plume Design, Inc, Palo Alto, CA

Y3Era3EvVC9VVjR6U1dHVTlCWXhweVBRRnc9PQ==
  • Plume Design, Inc
  • Palo Alto, CA

Job Description

We’re looking for a seasoned Site Reliability Engineer, experienced with Customer Facing environments, to provide Technical Leadership for our Site Reliability Engineering Team. This team is focused on deployments, Production Infrastructure, Availability and Reliability. The right candidate has held several Infrastructure-oriented roles and needs to have strong technical knowledge in the DevOps/SRE technology stack while focusing on customer satisfaction.

What You’ll Do:

  • Supervise a team of Site Reliability Engineers who provide first-line support to Customer Clouds. Deployments, On-call, Application Provisioning are some of the routine tasks.
  • Run stand ups for the team, ticket management
  • Participate in the Sprints and close tickets with the team
  • Attend and conduct customer Meetings for Project and Roadmap specification.
  • Be able to step in and execute or triage issues. Some examples are as follows:
  • Provision and scale Kubernetes Infrastructure and Applications (EKS)
  • Deploy Software in multiple Production Environments
  • Own monitoring and alerting to production systems, improvements and changes
  • Contribute improvements to the current automation
  • Contribute improvements to our on-call process and alerting

What You’ll Bring

  • 4+ years of Kubernetes Knowledge (operate)
  • 2+ years of Terraform Knowledge
  • Experience both setting up and utilizing Monitoring and observability tools
  • e.g. New Relic, Nagios/Icinga, Grafana, Prometheus
  • 2+ years of experience Programming/Scripting - one of the following
  • eg. Perl, Python, PHP, GoLang, Java, etc
  • 8+ years of experience with modern Linux Operating systems
  • 6+ years of experience with modern cloud infrastructure, preferably AWS
  • Availability to be in on-call rotation for Production issues
  • Availability to work with a distributed team in different timezones
  • Advanced communication skills
  • Experience leading efforts and reporting up

Desired Skill Set

  • 10+ Years of experience with Production Troubleshooting
  • 4+ Years of experience leading teams
  • Executive Communication skills
  • Bachelor’s degree in related field or equivalent experience, Advanced degree preferred.
  • This is a leadership role, but you must have Technical knowledge and working experience with:
  • Kubernetes (operate)
  • Basic Terraform Knowledge
  • Experience Programming/Scripting - one of the following (eg. Perl, Python, PHP, GoLang, Java, etc)
  • Experience with modern cloud infrastructure, preferably AWS
  • Experience with modern Linux Operating systems (Enterprise Linux or Debian based)
  • Experience both setting up and utilizing self-managed Monitoring and observability tools (e.g. Nagios/Icinga, Grafana, Prometheus)

Differentiators

  • Troubleshooting production performance/service degradation or outage issues at scale
  • Experience with Infrastructure Troubleshooting in VMs and/or Bare Metal (ssh/Linux)
  • Advanced Kubernetes knowledge
  • Advanced Terraform knowledge
  • Customer Facing experience in previous roles
  • Experience operating Kafka in Production
  • Experience operating NoSQL Databases in Production
  • Experience operating Relational Databases in Production
  • Configuration Management experience

Please note that this is a HYBRID position requiring work in the office three days a week. We’re looking for candidates who are within a commutable distance. We are unable to provide relocation assistance or visa sponsorship at this time.

Total Compensation package would include: anticipated compensation range of $177,000 - $208,000 + bonus + equity + benefits. Benefits include: a 401k plan and a company match, basic life insurance plus unparalleled health, dental, vision and other benefits and perks. For more details please see:

An employee’s base salary and its position within the range may depend on a number of factors including job related knowledge, education, skills, experience and other business related considerations. Published ranges are provided in good faith at the time of posting.

Job Tags

Work experience placement, Visa sponsorship, Relocation package, 3 days per week,

Similar Jobs

Scheels

Asset Protection Associate Job at Scheels

 ...Scheels - 4301 W. Wisconsin Ave STE 015 [Loss Prevention / Security] As an Asset Protection Officer at Scheels, you'll: Reduce and deter internal loss through means of floor surveillance, monitoring closed circuit camera systems, and auditing/reviewing a variety of exception... 

North Central Health Care

Manager of Crisis Clinical Services Job at North Central Health Care

 ...Job Description Job Description The Opportunity: The Manager of Crisis Clinical Services is a licensed mental health professional who provides clinical oversight for the NCHC Crisis program, whose staff provide 24-hour Crisis assessment and response services to... 

Takeda Pharmaceutical Company Limited

Sr Manager, Patient Safety & Pharmacovigilance, Standards & Training (Remote) Job at Takeda Pharmaceutical Company Limited

 ...discipline or related field, preferably Health Care Professional Minimum of 5 years prior experience in pharmaceutical industry in a drug safety/pharmacovigilance global setting Experience in with training adult learners and developing procedural documents and training... 

Haptiq

Office Manager and Personal Assistant Job at Haptiq

 ...Company Overview Haptiq is a global fintech company with offices in the US, Canada, Poland, and India. We are a leader in delivering...  ...are seeking a highly organized, reliable, and resourceful Office Manager & Personal Assistant to ensure our New York office runs... 

XPO Logistics

Mechanic Apprentice Job at XPO Logistics

 ...What youll need to succeed as a Mechanic Apprentice at XPO Minimum qualifications: ~1 year of experience in tractor and trailer inspection and repair -OR- a technical school degree with certification in heavy-duty truck/trailer maintenance ~ A valid drivers license...