Gestores

AWS Site Reliability Engineer (SRE)

Detalhes da Vaga:

QUALIFICATIONS

  • Advanced English
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience.
  •   At least 10 years of experience as an SRE or in a similar operational role with a focus on AWS cloud services.
  •   Strong background in Linux/Unix and system administration.
  •   Proficiency with AWS services such as EC2, RDS, S3, DynamoDB, IAM, VPC, Lambda, and CloudWatch.
  •   Expertise in scripting languages like Python, Bash, or similar for automation.
  •   Familiarity with DevOps practices and tools (CI/CD pipelines, configuration management, etc.).
  •   Knowledge of networking principles and protocols, including DNS, HTTP/HTTPS, and TCP/IP.
  •   Understanding of security practices in AWS and experience implementing security controls.
  •   Excellent problem-solving, critical thinking, and communication skills.
  •   AWS certifications are a plus, such as AWS Certified SysOps Administrator or AWS Certified DevOps Engineer.

KEY RESPONSIBILITIES 

  • Work at the US Central Time Zone
  • Collaborate with development and operations teams to enhance, design, and manage scalable and reliable cloud infrastructure on AWS.
  •   Develop automation tools for health-checks, deployments, patching, and security configurations.
  •   Employ Infrastructure as Code (IaC) practices using tools like AWS CDK, CloudFormation, or Terraform for repeatable and consistent environment setup.
  •   Design and implement solutions for monitoring and alerting that allow proactive incident management and increase the visibility into operational efficiency using Datadog.
  •   Diligently work to reduce the occurrence of errors and improve customer experience by implementing modern software deployment techniques, such as canary releases and blue/green deployments.
  •   Participate in on-call rotations, proactively respond to system outages, and be an incident commander as needed.
  •   Drive root cause analysis (RCA) and post-mortem culture to ascertain the contributing factors behind incidents and to devise preventative measures.
  •   Work closely with the security team to enforce best practices, security policies, and compliance with regulations as applicable.
  •   Contribute to capacity planning and demand forecasting, software performance analysis, and system tuning.
  •   Develop and maintain Runbooks, Playbooks, and documentation about system configurations, operating procedures, and service records.

Aplicar-se a Vaga
Habilidades e Conhecimentos:

Linux/Windows system administration

AWS Services

EC2

RDS

S3

DynamoDB

IAM

VPC

Lambda

CloudWatch

Python/Bash

Benefícios:

Auxílio Home Office

Bolsas Educacionais

Cartão de Benefícios

Day off

Licença Paternidade

Plano de Carreira

Plano de Saude

Plano Odontológico

Seguro de Vida

Via Recrutei
company-logo
São Paulo (SP) ou Remoto

LabVantage Solutions LATAM

https://interfusaoti.com.br/

Não Informado

Pessoa Jurídica

Publicada há 1 semana

Compartilhar Vaga: