Site Reliability Engineer
Site Reliability Engineer
12/12/2021BMC is looking for a Site Reliability Engineer, who is passionate and self-motivated. Come join a dynamic group in BMC working on a next generation cloud-based solution for IT management.
In this role, you will be working with a team to maintain and evolve our Cloud native direction. The role drives business value through technical innovation such as simplification, extensibility, automation, and supportability. Responsibilities: Focus on Site Reliability Engineering (SRE) principles to develop tools, automation, dashboards to operationalize new BMC SaaS services and to transform BMC’s legacy applications to SaaS continuous delivery model. Collaborate with multiple teams within the company, including R&D, DevOps, Infrastructure, Support and Operations Help to deliver solutions for Cloud services – e.g. operationalizing new cloud services, Grafana dashboards, Kibana dashboards Automate and apply development principles to operational practice of SaaS management. Transform legacy technology to cloud native. Take part in evolve cloud observability to use industry standard tools, and coding. Conduct root cause analysis with the cross functional teams after service disruptions, identify opportunities for improvement, including development of such tools.
Requirements:
- 2+ years of experience working for a software engineering company
- Coding experience in Python, Java, Ansible, Shell, Go etc. Experience working in an Agile methodology with cross functional teams (R&D, DevOps, Customer Success etc.)
- Experience with open-source monitoring and visualization systems and tools, i.e.
- Prometheus (monitoring + tracing), Grafana/Kibana (dashboards)
- Experience with Containers solutions (Kubernetes and Docker)
- Experience with CI\CD pipelines and tools, such as Jenkins
- Experience in Web service protocols (REST, JSON) implementation
- Experience with Relational DBs (e.g. PostgreSQL, MS SQL, MySQL)
Advantages:
- Knowledge of defining and monitoring system quality measures, including SLO and SLA
- Knowledge of Built tooling to improve reliability of systems, automated remediation of issues, or improve scalability.
- Experience in Public cloud, private cloud, and hybrid cloud deployments.
- Knowledge \ Experience in OpenShift, Rancher Knowledge in logging solutions like Logstash, FileBeat, FluentBit, Jaeger
- Knowledge of BMC products as ITSM stack and Helix ITSM
- Knowledge of Helix Operation Management and KMs
משרה זו סגורה ולא ניתן לשלוח אליה קורות חיים.