Job Description
Overview:
Are you passionate about ensuring a great customer experience behind the scenes?
This role will be predominately operational, focused on improving & supporting front-line SRE operations. The focus will centre on operational readiness, resiliency & quality standards. In addition, there will be the opportunity to contribute & define exciting but scalable reliability engineering projects.
Not sure what skills you will need for this opportunity Simply read the full description below to get a complete picture of candidate requirements.
*The role is based in London UK and the current expectation is for the resource to work three days in the Battersea office with flexible days on Monday/Friday.
For this contractor position, you will:
- Triage, troubleshoot & resolve front-line production support alerts and tickets.
- Monitor high-traffic systems responsible for operations, including applications, containers, middleware, physical hardware & databases.
- Have a systematic, test and measure approach to continually improving operations & monitoring.
- Understand Web Service APIs, Internet architecture, & common client-server technology stacks.
- Provide a high level of customer experience to our internal and external stakeholders.
- Work closely with third parties, interacting with many of our partners to resolve platform issues.
- Be a confident communicator, leading telephone or video calls with internal teams & external partners.
- Support new projects when launched.
- Have previous production or application support experience, preferably with large-scale distributed systems.
The ideal candidate will have the following:
- Proficiency in handling incident management & problem management at an application support level.
- Experience troubleshooting, analysing log files & resolving technical problems with Java-based applications in a fast-paced environment.
- Strong background in monitoring and logging of large-scale platforms (Prometheus, Grafana, Splunk, etc.)
- Familiarity with configuration and deployment management (AWS, Unix, Java, Databases, Kubernetes, Docker, etc.)
- Competency in one or more coding or scripting languages, such as Python, Ruby, Ansible, etc.
- Understanding of KPIs, metrics and SLOs.
- Experience automating manual tasks.
- Knowledge of version control tools such as Git.
- Familiarity with iOS, macOS, WatchOS and iPadOS.
- An exceptionally high level of attention to detail.
- Self-motivated, proactive, and able to work independently and as part of a team.
- Effective verbal and written communication skills.
- Most importantly, a passion for ensuring our customers and stakeholders always get the best possible experience.
Competencies:
Well organised, learning on the fly, self-development, problem-solving, functional/technical skills.
Prior experience as an SRE, System Administrator, Application Support, or similar role.