Site Reliability Engineer
Marshall Wace is built on a culture of innovation and we build and maintain cutting edge hardware and software solutions. We are looking for an ambitious, enthusiastic and driven Site Reliability Engineer to join our global Production Engineering team in London and to help take our reliability to the next level. The team is focussed on an unrelenting push to improve automation, testing and monitoring of systems and processes to make our business more resilient and to enable a high velocity of change.
- Standardisation of monitoring methodologies, systems, tools, libraries
- Automation of operational processes to improve reliability and efficiency and to reduce alert fatigue
- Owning and evolving our systems through pushing for changes that improve resilience and reliability
- Developing and enabling development of high quality, resilient, scalable and secure systems
- Wearing a strategic resilience and reliability hat in architecture and design discussions
- Maintain the highest levels of systems availability – mostly proprietary applications, across the enterprise
- A passion for automation and continual improvement, with a track record of identifying high value automation opportunities
- Intense focus on improving system availability and resilience through testing, standardisation and automation
- Ability to build positive and collaborative relationships with colleagues across teams and geographies.
- Broad technical knowledge and strong communication skills, credible across the full technology stack
- Systematic and methodical approach to problem-solving and debugging
- Knowledge of cyber security risks
The ideal candidate will have:
- Expert level scripting / coding in Python / Ruby / Powershell / C# / Java / GO or equivalent
- Experience implementing / using containerisation technologies Docker / Podman / Kubernetes / Openshift etc.
- Experience using configuration management tools such as Puppet / Chef / Ansible /DSC / Terraform etc.
- Experience implementing distributed systems such as Hadoop / Spark / Kafka / Flink etc.
- Experience implementing centralised logging and monitoring / alerting systems such as Nagios / Sensu / Zabbix / Grafana / Kibana / Prometheus etc.