Marshall Wace

Back to all Vacancies

Site Reliability Engineer (London)

Founded in 1997, Marshall Wace is one of Europe's leading Hedge Fund Managers with approximately $39 billion assets under management. It enjoys a strong reputation in the industry for its success, influence and innovation, built by a dedicated team of people working in a dynamic, entrepreneurial culture. Our firm is made up of over 250 professionals operating from established offices in London, New York and Hong Kong.

Full Description

Site Reliability Engineer

Marshall Wace is built on a culture of innovation and we build and maintain cutting edge hardware and software solutions. We are looking for an ambitious, enthusiastic and driven Site Reliability Engineer to join our global Production Engineering team in London and to help take our reliability to the next level. The team is focussed on an unrelenting push to improve automation, testing and monitoring of systems and processes to make our business more resilient and to enable a high velocity of change.

 Core Responsibilities

  • Standardisation of monitoring methodologies, systems, tools, libraries
  • Automation of operational processes to improve reliability and efficiency and to reduce alert fatigue
  • Owning and evolving our systems through pushing for changes that improve resilience and reliability
  • Developing and enabling development of high quality, resilient, scalable and secure systems
  • Wearing a strategic resilience and reliability hat in architecture and design discussions
  • Maintain the highest levels of systems availability – mostly proprietary applications, across the enterprise

 Key Skills

  • A passion for automation and continual improvement, with a track record of identifying high value automation opportunities
  • Intense focus on improving system availability and resilience through testing, standardisation and automation
  • Ability to build positive and collaborative relationships with colleagues across teams and geographies.
  • Broad technical knowledge and strong communication skills, credible across the full technology stack
  • Systematic and methodical approach to problem-solving and debugging
  • Knowledge of cyber security risks


The ideal candidate will have:

  • Expert level scripting / coding in Python / Ruby / Powershell / C# / Java / GO or equivalent
  • Experience implementing / using containerisation technologies Docker / Podman / Kubernetes / Openshift etc.
  • Experience using configuration management tools such as Puppet / Chef / Ansible /DSC / Terraform etc.
  • Experience implementing distributed systems such as Hadoop / Spark / Kafka / Flink etc.
  • Experience implementing centralised logging and monitoring / alerting systems such as Nagios / Sensu / Zabbix / Grafana / Kibana / Prometheus etc.