Around a year ago we embarked on a new journey to build a Site Reliability Engineering (SRE) team to help propel innovation, efficiency, scalability, and to create an overall autonomous state. This post will be the first of many surrounding our SRE journey, and we hope by posting about our progress, it may help others on a similar path.
In 2017 we grew by roughly 50 per cent in software engineering, and are on route to do the same in 2018. Paul Voccio, our new Head of Software Engineering and former VP of Software Engineering at Rackspace, joined in June 2017. On realising that we were prioritising feature work, Paul introduced the SRE mentality into the business. Subsequently a plan was formed to hire a team that was separate from the feature/quant work needs, but close enough to understand and build infrastructure tooling along with changes needed that would come with migration to cloud platforms.
We intend to make huge investments both in people and technology over the next two to three years. Led by Matt Barsby, Head of Platform Engineering & Production Development, G-Research have hired four SRE`s alongside a new SRE Manager who will be joining us in September.
‘Having been at G-Research for over eight years I’ve seen the organisation increase in size massively. What’s clear to me though is that the growth of our technical footprint is somewhat exponential in comparison as we take advantage of on demand infrastructure and modern design patterns such as microservices and containers. Many of our developers took on (unknowingly) dual roles and would manage the scale and reliability issues alongside their feature development work, but a platform of our scale needs dedicated resources who have both the time and expertise to address the engineering challenges at the micro and macro level. This is where SRE was born.
There is no shortage of interesting challenges and G-Research being as ambitious as it is, we’ll continue to see demand for SRE resources for the foreseeable future. Whether it is enabling application teams to take advantage of cloud computing, making systems more autonomous or by tuning the platform for the best performance, we’ll likely be involved. Due to the nature of the business, we have a large and fairly unique dispersed global network that needs to handle both high performance computing, cloud and low-latency workloads, much more than ‘just web scale.’ – Matt Barsby
In April we hired Richard Semmens, previously an SRE at Google and Bloomberg. He has brought with him experience and innovative ideas surrounding large scale, distributed monitoring and cross-team problem root-cause analysis.
‘After leading a strong and varied career, I came to G-Research after spending time discussing with senior management what SRE could and should be here. The decisions and reasoning resonated well with my engineering history at Google, but without the corporate challenges.
We have a great start-up feel, where everyone is social, friendly and keen to improve everything from infrastructure and code to optimal cake delivery.
As SRE is a new function here, we get to focus on future-facing projects, evangelising the best of breed technologies available to us and providing consultancy to the already data-driven production teams for implementation. Compared to my corporate experience, G-Research is far more enjoyable as decisions aren’t pre-made by people unconnected to the end result and are instead collaborations between the people who care about the outcomes.’ – Richard Semmens.
We’re only just getting started, and the team has already been involved in a wide number of projects that have provided a lot of value to the business. Highlights so far include:
- Working with security and infrastructure teams to plan how best to take advantage of a Hybrid Cloud platform.
- Building a telemetry platform that meets the requirements of application and infrastructure teams now and for years to come.
- Working with our platform operation teams to help automate many of the processes they run so that they can add more value in releasing research.
- Providing consultancy for projects across the company to advise on reliability, scaling and monitoring best practices.
In the next blog post we’ll talk about the hiring process for the SRE team, both engineers and managers.
Sounds interesting? Why not apply?
Ben Guinchard & Matt Barsby