Software Engineering 26/03/2020

The Magic of Maintenance

“Doing” open source isn’t just about committing code – it’s also about committing to the upkeep of the code you’ve committed. There’s nothing glamorous about maintenance. You have to really love data to appreciate and maintain Sparkmagic. That’s why G-Research recently volunteered to help maintain the data-liaison. It’s a clear demonstration of our willingness to put our backs into our work and offer something we find valuable to the greater data science community.

As background, Sparkmagic is an intermediary that allows data scientists to get one tool (Spark) to talk effectively with another tool (Jupyter). Spark is a great way to process huge amounts of data, and Jupyter helps data scientists explore and analyse their data. Meanwhile, programmatic access to Spark can be set up via Livy, a REST server.

Unfortunately, and as happens often in the open source ecosystem, the maintainers of Sparkmagic didn’t have time to keep up with the changing Livy APIs, and Sparkmagic started accumulating bugs. To the credit of the creators of SparkMagic, they had created a tool that people wanted to use, and the community was eager to see SparkMagic keep up with the gradual evolution of the ecosystem surrounding it. Eventually, frustration with the project’s relative inactivity led to forks and fractures, with alternative versions on PyPl.

So, in order to support its own use of Sparkmagic, G-Research funded maintenance time to update the code and more generally keep development going. It was important for us to get technical enhancements added but it was also important for us to see that the community of developers around this product was able to contribute to and work together on the same thing.

This effort succeeded brilliantly: thanks to the work of the Sparkmagic community, including at least six developers who contributed code and others who helped with testing, now Sparkmagic:

  • Works with the latest Livy release, as well as modern versions of Jupyter Notebook and Jupyter Lab
  • Supports server-side rendering of images, so large amounts of data don’t need to be shipped to the client for visualizations
  • Supports progress bars of long-running tasks and works better with Papermill

Funding maintenance time was key to getting these contributions in. Besides writing actual code, open source projects also need someone to review code, merge patches, triage issues, and generally maintain the project. That’s a big part of what underwrites our success and makes possible advances in tools such as Sparkmagic.

G-Research takes software seriously. We have to, as it underpins  our entire business. Most of the best software today is open source and we’re excited to put our resources into the projects that matter to making our systems stronger, our processes better and our results more robust. We take real joy in maintaining the tools that we use; and allot time and resources to our team to work on them. If this sounds interesting to you, please look into Sparkmagic, and be sure to see what positions are currently open on our team.

Alex Scammon – Head of Open Source Development

Stay up to-date with G-Research

Subscribe to our newsletter to receive news & updates

You can click here to read our privacy policy. You can unsubscribe at anytime.