Blog | Linux Foundation

If You Can't Measure It, You Can't Improve It: CHAOSS Project Creates Tools to Analyze Software Development and Measure Open Source Community Health - Linux Foundation

Written by Jim Zemlin | Sep 11, 2017 7:00:00 AM

Today over 80 percent of the software in any technology product or service is open source. And this trend is growing. According to a recent study by Sonatype, every day the supply of open source across all ecosystems increases by about 1,100 new projects and 10,000 new versions.

This raises important questions about which open source projects matter. What code should I bet my product, my company, or career on? Will those projects grow or shrink? Is the code base stable or changing? Does the project depend on one organization or many? Is the community healthy or hopelessly ill?

At The Linux Foundation, we want to grow and sustain the most important software in the world. One of the ways we can do this is by measuring the general health of an open source community and developing methodologies and tools for analyzing modern software development. With this in place, we can improve the health of projects and make it easier to answer the questions above.

We need software that will help benchmark and analyze project health along several dimensions as well as robust guidelines for what makes an open source community healthy. We need the means to apply analytics to the big data produced by all the systems supporting software development.

Welcome CHAOSS Project

It’s with this that I’m excited to announce the Community Health Analytics Open Source Software project (CHAOSS). CHAOSS is a new Linux Foundation project focused on creating the analytics and metrics to help define community health.

Initial members contributing to the project include Bitergia, Eclipse Foundation, Jono Bacon Consulting, Laval University (Canada), Linaro, Mozilla, OpenStack, Polytechnique Montreal (Canada) Red Hat, Sauce Labs, Software Sustainability Institute, Symphony Software Foundation, University of Missouri, University of Mons (Belgium), University of Nebraska at Omaha, and University of Victoria.

The project aims to:

  • Establish standard implementation-agnostic metrics for measuring community activity, contributions, and health, which are objective and repeatable.
  • Produce integrated open source software for analyzing software community development.

Developing Metrics and Tools

To date, there have been fragmented individual efforts to look at metrics across open source communities. Researchers from academia and practitioners from industry are getting together in the CHAOSS metrics committee to define a neutral, implementation-agnostic set of reference metrics to be used to describe communities in a common way. Founding members of the metrics committee include members from the Symphony Software Foundation, Mozilla, the Software Sustainability Institute, SECOHealth interdisciplinary research project, the University of Nebraska at Omaha, and the University of Missouri.

Definitions without implementations aren’t very practical, so the CHAOSS software committee is being formed to provide a framework for establishing an open source GPLv3 reference implementation of the CHAOSS metrics.

Several members, as well as individuals, have open sourced pieces of software to begin the building blocks of the project. Initial projects to be contributed and integrated into the CHAOSS community are Prospector from Red Hat, Grimoire Lab from Bitergia, and Cregit from Daniel M. German, Professor of the Department of Computer Science of the University of Victoria.

Red Hat’s contribution, Prospector, is a tool for automated collection and continuous tracking of a wide range of metrics of open source projects useful in evaluating the health and trends of projects. Red Hat is open sourcing Prospector as part of the launch of CHAOSS under GPLv3.  

Bitergia’s contribution, GrimoireLab, is a set of free, open source software tools for software development analytics. The tools gather data from many development-supported systems (git, GitHub, Jira, Bugzilla, Gerrit, mailing lists, Jenkins, Slack, Discourse, Confluence, StackOverflow, etc.); merges and organizes it in a database; and produces visualizations, actionable dashboards, and analytics of collected data. GrimoireLab is focused on analyzing activity, community, and processes, but can be easy tailored for other aims. Grimoire Lab is being developed under a community model and is licensed under GPLv3. You can see an example of a dashboard produced for it at http://cncf.biterg.io.

Cregit is a tool to improve provenance of source code in git repositories. It increases the granularity of blame information from line to token level. This information is very useful to help ascertain the true origin of source code. Cregit has recently expanded to link source with the email-based code reviews where it was introduced. It is currently in use by the Linux kernel at cregit.linuxsources.org, and was originally developed by Daniel M. German, Professor of the Department of Computer Science of the University of Victoria.

Building off Current Metrics

In addition, several community projects are already planning to help implement the metrics as they are defined in their own projects. These projects include GHData from Sean P. Goggins, Computer Science Professor at the University of Missouri, and Matt Germonprez, Information Science & Technology Professor from the University of Nebraska Omaha and velocity and gha2db from the Cloud Native Computing Foundation.

GHData exists as community-initiative prototype implementation of individual metrics developed by the CHAOSS working group. GHData is a Python library and REST server that provides a community-based implementation of CHAOSS metrics. GHData is initially aimed at GitHub hosted projects, using the GHTorrent database. All software is released under the MIT license. The repository is available at: https://github.com/OSSHealth/ghdata

Velocity is a set of tools for analyzing and visualizing data about project velocity developed by the Cloud Native Computing Foundation. It enables reports such as the 30 highest velocity open source projects. gha2db is an emerging project for populating a time series database of project status by analyzing the freely available GitHub Archives.

Participating in CHAOSS

The CHAOSS project has a governance board, which is responsible for the overall oversight of the Project and coordination of efforts of the technical committees. You can learn more about the governing board and how to participate in the project on the CHAOSS community website: https://chaoss.community.

If you are currently attending Open Source Summit North America, the following sessions will discuss this new initiative and how to get involved:

 

Industry and Academic Support of CHAOSS

Jesus M. Gonzalez-Barahona, Co-Founder, Bitergia and Professor, University of Rey Juan Carlos:

“Over the years, I have witnessed how more and more people need to understand the complexities of modern software development. Bitergia was founded to address this need. Grimoire Lab is our second generation of tools, designed and implemented to be production-ready, and capable of running analytics on software projects of any size and any complexity.”

Jono Bacon, Founder of Jono Bacon Consulting, and Leading Community Strategist and Author:

“I am delighted to be able to participate in CHAOSS. Effective metrics, and the insights they provide, are key to the success of ecosystems and open source more broadly. Measuring community value, both tangible and intangible, is critical not merely for evolving our communities, but for understanding where we move the art and science of community strategy forward. I am passionate about the broader success of communities, and I am delighted to play a role on CHAOSS as one component of this work.”

Dr. Tom Mens, University of Mons (Belgium); Dr. Bram Adams, Polytechnique Montreal (Canada); and Dr. Josianne Marsan, Laval University (Canada):

“As principal investigators of the SECOHealth interdisciplinary research project on software ecosystem health, we are highly interested in working hand-in-hand with the CHAOSS project to enhance the capacity of developing useful tools and guidelines for software ecosystems and communities. The success of open source development is not only due to the technical excellence of individual projects and their developers, but especially due to the social collaboration and interaction of hundreds of projects into software ecosystems, such as the Debian/Ubuntu ecosystem (comprising upstream projects and the kernel), the npm ecosystem or the Android app ecosystem. What makes such ecosystems tick? Can one accurately measure the health of an ecosystem in order to predict (and avoid) disruptive events? We aim to define and evaluate health metrics for ecosystems inspired by ecological ecosystems of living organisms.”

Andrea Gallo, Vice President of Segment Groups, Linaro:

“Successful open source projects rely on open governance and continuous collaboration between skilled developers across the community, as opposed to unsupported and non-maintained free source code, or projects with closed governance. We are pleased to collaborate with the Linux Foundation on this project to define the relevant metrics and tools required to measure the health and success of open source community projects.”

Don Marti, Strategist, Mozilla:

“As a member of CHAOSS, Mozilla is committed to supporting research that will help maintainers pick the right open source metrics to focus on — metrics that will help open source projects make great software and provide a rewarding experience for contributors.”

Ildiko Vancsa, Board Member, OpenStack:

“Society is based on a rapidly growing amount of information being exchanged every second; being a part of this society makes data invaluable. Both in open source and commercial environments we cannot avoid measuring and analyzing different aspects such as community health, progress or diversity. Having a set of metrics with a common understanding about what they mean and what their context is provides a sizable challenge. I’m very excited to be part of the CHAOSS project where we have a publicly visible forum to find the common ground for these data points from definition through implementation to visualization.

Michael R. Cunningham, Executive Vice President and General Counsel, Red Hat:  

“The enormous expansion of open source provides great opportunity to developers and companies. Red Hat created Prospector with a view to better understanding and tracking how this innovation is being unleashed, and to provide insight on the speed of its growth and the health of individual projects. We are pleased at the enthusiasm these efforts have generated, and look forward to being part of CHAOSS.”  

Jonathan Lipps, Director of Open Source, Sauce Labs:

“Sauce Labs grew from and remains committed to open source projects and communities. As the instigator and primary contributor to the Appium open source project, we have a great need to understand how it is growing and what we can do to help it become sustainable for the long haul. The CHAOSS project (and its working group) have been pivotal in helping us to decide what to look at when it comes to open source metrics, and we are thrilled to see consensus forming around this formerly-murky aspect of OSS maintenance.”

Dr. Sean P. Goggins, Faculty of Computer Science, University of Missouri:

“Measuring performance in software development is a sociotechnical challenge. I am excited about participating in this process of identifying important project milestones and then figuring out how to represent them using the trace data from version control and other systems. I have been doing this work across measures and technologies that implement them for the past decade, and its invigorating to see a community effort to address the challenge more systematically.”

Dr. Matt Germonprez, Faculty of Information Science & Technology, University of Nebraska at Omaha:

“As a member of CHAOSS, I’m excited to be part of an effort aimed at understanding open source community health more deeply. Open source community health can mean many different things to many different people. Yet, I think there is an underlying way that metrics can be used to signal important characteristics such as communal diversity, project maturity, impact within an ecosystem, contributor rewards, and project risk. Within CHAOSS, we aim to develop implementation-agnostic metrics to which organization and community managers can better align their own goals with the dynamics and complexities of open source communities.”

Daniel M. German, Professor, Department of Computer Science, University of Victoria:

“Creating metrics is easy; creating metrics that are useful and meaningful is difficult, especially in the context of free and open source systems, where information is often incomplete. The ability to measure the health of a project will allow its stakeholders to understand and help improve its sustainability.”

Wayne Beaton, Director of Open Source Projects, The Eclipse Foundation:

“The Eclipse Foundation regards the principles of transparency and openness as critical to the success of open source software; the development of consistent means of gathering, interpreting, and disseminating metrics is an important part of this and we’re excited to be engaged in this effort.”