By Marc Prioleau, Executive Director of Overture Maps Foundation
In the open source world, we often draw parallels between various collaborative projects. However, as we've discovered through our work at the Overture Maps Foundation (Overture), open data presents a unique and increasingly relevant set of challenges that set them apart from traditional open source software initiatives, requiring fresh approaches and solutions.
As data becomes the backbone of modern technologies—powering everything from AI models to digital infrastructure—the need for well-managed, accessible, and high-quality open data has never been more critical. While open data projects can leverage the decades of experience inside the Linux Foundation in building open source software communities, the emerging field of open data presents novel considerations that deserve careful attention.
At Overture, we envision a world where the best maps are not proprietary assets but shared, open resources, built and enriched by the very people who rely on them. Achieving this vision, however, means overcoming key differences between open data and open source software projects. By addressing these complexities and encouraging collaboration, we aim to inspire new approaches to open data within the Linux Foundation community and beyond to advance the future of open data.
Maps are hard—and the hard part is the data. A modern digital map consists of multiple intricate layers: land cover, addresses, divisions, buildings, geographic names, transportation networks, and more. Each layer represents a complex dataset, which requires accurately capturing, maintaining, and updating as the physical world changes. Our experience with these challenges offers valuable lessons for any organization working with open data at scale.
Data Origins: Proprietary To Open
Unlike open source software, where contributions often start as open from inception, open data usually begins with proprietary sources. For example, proprietary aerial imagery can generate open map data through computer vision. This creates a unique dynamic that raises several challenges:
To address these challenges, Overture introduced the "Contributors Club," which provides exclusive benefits—such as access to unique data insights—to incentivize participation while ensuring the sustainability of open data initiatives.
License Complexity
While open source software projects typically operate under a single license, open data projects must navigate a patchwork of licenses. Our upstream data sources come with various licenses including share-alike provisions, attribution requirements, and custom agreements. This complexity underscores the need for:
In map data, the sources are fragmented, making it difficult to align all the parties around commonly used open licenses. As open data evolves, a broader knowledge of best practices can emerge and be communicated through organizations like the Linux Foundation.
The Scale and Cost of Data
Data, particularly geospatial data, is massive. Our November 2024 release alone was approximately half a terabyte. This scale presents significant challenges:
As we like to say, "Open data is free...like a free puppy." The ongoing maintenance and infrastructure costs make sustainable funding crucial for open data projects. This implies that open data projects need to have stable sources of funding to be sustainable.
Data Requires a Factory Approach
Map data requires a continuous production approach instead of incremental development, including:
This "factory" model demands different organizational structures and workflows compared to traditional open source projects. This can create significant efficiencies since the work can be done once rather redundantly by many players in the ecosystem. At Overture, we’ve built workflows that operate like a data “factory,” ensuring consistent quality and timely updates to meet user expectations. In addition, members of open data projects may want the project to assume responsibility for some of the production roles allowing the member resources to focus on new data creation and architecture.
Quality Assurance
Data describes something real. Therefore, accuracy (or an estimate of accuracy) is important when using data. Unlike code, where competing versions can coexist, open data projects must resolve conflicting data to avoid disseminating inaccuracies. This requires proactive quality standards and processes to evaluate data contributions.
Managing Personal Identifying Information
Contributed data may contain personal identifying information (PII), posing significant ethical and legal risks. Open data projects must have clear policies to manage PII and ensure that future datasets are free from such issues.
Open Data Projects Also Build Code
Running an open data project often involves substantial coding efforts. From pipelines to schema matching and provenance tracking, code plays a vital role in the workflows that sustain open data initiatives.
The Future: Where Code Meets Data
AI is reshaping our digital landscape, blurring the lines between open data and open source code. AI systems rely on vast datasets for training, making high-quality open data critical for innovation.
At Overture, we know that running an open data project requires elements of both worlds:
The open data movement represents an exciting frontier in collaborative innovation, but it requires its own playbook. As we continue to build and maintain open map data at Overture, we're developing new models for collaboration, data sharing, and sustainable operations.
The lessons we're learning extend beyond mapping; they apply to any organization working with open data at scale. As AI and machine learning transform our digital landscape, the importance of well-managed, accessible, and high-quality open data will only grow. We invite the Linux Foundation community to join us in advancing this exciting frontier.
Here are a few ways you can contribute:
The open data movement is an exciting frontier for collaborative innovation and requires a playbook distinct from open source software. At Overture, we’re learning, adapting, and sharing as we go. By working together, we can unlock the full potential of open data and shape a more collaborative future.
The basis of this post is a presentation by Marc Prioleau, executive director of Overture Maps Foundation. For more information about Overture Maps Foundation and our work, visit overturemaps.org.