The Unique Challenges of Open Data Projects: Lessons From Overture Maps Foundation
Marc Prioleau | 13 January 2025
By Marc Prioleau, Executive Director of Overture Maps Foundation
In the open source world, we often draw parallels between various collaborative projects. However, as we've discovered through our work at the Overture Maps Foundation (Overture), open data presents a unique and increasingly relevant set of challenges that set them apart from traditional open source software initiatives, requiring fresh approaches and solutions.
As data becomes the backbone of modern technologies—powering everything from AI models to digital infrastructure—the need for well-managed, accessible, and high-quality open data has never been more critical. While open data projects can leverage the decades of experience inside the Linux Foundation in building open source software communities, the emerging field of open data presents novel considerations that deserve careful attention.
At Overture, we envision a world where the best maps are not proprietary assets but shared, open resources, built and enriched by the very people who rely on them. Achieving this vision, however, means overcoming key differences between open data and open source software projects. By addressing these complexities and encouraging collaboration, we aim to inspire new approaches to open data within the Linux Foundation community and beyond to advance the future of open data.
The Complex Nature of Map Data
Maps are hard—and the hard part is the data. A modern digital map consists of multiple intricate layers: land cover, addresses, divisions, buildings, geographic names, transportation networks, and more. Each layer represents a complex dataset, which requires accurately capturing, maintaining, and updating as the physical world changes. Our experience with these challenges offers valuable lessons for any organization working with open data at scale.
Key Differences Between Open Data and Open Source Code
Data Origins: Proprietary To Open
Unlike open source software, where contributions often start as open from inception, open data usually begins with proprietary sources. For example, proprietary aerial imagery can generate open map data through computer vision. This creates a unique dynamic that raises several challenges:
- Incentivizing proprietary data holders to contribute
- Developing appropriate licensing frameworks for derivative open data
- Maintaining ongoing relationships with data providers to ensure regular updates
To address these challenges, Overture introduced the "Contributors Club," which provides exclusive benefits—such as access to unique data insights—to incentivize participation while ensuring the sustainability of open data initiatives.
License Complexity
While open source software projects typically operate under a single license, open data projects must navigate a patchwork of licenses. Our upstream data sources come with various licenses including share-alike provisions, attribution requirements, and custom agreements. This complexity underscores the need for:
- Strategic license selection
- Dedicated legal expertise to ensure compliance and compatibility
In map data, the sources are fragmented, making it difficult to align all the parties around commonly used open licenses. As open data evolves, a broader knowledge of best practices can emerge and be communicated through organizations like the Linux Foundation.
The Scale and Cost of Data
Data, particularly geospatial data, is massive. Our November 2024 release alone was approximately half a terabyte. This scale presents significant challenges:
- Storage and compute costs can run into millions of dollars annually.
- Regular processing of large datasets requires substantial infrastructure.
- Weekly update cycles demand robust technical infrastructure.
As we like to say, "Open data is free...like a free puppy." The ongoing maintenance and infrastructure costs make sustainable funding crucial for open data projects. This implies that open data projects need to have stable sources of funding to be sustainable.
Data Requires a Factory Approach
Map data requires a continuous production approach instead of incremental development, including:
- Regular updates to reflect real-world changes
- Consistent quality assurance and validation processes
- Conflation of multiple data sources into cohesive datasets
- Industry-grade release management
This "factory" model demands different organizational structures and workflows compared to traditional open source projects. This can create significant efficiencies since the work can be done once rather redundantly by many players in the ecosystem. At Overture, we’ve built workflows that operate like a data “factory,” ensuring consistent quality and timely updates to meet user expectations. In addition, members of open data projects may want the project to assume responsibility for some of the production roles allowing the member resources to focus on new data creation and architecture.
Quality Assurance
Data describes something real. Therefore, accuracy (or an estimate of accuracy) is important when using data. Unlike code, where competing versions can coexist, open data projects must resolve conflicting data to avoid disseminating inaccuracies. This requires proactive quality standards and processes to evaluate data contributions.
Managing Personal Identifying Information
Contributed data may contain personal identifying information (PII), posing significant ethical and legal risks. Open data projects must have clear policies to manage PII and ensure that future datasets are free from such issues.
Open Data Projects Also Build Code
Running an open data project often involves substantial coding efforts. From pipelines to schema matching and provenance tracking, code plays a vital role in the workflows that sustain open data initiatives.
The Future: Where Code Meets Data
AI is reshaping our digital landscape, blurring the lines between open data and open source code. AI systems rely on vast datasets for training, making high-quality open data critical for innovation.
At Overture, we know that running an open data project requires elements of both worlds:
- Code for data pipelines and processing
- Data management and quality control
- Community engagement and contribution frameworks
- Sustainable funding models
Moving Forward
The open data movement represents an exciting frontier in collaborative innovation, but it requires its own playbook. As we continue to build and maintain open map data at Overture, we're developing new models for collaboration, data sharing, and sustainable operations.
The lessons we're learning extend beyond mapping; they apply to any organization working with open data at scale. As AI and machine learning transform our digital landscape, the importance of well-managed, accessible, and high-quality open data will only grow. We invite the Linux Foundation community to join us in advancing this exciting frontier.
Here are a few ways you can contribute:
- Share your expertise: Do you have experience with licensing, data infrastructure, or managing collaborative projects? Join the conversation and share your insights to help shape best practices for open data.
- Collaborate on solutions: Partner with us to develop innovative models for data sharing, sustainability, and quality assurance.
- Develop and promote best practices: This is especially critical on licenses which can streamline data integration if done well.
- Spread the word: Help us raise awareness about the importance of open data by sharing this blog with your network.
- Get involved: Visit Overture Maps Foundation to learn more about our work, and join our community by becoming a member.
- Stay connected: Follow us on Linkedin and X for the latest updates and discussions. Sign up for Overture’s monthly newsletter and be part of a growing community.
- Propose a discussion or session: Have an idea for a panel or workshop about open data? Let’s collaborate to bring this topic to Linux Foundation events.
Conclusion:
The open data movement is an exciting frontier for collaborative innovation and requires a playbook distinct from open source software. At Overture, we’re learning, adapting, and sharing as we go. By working together, we can unlock the full potential of open data and shape a more collaborative future.
The basis of this post is a presentation by Marc Prioleau, executive director of Overture Maps Foundation. For more information about Overture Maps Foundation and our work, visit overturemaps.org.
Similar Articles
Browse Categories
Cloud Computing Compliance and Security Open Source Projects 2024 LF Research Linux How-To Blog Open Source Ecosystem and Governance Diversity & Inclusion Research Data, AI, and Analytics Newsletter linux blog Training and Certification Linux Cross Technology software development Cloud Native Computing Foundation cybersecurity Announcements Decentralized Technology Legal OpenSearch Sustainability and Green Initiatives cloud native generative AI industries lf events Finance and Business Technology Networking and Edge cncf AI/ML Emerging Technology Health and Public Sector Interoperability Kubernetes Topic: Security Web Application & Development amazon web services aws community tools confidential computing challenges decentralized AI decentralized computing eBPF funding innovation investment japan spotlight kernel learning lg blog license compliance open standards openssf ospo research survey skills development state of open source tech talent transformation