Imagine you have created an open source project that has become incredibly popular.  Thousands, if not millions, of developers worldwide, rely on the lines of code that you wrote. You have become an accidental hero of that community — people love your code, contribute to improving it, requesting new features, and encouraging others to use it. Life is amazing, but with great power and influence comes great responsibility.

When code is buggy, people complain. When performance issues crop up in large scale implementations, it needs to be addressed. When security vulnerabilities are discovered — because no code or its dependencies are always perfect — they need to be remediated quickly to keep your community safe.  

To help open source projects better address some of the responsibilities tied to security, many communities hosted by the Linux Foundation have invested countless hours, resources, and code into some important efforts. We’ve worked to improve the security of the Linux kernel, hosted Let’s Encrypt and sigstore, helped steward the ISO standardization for SPDX, and brought together a community building metrics for OSS health and risk through the CHAOSS project — among many others.

Today, we are taking steps with many leading organizations around the world to enhance the security of software supply chains. The Linux Foundation has raised $10 million in new investments to expand and support the Open Source Security Foundation (OpenSSF) and its initiatives. This cross-industry collaboration brings together an ecosystem to collectively identify and fix cybersecurity vulnerabilities in open source software and develop improved tooling, training, research, best practices, and vulnerability disclosure practices. We are also proud to announce that open source luminary, Brian Behlendorf, will serve the OpenSSF community as General Manager. 

Financial commitments for OpenSSF include Premier members such as AWS, Cisco, Dell Technologies, Ericsson, Facebook, Fidelity, GitHub, Google, IBM, Intel, JPMorgan Chase, Microsoft, Morgan Stanley, Oracle, Red Hat, Snyk, and VMware. Additional commitments come from General members, including Aiven, Anchore, Apiiro, AuriStor, Codethink, Cybertrust, Deepfence, Devgistics, DTCC, GitLab, Goldman Sachs, JFrog, Nutanix, StackHawk, Tencent, TideLift, and Wind River.

To learn more about how to join the OpenSSF or to get involved in one of its six working groups, listen in to this brief introduction from Brian Behlendorf recorded this week at KubeCon:

In 2021, the Linux Foundation and its community will continue to support education and share resources critical to improving open source cybersecurity.  For example, this week, we also hosted SupplyChainSecurityCon, where the SLSA and sigstore projects were heavily featured.

If you are an open source software developer, user, or other community participant who just wants to help further protect the software that accelerates innovation around the world, please consider joining one of our six OpenSSF working groups, or suggest a new working group that addresses gaps in software supply chain security needs.

You can follow the latest news from OpenSSF here on our blog, Twitter (@TheOpenSSF), and LinkedIn.

Background

The Academy Software Foundation (ASWF), a project hosted by The Linux Foundation, provides a neutral forum for open source software developers in the motion picture and broader media industries to share resources and collaborate on image creation, visual effects, animation, and sound technologies. 

It was created in 2018 after the conclusion of an investigation by the Academy of Motion Pictures Arts and Sciences (AMPAS) Science and Technology Council holding an 18-month investigation on the state of open source in the industry. This aligned with the need for a vendor-neutral foundation to provide a sustainable home for open source projects that are key to the growth of the industry.

Identifying the need for exemplar assets for community use

As of August 2021, The Academy Software Foundation provides a home for Open Shading Language, OpenColorIO, OpenCue, OpenEXR, OpenTimelineIO, OpenVDB, and MaterialX.

As these projects have progressed in development, there was a need identified to have production-grade digital assets (e.g.,3D scene data, images, image sequences, volumetric data, animation rigs, edit decision lists) available for use in development and testing environments to ensure these projects can scale to the demands of the movie and content creation processes. 

Furthermore, the ASWF identified an additional need to have production-grade assets for general research and learning purposes. 

The ASWF identified two objectives to address these requirements:

  • Provide a vendor-neutral home for both homing the assets and being a curator for exemplar assets that would align with the industry needs.
  • Create a licensing framework striking a balance between the needs in research, learning, and open source development, with the intellectual property concerns of production-grade assets (as they often come from real productions).

An open community comes together

There was some precedent in the industry, with the 2018 release of the Moana Island Scene by Disney Animation. This sparked several discussions in the industry on how to have a larger set of similar assets available for community use leading to the creation of an Asset Repository Working Group at the Academy Software Foundation in 2020.

The culmination of this working group came in July 2021, with the transition of the working group to a formal project that will establish the infrastructure and governance of the Assets Repository. The intention is for the project to function and work like any other open source project, with full transparency and community participation, to identify and curate exemplar assets. 

At the same time, the legal counsel across Academy Software Foundation members came together to align on the ASWF Digital Assets License, which was created in the spirit of licenses used previously in the industry and designed to specifically ensure these assets can be used for education, learning, research, and open source development. The ASWF Digital Assets License helped create a bridge between producers and consumers of these assets, establishing standardized terms to enable collaboration and the re-use of content in an industry where it had previously been limited.

As of August 2021, there is interest from multiple organizations in contributing assets to this repository as it takes form over the next few months.

Conclusion

The Linux Foundation has been the home for vendor-neutral collaboration in both horizontal technology spaces and vertical markets such as automotive, networking, energy, and here motion pictures. In supporting over 750 open source projects, we are starting to see more and more efforts such as these where the collaboration outside of traditional software development and into educational materials, community development, and standards. The Assets Repository project at the Academy Software Foundation is a great example of the unique collaboration opportunities that open source brings and are driven by our open communities.

We’re pleased to announce that Michael Cheng joined the Linux Foundation Board of Directors earlier this year. Michael is a product manager at Facebook, currently supporting open source and standards work across the company. Michael is a former network engineer and M&A attorney. He previously led the product, commercial, and intellectual property functions on Facebook’s M&A legal team.

Michael has built some of the world’s most valuable and innovative open source ecosystems, representing billions of dollars of value, including GraphQL, Magma, Diem, ML Commons, and many others.

In 2018, Michael helped design the Joint Development Foundation — a lightweight, turnkey solution for the development of technology standards and specifications. Michael then brought in GraphQL as the JDF’s first project. GraphQL now powers trillions of API calls every day for some of the world’s largest companies.

Michael Cheng

Michael was one of the founding members of ML Commons, an industry-wide consortium that aims to unlock the next stage of AI/ML adoption by creating useful measures of quality and performance, large-scale open data sets, and common development practices and resources. Michael served as ML Commons’ first treasurer, and it has since grown to more than 50 members and affiliates representing a broad cross-section of the ML ecosystem.

This year, Michael created the Magma Foundation, the first open source platform that enables telecom operators to build modern and efficient mobile networks at scale. Michael now chairs the board of the Magma Foundation — growing its ranks to more than 20 members this year.

Michael is also a champion of diversity. Late last year, at the height of the pandemic, Michael designed and launched the Major League Hacking (MLH) Fellowship program to address challenges faced by both early-career developers who saw many of their job and internship opportunities disappear open source maintainers struggling to keep projects afloat. The Fellowship has been effective at helping students land desirable jobs while increasing the aggregate health of the open source projects that participate in the program. Michael also launched the Black Developer Scholarship for developers who self-identify as Black or African diaspora to participate in the Fellowship.

Michael has also played an integral role in the creation of the Presto Foundation, eBPF Foundation, Ent Foundation, Reactive Foundation, Urban Computing Foundation, and OpenChain.

“Michael is one of the rare breeds of lawyers who possess both a strong technical background and a sharp mind for process improvement.  His leadership at Facebook has made a meaningful impact within the OpenChain project and beyond.  I warmly welcome him to the Linux Foundation board.”

Dave Marr, Vice President, Legal Counsel at Qualcomm Technologies

“Facebook is built on top of open source and has shown a strong commitment to investing back into the communities from which we all benefit. Micheal’s legal background and technical knowledge make him an ideal member of the Linux Foundation board. His leadership is just another example of Facebook’s commitment to open source and collective innovation.” 

Jim Zemlin, Executive Director, Linux Foundation

“Successful open source work requires an intersection of legal, business, technical, and community thinking and Michael brings all those skills in one very integrated way.  And his perspectives from his experience shepherding multiple open source projects at scale and in production is of great value to the Linux Foundation board. I am excited to welcome him to the board and to work with him on advancing open source innovation.” 

Nithya Ruff – Chair, Linux Foundation Board of Directors, Head, Comcast Open Source Program Office

“Michael’s role in growing some of the Linux Foundation’s most valuable communities cannot be understated. He brings a level of technical depth, legal acumen, and industry credibility that has been instrumental in stitching together novel coalitions of companies, NGOs, and individuals into dynamic and sustainable communities. We’re thrilled to have him on the board.”

Chris Aniszczyk, CTO, CNCF

“Michael’s talents, skills, and experience have been brought to bear at Facebook to transform the company’s identity in the open source software community. His leadership, vision and understanding of the importance of collaboration and the development of consensus in the legal and technical communities of important projects have made Facebook a key driver in open source.”

Keith Bergelt, CEO, Open Invention Network

Backed by many of the world’s largest companies for more than a decade, SPDX formally becomes an internationally recognized ISO/IEC JTC 1 standard during a transformational time for software and supply chain security

SAN FRANCISCO, September 9, 2021 – The Linux Foundation, Joint Development Foundation, and the SPDX community, today announced the Software Package Data Exchange® (SPDX®) specification has been published as ISO/IEC 5962:2021 and recognized as the international open standard for security, license compliance, and other software supply chain artifacts. ISO/IEC JTC 1 is an independent, non-governmental standards body. 

Intel, Microsoft, Siemens, Sony, Synopsys, VMware, and WindRiver are just a small sample of the companies already using SPDX to communicate Software Bill of Materials (SBOM) information in policies or tools to ensure compliant, secure development across global software supply chains. 

“SPDX plays an important role in building more trust and transparency in how software is created, distributed, and consumed throughout supply chains. The transition from a de-facto industry standard to a formal ISO/IEC JTC 1 standard positions SPDX for dramatically increased adoption in the global arena,” said Jim Zemlin, executive director, the Linux Foundation. “SPDX is now perfectly positioned to support international requirements for software security and integrity across the supply chain.” 

Between eighty and ninety percent (80%-90%) of a modern application is assembled from open source software components. An SBOM accounts for the software components contained in an application — open source, proprietary, or third-party — and details their provenance, license, and security attributes. SBOMs are used as a part of a foundational practice to track and trace components across software supply chains. SBOMs also help to proactively identify software issues and risks and establish a starting point for their remediation.

SPDX results from ten years of collaboration from representatives across industries, including the leading Software Composition Analysis (SCA) vendors – making it the most robust, mature, and adopted SBOM standard. 

“As new use cases have emerged in the software supply chain over the last decade, the SPDX community has demonstrated its ability to evolve and extend the standard to meet the latest requirements. This really represents the power of collaboration on work that benefits all industries,” said Kate Stewart, SPDX tech team co-lead. “SPDX will continue to evolve with open community input, and we invite everyone, including those with new use cases, to participate in SPDX’s evolution and securing the software supply chain.”  

For more information on how to participate in and benefit from SPDX, please visit: https://spdx.dev.

To learn more about how companies and open source projects are using SPDX, recordings from the “Building Cybersecurity into the Software Supply Chain” Town Hall that was held on August 18th are available and can be viewed at: https://events.linuxfoundation.org/supply-chain-town-hall/ 

ISO/IEC JTC 1 is an independent, non-governmental international organization based in Geneva, Switzerland. Its membership represents more than 165 national standards bodies with experts who share knowledge and develop voluntary, consensus-based, market-relevant international standards that support innovation and provide solutions to global challenges.

Supporting Comments

Intel

“Software security and trust are critical to our Industry’s success. Intel has been an early participant in the development of the SPDX specification and utilizes SPDX both internally and externally for a number of software use-cases,” said Melissa Evers, Vice President – Software and Advanced Technology Group, General Manager of Strategy to Execution, Intel.

Microsoft

“Microsoft has adopted SPDX as our SBOM format of choice for software we produce,” says Adrian Diglio, Principal Program Manager of Software Supply Chain Security at Microsoft. “SPDX SBOMs make it easy to produce U.S. Presidential Executive Order compliant SBOMs, and the direction that SPDX is taking with the design of their next gen schema will help further improve the security of the software supply chain.”

Siemens

“With ISO/IEC 5962:2021 we have the first official standard for metadata of software packages. It’s natural that SPDX is that standard, as it’s been the de facto standard for a decade. This will make license compliance in the supply chain much easier, especially because several open source tools like FOSSology, ORT, scancode, and sw360 already support SPDX,” said Oliver Fendt, senior manager, open source at Siemens. 

Sony

”The Sony team uses various approaches to managing open source compliance and governance,” says Hisashi Tamai, Senior Vice President, Deputy President of R&D Center, Representative of the Software Strategy Committee, Sony Group Corporation. “An example is the use of an OSS management template sheet that is based on SPDX Lite, a compact subset of the SPDX standard. It is important for teams to be able to quickly review the type, version, and requirements of software, and using a clear standard is a key part of this process.”

Synopsys

“The Black Duck team from Synopsys has been involved with SPDX since its inception, and I personally had the pleasure of coordinating the activities of the project’s leadership for more than a decade. Representatives from scores of companies have contributed to the important work of developing a standard way of describing and communicating the content of a software package,” said Phil Odence, General Manager, Black Duck Audits.

VMware

“SPDX is the essential common thread among tools under the Automating Compliance Tooling (ACT) Umbrella. SPDX enables tools written in different languages and for different software targets to achieve coherence and interoperability around SBOM production and consumption. SPDX is not just for compliance, either; the well-defined and ever-evolving spec is also able to represent security and supply chain implications. This is incredibly important for the growing community of SBOM tools as they aim to thoroughly represent the intricacies of modern software,” said Rose Judge, ACT TAC Chair and open source engineer at VMware.

Wind River

“The SPDX format greatly facilitates the sharing of software component data across the supply chain. Wind River has been providing a Software Bill of Materials (SBOM) to its customers using the SPDX format for the past 8 years. Often customers will request SBOM data in a custom format. Standardizing on SPDX has enabled us to deliver a higher quality SBOM at a lower cost,” said Mark Gisi, Wind River Open Source Program Office Director and OpenChain Specification Chair.

About SPDX

SPDX is an open standard for communicating software bill of material information, including provenance, license, security, and other related information. SPDX reduces redundant work by providing common formats for organizations and communities to share important data, thereby streamlining and improving compliance, security, and dependability. For more information, please visit us at spdx.org.

###

The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our trademark usage page:  https://www.linuxfoundation.org/trademark-usage. Linux is a registered trademark of Linus Torvalds.

Media Contact

Jennifer Cloer

for the Linux Foundation

503-867-2304

jennifer@storychangesculture.com

Today, the Linux Foundation announced that Ent, an entity framework for Go that was developed and open sourced by Facebook in 2019, has moved under the governance of the Linux Foundation to help accelerate its development and foster the community of developers and companies using it.

Ent was designed to enable developers to work on complex backend applications. Developers working on these applications faced the challenge of maintaining a codebase used to manage hundreds of different entity types with numerous, complex relationships between them. Ent uses graph concepts to model an application’s schema and employs advanced code-generation techniques to create type-safe, efficient code that greatly simplifies working with databases compared to other approaches.

Ent is similar to traditional ORMs (Object-Relational Mappers) but takes an opinionated approach that is especially effective in improving developer productivity. 

  • First, schemas are modeled in graph concepts (nodes and edges) instead of the more common table-oriented method that makes traversing through datasets and expressing complex queries easier and less error-prone. 
  • Second, the code generated by Ent is completely type-safe, which means that many classes of common bugs are caught very early on in the development process. In addition, code editing software can understand Ent code very well to offer developers useful hints and feedback as they are typing code. 
  • Finally, schemas are defined in actual Go code, which facilitates a very rich feature set ranging from integrations with observability systems to the definition of privacy (authorization) rules right at the data-access layer. 

“From the start it was obvious that Ent would present a unique and compelling value proposition to a diverse range of use cases across any industry with complex technology stacks,” said Ariel Mashraki, Ent’s creator and lead maintainer. “The promise of collaborating with a broad coalition of users was the main reason we open-sourced Ent.” 

Since it was open-sourced in 2019, engineers from many leading companies have contributed code to Ent, including Facebook, GitHub, Mail.ru, Scaleway and VirtaHealth. Ent has also been used by the CNCF projects and by other open source ecosystems. Ariel Mashraki recently started a new company, Ariga, to create a data fabric solutions provider that is built on Ent. “With the move to the Linux Foundation’s neutral governance model, we (on behalf of myself and the rest of the Ent maintainers) hope to double-down on growing Ent into the industry standard for data-access in Go. You should expect to see a lot of exciting developments in the next six months from the community and we invite all to participate,” said Mashraki.

Ent is just the latest in a variety of technologies that Facebook has first open sourced to the public and then transferred control to the community. “This additional step of enabling open source contributors to take direct ownership of a project’s technical vision is part of our longstanding commitment to open and sustainable innovation,” said Michael Cheng, product manager at Facebook. “Enabling a project’s maintainers to chart their course often sparks additional investment, contributions and new companies building products and platforms based on that project, for example, GraphQL, Presto, ONNX, and Magma, to name a few. We see that Ent is already following a similar pattern and we’ll be cheering on the Ent community as it enters this next stage of exciting growth.”


You can learn more about Ent framework for Go, sample the technology, and contribute back to the project at https://github.com/ent/ent.

Open source software (OSS) is vitally important to the functioning of society today; it underpins much of the global economy. However, some OSS is highly secure, while others are not as secure as they need to be.

By its very nature, open source enables worldwide peer review, yet while its transparency has the potential for enhanced software security, that potential isn’t always realized. Many people are working to improve things where it’s needed. Most of that work is done by volunteers or organizations outside the Linux Foundation (LF) who directly pay people to do the work (typically as employees). Often those people work together within a foundation that’s part of the Linux Foundation. Sometimes, however, the LF or an LF foundation/project (e.g., a fund) directly funds people to do security work.

At the Linux Foundation (LF), I have the privilege of overseeing focused work to improve OSS security by the very people paid to do it. This work is funded through various grants and foundations, with credits to organizations like Google, Microsoft, the Open Source Security Foundation (OpenSSF), the LF Public Health foundation, and the LF itself.

The LF and its foundations do much more that I don’t oversee, so I’ve only listed the ones I am personally involved with in the interest of brevity. I hope it will give you a sense of some of the things we’re doing that you might not know about otherwise.

The typical LF oversight process for this work is described in “Post-Approval LF Security Funding.” Generally, performers must provide a periodic summary of their work so they can get paid. Most of those summaries are public, and in those cases, it’s easy for others to learn about their interesting work!

Here’s a sample of the work I oversee:

  • Ariadne Conill is improving Alpine Linux security, including significant improvements to its vulnerability processing and making it reproducible. For example, as noted in the July 2021 report, this resulted in Alpine 3.14 being released with the lowest open vulnerability count in the final release in a long time. Alpine Linux’s security is important because many containers use it. For more information, see “Bits relating to Alpine security initiatives in June” and “Bits relating to Alpine security initiatives in July.”
  • kpcyrd is doing a lot of reproducible build work on Linux distributions, especially Alpine Linux (including on the Raspberry Pi) and Arch Linux. Reproducible builds are a strong countermeasure against build system attacks (such as the devastating attack on SolarWinds Orion). More than half of the currently unreproducible packages in Arch Linux have now been reviewed and classified.
  • David Huseby has been working on modifying git to have a much more flexible cryptographic signing infrastructure. This will make it easier to verify the integrity of software source code; git is widely used to manage source code.
  • Theo de Raadt has also been receiving funding to secure the critical “plumbing” behind modern communications infrastructure:
    • This funding is being used towards improving OpenSSH (a widely-used tool whose security is critical). These include various smaller improvements, an updated configuration file parser, and a transition to using the SFTP protocol rather than the older RCP protocol inside the scp(1) program.
    • It is also being used to improve rpki-client, implementing Resource Public Key Infrastructure (RPKI). RPKI is an important protocol for protecting the Internet’s routing protocols from attack. These improvements implement the RPKI Repository Delta Protocol (RRDP) data transfer protocol and fix various edge cases (e.g., through additional validation checks). The https://irrexplorer.nlnog.net/ service is even using rpki-client behind the scenes.
  • Nathan Chancellor is improving the Linux kernel’s ability to be compiled with clang (instead of just gcc). This includes eliminating warning messages from clang (which helps to reduce kernel bugs even when gcc is used) and fixing/extending the clang compiler (which helps clang users when compiling code other than the Linux kernel). Unsurprisingly this involves changing both the Linux kernel and the clang/LLVM compiler infrastructure, and sometimes other software as well.
    • In the long run, eliminating warnings that by themselves aren’t bugs is important; developers will ignore warnings if there are many irrelevant ones, but if there are only a few warnings, they’ll examine them (making warnings more useful).
    • Of notable mention for security implications is clang support for Control-Flow Integrity (CFI); this can counter many attacks on arm64, and work will eventually enable x86_64 support.
  • I oversee some security audits conducted via the Open Source Technology Improvement Fund (OSTIF) when funded through the LF. We (the LF) often work with OSTIF to conduct security audits. We work with OSTIF to define the audit scope, and then OSTIF runs a bidding process where qualified security audit firms propose to do the work. We then work with OSTIF to select the winner (who isn’t always the cheapest — we want good work, not a box-check). OSTIF & I then oversee the process and review the final result. 
    • Note that we don’t just want to do audits, we also want to fix or mitigate any critical issues the audits identify, but the audits help us find the key problems. Subject matter experts perform the audit reports, and handling bidding is OSTIF’s primary focus, so my main contribution is usually to help ensure these reports are clear to non-experts while still being accurate. Experts sometimes forget to explain their context and jargon, and it’s sometimes hard to fix that (you must know the terminology & technology to explain it).
    • This work included two security audits related to the Linux kernel, one for signing and key management policies and the other for vulnerability reporting and remediation. 
    • I’ve also overseen audits of the exposure notification applications COVID Shield and COVID Green: 
    • It’s not part of my oversight of OSTIF on behalf of the LF, but I also informally talk with OSTIF about other OSS they’re auditing (such as flux2, lodash, jackson-core, jackson-databind, httpcomponents-core, httpcomponents-client, laravel, and slf4j). A little coordination and advice-sharing among experts can make everything better.

The future is hard to predict, but we anticipate that we will be doing more. In late July, the OpenSSF Technical Advisory Council (TAC) recommended approving funding for a security audit of (part of) Symfony, a widely-used web framework. The OpenSSF Governing Board (GB) approved this on 2021-08-05 and I expect OSTIF will soon take bids on it.

The OpenSSF is also taking steps to raise more money via membership dues (this was delayed due to COVID; starting a new foundation is harder during a pandemic). Once the OpenSSF has more money, we expect they’ll be funding a lot more work to identify critical projects, do security audits, fix problems, and improve or create projects to enhance OSS security. The future looks bright.

Please remember that this is only a small part of ongoing work to improve OSS security. Almost all LF projects need to be secure, so most foundations’ projects include security efforts not listed here. As noted earlier, most development work is done by volunteers or by non-LF organizations directly paying people to do the work (typically employees). 

The OpenSSF has several working groups and many projects where people are working together to improve OSS security. These include free courses on how to develop secure software and the CII Best Practices badge project. We (at the LF) also have many other projects working to improve OSS security. For example, sigstore is making cryptographic signatures much easier; sigstore’s “cosign” tool just released its version 1.0. Many organizations have recently become interested in software bill-of-materials (SBOMs), and we’ve been working on SBOMs for a long time.

If you or your organization would like to fund focused work on improving OSS security, please reach out! You can contribute to the OpenSSF (in general or as a directed fund); just contact them (e.g., Microsoft contributed to OpenSSF in December 2020). If you’d prefer, you can create a grant directly with the Linux Foundation itself — just email me at <dwheeler@linuxfoundation.org> if you have questions. For smaller amounts, say to fund a specific project, you can also consider using the LFX crowdfunding tools to fund or request funding. Many people & organizations struggle to pay individual OSS developers because of the need to handle taxes and oversight. If that’s your concern, talk to us. The LF has experience & processes to do all that, letting experts focus on getting the work done.

My sincere thanks to all the performers for their important work and to all the funders for their confidence in us!

About the author: David A. Wheeler is Director of Open Source Supply Chain Security for The Linux Foundation.

One of the greatest strengths of open source development is how it enables collaboration across the entire world. However, because open source development is a global activity, it necessarily involves making available software across national boundaries. Some countries’ export control regulations, such as the United States, may require taking additional steps to ensure that an open source project is satisfying obligations under local laws.

In July of 2020, The Linux Foundation published a whitepaper on how to address these issues in detail, which can be downloaded here. In 2021, the primary update in the paper is to reflect a change in the US Export Administration Regulations.

  • Previously, in order for publicly available encryption software under ECCN 5D002 to be not subject to the EAR, email notifications were required regardless of whether or not the cryptography it implemented was standardized.
  • Following the change, email notifications are only required for software that implements “non-standard cryptography”.

Please see the updated paper and the EAR for more specific details about this change.

A Software Bill of Materials (SBOM) is a complete, formally structured list of components, libraries, and modules required to build (i.e., compile and link) a given piece of software and the supply chain relationships between them. These components can be open source or proprietary, free or paid, and widely available or restricted access. SBOMs that can be shared without friction between teams and companies are a core part of software management for critical industries and digital infrastructure in the coming decades.

SBOMs are especially critical for a national digital infrastructure used within government agencies and in critical industries that present national security risks if penetrated. SBOMs would improve understanding of those software components’ operational and cyber risks from their originating supply chain.

This SBOM readiness survey is the Linux Foundation’s first project addressing how to secure the software supply chain. The foundation of this project is a worldwide survey of IT professionals who understand their organization’s approach to software development, procurement, compliance, or security.  Organizations surveyed will include both software producers and consumers. An important driver for this survey is the recent Executive Order on Cybersecurity, which focuses on producing and consuming SBOMs.

The objectives of the survey are as follows:

  • How concerned are organizations about software security?
  • How familiar are organizations with SBOMs?
  • How ready are organizations to consume and produce SBOMs?
  • What is your commitment to the timeline for addressing SBOMs?
  • What benefits do you expect to derive from SBOMs?
  • What concerns you about SBOMs?
  • What capabilities are needed in SBOMs?
  • What do organizations need to improve their SBOM operability?
  • How important are SBOMS relative to other ways to secure the software supply chain?

Data from this survey will enable the development of a maturity model that will focus on how the increasing value provided by SBOMs as organizations build out their SBOM capabilities.

The survey is available in seven languages:

  • English
  • Chinese
  • Japanese
  • Korean
  • German
  • French
  • Russian

To take the 2021 State of SBOM Readiness Survey, click the button for your desired language/region below:

BONUS

As a thank-you for your participation, you will receive a 20% registration discount to attend the Open Source Summit/Embedded Linux Conference event upon completion of the survey. Please note this discount is not transferable, and may not be combined with other offers.

PRIVACY

Personally identifiable information will not be published. Reviews are attributed to your role, company size, and industry. Responses will be subject to the Linux Foundation’s Privacy Policy, available at https://linuxfoundation.org/privacy. 

VISIBILITY

We will summarize the survey data and share the findings at the Open Source Summit/Embedded Linux Conference in September.

QUESTIONS

If you have questions regarding this survey, please email us at research@linuxfoundation.org. 

The Linux Foundation is pleased to announce the release of the CDLA-Permissive-2.0 license agreement, which is now available on the CDLA website at https://cdla.dev/permissive-2-0/. We believe that CDLA-Permissive-2.0 will meet a genuine need for a short, simple, and broadly permissive license agreement to enable wider sharing and usage of open data, particularly to bring clarity to the use of open data for artificial intelligence and machine learning models. 

We’re happy to announce that IBM and Microsoft are making data sets available today using CDLA-Permissive-2.0.

In this blog post, we’ll share some background about the original versions of the Community Data License Agreement (CDLA), why we worked with the community to develop the new CDLA-Permissive-2.0 agreement, and why we think it will benefit producers, users, and redistributors of open data sets.

Background: Why would you need an open data license agreement?

Licenses and license agreements are legal documents that define how content can be used, modified, and shared. They operate within the legal frameworks for copyrights, patents, and other rights that are established by laws and regulations around the world. These laws and regulations are not always clear and are not always in sync with one another.

Decades of practice have established a collection of open source software licenses and open content licenses that are widely used. These licenses typically work within the frameworks established by laws and regulations mentioned above to permit broad use, modification, and sharing of software and other copyrightable content in exchange for following the license requirements.

Open data is different. Various laws and regulations treat data differently from software or other creative content. Depending on what the data is and which country’s laws you’re looking at, the data often may not be subject to copyright protection, or it might be subject to different laws specific to databases, i.e., sui generis database rights in the European Union. 

Additionally, data may be consumed, transformed, and incorporated into Artificial Intelligence (AI) and Machine Learning (ML) models in ways that are different from how software and other creative content are used. Because of all of this, assumptions made in commonly-used licenses for software and creative content might not apply in expected ways to open data.

Choice is often a good thing, but too many choices can be problematic. To be clear, there are other licenses in use today for open data use cases. In particular, licenses and instruments from Creative Commons (such as CC-BY-4.0 and CC0-1.0) are used to share data sets and creative content. It was also important in drafting the CDLA agreements to enable collaboration with similar licenses. The CDLA agreements are in no way meant as a criticism of those alternatives, but rather the CDLA agreements are focused on addressing newer concerns born out of AI and ML use cases. AI and ML models generated from open data are the primary use case organizations have struggled with — CDLA was designed to address those concerns. Our goal was to strike a balance between updated choices and too many options.

First steps: CDLA version 1.0

Several years ago, in talking with members of the Linux Foundation member counsel community, we began collaborating to develop a license agreement that would clearly enable use, modification, and open data sharing, with a particular eye to AI and ML applications.

In October 2017, The Linux Foundation launched version 1.0 of the CDLA. The CDLA was intended to provide clear and explicit rights for recipients of data under CDLA to use, share and modify the data for any purpose. Importantly, it also explicitly permitted using the results from analyzed data to create AI and ML models, without any of the obligations that apply under the CDLA to sharing the data itself. It was launched with two initial types: a Permissive variant, with attribution-style obligations, and a Sharing variant, with a “copyleft”-style reciprocal commitment when resharing the raw data.

The CDLA-Permissive-1.0 agreement saw some amount of uptake and use. However, subsequent feedback revealed that some potential licensors and users of data under the CDLA-Permissive-1.0 agreement found it to be overly complex for non-lawyers to use. Many of its provisions were targeted at addressing specific and nuanced considerations for open data under various legal frameworks. While these considerations were worthwhile, we saw that communities may balance that specificity and clarity against the value of a concise set of easily comprehensible terms to lawyers and non-lawyers alike.

Partly in response to this, in 2019, Microsoft launched the Open Use of Data Agreement (O-UDA-1.0) to provide a more concise and simplified set of terms around the sharing and use of data for similar purposes. Microsoft graciously contributed stewardship of the O-UDA-1.0 to the CDLA effort. Given the overlapping scope of the O-UDA-1.0 and the CDLA-Permissive-1.0, we saw an opportunity to converge on a new draft for a CDLA-Permissive-2.0. 

Moving to version 2.0: Simplifying, clarifying, and making it easier

Following conversations with various stakeholders and after a review and feedback period with the Linux Foundation Member Counsel community, we have prepared and released CDLA-Permissive-2.0

In response to perceptions of CDLA-Permissive-1.0 as overly complex, CDLA-Permissive-2.0 is short and uses plain language to express the grant of permissions and requirements. Like version 1.0, the version 2.0 agreement maintains the clear rights to use, share and modify the data, as well as to use without restriction any “Results” generated through computational analysis of the data.

Unlike version 1.0, the new CDLA-Permissive-2.0 is less than a page in length.

  • The only obligation it imposes when sharing data is to “make available the text of this agreement with the shared Data,” including the disclaimer of warranties and liability. 

In a sense, you might compare its general “character” to that of the simpler permissive open source licenses, such as the MIT or BSD-2-Clause licenses, albeit specific to data (and with even more limited obligations).

One key point of feedback from users of the license and lawyers from organizations involved in Open Data were the challenges involved with associating attribution information with data (or versions of data sets). 

Although “attribution-style” provisions may be common in permissive open source software licenses, there was feedback that:

  • As data technologies continue to evolve beyond what the CDLA drafters might anticipate today, it is unclear whether typical ways of sharing attributions for open source software will fit well with open data sharing. 
  • Removing this as a mandated requirement was seen as preferable.

Recipients of Data under CDLA-Permissive-2.0 may still choose to provide attribution about the data sources. Attribution will often be important for appropriate norms in communities, and understanding its origination source is often a key aspect of why an open data set will have value. The CDLA-Permissive-2.0 simply does not make it a condition of sharing data.

CDLA-Permissive-2.0 also removes some of the more confusing terms that we’ve learned were just simply unnecessary or not useful in the context of an open data collaboration. Removing these terms enables the CDLA-Permissive-2.0 to present the terms in a concise, easy to read format that we believe will be appreciated by data scientists, AI/ML users, lawyers, and users around the world where English is not a first language.

We hope and anticipate that open data communities will find it easy to adopt it for releases of their own data sets.

Voices from the Community

“The open source licensing and collaboration model has made AI accessible to everyone, and formalized a two-way street for organizations to use and contribute to projects with others helping accelerate applied AI research. CDLA-Permissive-2.0 is a major milestone in achieving that type of success in the Data domain, providing an open source license specific to data that enables access, sharing and using data among individuals and organizations. The LF AI & Data community appreciates the clarity and simplicity CDLA-Permissive-2.0 provides.” Dr. Ibrahim Haddad, Executive Director of LF AI & Data 

“We appreciate the simplicity of the CDLA-Permissive-2.0, and we appreciate the community ensuring compatibility with Creative Commons licensed data sets.” Catherine Stihler, CEO of Creative Commons

“IBM has been at the forefront of innovation in open data sets for some time and as a founding member of the Community Data License Agreement. We have created a rich collection of open data sets on our Data Asset eXchange that will now utilize the new CDLAv2, including the recent addition of CodeNet – a 14-million-sample dataset to develop machine learning models that can help in programming tasks.” Ruchir Puri, IBM Fellow, Chief Scientist, IBM Research

“Sharing and collaborating with open data should be painless – and sharing agreements should be easy to understand and apply. We applaud the clear and understandable approach in the new CDLA-Permissive-2.0 agreement.” Jennifer Yokoyama, Vice President and Chief IP Counsel, Microsoft

“It’s exciting to see communities of legal and AI/ML experts come together to work on cross-organizational challenges to develop a framework to support data collaboration and sharing.” Nithya Ruff, Chair of the Board, The Linux Foundation and Executive Director, Open Source Program Office, Comcast

“Data is an essential component of how companies build their operations today, particularly around Open Data sets that are available for public use. At OpenUK, we welcome the CDLA-Permissive-2.0 license as a tool to make Open Data more available and more manageable over time, which will be key to addressing the challenges that organisations have coming up. This new approach will make it easier to collaborate around Open Data and we hope to use it in our upcoming work in this space.” Amanda Brock, CEO of OpenUK

“Verizon supports community efforts to develop clear and scalable solutions to legal issues around building artificial intelligence and machine learning, and we welcome the CDLA-Permissive-2.0 as a mechanism for data providers and software developers to work together in building new technology.” Meghna Sinha, VP – AI Center, Verizon

“Sony believes that the spread of clear and simple Open Data licenses like CDLA-2.0 activates Open Data ecosystem and contributes to innovation with AI. We support CDLA’s effort and hope CDLA will be used widely.” Hisashi Tamai, SVP, Sony Group Corporation

Data Sets Available under CDLA-Permissive-2.0

With today’s release of CDLA-Permissive-2.0, we are also pleased to announce several data sets that are now available under the new agreement. 

The IBM Center for Open Source Data and AI Technologies (CODAIT) will begin to re-license its public datasets hosted here using the CDLA-Permissive 2.0, starting with Project CodeNet, a large-scale dataset with 14 million code samples developed to drive algorithmic innovations in AI for code tasks like code translation, code similarity, code classification, and code search.

Microsoft Research is announcing that the following data sets are now being made available under CDLA-Permissive-2.0:

  • The Hippocorpus dataset, which comprises diary-like short stories about recalled and imagined events to help examine the cognitive processes of remembering and imagining and their traces in language;
  • The Public Perception of Artificial Intelligence data set, comprising analyses of text corpora over time to reveal trends in beliefs, interest, and sentiment about a topic;
  • The Xbox Avatars Descriptions data set, a corpus of descriptions of Xbox avatars created by actual gamers;         
  • A Dual Word Embeddings data set, trained on Bing queries, to facilitate information retrieval about documents; and
  • A GPS Trajectory data set, containing 17,621 trajectories with a total distance of about 1.2 million kilometers and a total duration of 48,000+ hours.

Next Steps and Resources

If you’re interested in learning more, please check out the following resources: