Author Archives: Google

Supply chain security for Go, Part 1: Vulnerability management

High profile open source vulnerabilities have made it clear that securing the supply chains underpinning modern software is an urgent, yet enormous, undertaking. As supply chains get more complicated, enterprise developers need to manage the tidal wave of vulnerabilities that propagate up through dependency trees. Open source maintainers need streamlined ways to vet proposed dependencies and protect their projects. A rise in attacks coupled with increasingly complex supply chains means that supply chain security problems need solutions on the ecosystem level.

One way developers can manage this enormous risk is by choosing a more secure language. As part of Google’s commitment to advancing cybersecurity and securing the software supply chain, Go maintainers are focused this year on hardening supply chain security, streamlining security information to our users, and making it easier than ever to make good security choices in Go.

This is the first in a series of blog posts about how developers and enterprises can secure their supply chains with Go. Today’s post covers how Go helps teams with the tricky problem of managing vulnerabilities in their open source packages.

Extensive Package Insights

Before adopting a dependency, it’s important to have high-quality information about the package. Seamless access to comprehensive information can be the difference between an informed choice and a future security incident from a vulnerability in your supply chain. Along with providing package documentation and version history, the Go package discovery site links to Open Source Insights. The Open Source Insights page includes vulnerability information, a dependency tree, and a security score provided by the OpenSSF Scorecard project. Scorecard evaluates projects on more than a dozen security metrics, each backed up with supporting information, and assigns the project an overall score out of ten to help users quickly judge its security stance (example). The Go package discovery site puts all these resources at developers’ fingertips when they need them most—before taking on a potentially risky dependency.

Curated Vulnerability Information

Large consumers of open source software must manage many packages and a high volume of vulnerabilities. For enterprise teams, filtering out noisy, low quality advisories and false positives from critical vulnerabilities is often the most important task in vulnerability management. If it is difficult to tell which vulnerabilities are important, it is impossible to properly prioritize their remediation. With granular advisory details, the Go vulnerability database removes barriers to vulnerability prioritization and remediation.

All vulnerability database entries are reviewed and curated by the Go security team. As a result, entries are accurate and include detailed metadata to improve the quality of vulnerability scans and to make vulnerability information more actionable. This metadata includes information on affected functions, operating systems, and architectures. With this information, vulnerability scanners can reduce the number of false positives using symbol information to filter out vulnerabilities that aren’t called by client code.

Consider the case of GO-2022-0646, which describes an unfixed vulnerability present in all versions of the package. It can only be triggered, though, if a particular, deprecated function is called. For the majority of users, this vulnerability is a false positive—but every user would need to spend time and effort to manually determine whether they’re affected if their vulnerability database doesn’t include function metadata. This amounts to enormous wasted effort that could be spent on more productive security efforts.

The Go vulnerability database streamlines this process by including accurate affected function level metadata for GO-2022-0646. Vulnerability scanners can then use static analysis to accurately determine if the project uses the affected function. Because of Go’s high quality metadata, a vulnerability such as this one can automatically be excluded with less frustration for developers, allowing them to focus on more relevant vulnerabilities. And for projects that do incorporate the affected function, Go’s metadata provides a remediation path: at the time of writing, it’s not possible to upgrade the package to fix the vulnerability, but you can stop using the vulnerable function. Whether or not the function is called, Go’s high quality metadata provides the user with the next step.

Entries in the Go vulnerability database are served as JSON files in the OSV format from vuln.go.dev. The OSV format is a minimal and precise industry-accepted reporting format for open source vulnerabilities that has coverage over 16 ecosystems. OSV treats open source as a first class citizen by including information specific to open source, like git commit hashes. The OSV format ensures that the vulnerability information is both machine readable and easy for developers to understand. That means that not only are the database entries easy to read and browse, but that the format is also compatible with automated tools like scanners. Go provides such a scanner that intelligently matches vulnerabilities to Go codebases.

Low noise, reliable vulnerability scanning

The Go team released a new command line tool, govulncheck, last September. Govulncheck does more than simply match dependencies to known vulnerabilities in the Go vulnerability database; it uses the additional metadata to analyze your project’s source code and narrow results to vulnerabilities that actually affect the application. This cuts down on false positives, reducing noise and making it easier to prioritize and fix issues.

You can run govulncheck as a command-line tool throughout your development process to see if a recent change introduced a new exploitable path. Fortunately, it’s easy to run govulncheck directly from your editor using the latest VS Code Go extension. Users have even incorporated govulncheck into their CI/CD pipeline. Finding new vulnerabilities early can help you fix them before they’re in production.

The Go team has been collaborating with the OSV team to bring source analysis capabilities to OSV-Scanner through a beta integration with govulncheck. OSV-Scanner is a general purpose, multi-ecosystem, vulnerability scanner that matches project dependencies to known vulnerabilities. Go vulnerabilities can now be marked as “unexecuted” thanks to govulncheck’s analysis.

Govulncheck is under active development, and the team appreciates feedback from users. Go package maintainers are also encouraged to contribute vulnerability reports to the Go vulnerability database.

Additionally, you can report a security bug in the Go project itself, following the Go Security Policy. These may be eligible for the Open Source Vulnerability Rewards Program, which gives financial rewards for vulnerabilities found in Google’s open source projects. These contributions improve security for all users and reports are always appreciated.

Security across the supply chain

Google is committed to helping developers use Go software securely across the end-to-end supply chain, connecting users to dependable data and tools throughout the development lifecycle. As supply chain complexities and threats continue to increase, Go’s mission is to provide the most secure development environment for software engineering at scale.

Our next installment in this series on supply chain security will cover how Go’s checksum database can help protect users from compromised dependencies. Watch for it in the coming weeks!

Announcing the deps.dev API: critical dependency data for secure supply chains

Today, we are excited to announce the deps.dev API, which provides free access to the deps.dev dataset of security metadata, including dependencies, licenses, advisories, and other critical health and security signals for more than 50 million open source package versions.

Software supply chain attacks are increasingly common and harmful, with high profile incidents such as Log4Shell, Codecov, and the recent 3CX hack. The overwhelming complexity of the software ecosystem causes trouble for even the most diligent and well-resourced developers.

We hope the deps.dev API will help the community make sense of complex dependency data that allows them to respond to—or even prevent—these types of attacks. By integrating this data into tools, workflows, and analyses, developers can more easily understand the risks in their software supply chains.

The power of dependency data

As part of Google’s ongoing efforts to improve open source security, the Open Source Insights team has built a reliable view of software metadata across 5 packaging ecosystems. The deps.dev data set is continuously updated from a range of sources: package registries, the Open Source Vulnerability database, code hosts such as GitHub and GitLab, and the software artifacts themselves. This includes 5 million packages, more than 50 million versions, from the Go, Maven, PyPI, npm, and Cargo ecosystems—and you'd better believe we're counting them!

We collect and aggregate this data and derive transitive dependency graphs, advisory impact reports, OpenSSF Security Scorecard information, and more. Where the deps.dev website allows human exploration and examination, and the BigQuery dataset supports large-scale bulk data analysis, this new API enables programmatic, real-time access to the corpus for integration into tools, workflows, and analyses.

The API is used by a number of teams internally at Google to support the security of our own products. One of the first publicly visible uses is the GUAC integration, which uses the deps.dev data to enrich SBOMs. We have more exciting integrations in the works, but we’re most excited to see what the greater open source community builds!

We see the API as being useful for tool builders, researchers, and tinkerers who want to answer questions like:

  • What versions are available for this package?
  • What are the licenses that cover this version of a package—or all the packages in my codebase?
  • How many dependencies does this package have? What are they?
  • Does the latest version of this package include changes to dependencies or licenses?
  • What versions of what packages correspond to this file?

Taken together, this information can help answer the most important overarching question: how much risk would this dependency add to my project?

The API can help surface critical security information where and when developers can act. This data can be integrated into:

  • IDE Plugins, to make dependency and security information immediately available.
  • CI/CD integrations to prevent rolling out code with vulnerability or license problems).
  • Build tools and policy engine integrations to help ensure compliance.
  • Post-release analysis tools to detect newly discovered vulnerabilities in your codebase.
  • Tools to improve inventory management and mystery file identification.
  • Visualizations to help you discover what your dependency graph actually looks like:

    Unique features

    The API has a couple of great features that aren’t available through the deps.dev website.

    Hash queries

    A unique feature of the API is hash queries: you can look up the hash of a file's contents and find all the package versions that contain that file. This can help figure out what version of which package you have even absent other build metadata, which is useful in areas such as SBOMs, container analysis, incident response, and forensics.

    Real dependency graphs

    The deps.dev dependency data is not just what a package declares (its manifests, lock files, etc.), but rather a full dependency graph computed using the same algorithms as the packaging tools (Maven, npm, Pip, Go, Cargo). This gives a real set of dependencies similar to what you would get by actually installing the package, which is useful when a package changes but the developer doesn’t update the lock file. With the deps.dev API, tools can assess, monitor, or visualize expected (or unexpected!) dependencies.

    API in action

    For a demonstration of how the API can help software supply chain security efforts, consider the questions it could answer in a situation like the Log4Shell discovery:

    • Am I affected? - A CI/CD integration powered by the free API would automatically detect that a new, critical vulnerability is affecting your codebase, and alert you to act.
    • Where? - A dependency visualization tool pulling from the deps.dev API transitive dependency graphs would help you identify whether you can update one of your direct dependencies to fix the issue. If you were blocked, the tool would point you at the package(s) that are yet to be patched, so you could contribute a PR and help unblock yourself further up the tree.
    • Where else? - You could query the API with hashes of vendored JAR files to check if vulnerable log4j versions were unexpectedly hiding therein.
    • How much of the ecosystem is impacted? - Researchers, package managers, and other interested observers could use the API to understand how their ecosystem has been affected, as we did in this blog post about Log4Shell’s impact.

    Getting started

    The API service is globally replicated and highly available, meaning that you and your tools can depend on it being there when you need it.

    It's also free and immediately available—no need to register for an API key. It's just a simple, unauthenticated HTTPS API that returns JSON objects:

    # List the advisories affecting log4j 1.2.17
    $ curl https://api.deps.dev/v3alpha/systems/maven/packages/log4j%3Alog4j/versions/1.2.17 \
            | jq '.advisoryKeys[].id'
    "GHSA-2qrg-x229-3v8q"
    "GHSA-65fg-84f6-3jq3"
    "GHSA-f7vh-qwp3-x37m"
    "GHSA-fp5r-v3w9-4333"
    "GHSA-w9p3-5cr8-m3jj"

    A single API call to list all the GHSA advisories affecting a specific version of log4j.

    Check out the API Documentation to get started, or jump straight into the code with some examples.

    Securing supply chains

    Software supply chain security is hard, but it’s in all our interests to make it easier. Every day, Google works hard to create a safer internet, and we’re proud to be releasing this API to help do just that, and make this data universally accessible and useful to everyone.

    We look forward to seeing what you might do with the API, and would appreciate your feedback. (What works? What doesn't? What makes it better?) You can reach us at [email protected], or by filing an issue on our GitHub repo.

OSV and the Vulnerability Life Cycle

It is an interesting time for everyone concerned with open source vulnerabilities. The U.S. Executive Order on Improving the Nation's Cybersecurity requirements for vulnerability disclosure programs and assurances for software used by the US government will go into effect later this year. Finding and fixing security vulnerabilities has never been more important, yet with increasing interest in the area, the vulnerability management space has become fragmented—there are a lot of new tools and competing standards.

In 2021, we announced the launch of OSV, a database of open source vulnerabilities built partially from vulnerabilities found through Google’s OSS-Fuzz program. OSV has grown since then and now includes a widely adopted OpenSSF schema and a vulnerability scanner. In this blog post, we’ll cover how these tools help maintainers track vulnerabilities from discovery to remediation, and how to use OSV together with other SBOM and VEX standards.

Vulnerability Databases

The lifecycle of a known vulnerability begins when it is discovered. To reach developers, the vulnerability needs to be added to a database. CVEs are the industry standard for describing vulnerabilities across all software, but there was a lack of an open source centric database. As a result, several independent vulnerability databases exist across different ecosystems.

To address this, we announced the OSV Schema to unify open source vulnerability databases. The schema is machine readable, and is designed so dependencies can be easily matched to vulnerabilities using automation. The OSV Schema remains the only widely adopted schema that treats open source as a first class citizen. Since becoming a part of OpenSSF, the OSV Schema has seen adoption from services like GitHub, ecosystems such as Rust and Python, and Linux distributions such as Rocky Linux.

Thanks to such wide community adoption of the OSV Schema, OSV.dev is able to provide a distributed vulnerability database and service that pulls from language specific authoritative sources. In total, the OSV.dev database now includes 43,302 vulnerabilities from 16 ecosystems as of March 2023. Users can check OSV for a comprehensive view of all known vulnerabilities in open source.

Every vulnerability in OSV.dev contains package manager versions and git commit hashes, so open source users can easily determine if their packages are impacted because of the familiar style of versioning. Maintainers are also familiar with OSV’s community driven and distributed collaboration on the development of OSV’s database, tools, and schema.

Matching

The next step in managing vulnerabilities is to determine project dependencies and their associated vulnerabilities. Last December we released OSV-Scanner, a free, open source tool which scans software projects’ lockfiles, SBOMs, or git repositories to identify vulnerabilities found in the OSV.dev database. When a project is scanned, the user gets a list of all known vulnerabilities in the project.

In the two months since launch, OSV-Scanner has seen positive reception from the community, including over 4,600 stars and 130 PRs from 29 contributors. Thank you to the community, which has been incredibly helpful in identifying bugs, supporting new lockfile formats, and helping us prioritize new features for the tool.

Remediation

Once a vulnerability has been identified, it needs to be remediated. Removing a vulnerability through upgrading the package is often not as simple as it seems. Sometimes an upgrade will break your project or cause another dependency to not function correctly. These complex dependency graph constraints can be difficult to resolve. We’re currently working on building features in OSV-Scanner to improve this process by suggesting minimal upgrade paths.

Sometimes, it isn’t even necessary to upgrade a package. A vulnerable component may be present in a project, but that doesn’t mean it is exploitable–and VEX statements provide this information to help in prioritization of vulnerability remediation. For example, it may not be necessary to update a vulnerable component if it is never called. In cases like this, a VEX (Vulnerability Exploitability eXchange) statement can provide this justification.

Manually generating VEX statements is time intensive and complex, requiring deep expertise in the project’s codebase and libraries included in its dependency tree. These costs are barriers to VEX adoption at scale, so we’re working on the ability to auto-generate high quality VEX statements based on static analysis and manual ignore files. The format for this will likely be one or more of the current emerging VEX standards.

Compatibility

Not only are there multiple emerging VEX standards (such as OpenVEX, CycloneDX, and CSAF), there are also multiple advisory formats (CVE, CSAF) and SBOM formats (CycloneDX, SPDX). Compatibility is a concern for project maintainers and open source users throughout the process of identifying and fixing project vulnerabilities. A developer may be obligated to use another standard and wonder if OSV can be used alongside it.

Fortunately, the answer is generally yes! OSV provides a focused, first-class experience for describing open source vulnerabilities, while providing an easy bridge to other standards.

CVE 5.0

The OSV team has directly worked with the CVE Quality Working Group on a key new feature of the latest CVE 5.0 standard: a new versioning schema that closely resembles OSV’s own versioning schema. This will enable easy conversion from OSV to CVE 5.0, and vice versa. It also enables OSV to contribute high quality metadata directly back to CVE, and drive better machine readability and data quality across the open source ecosystem.

Other emerging standards

Not all standards will convert as effortlessly as CVE to OSV. Emerging standards like CSAF are comparatively complicated because they support broader use cases. These standards often need to encode affected proprietary software, and CSAF includes rich mechanisms to express complicated nested product trees that are unnecessary for open source. As a result, the spec is roughly six times the size of OSV and difficult to use directly for open source.

OSV Schema's strong adoption shows that the open source community prefers a lightweight standard, tailored for open source. However, the OSV Schema maintains compatibility with CSAF for identification of packages through the Package URL and vers standards. CSAF records that use these mechanisms can be directly converted to OSV, and all OSV entries can be converted to CSAF.

SBOM and VEX standards

Similarly, all emerging SBOM and VEX standards maintain compatibility with OSV through the Package URL specification. OSV-Scanner today also already provides scanning support for the SPDX and CycloneDX SBOM standards.

OSV in 2023

OSV already provides straightforward compatibility with established standards such as CVE, SPDX, and CycloneDX. While it’s not clear yet which other emerging SBOM and VEX formats will become the standard, OSV has a clear path to supporting all of them. Open source developers and ecosystems will likely find OSV to be convenient for recording and consuming vulnerability information given OSV’s focused, minimal design.

OSV is not just built for open source, it is an open source project. We desire to build tools that will easily fit into your workflow and will help you identify and fix vulnerabilities in your projects. Your input, through contributions, questions, and feedback, is very valuable to us as we work towards that goal. Questions can be asked by opening an issue and all of our projects (OSV.dev, OSV-Scanner, OSV-Schema) welcome contributors.


Want to keep up with the latest OSV developments? We’ve just launched a project blog! Check out our first major post, all about how VEX could work at scale.

The US Government says companies should take more responsibility for cyberattacks. We agree.

Should companies be responsible for cyberattacks? The U.S. government thinks so – and frankly, we agree.

Jen Easterly and Eric Goldstein of the Cybersecurity and Infrastructure Security Agency at the Department of Homeland Security planted a flag in the sand:

“The incentives for developing and selling technology have eclipsed customer safety in importance. […] Americans…have unwittingly come to accept that it is normal for new software and devices to be indefensible by design. They accept products that are released to market with dozens, hundreds, or even thousands of defects. They accept that the cybersecurity burden falls disproportionately on consumers and small organizations, which are often least aware of the threat and least capable of protecting themselves.”

We think they’re right. It’s time for companies to step up on their own and work with governments to help fix a flawed ecosystem. Just look at the growing threat of ransomware, where bad actors lock up organizations’ systems and demand payment or ransom to restore access. Ransomware affects every industry, in every corner of the globe – and it thrives on pre-existing vulnerabilities: insecure software, indefensible architectures, and inadequate security investment.

Remember that sophisticated ransomware operators have bosses and budgets too. They increase their return on investment by exploiting outdated and insecure technology systems that are too hard to defend. Alarmingly, the most significant source of compromise is through exploitation of known vulnerabilities, holes sometimes left unpatched for years. While law enforcement works to bring ransomware operators to justice, this merely treats the symptoms of the problem.


Treating the root causes will require addressing the underlying sources of digital vulnerabilities. As Easterly and Goldstein rightly point out, “secure by default” and “secure by design” should be table stakes.

The bottom line: People deserve products that are secure by default and systems that are built to withstand the growing onslaught from attackers. Safety should be fundamental: built-in, enabled out of the box, and not added on as an afterthought. In other words, we need secure products, not security products. That’s why Google has worked to build security in – often making it invisible – to our users. Many of our most significant security features, including innovations like SafeBrowsing, do their best work behind the scenes for our core consumer products.

There’s come to be an unfortunate belief that security features are cumbersome and hurt user experience. That can be true – but it doesn’t need to be. We can make the safe path the easiest, most helpful path for people using our products. Our approach to multi-factor authentication – one of the most important controls to defend against phishing attacks – provides a great example. Since 2021, we’ve turned on 2-Step Verification (2SV) by default for hundreds of millions of people to add an additional layer of security across their online accounts. If we had simply announced 2SV as an available option for people to enroll in, it would have failed like so many other security add-ons. Instead, we pioneered an approach using in-app notifications that was so seamless and integrated, many of the millions of people we auto-enrolled never noticed they adopted 2SV. We’ve taken this approach even further by building the “second factor” right into phones – giving people the strongest form of account security as soon as they have their device.

As for secure by design: We all have to shift our focus from reactive incident response to upstream software development. That will demand a completely new approach to how companies build products and services. We’ve learned a lot in the past decade about reengineering security architectures, and actively apply those learnings to keep people safe online every day. Ensuring technology is secure by design should be like balancing budgets — a part of business as usual. However, it isn’t easy to cut-and-paste solutions here: developers need to think deeply about the threats their products will face, and design them from the ground up to withstand those attacks. And the same principles are true for securing the development process as they are for users: the secure engineering choice must also be the easiest and most helpful one.

Building security into every stage of the software development process takes work, but recent innovations, like our SLSA framework for secure software supply chains, and new general purpose memory-safe languages, are making it easier. Perhaps most significantly, adopting modern cloud architectures makes it easier to define and enforce secure software development policies.

Persistent collaboration between private and public sector partners is essential. No company can solve the cybersecurity challenge on its own. It’s a collective action problem that demands a collective solution, including international coordination and collaboration. Many public and private initiatives — threat sharing, incident response, law enforcement cooperation — are valuable, but address only symptoms, not root causes. We can do better than just holding attackers to account after the fact.

As Easterly and Goldstein write, “Americans need a new model, one they can trust to ensure the safety and integrity of the technology that they use every hour of every day.” Again, we agree, but in this case we’d take it a step further. Building this model and ensuring it can scale calls for close cooperation between tech companies, standards bodies, and government agencies. But since technologies and companies cross borders, we also need to take a global view: Cybersecurity is a team sport, and international coordination is essential to avoid conflicting requirements that unintentionally make it harder to secure software. Broad regulatory cooperation on cybersecurity will promote secure-by-default principles for everyone. This approach holds enormous promise, and not just for technologically advanced nations. Raising the security benchmark for basic consumer and enterprise technologies that all nations rely on offers far more bang for the buck. A far wider range of countries and companies can take these simple steps than can employ advanced cyber initiatives like detailed threat sharing and close operational collaboration. Given the interdependent nature of the ecosystem, we are only as strong as our weakest link. That means raising cyber standards globally will improve American resilience as well.

Of course, raising the security baseline won’t stop all bad actors, and software will likely always have flaws – but we can start by covering the basics, fixing the most egregious security risks, and coming up with new approaches that eliminate entire classes of threats. Google has made investments in the past two decades, but contributing resources is just a piece of the puzzle. It's work for all of us, but it's the responsible thing to do: The safety and security of our increasingly digitized world depends on it.

Taking the next step: OSS-Fuzz in 2023

Since launching in 2016, Google's free OSS-Fuzz code testing service has helped get over 8800 vulnerabilities and 28,000 bugs fixed across 850 projects. Today, we’re happy to announce an expansion of our OSS-Fuzz Rewards Program, plus new features in OSS-Fuzz and our involvement in supporting academic fuzzing research.

Refreshed OSS-Fuzz rewards

The OSS-Fuzz project's purpose is to support the open source community in adopting fuzz testing, or fuzzing — an automated code testing technique for uncovering bugs in software. In addition to the OSS-Fuzz service, which provides a free platform for continuous fuzzing to critical open source projects, we established an OSS-Fuzz Reward Program in 2017 as part of our wider Patch Rewards Program.

We’ve operated this successfully for the past 5 years, and to date, the OSS-Fuzz Reward Program has awarded over $600,000 to over 65 different contributors for their help integrating new projects into OSS-Fuzz.

Today, we’re excited to announce that we’ve expanded the scope of the OSS-Fuzz Reward Program considerably, introducing many new types of rewards!

These new reward types cover contributions such as:

  • Project fuzzing coverage increases
  • Notable FuzzBench fuzzer integrations
  • Integrating a new sanitizer (example) that finds two new vulnerabilities

These changes boost the total rewards possible per project integration from a maximum of $20,000 to $30,000 (depending on the criticality of the project). In addition, we’ve also established two new reward categories that reward wider improvements across all OSS-Fuzz projects, with up to $11,337 available per category.

For more details, see the fully updated rules for our dedicated OSS-Fuzz Reward Program.

OSS-Fuzz improvements

We’ve continuously made improvements to OSS-Fuzz’s infrastructure over the years and expanded our language offerings to cover C/C++, Go, Rust, Java, Python, and Swift, and have introduced support for new frameworks such as FuzzTest. Additionally, as part of an ongoing collaboration with Code Intelligence, we’ll soon have support for JavaScript fuzzing through Jazzer.js.

FuzzIntrospector support

Last year, we launched the OpenSSF FuzzIntrospector tool and integrated it into OSS-Fuzz.

We’ve continued to build on this by adding new language support and better analysis, and now C/C++, Python, and Java projects integrated into OSS-Fuzz have detailed insights on how the coverage and fuzzing effectiveness for a project can be improved.

The FuzzIntrospector tool provides these insights by identifying complex code blocks that are blocked during fuzzing at runtime, as well as suggesting new fuzz targets that can be added. We’ve seen users successfully use this tool to improve the coverage of jsonnet, file, xpdf and bzip2, among others.

Anyone can use this tool to increase the coverage of a project and in turn be rewarded as part of the refreshed OSS-Fuzz rewards. See the full list of all OSS-Fuzz FuzzIntrospector reports to get started.

Fuzzing research and competition

The OSS-Fuzz team maintains FuzzBench, a service that enables security researchers in academia to test fuzzing improvements against real-world open source projects. Approaching its third anniversary in serving free benchmarking, FuzzBench is cited by over 100 papers and has been used as a platform for academic fuzzing workshops such as NDSS’22.

This year, FuzzBench has been invited to participate in the SBFT'23 workshop in ICSE, a premier research conference in the field, which for the first time is hosting a fuzzing competition. During this competition, the FuzzBench platform will be used to evaluate state-of-the-art fuzzers submitted by researchers from around the globe on both code coverage and bug-finding metrics.

Get involved!

We believe these initiatives will help scale security testing efforts across the broader open source ecosystem. We hope to accelerate the integration of critical open source projects into OSS-Fuzz by providing stronger incentives to security researchers and open source maintainers. Combined with our involvement in fuzzing research, these efforts are making OSS-Fuzz an even more powerful tool, enabling users to find more bugs, and, more critically, find them before the bad guys do!

Announcing OSV-Scanner: Vulnerability Scanner for Open Source

Today, we’re launching the OSV-Scanner, a free tool that gives open source developers easy access to vulnerability information relevant to their project.

Last year, we undertook an effort to improve vulnerability triage for developers and consumers of open source software. This involved publishing the Open Source Vulnerability (OSV) schema and launching the OSV.dev service, the first distributed open source vulnerability database. OSV allows all the different open source ecosystems and vulnerability databases to publish and consume information in one simple, precise, and machine readable format.

The OSV-Scanner is the next step in this effort, providing an officially supported frontend to the OSV database that connects a project’s list of dependencies with the vulnerabilities that affect them.

OSV-Scanner

Software projects are commonly built on top of a mountain of dependencies—external software libraries you incorporate into a project to add functionalities without developing them from scratch. Each dependency potentially contains existing known vulnerabilities or new vulnerabilities that could be discovered at any time. There are simply too many dependencies and versions to keep track of manually, so automation is required.

Scanners provide this automated capability by matching your code and dependencies against lists of known vulnerabilities and notifying you if patches or updates are needed. Scanners bring incredible benefits to project security, which is why the 2021 U.S. Executive Order for Cybersecurity included this type of automation as a requirement for national standards on secure software development.

The OSV-Scanner generates reliable, high-quality vulnerability information that closes the gap between a developer’s list of packages and the information in vulnerability databases. Since the OSV.dev database is open source and distributed, it has several benefits in comparison with closed source advisory databases and scanners:

  • Each advisory comes from an open and authoritative source (e.g. the RustSec Advisory Database)
  • Anyone can suggest improvements to advisories, resulting in a very high quality database
  • The OSV format unambiguously stores information about affected versions in a machine-readable format that precisely maps onto a developer’s list of packages
  • The above all results in fewer, more actionable vulnerability notifications, which reduces the time needed to resolve them

Running OSV-Scanner on your project will first find all the transitive dependencies that are being used by analyzing manifests, SBOMs, and commit hashes. The scanner then connects this information with the OSV database and displays the vulnerabilities relevant to your project.

OSV-Scanner is also integrated into the OpenSSF Scorecard’s Vulnerabilities check, which will extend the analysis from a project’s direct vulnerabilities to also include vulnerabilities in all its dependencies. This means that the 1.2M projects regularly evaluated by Scorecard will have a more comprehensive measure of their project security.

What else is new for OSV?

The OSV project has made lots of progress since our last post in June last year. The OSV schema has seen significant adoption from vulnerability databases such as GitHub Security Advisories and Android Security Bulletins. Altogether OSV.dev now supports 16 ecosystems, including all major language ecosystems, Linux distributions (Debian and Alpine), as well as Android, Linux Kernel, and OSS-Fuzz. This means the OSV.dev database is now the biggest open source vulnerability database of its kind, with a total of over 38,000 advisories from 15,000 advisories a year ago.

The OSV.dev website also had a complete overhaul, and now has a better UI and provides more information on each vulnerability. Prominent open source projects have also started to rely on OSV.dev, such as DependencyTrack and Flutter.

What’s next?

There’s still a lot to do! Our plan for OSV-Scanner is not just to build a simple vulnerability scanner; we want to build the best vulnerability management tool—something that will also minimize the burden of remediating known vulnerabilities. Here are some of our ideas for achieving this:

  • The first step is further integrating with developer workflows by offering standalone CI actions, allowing for easy setup and scheduling to keep track of new vulnerabilities.
  • Improve C/C++ vulnerability support: One of the toughest ecosystems for vulnerability management is C/C++, due to the lack of a canonical package manager to identify C/C++ software. OSV is filling this gap by building a high quality database of C/C++ vulnerabilities by adding precise commit level metadata to CVEs.
  • We are also looking to add unique features to OSV-Scanner, like the ability to utilize specific function level vulnerability information by doing call graph analysis, and to be able to automatically remediate vulnerabilities by suggesting minimal version bumps that provide the maximal impact.
  • VEX support: Automatically generating VEX statements using, for example, call graph analysis.

Try out OSV-Scanner today!

You can download and try out OSV-Scanner on your projects by following instructions on our new website osv.dev. Or alternatively, to automatically run OSV-Scanner on your GitHub project, try Scorecard. Please feel free to let us know what you think! You can give us feedback either by opening an issue on our Github, or through the OSV mailing list.

Announcing GUAC, a great pairing with SLSA (and SBOM)!

Supply chain security is at the fore of the industry’s collective consciousness. We’ve recently seen a significant rise in software supply chain attacks, a Log4j vulnerability of catastrophic severity and breadth, and even an Executive Order on Cybersecurity.

It is against this background that Google is seeking contributors to a new open source project called GUAC (pronounced like the dip). GUAC, or Graph for Understanding Artifact Composition, is in the early stages yet is poised to change how the industry understands software supply chains. GUAC addresses a need created by the burgeoning efforts across the ecosystem to generate software build, security, and dependency metadata. True to Google’s mission to organize and make the world’s information universally accessible and useful, GUAC is meant to democratize the availability of this security information by making it freely accessible and useful for every organization, not just those with enterprise-scale security and IT funding.

Thanks to community collaboration in groups such as OpenSSF, SLSA, SPDX, CycloneDX, and others, organizations increasingly have ready access to:

These data are useful on their own, but it’s difficult to combine and synthesize the information for a more comprehensive view. The documents are scattered across different databases and producers, are attached to different ecosystem entities, and cannot be easily aggregated to answer higher-level questions about an organization’s software assets.

To help address this issue we’ve teamed up with Kusari, Purdue University, and Citi to create GUAC, a free tool to bring together many different sources of software security metadata. We’re excited to share the project’s proof of concept, which lets you query a small dataset of software metadata including SLSA provenance, SBOMs, and OpenSSF Scorecards.

What is GUAC

Graph for Understanding Artifact Composition (GUAC) aggregates software security metadata into a high fidelity graph database—normalizing entity identities and mapping standard relationships between them. Querying this graph can drive higher-level organizational outcomes such as audit, policy, risk management, and even developer assistance.

Conceptually, GUAC occupies the “aggregation and synthesis” layer of the software supply chain transparency logical model:

GUAC has four major areas of functionality:

  1. Collection
    GUAC can be configured to connect to a variety of sources of software security metadata. Some sources may be open and public (e.g., OSV); some may be first-party (e.g., an organization’s internal repositories); some may be proprietary third-party (e.g., from data vendors).
  2. Ingestion
    From its upstream data sources GUAC imports data on artifacts, projects, resources, vulnerabilities, repositories, and even developers.
  3. Collation
    Having ingested raw metadata from disparate upstream sources, GUAC assembles it into a coherent graph by normalizing entity identifiers, traversing the dependency tree, and reifying implicit entity relationships, e.g., project → developer; vulnerability → software version; artifact → source repo, and so on.
  4. Query
    Against an assembled graph one may query for metadata attached to, or related to, entities within the graph. Querying for a given artifact may return its SBOM, provenance, build chain, project scorecard, vulnerabilities, and recent lifecycle events — and those for its transitive dependencies.

    A CISO or compliance officer in an organization wants to be able to reason about the risk of their organization. An open source organization like the Open Source Security Foundation wants to identify critical libraries to maintain and secure. Developers need richer and more trustworthy intelligence about the dependencies in their projects.

    The good news is, increasingly one finds the upstream supply chain already enriched with attestations and metadata to power higher-level reasoning and insights. The bad news is that it is difficult or impossible today for software consumers, operators, and administrators to gather this data into a unified view across their software assets.

    To understand something complex like the blast radius of a vulnerability, one needs to trace the relationship between a component and everything else in the portfolio—a task that could span thousands of metadata documents across hundreds of sources. In the open source ecosystem, the number of documents could reach into the millions.

    GUAC aggregates and synthesizes software security metadata at scale and makes it meaningful and actionable. With GUAC in hand, we will be able to answer questions at three important stages of software supply chain security:

    • Proactive, e.g.,
      • What are the most used critical components in my software supply chain ecosystem?
      • Where are the weak points in my overall security posture?
      • How do I prevent supply chain compromises before they happen?
      • Where am I exposed to risky dependencies?
    • Operational, e.g.,
      • Is there evidence that the application I’m about to deploy meets organization policy?
      • Do all binaries in production trace back to a securely managed repository?
    • Reactive, e.g.,
      • Which parts of my organization’s inventory is affected by new vulnerability X?
      • A suspicious project lifecycle event has occurred. Where is risk introduced to my organization?
      • An open source project is being deprecated. How am I affected?

Get Involved

GUAC is an Open Source project on Github, and we are excited to get more folks involved and contributing (read the contributor guide to get started)! The project is still in its early stages, with a proof of concept that can ingest SLSA, SBOM, and Scorecard documents and support simple queries and exploration of software metadata. The next efforts will focus on scaling the current capabilities and adding new document types for ingestion. We welcome help and contributions of code or documentation.

Since the project will be consuming documents from many different sources and formats, we have put together a group of “Technical Advisory Members'' to help advise the project. These members include representation from companies and groups such as SPDX, CycloneDX Anchore, Aquasec, IBM, Intel, and many more. If you’re interested in participating as a contributor or advisor representing end users’ needs—or the sources of metadata GUAC consumes—you can register your interest in the relevant GitHub issue.

The GUAC team will be showcasing the project at Kubecon NA 2022 next week. Come by our session if you’ll be there and have a chat with us—we’d be happy to talk in person or virtually!

Fuzzing beyond memory corruption: Finding broader classes of vulnerabilities automatically

Recently, OSS-Fuzz—our community fuzzing service that regularly checks 700 critical open source projects for bugs—detected a serious vulnerability (CVE-2022-3008): a bug in the TinyGLTF project that could have allowed attackers to execute malicious code in projects using TinyGLTF as a dependency.

The bug was soon patched, but the wider significance remains: OSS-Fuzz caught a trivially exploitable command injection vulnerability. This discovery shows that fuzzing, a type of testing once primarily known for detecting memory corruption vulnerabilities in C/C++ code, has considerable untapped potential to find broader classes of vulnerabilities. Though the TinyGLTF library is written in C++, this vulnerability is easily applicable to all programming languages and confirms that fuzzing is a beneficial and necessary testing method for all software projects.

Fuzzing as a public service

OSS-Fuzz was launched in 2016 in response to the Heartbleed vulnerability, discovered in one of the most popular open source projects for encrypting web traffic. The vulnerability had the potential to affect almost every internet user, yet was caused by a relatively simple memory buffer overflow bug that could have been detected by fuzzing—that is, by running the code on randomized inputs to intentionally cause unexpected behaviors or crashes that signal bugs. At the time, though, fuzzing was not widely used and was cumbersome for developers, requiring extensive manual effort.

Google created OSS-Fuzz to fill this gap: it's a free service that runs fuzzers for open source projects and privately alerts developers to the bugs detected. Since its launch, OSS-Fuzz has become a critical service for the open source community, helping get more than 8,000 security vulnerabilities and more than 26,000 other bugs in open source projects fixed. With time, OSS-Fuzz has grown beyond C/C++ to detect problems in memory-safe languages such as Go, Rust, and Python.

Google Cloud’s Assured Open Source Software Service, which provides organizations a secure and curated set of open source dependencies, relies on OSS-Fuzz as a foundational layer of security scanning. OSS-Fuzz is also the basis for free fuzzing tools for the community, such as ClusterFuzzLite, which gives developers a streamlined way to fuzz both open source and proprietary code before committing changes to their projects. All of these efforts are part of Google’s $10B commitment to improving cybersecurity and continued work to make open source software more secure for everyone.

New classes of vulnerabilities

Last December, OSS-Fuzz announced an effort to improve our bug detectors (known as sanitizers) to find more classes of vulnerabilities, by first showing that fuzzing can find Log4Shell. The TinyGLTF bug was found using one of those new sanitizers, SystemSan, which was developed specifically to find bugs that can be exploited to execute arbitrary commands in any programming language. This vulnerability shows that it was possible to inject backticks into the input glTF file format and allow commands to be executed during parsing.

# Craft an input that exploits the vulnerability to insert a string to poc
$ echo '{"images":[{"uri":"a`echo iamhere > poc`"}], "asset":{"version":""}}' > payload.gltf
# Execute the vulnerable program with the input
$ ./loader_exampler payload.gltf
# The string was inserted to poc, proving the vulnerability was successfully exploited
$ cat poc
iamhere

A proof of exploit in TinyGLTF, extended from the input found by OSS-Fuzz with SystemSan. The culprit was the use of the “wordexp” function to expand file paths.

SystemSan uses ptrace, and is built in a language-independent and highly extensible way to allow new bug detectors to be added easily. For example, we’ve built proofs of concept to detect issues in JavaScript and Python libraries, and an external contributor recently added support for detecting arbitrary file access (e.g. through path traversal).

OSS-Fuzz has also continued to work with Code Intelligence to improve Java fuzzing by integrating over 50 additional Java projects into OSS-Fuzz and developing sanitizers for detecting Java-specific issues such as deserialization and LDAP injection vulnerabilities. A number of these types of vulnerabilities have been found already and are pending disclosure.

Rewards for getting involved

Want to get involved with making fuzzing more widely used and get rewarded? There are two ways:

  1. Integrate a new sanitizer into OSS-Fuzz (or fuzzing engines like Jazzer) to detect more classes of bugs. We will pay $11,337 for integrations that find at least 2 new vulnerabilities in OSS-Fuzz projects.
  2. Integrate a new project into OSS-Fuzz. We currently support projects written in C/C++, Rust, Go, Swift, Python, and JVM-based languages; Javascript is coming soon. This is part of our existing OSS-Fuzz integration rewards.

To apply for these rewards, see the OSS-Fuzz integration reward program.

Fuzzing still has a lot of unexplored potential in discovering more classes of vulnerabilities. Through our combined efforts we hope to take this effective testing method to the next level and enable more of the open source community to enjoy the benefits of fuzzing.

Retrofitting Temporal Memory Safety on C++


Memory safety in Chrome is an ever-ongoing effort to protect our users. We are constantly experimenting with different technologies to stay ahead of malicious actors. In this spirit, this post is about our journey of using heap scanning technologies to improve memory safety of C++.



Let’s start at the beginning though. Throughout the lifetime of an application its state is generally represented in memory. Temporal memory safety refers to the problem of guaranteeing that memory is always accessed with the most up to date information of its structure, its type. C++ unfortunately does not provide such guarantees. While there is appetite for different languages than C++ with stronger memory safety guarantees, large codebases such as Chromium will use C++ for the foreseeable future.



auto* foo = new Foo();

delete foo;

// The memory location pointed to by foo is not representing

// a Foo object anymore, as the object has been deleted (freed).

foo->Process();



In the example above, foo is used after its memory has been returned to the underlying system. The out-of-date pointer is called a dangling pointer and any access through it results in a use-after-free (UAF) access. In the best case such errors result in well-defined crashes, in the worst case they cause subtle breakage that can be exploited by malicious actors. 



UAFs are often hard to spot in larger codebases where ownership of objects is transferred between various components. The general problem is so widespread that to this date both industry and academia regularly come up with mitigation strategies. The examples are endless: C++ smart pointers of all kinds are used to better define and manage ownership on application level; static analysis in compilers is used to avoid compiling problematic code in the first place; where static analysis fails, dynamic tools such as C++ sanitizers can intercept accesses and catch problems on specific executions.



Chrome’s use of C++ is sadly no different here and the majority of high-severity security bugs are UAF issues. In order to catch issues before they reach production, all of the aforementioned techniques are used. In addition to regular tests, fuzzers ensure that there’s always new input to work with for dynamic tools. Chrome even goes further and employs a C++ garbage collector called Oilpan which deviates from regular C++ semantics but provides temporal memory safety where used. Where such deviation is unreasonable, a new kind of smart pointer called MiraclePtr was introduced recently to deterministically crash on accesses to dangling pointers when used. Oilpan, MiraclePtr, and smart-pointer-based solutions require significant adoptions of the application code.



Over the last years, another approach has seen some success: memory quarantine. The basic idea is to put explicitly freed memory into quarantine and only make it available when a certain safety condition is reached. In the Linux kernel a probabilistic approach was used where memory was eventually just recycled. A more elaborate approach uses heap scanning to avoid reusing memory that is still reachable from the application. This is similar to a garbage collected system in that it provides temporal memory safety by prohibiting reuse of memory that is still reachable. The rest of this article summarizes our journey of experimenting with quarantines and heap scanning in Chrome.



(At this point, one may ask where pointer authentication fits into this picture – keep on reading!)

Quarantining and Heap Scanning, the Basics

The main idea behind assuring temporal safety with quarantining and heap scanning is to avoid reusing memory until it has been proven that there are no more (dangling) pointers referring to it. To avoid changing C++ user code or its semantics, the memory allocator providing new and delete is intercepted.

Upon invoking delete, the memory is actually put in a quarantine, where it is unavailable for being reused for subsequent new calls by the application. At some point a heap scan is triggered which scans the whole heap, much like a garbage collector, to find references to quarantined memory blocks. Blocks that have no incoming references from the regular application memory are transferred back to the allocator where they can be reused for subsequent allocations.



There are various hardening options which come with a performance cost:

  • Overwrite the quarantined memory with special values (e.g. zero);

  • Stop all application threads when the scan is running or scan the heap concurrently;

  • Intercept memory writes (e.g. by page protection) to catch pointer updates;

  • Scan memory word by word for possible pointers (conservative handling) or provide descriptors for objects (precise handling);

  • Segregation of application memory in safe and unsafe partitions to opt-out certain objects which are either performance sensitive or can be statically proven as being safe to skip;

  • Scan the execution stack in addition to just scanning heap memory;



We call the collection of different versions of these algorithms StarScan [stɑː skæn], or *Scan for short.

Reality Check

We apply *Scan to the unmanaged parts of the renderer process and use Speedometer2 to evaluate the performance impact. 



We have experimented with different versions of *Scan. To minimize performance overhead as much as possible though, we evaluate a configuration that uses a separate thread to scan the heap and avoids clearing of quarantined memory eagerly on delete but rather clears quarantined memory when running *Scan. We opt in all memory allocated with new and don’t discriminate between allocation sites and types for simplicity in the first implementation.


Note that the proposed version of *Scan is not complete. Concretely, a malicious actor may exploit a race condition with the scanning thread by moving a dangling pointer from an unscanned to an already scanned memory region. Fixing this race condition requires keeping track of writes into blocks of already scanned memory, by e.g. using memory protection mechanisms to intercept those accesses, or stopping all application threads in safepoints from mutating the object graph altogether. Either way, solving this issue comes at a performance cost and exhibits an interesting performance and security trade-off. Note that this kind of attack is not generic and does not work for all UAF. Problems such as depicted in the introduction would not be prone to such attacks as the dangling pointer is not copied around.



Since the security benefits really depend on the granularity of such safepoints and we want to experiment with the fastest possible version, we disabled safepoints altogether.



Running our basic version on Speedometer2 regresses the total score by 8%. Bummer…



Where does all this overhead come from? Unsurprisingly, heap scanning is memory bound and quite expensive as the entire user memory must be walked and examined for references by the scanning thread.



To reduce the regression we implemented various optimizations that improve the raw scanning speed. Naturally, the fastest way to scan memory is to not scan it at all and so we partitioned the heap into two classes: memory that can contain pointers and memory that we can statically prove to not contain pointers, e.g. strings. We avoid scanning memory that cannot contain any pointers. Note that such memory is still part of the quarantine, it is just not scanned.



We extended this mechanism to also cover allocations that serve as backing memory for other allocators, e.g., zone memory that is managed by V8 for the optimizing JavaScript compiler. Such zones are always discarded at once (c.f. region-based memory management) and temporal safety is established through other means in V8.



On top, we applied several micro optimizations to speed up and eliminate computations: we use helper tables for pointer filtering; rely on SIMD for the memory-bound scanning loop; and minimize the number of fetches and lock-prefixed instructions.



We also improve upon the initial scheduling algorithm that just starts a heap scan when reaching a certain limit by adjusting how much time we spent in scanning compared to actually executing the application code (c.f. mutator utilization in garbage collection literature).



In the end, the algorithm is still memory bound and scanning remains a noticeably expensive procedure. The optimizations helped to reduce the Speedometer2 regression from 8% down to 2%.



While we improved raw scanning time, the fact that memory sits in a quarantine increases the overall working set of a process. To further quantify this overhead, we use a selected set of Chrome’s real-world browsing benchmarks to measure memory consumption. *Scan in the renderer process regresses memory consumption by about 12%. It’s this increase of the working set that leads to more memory being paged in which is noticeable on application fast paths.


Hardware Memory Tagging to the Rescue

MTE (Memory Tagging Extension) is a new extension on the ARM v8.5A architecture that helps with detecting errors in software memory use. These errors can be spatial errors (e.g. out-of-bounds accesses) or temporal errors (use-after-free). The extension works as follows. Every 16 bytes of memory are assigned a 4-bit tag. Pointers are also assigned a 4-bit tag. The allocator is responsible for returning a pointer with the same tag as the allocated memory. The load and store instructions verify that the pointer and memory tags match. In case the tags of the memory location and the pointer do not match a hardware exception is raised.



MTE doesn't offer a deterministic protection against use-after-free. Since the number of tag bits is finite there is a chance that the tag of the memory and the pointer match due to overflow. With 4 bits, only 16 reallocations are enough to have the tags match. A malicious actor may exploit the tag bit overflow to get a use-after-free by just waiting until the tag of a dangling pointer matches (again) the memory it is pointing to.



*Scan can be used to fix this problematic corner case. On each delete call the tag for the underlying memory block gets incremented by the MTE mechanism. Most of the time the block will be available for reallocation as the tag can be incremented within the 4-bit range. Stale pointers would refer to the old tag and thus reliably crash on dereference. Upon overflowing the tag, the object is then put into quarantine and processed by *Scan. Once the scan verifies that there are no more dangling pointers to this block of memory, it is returned back to the allocator. This reduces the number of scans and their accompanying cost by ~16x.



The following picture depicts this mechanism. The pointer to foo initially has a tag of 0x0E which allows it to be incremented once again for allocating bar. Upon invoking delete for bar the tag overflows and the memory is actually put into quarantine of *Scan.

We got our hands on some actual hardware supporting MTE and redid the experiments in the renderer process. The results are promising as the regression on Speedometer was within noise and we only regressed memory footprint by around 1% on Chrome’s real-world browsing stories.



Is this some actual free lunch? Turns out that MTE comes with some cost which has already been paid for. Specifically, PartitionAlloc, which is Chrome’s underlying allocator, already performs the tag management operations for all MTE-enabled devices by default. Also, for security reasons, memory should really be zeroed eagerly. To quantify these costs, we ran experiments on an early hardware prototype that supports MTE in several configurations:

  1. MTE disabled and without zeroing memory;

  2. MTE disabled but with zeroing memory;

  3. MTE enabled without *Scan;

  4. MTE enabled with *Scan;



(We are also aware that there’s synchronous and asynchronous MTE which also affects determinism and performance. For the sake of this experiment we kept using the asynchronous mode.) 

The results show that MTE and memory zeroing come with some cost which is around 2% on Speedometer2. Note that neither PartitionAlloc, nor hardware has been optimized for these scenarios yet. The experiment also shows that adding *Scan on top of MTE comes without measurable cost. 


Conclusions

C++ allows for writing high-performance applications but this comes at a price, security. Hardware memory tagging may fix some security pitfalls of C++, while still allowing high performance. We are looking forward to see a more broad adoption of hardware memory tagging in the future and suggest using *Scan on top of hardware memory tagging to fix temporary memory safety for C++. Both the used MTE hardware and the implementation of *Scan are prototypes and we expect that there is still room for performance optimizations.