Over 35,000 Java packages, amounting to over 8% of the Maven Central repository (the most significant Java package repository), have been impacted by the currently recently disclosed log4j vulnerabilities (1, 2), with widespread fallout across the software industry. The vulnerabilities allow and enable an attacker to perform remote code execution by exploiting the insecure JNDI lookups feature exposed by the logging library log4j. This exploitable feature was enabled by default in various versions of the library.
This vulnerability has captivated and attracted the information security ecosystem since its disclosure on December 9th because of both its severity and the widespread impact. As a popular logging tool, log4j is used by thousands of software packages (known as artifacts in the Java ecosystem) and projects across the software industry. User’s lack of visibility into their dependencies and transitive dependencies has made patching extremely difficult; it has also made it difficult to determine the full blast radius of this vulnerability. Using the Open Source Insights, a project to help and enable us to understand the open source dependencies, we surveyed all versions of all the artifacts in the Maven Central Repository to determine the scope of the issue and problem in the open source ecosystem of JVM based languages, and to track the ongoing efforts to mitigate the affected packets.
How widespread is the log4j vulnerability?
As of December 16th of the year 2021, we found that 35,863 of the available Java artifacts from Maven Central depend on the affected log4j code. This means that more than 8 per cent of all packages on Maven Central have at least one version that is impacted and affected by this vulnerability. (These numbers do not encompass all of the Java packages, such as directly distributed binaries, but Maven Central is a very strong proxy for the state of the ecosystem.) As far as ecosystem impact goes, approx 8% is hugely enormous. The average ecosystem impact of the advisories affecting or impacting the Maven Central is 2%, with the median less than 0.1%.
Direct dependencies account for approx 7,000 of the affected artifacts, meaning that any of its versions depend solely upon an affected version of the log4j-core or log4j-api, as has already been described in the CVEs. The majority of the affected artifacts come from indirectly depends (that is, the dependencies of one’s own dependencies), meaning log4j is not explicitly defined as a dependency of the artifact, but it somehow gets pulled in as a transitive dependency.
What is the current and recent progress in fixing the open source JVM ecosystem?
We counted an artifact as fixed if the artifact had at least one version gets affected and has released a greater stable version (according to semantic versioning) that stays unaffected. An artifact affected by log4j is considered fixed if it has updated to the 2.16.0 or removed and diminished its dependency on the log4j together. At the time of its writing, nearly five thousand of the impacted and affected artifacts have been fixed. This represents a very rapid and quick response and mammoth effort both by the log4j maintainers and the wider community of the open source consumers. That leaves us with over 30,000 artifacts affected; many of these are dependent on another artifact to patch (the transitive dependency) and are extremely likely to be blocked.
Why is fixing the JVM ecosystem so hard?
Most of the artifacts that depend on the log4j do so indirectly. The deeper the vulnerability is in a dependency chain, the more number of steps are required for it to be fixed. For greater than 80% of the packages, the vulnerability is more than one level deep, with a majority affected five levels down. These packages will require fixes throughout all the parts of the tree, starting right from the deepest dependencies first.
Another difficulty which is generally caused by the ecosystem-level choices in the dependency resolution algorithm and requirement specification conventions. In the Java ecosystem, it is common practice to specify the “soft” version requirements — the exact versions that are used by the resolution algorithm if no other version of the same package gets appeared earlier in the dependency graph. Propagating or introducing a fix often requires explicit action to be taken by the maintainers to update the dependency requirements to a patched version. This practice is in contrast to the other ecosystems, such as the npm, where it’s very common for developers to specify the open ranges for dependency requirements. Opened ranges enable and allow the resolution algorithm to select the most recently released the version that satisfies and quenches the dependency requirements, thereby pulling in new and advanced fixes for the issues. Consumers can get a patched version on the next build after the patch is available, which propagates and intensifies up the dependencies quickly.
How long will it take for this vulnerability to be fixed across the whole ecosystem?
It’s extremely hard to say. We looked at all publicly disclosed critical advisories affecting the Maven packages to get a sense of how quickly other vulnerabilities have been fully addressed and acknowledged. Less than half i.e. 48% of the artifacts affected by vulnerability have been fixed, so we might actually be in for a long wait, likely years. But things are looking quite promising on the log4j front. After less than a week, 4,620 affected artifacts (~13%) have been fixed so far. This, more than any other statistics, speaks to the massive effort by the open source maintainers, information security teams and the consumers all across the globe.
Thank you very much for reading this blog! I hope you have a wonderful rest of your day! In case of any queries or doubts please feel free to reach out to us!