diff mbox series

[09/10] support/scripts/{pkg-stats, cve.py, cve-checker}: support CPE ID based matching

Message ID 20201104145145.1316167-10-thomas.petazzoni@bootlin.com
State Changes Requested
Headers show
Series Introduce CPE ID matching for CVEs | expand

Commit Message

Thomas Petazzoni Nov. 4, 2020, 2:51 p.m. UTC
This commit modifies cve.py, as well as its users cve-checker and
pkg-stats to support CPE ID based matching, for packages that have CPE
ID information.

One of the non-trivial thing is that we can't simply iterate over all
CVEs, and then iterate over all our packages to see which packages
have CPE ID information that match the CPEs affected by the
CVE. Indeed, this is an O(n^2) operation.

So instead, we do a pre-filtering of packages potentially affected. In
check_package_cves(), we build a cpe_product_pkgs dict that associates
a CPE product name to the packages that have this CPE product
name. The CPE product name is either derived from the CPE information
provided by the package if available, and otherwise we use the package
name, which is what was used prior to this patch.

And then, when we look at CVEs, we only consider the packages that
have a CPE product name matching the CPE products affected by the
CVEs. This is done in check_package_cve_affects().

Note that there is a bit of duplication of logic between cve-checker
and pkg-stats, but we intend in a follow-up series to re-unify those
two scripts.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 support/scripts/cve-checker | 24 +++++++++++++++++-----
 support/scripts/cve.py      | 41 +++++++++++++++++++++++++++++--------
 support/scripts/pkg-stats   | 25 +++++++++++++++-------
 3 files changed, 70 insertions(+), 20 deletions(-)

Comments

Matt Weber Nov. 4, 2020, 6:33 p.m. UTC | #1
Thomas / Greg,

On Wed, Nov 4, 2020 at 8:52 AM Thomas Petazzoni
<thomas.petazzoni@bootlin.com> wrote:
>
> This commit modifies cve.py, as well as its users cve-checker and
> pkg-stats to support CPE ID based matching, for packages that have CPE
> ID information.
>
> One of the non-trivial thing is that we can't simply iterate over all
> CVEs, and then iterate over all our packages to see which packages
> have CPE ID information that match the CPEs affected by the
> CVE. Indeed, this is an O(n^2) operation.
>
> So instead, we do a pre-filtering of packages potentially affected. In
> check_package_cves(), we build a cpe_product_pkgs dict that associates
> a CPE product name to the packages that have this CPE product
> name. The CPE product name is either derived from the CPE information
> provided by the package if available, and otherwise we use the package
> name, which is what was used prior to this patch.
>
> And then, when we look at CVEs, we only consider the packages that
> have a CPE product name matching the CPE products affected by the
> CVEs. This is done in check_package_cve_affects().
>
> Note that there is a bit of duplication of logic between cve-checker
> and pkg-stats, but we intend in a follow-up series to re-unify those
> two scripts.
>
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> ---
>  support/scripts/cve-checker | 24 +++++++++++++++++-----
>  support/scripts/cve.py      | 41 +++++++++++++++++++++++++++++--------
>  support/scripts/pkg-stats   | 25 +++++++++++++++-------
>  3 files changed, 70 insertions(+), 20 deletions(-)
>
> diff --git a/support/scripts/cve-checker b/support/scripts/cve-checker
> index 421202d049..e1dfab3805 100755
> --- a/support/scripts/cve-checker
> +++ b/support/scripts/cve-checker

I noticed this tool isn't described in the manual with at least an
example invocation noting why the nvd folder is being cached and the
piping of show-info output, etc.

Basic test config:

BR2_aarch64=y
BR2_TOOLCHAIN_EXTERNAL=y
BR2_LINUX_KERNEL=y
BR2_LINUX_KERNEL_CUSTOM_VERSION=y
BR2_LINUX_KERNEL_CUSTOM_VERSION_VALUE="4.16.7"
BR2_LINUX_KERNEL_USE_CUSTOM_CONFIG=y
BR2_LINUX_KERNEL_CUSTOM_CONFIG_FILE="board/qemu/aarch64-virt/linux.config"
BR2_LINUX_KERNEL_NEEDS_HOST_OPENSSL=y
BR2_PACKAGE_LIBSODIUM=y
BR2_PACKAGE_LIBSSH=y
BR2_PACKAGE_LIBSSH2=y
BR2_PACKAGE_MBEDTLS=y
BR2_PACKAGE_OPENSSL=y
BR2_TARGET_ROOTFS_EXT2=y
# BR2_TARGET_ROOTFS_TAR is not set

make show-info | support/scripts/cve-checker --html cve.html
--nvd-path ~/nvd_dl/

Result:  https://pastebin.ubuntu.com/p/H4hZzxgsCZ/

[snip]

> diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
> index 0a48cf9581..f357cbe1b6 100755
> --- a/support/scripts/pkg-stats
> +++ b/support/scripts/pkg-stats

support/scripts/pkg-stats --html cve.html -p linux --nvd-path ~/nvd_dl --cpeid

Results:  https://pastebin.ubuntu.com/p/RKF3F4bCcG/

Tested-by: Matt Weber <matthew.weber@rockwellcollins.com>
Peter Korsgaard Nov. 5, 2020, 8:46 a.m. UTC | #2
>>>>> "Matthew" == Matthew Weber <matthew.weber@rockwellcollins.com> writes:

 > Thomas / Greg,
 >> diff --git a/support/scripts/cve-checker b/support/scripts/cve-checker
 >> index 421202d049..e1dfab3805 100755
 >> --- a/support/scripts/cve-checker
 >> +++ b/support/scripts/cve-checker

 > I noticed this tool isn't described in the manual with at least an
 > example invocation noting why the nvd folder is being cached and the
 > piping of show-info output, etc.

Yes, I noticed that as well while updating CHANGES for 2020.11-rc1. It
also reads stdin before handling the command line arguments, so you
cannot just run ./support/scripts/cve-checker --help to get usage info
:/
Thomas Petazzoni Nov. 5, 2020, 8:55 a.m. UTC | #3
On Thu, 05 Nov 2020 09:46:38 +0100
Peter Korsgaard <peter@korsgaard.com> wrote:

> Yes, I noticed that as well while updating CHANGES for 2020.11-rc1. It
> also reads stdin before handling the command line arguments, so you
> cannot just run ./support/scripts/cve-checker --help to get usage info
> :/

See patch 02/10 in this series, which fixes precisely this.

However, I am about to send a patch series that drops cve-checker
entirely, by making pkg-stats provide the same functionality.

Thomas
Gregory CLEMENT Nov. 5, 2020, 2:55 p.m. UTC | #4
Hello Thomas,

> -    def affects(self, name, version, cve_ignore_list):
> +    def affects(self, name, version, cve_ignore_list, cpeid=None):
>          """
>          True if the Buildroot Package object passed as argument is affected
>          by this CVE.
> @@ -199,8 +220,12 @@ class CVE:
>              print("Cannot parse package '%s' version '%s'" % (name, version))
>              pkg_version = None
>  
> +        # if we don't have a cpeid, build one based on name and version
> +        if not cpeid:
> +            cpeid = "cpe:2.3:*:*:%s:%s:*:*:*:*:*:*:*" % (name, version)
> +
>          for cpe in self.each_cpe():
> -            if cpe['product'] != name:
> +            if not cpe_matches(cpe['id'], cpeid):
>                  continue

Here you compare the full cpeid including the version to the cpeid
associated to the CVE. But if the CVE is about a range of version (using
versionStartIncluding for instance), then this test may file was
actually the package would be affected because the version is inside the
range of version affected.

Or maybe I missed something in this case could you point me where I am
wrong ?

Gregory

>              if not cpe['v_start'] and not cpe['v_end']:
>                  return self.CVE_AFFECTS
> diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
> index 0a48cf9581..f357cbe1b6 100755
Thomas Petazzoni Nov. 5, 2020, 4:59 p.m. UTC | #5
On Thu, 05 Nov 2020 15:55:56 +0100
Gregory CLEMENT <gregory.clement@bootlin.com> wrote:

> > +        # if we don't have a cpeid, build one based on name and version
> > +        if not cpeid:
> > +            cpeid = "cpe:2.3:*:*:%s:%s:*:*:*:*:*:*:*" % (name, version)
> > +
> >          for cpe in self.each_cpe():
> > -            if cpe['product'] != name:
> > +            if not cpe_matches(cpe['id'], cpeid):
> >                  continue  
> 
> Here you compare the full cpeid including the version to the cpeid
> associated to the CVE. But if the CVE is about a range of version (using
> versionStartIncluding for instance), then this test may file was
> actually the package would be affected because the version is inside the
> range of version affected.

So, a package will have a CPE ID like this:

  cpe:2.3:a:vendor:product:1.0.4:*:*:*:*:*:*:*

Then, a CVE will have two cases:

 - Either it has a CPE ID that includes directly a version, like:

   cpe:2.3:a:vendor:product:1.0.3:*:*:*:*:*:*:*

   In this case, the cpe_matches() function will return False, because
   indeed 1.0.3 isn't the same as 1.0.4

 - Or it has a CPE ID that does *NOT* include a version, because the
   version is specified separately through versionStartIncluding and
   similar properties. In this case, the CPE ID of the CVE will look
   like this:

   cpe:2.3:a:vendor:product:*:*:*:*:*:*:*:*

   Notice how the "version" field is "*". The cpe_matches() function
   handles "*" as a wildcard, and will allow it to match any value. So
   "*" matches "1.0.4", which means in this situation, cpe_matches()
   will return True, so the code logic will continue, and test if we're
   in the version range or not.

The code goes like this:

            if not cpe_matches(cpe['id'], cpeid):
                # The CPE doesn't match, so skip
                continue

            if not cpe['v_start'] and not cpe['v_end']:
                # The CPE matches *and* we don't have a version range, so we know the CVE affects us
                return self.CVE_AFFECTS

            if not pkg_version:
                # The version of the package couldn't be parsed, so we're not able to compare it
                # with distutils.version.LooseVersion(), so skip
                continue

            # and then here we handle the version range (code was not changed)

Does this explain how it works ? Let me know if you still see an issue,
because I could also be missing something.

Note: I checked the JSON output of pkg-stats before and after this
commit, and it is identical.

Best regards,

Thomas
Gregory CLEMENT Nov. 6, 2020, 2:48 p.m. UTC | #6
Hi Thomas,

> On Thu, 05 Nov 2020 15:55:56 +0100
> Gregory CLEMENT <gregory.clement@bootlin.com> wrote:
>
>> > +        # if we don't have a cpeid, build one based on name and version
>> > +        if not cpeid:
>> > +            cpeid = "cpe:2.3:*:*:%s:%s:*:*:*:*:*:*:*" % (name, version)
>> > +
>> >          for cpe in self.each_cpe():
>> > -            if cpe['product'] != name:
>> > +            if not cpe_matches(cpe['id'], cpeid):
>> >                  continue  
>> 
>> Here you compare the full cpeid including the version to the cpeid
>> associated to the CVE. But if the CVE is about a range of version (using
>> versionStartIncluding for instance), then this test may file was
>> actually the package would be affected because the version is inside the
>> range of version affected.
>
> So, a package will have a CPE ID like this:
>
>   cpe:2.3:a:vendor:product:1.0.4:*:*:*:*:*:*:*
>
> Then, a CVE will have two cases:
>
>  - Either it has a CPE ID that includes directly a version, like:
>
>    cpe:2.3:a:vendor:product:1.0.3:*:*:*:*:*:*:*
>
>    In this case, the cpe_matches() function will return False, because
>    indeed 1.0.3 isn't the same as 1.0.4
>
>  - Or it has a CPE ID that does *NOT* include a version, because the
>    version is specified separately through versionStartIncluding and
>    similar properties. In this case, the CPE ID of the CVE will look
>    like this:
>
>    cpe:2.3:a:vendor:product:*:*:*:*:*:*:*:*
>
>    Notice how the "version" field is "*". The cpe_matches() function
>    handles "*" as a wildcard, and will allow it to match any value. So
>    "*" matches "1.0.4", which means in this situation, cpe_matches()
>    will return True, so the code logic will continue, and test if we're
>    in the version range or not.
>
> The code goes like this:
>
>             if not cpe_matches(cpe['id'], cpeid):
>                 # The CPE doesn't match, so skip
>                 continue
>
>             if not cpe['v_start'] and not cpe['v_end']:
>                 # The CPE matches *and* we don't have a version range, so we know the CVE affects us
>                 return self.CVE_AFFECTS
>
>             if not pkg_version:
>                 # The version of the package couldn't be parsed, so we're not able to compare it
>                 # with distutils.version.LooseVersion(), so skip
>                 continue
>
>             # and then here we handle the version range (code was not changed)
>
> Does this explain how it works ? Let me know if you still see an issue,
> because I could also be missing something.

Yes it makes sens now. The key point is that when using the keywords
like versionStartIncluding, then the CPE ID referenced in database will
use wildcard for the version. I overlooked this.

Thanks for the explanation.

Gregory

>
> Note: I checked the JSON output of pkg-stats before and after this
> commit, and it is identical.
>
> Best regards,
>
> Thomas
> -- 
> Thomas Petazzoni, CTO, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
diff mbox series

Patch

diff --git a/support/scripts/cve-checker b/support/scripts/cve-checker
index 421202d049..e1dfab3805 100755
--- a/support/scripts/cve-checker
+++ b/support/scripts/cve-checker
@@ -23,6 +23,7 @@  import os
 import json
 import sys
 import cve as cvecheck
+from collections import defaultdict
 
 
 class Package:
@@ -34,15 +35,28 @@  class Package:
         self.ignored_cves = ignored_cves
 
 
+def check_package_cve_affects(cve, cpe_product_pkgs):
+    for product in cve.affected_products:
+        if not product in cpe_product_pkgs:
+            continue
+        for pkg in cpe_product_pkgs[product]:
+            if cve.affects(pkg.name, pkg.version, pkg.ignored_cves, pkg.cpeid) == cve.CVE_AFFECTS:
+                pkg.cves.append(cve.identifier)
+
 def check_package_cves(nvd_path, packages):
     if not os.path.isdir(nvd_path):
         os.makedirs(nvd_path)
 
+    cpe_product_pkgs = defaultdict(list)
+    for pkg in packages:
+        if pkg.cpeid:
+            cpe_product = cvecheck.cpe_product(pkg.cpeid)
+            cpe_product_pkgs[cpe_product].append(pkg)
+        else:
+            cpe_product_pkgs[pkg.name].append(pkg)
+
     for cve in cvecheck.CVE.read_nvd_dir(nvd_path):
-        for pkg_name in cve.pkg_names:
-            pkg = packages.get(pkg_name, '')
-            if pkg and cve.affects(pkg.name, pkg.version, pkg.ignored_cves) == cve.CVE_AFFECTS:
-                pkg.cves.append(cve.identifier)
+        check_package_cve_affects(cve, cpe_product_pkgs)
 
 
 html_header = """
@@ -199,7 +213,7 @@  def __main__():
     date = datetime.datetime.utcnow()
 
     print("Checking packages CVEs")
-    check_package_cves(args.nvd_path, {p.name: p for p in packages})
+    check_package_cves(args.nvd_path, packages)
 
     if args.html:
         print("Write HTML")
diff --git a/support/scripts/cve.py b/support/scripts/cve.py
index e7472cd470..6e97ea193f 100755
--- a/support/scripts/cve.py
+++ b/support/scripts/cve.py
@@ -47,6 +47,24 @@  ops = {
 }
 
 
+# Check if two CPE IDs match each other
+def cpe_matches(cpe1, cpe2):
+    cpe1_elems = cpe1.split(":")
+    cpe2_elems = cpe2.split(":")
+
+    remains = filter(lambda x: x[0] not in ["*", "-"] and x[1] not in ["*", "-"] and x[0] != x[1],
+                     zip(cpe1_elems, cpe2_elems))
+    return len(list(remains)) == 0
+
+
+def cpe_product(cpe):
+    return cpe.split(':')[4]
+
+
+def cpe_version(cpe):
+    return cpe.split(':')[5]
+
+
 class CVE:
     """An accessor class for CVE Items in NVD files"""
     CVE_AFFECTS = 1
@@ -134,7 +152,11 @@  class CVE:
         for cpe in node.get('cpe_match', ()):
             if not cpe['vulnerable']:
                 return
-            vendor, product, version = cpe['cpe23Uri'].split(':')[3:6]
+            product = cpe_product(cpe['cpe23Uri'])
+            version = cpe_version(cpe['cpe23Uri'])
+            # ignore when product is '-', which means N/A
+            if product == '-':
+                return
             op_start = ''
             op_end = ''
             v_start = ''
@@ -163,8 +185,7 @@  class CVE:
                     v_end = cpe['versionEndExcluding']
 
             yield {
-                'vendor': vendor,
-                'product': product,
+                'id': cpe['cpe23Uri'],
                 'v_start': v_start,
                 'op_start': op_start,
                 'v_end': v_end,
@@ -182,11 +203,11 @@  class CVE:
         return self.nvd_cve['cve']['CVE_data_meta']['ID']
 
     @property
-    def pkg_names(self):
-        """The set of package names referred by this CVE definition"""
-        return set(p['product'] for p in self.each_cpe())
+    def affected_products(self):
+        """The set of CPE products referred by this CVE definition"""
+        return set(cpe_product(p['id']) for p in self.each_cpe())
 
-    def affects(self, name, version, cve_ignore_list):
+    def affects(self, name, version, cve_ignore_list, cpeid=None):
         """
         True if the Buildroot Package object passed as argument is affected
         by this CVE.
@@ -199,8 +220,12 @@  class CVE:
             print("Cannot parse package '%s' version '%s'" % (name, version))
             pkg_version = None
 
+        # if we don't have a cpeid, build one based on name and version
+        if not cpeid:
+            cpeid = "cpe:2.3:*:*:%s:%s:*:*:*:*:*:*:*" % (name, version)
+
         for cpe in self.each_cpe():
-            if cpe['product'] != name:
+            if not cpe_matches(cpe['id'], cpeid):
                 continue
             if not cpe['v_start'] and not cpe['v_end']:
                 return self.CVE_AFFECTS
diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
index 0a48cf9581..f357cbe1b6 100755
--- a/support/scripts/pkg-stats
+++ b/support/scripts/pkg-stats
@@ -540,17 +540,28 @@  async def check_package_latest_version(packages):
         await asyncio.wait(tasks)
 
 
+def check_package_cve_affects(cve, cpe_product_pkgs):
+    for product in cve.affected_products:
+        if not product in cpe_product_pkgs:
+            continue
+        for pkg in cpe_product_pkgs[product]:
+            if cve.affects(pkg.name, pkg.current_version, pkg.ignored_cves, pkg.cpeid) == cve.CVE_AFFECTS:
+                pkg.cves.append(cve.identifier)
+
 def check_package_cves(nvd_path, packages):
     if not os.path.isdir(nvd_path):
         os.makedirs(nvd_path)
 
-    for cve in cvecheck.CVE.read_nvd_dir(nvd_path):
-        for pkg_name in cve.pkg_names:
-            if pkg_name in packages:
-                pkg = packages[pkg_name]
-                if cve.affects(pkg.name, pkg.current_version, pkg.ignored_cves) == cve.CVE_AFFECTS:
-                    pkg.cves.append(cve.identifier)
+    cpe_product_pkgs = defaultdict(list)
+    for pkg in packages:
+        if pkg.cpeid:
+            cpe_product = cvecheck.cpe_product(pkg.cpeid)
+            cpe_product_pkgs[cpe_product].append(pkg)
+        else:
+            cpe_product_pkgs[pkg.name].append(pkg)
 
+    for cve in cvecheck.CVE.read_nvd_dir(nvd_path):
+        check_package_cve_affects(cve, cpe_product_pkgs)
 
 def calculate_stats(packages):
     stats = defaultdict(int)
@@ -1017,7 +1028,7 @@  def __main__():
     loop.run_until_complete(check_package_latest_version(packages))
     if args.nvd_path:
         print("Checking packages CVEs")
-        check_package_cves(args.nvd_path, {p.name: p for p in packages})
+        check_package_cves(args.nvd_path, packages)
     print("Calculate stats")
     stats = calculate_stats(packages)
     if args.html: