diff mbox series

[NEXT,06/26] boa: add CPE id

Message ID 1519697441-54194-7-git-send-email-matthew.weber@rockwellcollins.com
State Changes Requested
Headers show
Series Package CVE Reporting | expand

Commit Message

Matt Weber Feb. 27, 2018, 2:10 a.m. UTC
Signed-off-by: Matthew Weber <matthew.weber@rockwellcollins.com>
---
 package/boa/boa.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Thomas Petazzoni Feb. 27, 2018, 10:17 p.m. UTC | #1
Hello,

On Mon, 26 Feb 2018 20:10:21 -0600, Matt Weber wrote:
> Signed-off-by: Matthew Weber <matthew.weber@rockwellcollins.com>
> ---
>  package/boa/boa.mk | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/package/boa/boa.mk b/package/boa/boa.mk
> index d8bcaa1..1ded702 100644
> --- a/package/boa/boa.mk
> +++ b/package/boa/boa.mk
> @@ -8,7 +8,7 @@ BOA_VERSION = 0.94.14rc21
>  BOA_SITE = http://www.boa.org
>  BOA_LICENSE = GPL-2.0+
>  BOA_LICENSE_FILES = COPYING
> -
> +BOA_CPE_ID = $(BOE_NAME):$(BOE_NAME):$(BOE_VERSION)

Typo: BOE -> BOA

Also, please keep an empty line before BOA_INSTALL_TARGET_CMDS.

Now, a more general discussion about <pkg>_CPE_ID. Here is the list of
CPE_ID you have added:

BASH_CPE_ID = gnu:$(BASH_NAME):$(BASH_VERSION)
BOA_CPE_ID = $(BOE_NAME):$(BOE_NAME):$(BOE_VERSION)
BOOST_CPE_ID = $(BOOST_NAME):$(BOOST_NAME):$(BOOST_VERSION)
BUSYBOX_CPE_ID = $(BUSYBOX_NAME):$(BUSYBOX_NAME):$(BUSYBOX_VERSION)
BZIP2_CPE_ID=bzip:$(BZIP2_NAME):$(BZIP2_VERSION)
DHCP_CPE_ID = isc:$(DHCP_NAME):$(DHCP_VERSION)
E2FSPROGS_CPE_ID = e2fsprogs_project:$(E2FSPROGS_NAME):$(E2FSPROGS_VERSION)
GDB_CPE_ID = gnu:$(GDB_NAME):$(GDB_VERSION)
GLIBC_CPE_ID = gnu:$(GLIBC_NAME):$(GLIBC_VERSION)
GNUPG_CPE_ID = $(GNUPG_NAME):$(GNUPG_NAME):$(GNUPG_VERSION)
GZIP_CPE_ID = gnu:$(GZIP_NAME):$(GZIP_VERSION)
IPROUTE2_CPE_ID = iproute2_project:$(IPROUTE2_NAME):$(IPROUTE2_VERSION)
LIBGCRYPT_CPE_ID = gnupg:$(LIBGCRYPT_NAME):$(LIBGCRYPT_VERSION)
LIBOPENSSL_CPE_ID = openssl:openssl:$(LIBOPENSSL_VERSION)
LIBZLIB_CPE_ID = gnu:zlib:$(LIBZLIB_VERSION)
LINUX_CPE_ID = $(LINUX_NAME):$(LINUX_NAME)_kernel:$(LINUX_VERSION)
LINUX_HEADERS_CPE_ID = linux:linux_kernel:$(LINUX_HEADERS_VERSION)
OPENSSH_CPE_ID = openbsd:$(OPENSSH_NAME):$(OPENSSH_VERSION)
RSYSLOG_CPE_ID = $(RSYSLOG_NAME):$(RSYSLOG_NAME):$(RSYSLOG_VERSION)
TCPDUMP_CPE_ID = $(TCPDUMP_NAME):$(TCPDUMP_NAME):$(TCPDUMP_VERSION)
UTIL_LINUX_CPE_ID = util-linux_project:$(UTIL_LINUX_NAME):$(UTIL_LINUX_VERSION)
XERCES_CPE_ID = apache:$(XERCES_NAME)-c\+\+:$(XERCES_VERSION)

There is clearly a pattern, no?

Does it make sense to define in the infra:

$(2)_CPE_ID_VENDOR ?= $$($(2)_NAME)
$(2)_CPE_ID_NAME ?= $$($(2)_NAME)
$(2)_CPE_ID_VERSION ?= $$($(2)_VERSION)
$(2)_CPE_ID ?= $($(2)_CPE_ID_VENDOR):$($(2)_CPE_ID_NAME):$($(2)_CPE_ID_VERSION)

With this, in many packages, you can avoid <pkg>_CPE_ID completely. In
a few other packages you'll have to do:

LIBZLIB_CPE_ID_VENDOR = gnu
LIBZLIB_CPE_ID_NAME = zlib

The drawback is obviously that all packages will suddenly have a
<pkg>_CPE_ID value, even if this value may potentially be incorrect
because it hasn't been checked. And this may be a real problem, so
probably your solution is better, I just wanted to point out the
(admittedly limited) duplication.

Best regards,

Thomas
Matt Weber Feb. 28, 2018, 4 a.m. UTC | #2
Thomas,

On Tue, Feb 27, 2018 at 4:17 PM, Thomas Petazzoni
<thomas.petazzoni@bootlin.com> wrote:
> Hello,
>
> On Mon, 26 Feb 2018 20:10:21 -0600, Matt Weber wrote:
>> Signed-off-by: Matthew Weber <matthew.weber@rockwellcollins.com>
>> ---
>>  package/boa/boa.mk | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/package/boa/boa.mk b/package/boa/boa.mk
>> index d8bcaa1..1ded702 100644
>> --- a/package/boa/boa.mk
>> +++ b/package/boa/boa.mk
>> @@ -8,7 +8,7 @@ BOA_VERSION = 0.94.14rc21
>>  BOA_SITE = http://www.boa.org
>>  BOA_LICENSE = GPL-2.0+
>>  BOA_LICENSE_FILES = COPYING
>> -
>> +BOA_CPE_ID = $(BOE_NAME):$(BOE_NAME):$(BOE_VERSION)
>
> Typo: BOE -> BOA
>
> Also, please keep an empty line before BOA_INSTALL_TARGET_CMDS.
>

Noted, will fix in v2

[snip]
> There is clearly a pattern, no?
>
> Does it make sense to define in the infra:
>
> $(2)_CPE_ID_VENDOR ?= $$($(2)_NAME)
> $(2)_CPE_ID_NAME ?= $$($(2)_NAME)
> $(2)_CPE_ID_VERSION ?= $$($(2)_VERSION)
> $(2)_CPE_ID ?= $($(2)_CPE_ID_VENDOR):$($(2)_CPE_ID_NAME):$($(2)_CPE_ID_VERSION)
>
> With this, in many packages, you can avoid <pkg>_CPE_ID completely. In
> a few other packages you'll have to do:
>
> LIBZLIB_CPE_ID_VENDOR = gnu
> LIBZLIB_CPE_ID_NAME = zlib
>

Where it gets difficult is when a package has multiple CPEs so that
you catch all possible combinations that could be relevant for the
package.  Eventually there shouldn't be multiple once the dictionary
at NIST is cleaned up......

One thing I didn't think about with the multiple CPEs is the logic to
detect if a CPE needs updating in the CPE dictionary.  We'd have to
look at all possible CPE id options to see if any match.  This may
lead to us wanting to push to update the primary CPE ID sooner then
later and align all our packages to use it instead of supporting
multiple.  Then we could use the logic of having all packages set the
LIBZLIB_CPE_ID_VENDOR and key off that to know we report that packages
info.

> The drawback is obviously that all packages will suddenly have a
> <pkg>_CPE_ID value, even if this value may potentially be incorrect
> because it hasn't been checked. And this may be a real problem, so
> probably your solution is better, I just wanted to point out the
> (admittedly limited) duplication.

Yeah, eventually I'd argue we could revert back to infra logic and
clean it all up once there is a baseline or majority of packages with
valid CPE IDs.    However for now, I default those we don't have to
unknown so it's clear for a developer to go research or request a new
CPE.

Matt
Thomas Petazzoni Feb. 28, 2018, 6:38 a.m. UTC | #3
Hello,

On Tue, 27 Feb 2018 22:00:31 -0600, Matthew Weber wrote:

> > LIBZLIB_CPE_ID_VENDOR = gnu
> > LIBZLIB_CPE_ID_NAME = zlib
> >  
> 
> Where it gets difficult is when a package has multiple CPEs so that
> you catch all possible combinations that could be relevant for the
> package.  Eventually there shouldn't be multiple once the dictionary
> at NIST is cleaned up......

Well, my proposal contained:

$(2)_CPE_ID ?= $($(2)_CPE_ID_VENDOR):$($(2)_CPE_ID_NAME):$($(2)_CPE_ID_VERSION)

Notice the ?=, which still allows a given package to override its
<pkg>_CPE_ID value entirely, potentially with a list of multiple values.

> One thing I didn't think about with the multiple CPEs is the logic to
> detect if a CPE needs updating in the CPE dictionary.  We'd have to
> look at all possible CPE id options to see if any match.  This may
> lead to us wanting to push to update the primary CPE ID sooner then
> later and align all our packages to use it instead of supporting
> multiple.  Then we could use the logic of having all packages set the
> LIBZLIB_CPE_ID_VENDOR and key off that to know we report that packages
> info.

I am not sure what is the story being multiple CPE IDs. Is it the NIST
database having two entries for what is essentially the same upstream
software ? If that's the case, then it really is a bug in the NIST
database, which should rather be fixed in the database rather than
worked around on our side by supporting CPE IDs, no ?

Of course, fixing the NIST database is the ideal situation, but it is
only reasonable if the NIST people are interested in fixing those
issues in a timely fashion. If they are not, then I'd say it's fine to
support multiple CPE IDs in Buildroot for the time being.

> > The drawback is obviously that all packages will suddenly have a
> > <pkg>_CPE_ID value, even if this value may potentially be incorrect
> > because it hasn't been checked. And this may be a real problem, so
> > probably your solution is better, I just wanted to point out the
> > (admittedly limited) duplication.  
> 
> Yeah, eventually I'd argue we could revert back to infra logic and
> clean it all up once there is a baseline or majority of packages with
> valid CPE IDs.    However for now, I default those we don't have to
> unknown so it's clear for a developer to go research or request a new
> CPE.

Yes, that is a possibility. As I said, I do realize that my proposal
has the significant drawback of not allowing to identify clearly which
package has a correct CPE_ID (voluntarily added by a developer) vs.
some potentially random CPE_ID (set by default by the package
infrastructure).

Thomas
Arnout Vandecappelle March 1, 2018, 8:47 p.m. UTC | #4
On 28-02-18 07:38, Thomas Petazzoni wrote:
>>> The drawback is obviously that all packages will suddenly have a
>>> <pkg>_CPE_ID value, even if this value may potentially be incorrect
>>> because it hasn't been checked. And this may be a real problem, so
>>> probably your solution is better, I just wanted to point out the
>>> (admittedly limited) duplication.  
>> Yeah, eventually I'd argue we could revert back to infra logic and
>> clean it all up once there is a baseline or majority of packages with
>> valid CPE IDs.    However for now, I default those we don't have to
>> unknown so it's clear for a developer to go research or request a new
>> CPE.
> Yes, that is a possibility. As I said, I do realize that my proposal
> has the significant drawback of not allowing to identify clearly which
> package has a correct CPE_ID (voluntarily added by a developer) vs.
> some potentially random CPE_ID (set by default by the package
> infrastructure).

 Well, this in the end boils down to the question: what are you going to do with
that cpe-manifest.csv? I think that whatever script is going to use that file,
it will have to be robust against CPE_IDs that don't actually exist in the CPE
database. Indeed, when someone bumps a package, will they have to verify if the
new CPE ID is still valid? If it isn't, what should be done? Report to NIST, of
course, but while the database isn't updated, should we just remove the CPE_ID
or what?

 To help a bit with deciding on this, it could be nice if you could add an RFC
patch with a script that takes the cpe-manifest.csv and outputs something real -
i.e., a list of open and closed CVEs. The latter will also help to evaluate to
what extent this whole thing is useful. In other words, how accurate is that
generated list? It's a question we had ad the BR developer meeting and which
still remains unanswered IMO.

 Regards,
 Arnout
Matt Weber March 1, 2018, 10:55 p.m. UTC | #5
Arnout,

On Thu, Mar 1, 2018 at 2:47 PM, Arnout Vandecappelle <arnout@mind.be> wrote:
>
>
>
> On 28-02-18 07:38, Thomas Petazzoni wrote:
> >>> The drawback is obviously that all packages will suddenly have a
> >>> <pkg>_CPE_ID value, even if this value may potentially be incorrect
> >>> because it hasn't been checked. And this may be a real problem, so
> >>> probably your solution is better, I just wanted to point out the
> >>> (admittedly limited) duplication.
> >> Yeah, eventually I'd argue we could revert back to infra logic and
> >> clean it all up once there is a baseline or majority of packages with
> >> valid CPE IDs.    However for now, I default those we don't have to
> >> unknown so it's clear for a developer to go research or request a new
> >> CPE.
> > Yes, that is a possibility. As I said, I do realize that my proposal
> > has the significant drawback of not allowing to identify clearly which
> > package has a correct CPE_ID (voluntarily added by a developer) vs.
> > some potentially random CPE_ID (set by default by the package
> > infrastructure).
>
>  Well, this in the end boils down to the question: what are you going to do with
> that cpe-manifest.csv? I think that whatever script is going to use that file,
> it will have to be robust against CPE_IDs that don't actually exist in the CPE
> database.

Correct, from our experience so far, the free and paid tools which use
this data are flexible on duplicate or invalid entries.  The key is
how they store the analysis and the manual updates you put in after
the fact.  Most take that manual adjustment and help produce a better
analysis on the next run.  If we're looking at making a script to do a
CPE to CVE analysis as part of buildroot, I'd argue (at this point)
that is a fairly involved task to get good data out.  Which is why
I've focused just on getting an initial cpe report and then mechanisms
to automate the upkeep and NIST version bump requests.


> Indeed, when someone bumps a package, will they have to verify if the
> new CPE ID is still valid? If it isn't, what should be done? Report to NIST, of
> course, but while the database isn't updated, should we just remove the CPE_ID
> or what?

We did walk this discussion a bit a developer day.  I think the
feeling was we document the process of suggesting a update to NIST.
We may catch that the version bump doesn't have a CPE before merge but
if we don't, we have a nightly/weekly script which runs to check
current buildroot CPEs vs their pkg versions for validity.  Then send
an email to the DEVELOPERS asking them to submit the update to NIST
(we can have the templated for them to just cut/paste/send).  There
will be gaps between when a version bumps and the CPE is invalid, that
should be ok.  We could make it a goal to have the LTS/releases all
have valid CPEs before they go out (pks-stats helping us).

So in retrospect (even against my v2), I'd argue we should just report
everything as the script we'll propose soon with find the CPEs without
a NIST entry and we can choose to email the DEVELOPER or catch that as
part of a pkg-stats like activity.

>
>  To help a bit with deciding on this, it could be nice if you could add an RFC
> patch with a script that takes the cpe-manifest.csv and outputs something real -
> i.e., a list of open and closed CVEs.

Ok but the goal I'm shooting for is just a mostly accurate list and a
way to try and improve that metadata over time.  It won't be perfect.
A good example of this are the duplicate CPEs, over time they can be
discarded as we validate they are legacy and no one is cataloguing
things against them.  Until they're removed, we will have no-hits on
most of the duplicates in whichever tool a person uses to do their CVE
checking.

> The latter will also help to evaluate to
> what extent this whole thing is useful. In other words, how accurate is that
> generated list? It's a question we had ad the BR developer meeting and which
> still remains unanswered IMO.

The CPE list will be as accurate as people using it update the NIST
entries, either manually or as Buildroot nudges them to send an email.
The CVEs those CPEs will find is another complete story and most open
ways of searching for those don't concluded clean results without a
good amount of manual intervention.


Matt
Arnout Vandecappelle March 2, 2018, 8:19 a.m. UTC | #6
On 01-03-18 23:55, Matthew Weber wrote:
> Arnout,
> 
> On Thu, Mar 1, 2018 at 2:47 PM, Arnout Vandecappelle <arnout@mind.be> wrote:
[snip]
>> Indeed, when someone bumps a package, will they have to verify if the
>> new CPE ID is still valid? If it isn't, what should be done? Report to NIST, of
>> course, but while the database isn't updated, should we just remove the CPE_ID
>> or what?
> 
> We did walk this discussion a bit a developer day.  I think the
> feeling was we document the process of suggesting a update to NIST.
> We may catch that the version bump doesn't have a CPE before merge but
> if we don't, we have a nightly/weekly script which runs to check
> current buildroot CPEs vs their pkg versions for validity.  Then send
> an email to the DEVELOPERS asking them to submit the update to NIST
> (we can have the templated for them to just cut/paste/send).  There
> will be gaps between when a version bumps and the CPE is invalid, that
> should be ok.  We could make it a goal to have the LTS/releases all
> have valid CPEs before they go out (pks-stats helping us).
> 
> So in retrospect (even against my v2), I'd argue we should just report
> everything as the script we'll propose soon with find the CPEs without
> a NIST entry and we can choose to email the DEVELOPER or catch that as
> part of a pkg-stats like activity.

 Just to be clear about what you're saying here: in v3, you'll making it so that
nothing has to be done explicitly in the package to get a CPE ID, but it may be
(and probably will be) invalid? So that the initial report will contain a huge
amount of invalid entries, and we can fix them one by one?

 I realize I'm contradicting myself, but that still leaves us with a problem (or
rather, limitation). We have no way of keeping track of which packages have been
checked and which haven't, so we have no way of keeping track of which ones are
broken.

 Or perhaps the pkg-stats script could eventually check the entry in the CPE
database, and add a column to indicate that it's valid or not?


>>  To help a bit with deciding on this, it could be nice if you could add an RFC
>> patch with a script that takes the cpe-manifest.csv and outputs something real -
>> i.e., a list of open and closed CVEs.
> 
> Ok but the goal I'm shooting for is just a mostly accurate list and a
> way to try and improve that metadata over time.  It won't be perfect.
> A good example of this are the duplicate CPEs, over time they can be
> discarded as we validate they are legacy and no one is cataloguing
> things against them.  Until they're removed, we will have no-hits on
> most of the duplicates in whichever tool a person uses to do their CVE
> checking.
> 
>> The latter will also help to evaluate to
>> what extent this whole thing is useful. In other words, how accurate is that
>> generated list? It's a question we had ad the BR developer meeting and which
>> still remains unanswered IMO.
> 
> The CPE list will be as accurate as people using it update the NIST
> entries, either manually or as Buildroot nudges them to send an email.
> The CVEs those CPEs will find is another complete story and most open
> ways of searching for those don't concluded clean results without a
> good amount of manual intervention.

 Yes, I do realize that this is a kind of catch-22 situation, if nobody uses the
database then obviously it will not be up-to-date. But to be honest, right now I
don't even really understand what kind of information you can get from it even
in the ideal case. That's why I asked to post an RFC script, so we can get some
idea of what the end goal is (and ultimately, whether this is worth the effort).

 Instead of posting a script (e.g. if you don't have that script, or you're not
allowed to redistribute it), you could also describe in the cover letter what
additional steps you have taken to get your final CVE list, and some statistics
about e.g. how many additional CVEs you find with manual checking. And maybe
also mention the commercial tools you've used (if you're allowed to do that). Or
even better, if there are any FLOSS tools, point to them.

 Regards,
 Arnout
Thomas Petazzoni March 2, 2018, 9:49 a.m. UTC | #7
Hello,

On Fri, 2 Mar 2018 09:19:25 +0100, Arnout Vandecappelle wrote:

>  Just to be clear about what you're saying here: in v3, you'll making it so that
> nothing has to be done explicitly in the package to get a CPE ID, but it may be
> (and probably will be) invalid? So that the initial report will contain a huge
> amount of invalid entries, and we can fix them one by one?
> 
>  I realize I'm contradicting myself, but that still leaves us with a problem (or
> rather, limitation). We have no way of keeping track of which packages have been
> checked and which haven't, so we have no way of keeping track of which ones are
> broken.
> 
>  Or perhaps the pkg-stats script could eventually check the entry in the CPE
> database, and add a column to indicate that it's valid or not?

Yes, that would be the idea.

> > The CPE list will be as accurate as people using it update the NIST
> > entries, either manually or as Buildroot nudges them to send an email.
> > The CVEs those CPEs will find is another complete story and most open
> > ways of searching for those don't concluded clean results without a
> > good amount of manual intervention.  
> 
>  Yes, I do realize that this is a kind of catch-22 situation, if nobody uses the
> database then obviously it will not be up-to-date. But to be honest, right now I
> don't even really understand what kind of information you can get from it even
> in the ideal case. That's why I asked to post an RFC script, so we can get some
> idea of what the end goal is (and ultimately, whether this is worth the effort).

Yes, I agree. The current proposal that just adds CPE_ID to packages,
and has a mechanism to generate a CSV based on that isn't very useful
per-se. I would also like to see the tooling that processes those
CPE_ID to query the NIST database, and get some useful data out of it.

Even if the data isn't great for the moment because the NIST database
is limited, it would show how the whole thing would be useful.

Right now the patch series just adds some IDs and generates a CSV with
those IDs, which isn't terribly useful.

Best regards,

Thomas
Matt Weber March 2, 2018, 4:14 p.m. UTC | #8
All,

On Fri, Mar 2, 2018 at 3:49 AM, Thomas Petazzoni
<thomas.petazzoni@bootlin.com> wrote:
> Hello,
>
> On Fri, 2 Mar 2018 09:19:25 +0100, Arnout Vandecappelle wrote:
>
>>  Just to be clear about what you're saying here: in v3, you'll making it so that
>> nothing has to be done explicitly in the package to get a CPE ID, but it may be
>> (and probably will be) invalid? So that the initial report will contain a huge
>> amount of invalid entries, and we can fix them one by one?
>>
>>  I realize I'm contradicting myself, but that still leaves us with a problem (or
>> rather, limitation). We have no way of keeping track of which packages have been
>> checked and which haven't, so we have no way of keeping track of which ones are
>> broken.
>>
>>  Or perhaps the pkg-stats script could eventually check the entry in the CPE
>> database, and add a column to indicate that it's valid or not?
>
> Yes, that would be the idea.
>
>> > The CPE list will be as accurate as people using it update the NIST
>> > entries, either manually or as Buildroot nudges them to send an email.
>> > The CVEs those CPEs will find is another complete story and most open
>> > ways of searching for those don't concluded clean results without a
>> > good amount of manual intervention.
>>
>>  Yes, I do realize that this is a kind of catch-22 situation, if nobody uses the
>> database then obviously it will not be up-to-date. But to be honest, right now I
>> don't even really understand what kind of information you can get from it even
>> in the ideal case. That's why I asked to post an RFC script, so we can get some
>> idea of what the end goal is (and ultimately, whether this is worth the effort).
>
> Yes, I agree. The current proposal that just adds CPE_ID to packages,
> and has a mechanism to generate a CSV based on that isn't very useful
> per-se. I would also like to see the tooling that processes those
> CPE_ID to query the NIST database, and get some useful data out of it.
>
> Even if the data isn't great for the moment because the NIST database
> is limited, it would show how the whole thing would be useful.
>
> Right now the patch series just adds some IDs and generates a CSV with
> those IDs, which isn't terribly useful.
>

At this point, disregard my current patches and I'll resubmit in a
week or so.  At that point, I'll include scripting and better set of
patch descriptions which cover the dialogue in this email and the
other thread.

Matt
diff mbox series

Patch

diff --git a/package/boa/boa.mk b/package/boa/boa.mk
index d8bcaa1..1ded702 100644
--- a/package/boa/boa.mk
+++ b/package/boa/boa.mk
@@ -8,7 +8,7 @@  BOA_VERSION = 0.94.14rc21
 BOA_SITE = http://www.boa.org
 BOA_LICENSE = GPL-2.0+
 BOA_LICENSE_FILES = COPYING
-
+BOA_CPE_ID = $(BOE_NAME):$(BOE_NAME):$(BOE_VERSION)
 define BOA_INSTALL_TARGET_CMDS
 	$(INSTALL) -D -m 755 $(@D)/src/boa $(TARGET_DIR)/usr/sbin/boa
 	$(INSTALL) -D -m 755 $(@D)/src/boa_indexer $(TARGET_DIR)/usr/lib/boa/boa_indexer