[v5] support/scripts/pkg-stats: add latest upstream version information
diff mbox series

Message ID 20190103083818.19702-1-thomas.petazzoni@bootlin.com
State Changes Requested
Headers show
Series
  • [v5] support/scripts/pkg-stats: add latest upstream version information
Related show

Commit Message

Thomas Petazzoni Jan. 3, 2019, 8:38 a.m. UTC
This commit adds fetching the latest upstream version of each package
from release-monitoring.org.

The fetching process first tries to use the package mappings of the
"Buildroot" distribution [1]. This mapping mechanism allows to tell
release-monitoring.org what is the name of a package in a given
distribution/build-system. For example, the package xutil_util-macros
in Buildroot is named xorg-util-macros on release-monitoring.org. This
mapping can be seen in the section "Mappings" of
https://release-monitoring.org/project/15037/.

If there is no mapping, then it does a regular search, and within the
search results, looks for a package whose name matches the Buildroot
name.

Even though fetching from release-monitoring.org is a bit slow, using
multiprocessing.Pool has proven to not be reliable, with some requests
ending up with an exception. So we keep a serialized approach, but
with a single HTTPSConnectionPool() for all queries. Long term, we
hope to be able to use a database dump of release-monitoring.org
instead.

From an output point of view, the latest version column:

 - Is green when the version in Buildroot matches the latest upstream
   version

 - Is orange when the latest upstream version is unknown because the
   package was not found on release-monitoring.org

 - Is red when the version in Buildroot doesn't match the latest
   upstream version. Note that we are not doing anything smart here:
   we are just testing if the strings are equal or not.

 - The cell contains the link to the project on release-monitoring.org
   if found.

 - The cell indicates if the match was done using a distro mapping, or
   through a regular search.

[1] https://release-monitoring.org/distro/Buildroot/

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
Changes since v4:
- Don't use multiprocessing.Pool(), stick to a serialized approach,
  which is more reliable.
- Handle errors/exceptions properly.
- Improve the layout of the resulting table column.

Changes since v3:
- Use Pool(), like is done for the upstream URL checking added by Matt
  Weber
- Use the requests Python module instead of the urllib2 Python module,
  so that we use the same module as the one used for the upstream URL
  checking
- Adjusted to work with the latest pkg-stats code

Changes since v2:
- Use the "timeout" argument of urllib2.urlopen() in order to make
  sure that the requests terminate at some point, even if
  release-monitoring.org is stuck.
- Move a lot of the logic as methods of the Package() class.

Changes since v1:
- Fix flake8 warnings
- Add missing newline in HTML
---
 support/scripts/pkg-stats | 137 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 137 insertions(+)

Comments

Thomas Petazzoni Jan. 3, 2019, 8:40 a.m. UTC | #1
Hello,

On Thu,  3 Jan 2019 09:38:18 +0100, Thomas Petazzoni wrote:
> This commit adds fetching the latest upstream version of each package
> from release-monitoring.org.

If anyone wants to see what the result looks like:

  https://bootlin.com/~thomas/stats.html

Best regards,

Thomas
Brandon Maier Jan. 3, 2019, 7:57 p.m. UTC | #2
This is a useful tool, thanks! I don't have time to test it right now,
but an observation.

On Thu, Jan 3, 2019 at 2:38 AM Thomas Petazzoni
<thomas.petazzoni@bootlin.com> wrote:
...
> +        elif pkg.latest_version[1] == pkg.current_version:
> +            stats["version-uptodate"] += 1
...
> +    elif pkg.latest_version[1] != pkg.current_version:
> +        td_class.append("version-needs-update")

From https://bootlin.com/~thomas/stats.html, there's a few packages
that are up-to-date, but get marked "version-needs-update" because
either the Buildroot or release-monitoring version string starts with
a "v". For example, the vmtouch and libmodbus packages. It would be
nice to change the version comparison to ignore a leading "v". E.g.

def version_uptodate(pkg):
    latest = remove_prefix(pkg.latest_version[1], 'v')
    current = remove_prefix(pkg.current_version, 'v')
    return latest == current
Matthew Weber Jan. 3, 2019, 8:23 p.m. UTC | #3
All,
(sending this one again for patchwork to see it, my email issue should be fixed)

On Thu, Jan 3, 2019 at 1:31 PM Matthew Weber
<matthew.weber@rockwellcollins.com> wrote:
>
> Thomas,
>
> (Sorry about the urls which got reformatted below....)
>
> On Thu, Jan 3, 2019 at 2:38 AM Thomas Petazzoni
> <thomas.petazzoni@bootlin.com> wrote:
> >
> > This commit adds fetching the latest upstream version of each package
> > from release-monitoring.org.
> >
> > The fetching process first tries to use the package mappings of the
> > "Buildroot" distribution [1]. This mapping mechanism allows to tell
> > release-monitoring.org what is the name of a package in a given
> > distribution/build-system. For example, the package xutil_util-macros
> > in Buildroot is named xorg-util-macros on release-monitoring.org. This
> > mapping can be seen in the section "Mappings" of
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__release-2Dmonitoring.org_project_15037_&d=DwIDAg&c=ilBQI1lupc9Y65XwNblLtw&r=y1sOV0GV8NZUkffv7oCRxs2Sd3nOBS-NxDM3NY8lOgs&m=D29WCvvKfHr85bUA0S0PaBzfYjT7AD5mrqPNN2yGQic&s=MAudgohIjdSwVSLqKKEn3eTk1ZdF8EZZjy91fRk2RK8&e=.
> >
> > If there is no mapping, then it does a regular search, and within the
> > search results, looks for a package whose name matches the Buildroot
> > name.
> >
> > Even though fetching from release-monitoring.org is a bit slow, using
> > multiprocessing.Pool has proven to not be reliable, with some requests
> > ending up with an exception. So we keep a serialized approach, but
> > with a single HTTPSConnectionPool() for all queries. Long term, we
> > hope to be able to use a database dump of release-monitoring.org
> > instead.
> >
> > From an output point of view, the latest version column:
> >
> >  - Is green when the version in Buildroot matches the latest upstream
> >    version
> >
> >  - Is orange when the latest upstream version is unknown because the
> >    package was not found on release-monitoring.org
> >
> >  - Is red when the version in Buildroot doesn't match the latest
> >    upstream version. Note that we are not doing anything smart here:
> >    we are just testing if the strings are equal or not.
> >
> >  - The cell contains the link to the project on release-monitoring.org
> >    if found.
> >
> >  - The cell indicates if the match was done using a distro mapping, or
> >    through a regular search.
> >
> > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__release-2Dmonitoring.org_distro_Buildroot_&d=DwIDAg&c=ilBQI1lupc9Y65XwNblLtw&r=y1sOV0GV8NZUkffv7oCRxs2Sd3nOBS-NxDM3NY8lOgs&m=D29WCvvKfHr85bUA0S0PaBzfYjT7AD5mrqPNN2yGQic&s=JVlzNLvfXxrlF8HcMalzbDFkuA30zKVTfxaRyA7AVkY&e=
> >
> > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> > ---
> > Changes since v4:
> > - Don't use multiprocessing.Pool(), stick to a serialized approach,
> >   which is more reliable.
> > - Handle errors/exceptions properly.
> > - Improve the layout of the resulting table column.
> >
> > Changes since v3:
> > - Use Pool(), like is done for the upstream URL checking added by Matt
> >   Weber
> > - Use the requests Python module instead of the urllib2 Python module,
> >   so that we use the same module as the one used for the upstream URL
> >   checking
> > - Adjusted to work with the latest pkg-stats code
> >
> > Changes since v2:
> > - Use the "timeout" argument of urllib2.urlopen() in order to make
> >   sure that the requests terminate at some point, even if
> >   release-monitoring.org is stuck.
> > - Move a lot of the logic as methods of the Package() class.
> >
> > Changes since v1:
> > - Fix flake8 warnings
> > - Add missing newline in HTML
> > ---
> >  support/scripts/pkg-stats | 137 ++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 137 insertions(+)
> >
> > diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
> > index d0b06b1e74..d32087cda1 100755
> > --- a/support/scripts/pkg-stats
> > +++ b/support/scripts/pkg-stats
> > @@ -25,11 +25,19 @@ import re
> >  import subprocess
> >  import sys
> >  import requests  # URL checking
> > +import json
> > +import certifi
> > +from urllib3 import HTTPSConnectionPool
> >  from multiprocessing import Pool
> >
> >  INFRA_RE = re.compile("\$\(eval \$\(([a-z-]*)-package\)\)")
> >  URL_RE = re.compile("\s*https?://\S*\s*$")
> > +RELEASE_MONITORING_API = "https://urldefense.proofpoint.com/v2/url?u=http-3A__release-2Dmonitoring.org_api&d=DwIDAg&c=ilBQI1lupc9Y65XwNblLtw&r=y1sOV0GV8NZUkffv7oCRxs2Sd3nOBS-NxDM3NY8lOgs&m=D29WCvvKfHr85bUA0S0PaBzfYjT7AD5mrqPNN2yGQic&s=8CezxghNio2dEk11Egejv806bc2Bt4Vc2IeP_l28tfM&e="
>
> This variable isn't  used in the new approach
>
> >
> > +RM_API_STATUS_ERROR = 1
> > +RM_API_STATUS_FOUND_BY_DISTRO = 2
> > +RM_API_STATUS_FOUND_BY_PATTERN = 3
> > +RM_API_STATUS_NOT_FOUND = 4
> >
> >  class Package:
> >      all_licenses = list()
> > @@ -49,6 +57,7 @@ class Package:
> >          self.url = None
> >          self.url_status = None
> >          self.url_worker = None
> > +        self.latest_version = None
> >
> >      def pkgvar(self):
> >          return self.name.upper().replace("-", "_")
> > @@ -297,6 +306,66 @@ def check_package_urls(packages):
> >      for pkg in packages:
> >          pkg.url_status = pkg.url_worker.get(timeout=3600)
> >
> > +def release_monitoring_get_latest_version_by_distro(pool, name):
> > +    try:
> > +        req = pool.request('GET', "/api/project/Buildroot/%s" % name)
> > +    except:
> > +        return (RM_API_STATUS_ERROR, None, None)
> > +
> > +    if req.status != 200:
> > +        return (RM_API_STATUS_NOT_FOUND, None, None)
> > +
> > +    data = json.loads(req.data)
> > +
> > +    if len(data['versions']) > 0:
> > +        return (RM_API_STATUS_FOUND_BY_DISTRO, data['versions'][0], data['id'])
> > +    else:
> > +        return (RM_API_STATUS_FOUND_BY_DISTRO, None, data['id'])
> > +
> > +def release_monitoring_get_latest_version_by_guess(pool, name):
> > +    try:
> > +        req = pool.request('GET', "/api/projects/?pattern=%s" % name)
> > +    except:
> > +        print("Exception release_monitoring_get_latest_version_by_guess for '%s': %s" % (name, err))
> > +        return (RM_API_STATUS_ERROR, None, None)
> > +
> > +    if req.status != 200:
> > +        return (RM_API_STATUS_NOT_FOUND, None, None)
> > +
> > +    data = json.loads(req.data)
> > +
> > +    for p in data['projects']:
> > +        if p['name'] == name and len(p['versions']) > 0:
> > +            return (RM_API_STATUS_FOUND_BY_PATTERN, p['versions'][0], p['id'])
> > +
> > +    return (RM_API_STATUS_NOT_FOUND, None, None)
> > +
> > +def check_package_latest_version(packages):
> > +    """
> > +    Fills in the .latest_version field of all Package objects
> > +
> > +    This field has a special format:
> > +      (status, version, id)
> > +    with:
> > +    - status: one of RM_API_STATUS_ERROR,
> > +      RM_API_STATUS_FOUND_BY_DISTRO, RM_API_STATUS_FOUND_BY_PATTERN,
> > +      RM_API_STATUS_NOT_FOUND
> > +    - version: string containing the latest version known by
> > +      release-monitoring.org for this package
> > +    - id: string containing the id of the project corresponding to this
> > +      package, as known by release-monitoring.org
> > +    """
> > +    pool = HTTPSConnectionPool('release-monitoring.org', port=443,
> > +                               cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())
>
> Suggest adding a timeout value  (timeout=#).  I had used 30 sec for
> the url checker logic but with this being serial with one main site,
> maybe shorter would be better?
>
> I still can't get the proxy and certificate stuff to work right to
> fully test this patchset at work.  It should work fine with a "normal"
> internet connection.
>
> Functionally I can say I have reviewed the patchset with those minor
> suggestions above.
> Reviewed-by: Matthew Weber <matthew.weber@rockwellcollins.com>
Thomas Petazzoni Jan. 3, 2019, 8:33 p.m. UTC | #4
Hello Brandon,

Thanks for the feedback.

On Thu, 3 Jan 2019 13:57:46 -0600, Brandon Maier wrote:

> From https://bootlin.com/~thomas/stats.html, there's a few packages
> that are up-to-date, but get marked "version-needs-update" because
> either the Buildroot or release-monitoring version string starts with
> a "v". For example, the vmtouch and libmodbus packages. It would be
> nice to change the version comparison to ignore a leading "v". E.g.
> 
> def version_uptodate(pkg):
>     latest = remove_prefix(pkg.latest_version[1], 'v')
>     current = remove_prefix(pkg.current_version, 'v')
>     return latest == current

I'm aware of this problem, but I'm not sure we should handle this with
a hack in the pkg-stats script itself. Indeed sometimes the version
prefix is just "v", but sometimes it's the entire name of the package,
sometimes something slightly different.

Back when I initially started working on this script, I had a series
that changed all the packages that used vX.Y.Z as a version to use just
X.Y.Z, and similarly for other packages in similar but slightly
different situations. I found (with some effort) the old branch I had
with those changes:

  https://git.bootlin.com/users/thomas-petazzoni/buildroot/log/?h=fix-versions

It seems like I never posted them. I thought there was some discussion
on the list about this issue, but I can't find it. Perhaps I should
update this series, submit it for good, and see what the feedback is ?

Best regards,

Thomas
Yann E. MORIN Jan. 3, 2019, 8:38 p.m. UTC | #5
Brandon, All,

On 2019-01-03 13:57 -0600, Brandon Maier spake thusly:
> This is a useful tool, thanks! I don't have time to test it right now,
> but an observation.
> 
> On Thu, Jan 3, 2019 at 2:38 AM Thomas Petazzoni
> <thomas.petazzoni@bootlin.com> wrote:
> ...
> > +        elif pkg.latest_version[1] == pkg.current_version:
> > +            stats["version-uptodate"] += 1
> ...
> > +    elif pkg.latest_version[1] != pkg.current_version:
> > +        td_class.append("version-needs-update")
> 
> From https://bootlin.com/~thomas/stats.html, there's a few packages
> that are up-to-date, but get marked "version-needs-update" because
> either the Buildroot or release-monitoring version string starts with
> a "v".

There was an issue raised (long ago) about this in the anitya tool
(which generatees the DB at r-m.o):
    https://github.com/release-monitoring/anitya/issues/374

So, the version strings we get from r-m.o should no longer have a
leading 'v' anymore.

Yet, in our FOO_VERSION, we in Buildroot may still have those leading
'v'. So, we should try to compare the versions as-is, and then remove
out leading 'v' and compare again. ot sure of the exact heuristic,
though...

Regards,
Yann E. MORIN.

> For example, the vmtouch and libmodbus packages. It would be
> nice to change the version comparison to ignore a leading "v". E.g.
> 
> def version_uptodate(pkg):
>     latest = remove_prefix(pkg.latest_version[1], 'v')
>     current = remove_prefix(pkg.current_version, 'v')
>     return latest == current
> _______________________________________________
> buildroot mailing list
> buildroot@busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot
Brandon Maier Jan. 3, 2019, 9:02 p.m. UTC | #6
:

On Thu, Jan 3, 2019 at 2:33 PM Thomas Petazzoni
<thomas.petazzoni@bootlin.com> wrote:
>
> Hello Brandon,
>
> Thanks for the feedback.
>
> On Thu, 3 Jan 2019 13:57:46 -0600, Brandon Maier wrote:
>
> > From https://urldefense.proofpoint.com/v2/url?u=https-3A__bootlin.com_-7Ethomas_stats.html&d=DwICAg&c=ilBQI1lupc9Y65XwNblLtw&r=bIwUnEkCqKFQQ0RVQaaY0gBWY7SIAhmiWLyMS82_mSU&m=BLGQQMtjdYeO462A3rnKjL7GVl8jKWnqncRGOzwJuc8&s=nJR7ACo6ote42lIcGZ0XW87x_SqYCdyi5OdeloYNqu4&e=, there's a few packages
> > that are up-to-date, but get marked "version-needs-update" because
> > either the Buildroot or release-monitoring version string starts with
> > a "v". For example, the vmtouch and libmodbus packages. It would be
> > nice to change the version comparison to ignore a leading "v". E.g.
> >
> > def version_uptodate(pkg):
> >     latest = remove_prefix(pkg.latest_version[1], 'v')
> >     current = remove_prefix(pkg.current_version, 'v')
> >     return latest == current
>
> I'm aware of this problem, but I'm not sure we should handle this with
> a hack in the pkg-stats script itself. Indeed sometimes the version
> prefix is just "v", but sometimes it's the entire name of the package,
> sometimes something slightly different.

Agreed it's not a pretty hack. I wouldn't want to handle the other
oddities in here, as that would get messy. I suggested it for "v" only
because it seems to be very common. It would be a way to get us most
of the way there, then it could be reverted if e.g. all the github
versions are fixed up.

On those other packages though, perhaps adding an optional variable to
package mk files for custom versions? E.g. package/libnfs/libnfs.mk
defines version as "libnfs-3.0.0", but the r-m version is "3.0.0".
E.g.

package/libnfs/libnfs.mk:
  LIBNFS_RM_VERSION = 3.0.0
  LIBNFS_VERSION = libnfs-$(LIBNFS_RM_VERSION)

>
> Back when I initially started working on this script, I had a series
> that changed all the packages that used vX.Y.Z as a version to use just
> X.Y.Z, and similarly for other packages in similar but slightly
> different situations. I found (with some effort) the old branch I had
> with those changes:
>
>   https://urldefense.proofpoint.com/v2/url?u=https-3A__git.bootlin.com_users_thomas-2Dpetazzoni_buildroot_log_-3Fh-3Dfix-2Dversions&d=DwICAg&c=ilBQI1lupc9Y65XwNblLtw&r=bIwUnEkCqKFQQ0RVQaaY0gBWY7SIAhmiWLyMS82_mSU&m=BLGQQMtjdYeO462A3rnKjL7GVl8jKWnqncRGOzwJuc8&s=YcscSq6riQ_8BeiMfo3XUuVEigc3adPnIJrOJn7hl3w&e=
>
> It seems like I never posted them. I thought there was some discussion
> on the list about this issue, but I can't find it. Perhaps I should
> update this series, submit it for good, and see what the feedback is ?

Regardless of this patch series, this sounds like a good idea to me.
Especially considering Anitya does this too, as Yann mentioned[1].

[1] https://github.com/release-monitoring/anitya/issues/374

>
> Best regards,
>
> Thomas
> --
> Thomas Petazzoni, CTO, Bootlin
> Embedded Linux and Kernel engineering
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bootlin.com&d=DwICAg&c=ilBQI1lupc9Y65XwNblLtw&r=bIwUnEkCqKFQQ0RVQaaY0gBWY7SIAhmiWLyMS82_mSU&m=BLGQQMtjdYeO462A3rnKjL7GVl8jKWnqncRGOzwJuc8&s=Pqp_noteyWtmSPPMYtHFZwlZbRE-a5HRmBWfPJcGDV0&e=
Thomas Petazzoni Jan. 3, 2019, 9:16 p.m. UTC | #7
Hello,

On Thu, 3 Jan 2019 15:02:24 -0600, Brandon Maier wrote:

> Agreed it's not a pretty hack. I wouldn't want to handle the other
> oddities in here, as that would get messy. I suggested it for "v" only
> because it seems to be very common. It would be a way to get us most
> of the way there, then it could be reverted if e.g. all the github
> versions are fixed up.
> 
> On those other packages though, perhaps adding an optional variable to
> package mk files for custom versions? E.g. package/libnfs/libnfs.mk
> defines version as "libnfs-3.0.0", but the r-m version is "3.0.0".
> E.g.
> 
> package/libnfs/libnfs.mk:
>   LIBNFS_RM_VERSION = 3.0.0
>   LIBNFS_VERSION = libnfs-$(LIBNFS_RM_VERSION)

We could do that, but is that really needed/useful ?

> > It seems like I never posted them. I thought there was some discussion
> > on the list about this issue, but I can't find it. Perhaps I should
> > update this series, submit it for good, and see what the feedback is ?  
> 
> Regardless of this patch series, this sounds like a good idea to me.
> Especially considering Anitya does this too, as Yann mentioned[1].
> 
> [1] https://github.com/release-monitoring/anitya/issues/374

Right. So you say I should submit this "fix-versions" patch series ?

Best regards,

Thomas
Brandon Maier Jan. 3, 2019, 11:29 p.m. UTC | #8
On Thu, Jan 3, 2019 at 3:31 PM Thomas Petazzoni
<thomas.petazzoni@bootlin.com> wrote:
...
> > package/libnfs/libnfs.mk:
> >   LIBNFS_RM_VERSION = 3.0.0
> >   LIBNFS_VERSION = libnfs-$(LIBNFS_RM_VERSION)
>
> We could do that, but is that really needed/useful ?
>

As discussed on IRC, nevermind this. If all package's _VERSION are
changed to match release-monitoring then it won't matter.

> > > It seems like I never posted them. I thought there was some discussion
> > > on the list about this issue, but I can't find it. Perhaps I should
> > > update this series, submit it for good, and see what the feedback is ?
> >
> > Regardless of this patch series, this sounds like a good idea to me.
> > Especially considering Anitya does this too, as Yann mentioned[1].
> >
> > [1] https://github.com/release-monitoring/anitya/issues/374
>
> Right. So you say I should submit this "fix-versions" patch series ?

Yes, good to get the discussion going.
Ricardo Martincoski Jan. 4, 2019, 2:12 a.m. UTC | #9
Hello,

In overall looks good. But I think a v6 would be good.

On Thu, Jan 03, 2019 at 06:38 AM, Thomas Petazzoni wrote:

> This commit adds fetching the latest upstream version of each package
> from release-monitoring.org.
> 
> The fetching process first tries to use the package mappings of the
> "Buildroot" distribution [1]. This mapping mechanism allows to tell
> release-monitoring.org what is the name of a package in a given
> distribution/build-system. For example, the package xutil_util-macros
> in Buildroot is named xorg-util-macros on release-monitoring.org. This
> mapping can be seen in the section "Mappings" of
> https://release-monitoring.org/project/15037/.
> 
> If there is no mapping, then it does a regular search, and within the
> search results, looks for a package whose name matches the Buildroot
> name.
> 
> Even though fetching from release-monitoring.org is a bit slow, using
> multiprocessing.Pool has proven to not be reliable, with some requests
> ending up with an exception. So we keep a serialized approach, but
> with a single HTTPSConnectionPool() for all queries. Long term, we

Looks like using urllib3 improves the speed a lot.
To me it took 25 minutes using v5 instead of 2 hours using v3 (also serialized
approach).

[snip]
> ---
> Changes since v4:
> - Don't use multiprocessing.Pool(), stick to a serialized approach,
>   which is more reliable.
> - Handle errors/exceptions properly.

Almost!
Could you fix the warnings from flake8?
https://gitlab.com/RicardoMartincoski/buildroot/-/jobs/141278172

I never used urllib3 but from the docs it seems there is a base exception, so
perhaps this line can be useful to avoid bare except:
from urllib3.exceptions import HTTPError

> - Improve the layout of the resulting table column.

[snip]
> +++ b/support/scripts/pkg-stats
> @@ -25,11 +25,19 @@ import re
>  import subprocess
>  import sys
>  import requests  # URL checking
> +import json
> +import certifi
> +from urllib3 import HTTPSConnectionPool
>  from multiprocessing import Pool
>  
>  INFRA_RE = re.compile("\$\(eval \$\(([a-z-]*)-package\)\)")
>  URL_RE = re.compile("\s*https?://\S*\s*$")
> +RELEASE_MONITORING_API = "http://release-monitoring.org/api"

Not used in v5. Matt already mentioned it, but I highlight it because flake8
does not catch it.


Regards,
Ricardo
Brandon Maier Jan. 4, 2019, 2:54 p.m. UTC | #10
On Thu, Jan 3, 2019 at 2:38 AM Thomas Petazzoni
<thomas.petazzoni@bootlin.com> wrote:
...
> +    if len(data['versions']) > 0:
> +        return (RM_API_STATUS_FOUND_BY_DISTRO, data['versions'][0], data['id'])
> +    else:
> +        return (RM_API_STATUS_FOUND_BY_DISTRO, None, data['id'])
> +
...
> +    for p in data['projects']:
> +        if p['name'] == name and len(p['versions']) > 0:
> +            return (RM_API_STATUS_FOUND_BY_PATTERN, p['versions'][0], p['id'])
> +

In these two places, instead of using "versions", could we use
"version"? It looks like the first item in "versions" isn't always
correct. For example see cdrkit. I did a mock API call with Postman
and there's a separate "version" field without the prefix.

Here's what a GET to
https://release-monitoring.org/api/projects/?pattern=cdrkit returns:
{
    "projects": [
        {
            "backend": "GitHub",
            "created_on": 1460976471,
            "ecosystem": "https://github.com/Distrotech/cdrkit",
            "homepage": "https://github.com/Distrotech/cdrkit",
            "id": 10520,
            "name": "cdrkit",
            "regex": null,
            "updated_on": 1533644065,
            "version": "1.1.11",
            "version_url": "Distrotech/cdrkit",
            "versions": [
                "release_1.1.11",
                "1.1.11"
            ]
        }
    ],
    "total": 1
}

Otherwise I ran the script on my setup and everything looks good.
Reviewed-By: Brandon Maier <brandon.maier@rockwellcollins.com>

Patch
diff mbox series

diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
index d0b06b1e74..d32087cda1 100755
--- a/support/scripts/pkg-stats
+++ b/support/scripts/pkg-stats
@@ -25,11 +25,19 @@  import re
 import subprocess
 import sys
 import requests  # URL checking
+import json
+import certifi
+from urllib3 import HTTPSConnectionPool
 from multiprocessing import Pool
 
 INFRA_RE = re.compile("\$\(eval \$\(([a-z-]*)-package\)\)")
 URL_RE = re.compile("\s*https?://\S*\s*$")
+RELEASE_MONITORING_API = "http://release-monitoring.org/api"
 
+RM_API_STATUS_ERROR = 1
+RM_API_STATUS_FOUND_BY_DISTRO = 2
+RM_API_STATUS_FOUND_BY_PATTERN = 3
+RM_API_STATUS_NOT_FOUND = 4
 
 class Package:
     all_licenses = list()
@@ -49,6 +57,7 @@  class Package:
         self.url = None
         self.url_status = None
         self.url_worker = None
+        self.latest_version = None
 
     def pkgvar(self):
         return self.name.upper().replace("-", "_")
@@ -297,6 +306,66 @@  def check_package_urls(packages):
     for pkg in packages:
         pkg.url_status = pkg.url_worker.get(timeout=3600)
 
+def release_monitoring_get_latest_version_by_distro(pool, name):
+    try:
+        req = pool.request('GET', "/api/project/Buildroot/%s" % name)
+    except:
+        return (RM_API_STATUS_ERROR, None, None)
+
+    if req.status != 200:
+        return (RM_API_STATUS_NOT_FOUND, None, None)
+
+    data = json.loads(req.data)
+
+    if len(data['versions']) > 0:
+        return (RM_API_STATUS_FOUND_BY_DISTRO, data['versions'][0], data['id'])
+    else:
+        return (RM_API_STATUS_FOUND_BY_DISTRO, None, data['id'])
+
+def release_monitoring_get_latest_version_by_guess(pool, name):
+    try:
+        req = pool.request('GET', "/api/projects/?pattern=%s" % name)
+    except:
+        print("Exception release_monitoring_get_latest_version_by_guess for '%s': %s" % (name, err))
+        return (RM_API_STATUS_ERROR, None, None)
+
+    if req.status != 200:
+        return (RM_API_STATUS_NOT_FOUND, None, None)
+
+    data = json.loads(req.data)
+
+    for p in data['projects']:
+        if p['name'] == name and len(p['versions']) > 0:
+            return (RM_API_STATUS_FOUND_BY_PATTERN, p['versions'][0], p['id'])
+
+    return (RM_API_STATUS_NOT_FOUND, None, None)
+
+def check_package_latest_version(packages):
+    """
+    Fills in the .latest_version field of all Package objects
+
+    This field has a special format:
+      (status, version, id)
+    with:
+    - status: one of RM_API_STATUS_ERROR,
+      RM_API_STATUS_FOUND_BY_DISTRO, RM_API_STATUS_FOUND_BY_PATTERN,
+      RM_API_STATUS_NOT_FOUND
+    - version: string containing the latest version known by
+      release-monitoring.org for this package
+    - id: string containing the id of the project corresponding to this
+      package, as known by release-monitoring.org
+    """
+    pool = HTTPSConnectionPool('release-monitoring.org', port=443,
+                               cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())
+    count = 0
+    for pkg in packages:
+        v = release_monitoring_get_latest_version_by_distro(pool, pkg.name)
+        if v[0] == RM_API_STATUS_NOT_FOUND:
+            v = release_monitoring_get_latest_version_by_guess(pool, pkg.name)
+
+        pkg.latest_version = v
+        print("[%d/%d] Package %s" % (count, len(packages), pkg.name))
+        count += 1
 
 def calculate_stats(packages):
     stats = defaultdict(int)
@@ -322,6 +391,16 @@  def calculate_stats(packages):
             stats["hash"] += 1
         else:
             stats["no-hash"] += 1
+        if pkg.latest_version[0] == RM_API_STATUS_FOUND_BY_DISTRO:
+            stats["rmo-mapping"] += 1
+        else:
+            stats["rmo-no-mapping"] += 1
+        if not pkg.latest_version[1]:
+            stats["version-unknown"] += 1
+        elif pkg.latest_version[1] == pkg.current_version:
+            stats["version-uptodate"] += 1
+        else:
+            stats["version-not-uptodate"] += 1
         stats["patches"] += pkg.patch_count
     return stats
 
@@ -354,6 +433,7 @@  td.somepatches {
 td.lotsofpatches {
   background: #ff9a69;
 }
+
 td.good_url {
   background: #d2ffc4;
 }
@@ -363,6 +443,20 @@  td.missing_url {
 td.invalid_url {
   background: #ff9a69;
 }
+
+td.version-good {
+  background: #d2ffc4;
+}
+td.version-needs-update {
+  background: #ff9a69;
+}
+td.version-unknown {
+ background: #ffd870;
+}
+td.version-error {
+ background: #ccc;
+}
+
 </style>
 <title>Statistics of Buildroot packages</title>
 </head>
@@ -465,6 +559,36 @@  def dump_html_pkg(f, pkg):
         current_version = pkg.current_version
     f.write("  <td class=\"centered\">%s</td>\n" % current_version)
 
+    # Latest version
+    if pkg.latest_version[0] == RM_API_STATUS_ERROR:
+        td_class.append("version-error")
+    if pkg.latest_version[1] is None:
+        td_class.append("version-unknown")
+    elif pkg.latest_version[1] != pkg.current_version:
+        td_class.append("version-needs-update")
+    else:
+        td_class.append("version-good")
+
+    if pkg.latest_version[0] == RM_API_STATUS_ERROR:
+        latest_version_text = "<b>Error</b>"
+    elif pkg.latest_version[0] == RM_API_STATUS_NOT_FOUND:
+        latest_version_text = "<b>Not found</b>"
+    else:
+        if pkg.latest_version[1] is None:
+            latest_version_text = "<b>Found, but not version</b>"
+        else:
+            latest_version_text = "<a href=\"https://release-monitoring.org/project/%s\"><b>%s</b></a>" % (pkg.latest_version[2], str(pkg.latest_version[1]))
+
+        latest_version_text += "<br/>"
+
+        if pkg.latest_version[0] == RM_API_STATUS_FOUND_BY_DISTRO:
+            latest_version_text += "found by <a href=\"https://release-monitoring.org/distro/Buildroot/\">distro</a>"
+        else:
+            latest_version_text += "found by guess"
+
+    f.write("  <td class=\"%s\">%s</td>\n" %
+            (" ".join(td_class), latest_version_text))
+
     # Warnings
     td_class = ["centered"]
     if pkg.warnings == 0:
@@ -502,6 +626,7 @@  def dump_html_all_pkgs(f, packages):
 <td class=\"centered\">License files</td>
 <td class=\"centered\">Hash file</td>
 <td class=\"centered\">Current version</td>
+<td class=\"centered\">Latest version</td>
 <td class=\"centered\">Warnings</td>
 <td class=\"centered\">Upstream URL</td>
 </tr>
@@ -532,6 +657,16 @@  def dump_html_stats(f, stats):
             stats["no-hash"])
     f.write(" <tr><td>Total number of patches</td><td>%s</td></tr>\n" %
             stats["patches"])
+    f.write("<tr><td>Packages having a mapping on <i>release-monitoring.org</i></td><td>%s</td></tr>\n" %
+            stats["rmo-mapping"])
+    f.write("<tr><td>Packages lacking a mapping on <i>release-monitoring.org</i></td><td>%s</td></tr>\n" %
+            stats["rmo-no-mapping"])
+    f.write("<tr><td>Packages that are up-to-date</td><td>%s</td></tr>\n" %
+            stats["version-uptodate"])
+    f.write("<tr><td>Packages that are not up-to-date</td><td>%s</td></tr>\n" %
+            stats["version-not-uptodate"])
+    f.write("<tr><td>Packages with no known upstream version</td><td>%s</td></tr>\n" %
+            stats["version-unknown"])
     f.write("</table>\n")
 
 
@@ -587,6 +722,8 @@  def __main__():
         pkg.set_url()
     print("Checking URL status")
     check_package_urls(packages)
+    print("Getting latest versions ...")
+    check_package_latest_version(packages)
     print("Calculate stats")
     stats = calculate_stats(packages)
     print("Write HTML")