Message ID | 1452872492-6980-1-git-send-email-patrickdepinguin@gmail.com |
---|---|
State | Changes Requested |
Headers | show |
Hi Thomas, Interesting concept... On 15-01-16 16:41, Thomas De Schampheleire wrote: > From: Thomas De Schampheleire <thomas.de.schampheleire@gmail.com> > > Leverage the CSV files produces by size-stats (make graph-size) to allow > for a comparison of rootfs size between two different buildroot > compilations. > > The script takes the file-size CSV files of two compilations as input, and > produces a textual report of the differences per package. > Using the -d/--detail flag, the report will show the file size changes > instead of package size changes. > The -t/--threshold option allows to ignore file size differences smaller > than the given threshold (in bytes). > > Signed-off-by: Thomas De Schampheleire <thomas.de.schampheleire@gmail.com> > --- > support/scripts/size-stats-compare | 98 ++++++++++++++++++++++++++++++++++++++ > 1 file changed, 98 insertions(+) > create mode 100755 support/scripts/size-stats-compare I think you should also add a reference to the script to docs/manual/common-usage.txt. No full documentation is needed, you can just point to the --help option (otherwise it would just get out of sync). > > diff --git a/support/scripts/size-stats-compare b/support/scripts/size-stats-compare > new file mode 100755 > index 0000000..d7eec09 > --- /dev/null > +++ b/support/scripts/size-stats-compare > @@ -0,0 +1,98 @@ > +#!/usr/bin/env python > + > +# Copyright (C) 2016 Thomas De Schampheleire <thomas.de.schampheleire@gmail.com> > + > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 2 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +# General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write to the Free Software > +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA > + > +# Compare the rootfs size per package/file with that from another compilation > +# (possibly from a different release). > + > +# This code relies on the output CSV files of 'make graph-size'. If you > +# are comparing to an older Buildroot release, you can consider backporting > +# that feature. See commits: > +# 99aca138c20e1259ac8c23252c223e150d460482 (instrumentation hook) > +# 598c80be8fc7b5ddfcba6f7715d90757033723ef (size-stats script) > +# 24c75fbb38ffeb5c5a7278cf40f4f7cf867502e0 (make graph-size) I think this documentation should be part of the argparse description. > + > +import csv > +import argparse > + > +def read_packages_csv(inputf, detail=None): Given the name of the input file read_file_size_stats_csv would be a more natural name. > + sizes = {} > + with open(inputf) as f: > + reader = csv.reader(f) > + # skip first line (header) > + next(reader) Perhaps it would be good to verify here that the header is as expected, at least the first 4 entries. > + for row in csv.reader(f): > + if detail: > + sizes[row[0]] = int(row[2]) > + else: > + sizes[row[1]] = int(row[3]) > + return sizes > + > +def compare_sizes(this, other): > + delta = {} > + removed = {} > + added = {} > + thiskeys = set(this.keys()) > + otherkeys = set(other.keys()) > + > + # packages in both packages or files, actually. > + for pkg in thiskeys.intersection(otherkeys): > + delta[pkg] = this[pkg] - other[pkg] > + # packages only in this > + for pkg in thiskeys.difference(otherkeys): > + added[pkg] = this[pkg] > + # packages only in other > + for pkg in otherkeys.difference(thiskeys): > + removed[pkg] = -other[pkg] > + > + return delta, added, removed > + > +def print_results(result, title, threshold): > + print '%s:' % title I would skip this complete if the result is empty. So if not result: return > + for pkg in result.keys(): > + if abs(result[pkg]) < threshold: I think <= would be more natural here. Then you can pass -t 0 to get rid of all packages that remain equal. Also a nice extension for a future patch would be to support K, M, G in the threshold. > + continue > + print '%12d %s' % (result[pkg], pkg) I think it would be better if the output was also CSV. If it is not CSV, please make it python3-compatible by adding () around the argument. A nice additional feature would be if the output CSV would (optionally?) contain all the info from the original CSVs. But that can be follow-up patch, of course. > + > + > +# main ######################################################################### > + > +parser = argparse.ArgumentParser(description='Compare rootfs size between Buildroot compilations') > + > +parser.add_argument('-f', '--file-size-csv', > + help='CSV file with file and package size statistics') You could use type=argparse.FileType('r') to get automatic help in case the file doesn't exist. And then you have to remove the open() from read_packages_csv, of course. Can you put the actual file name in the help text? Perhaps just with metavar='file-size-stats.csv' And actually, I would remove the -f and -g bits, i.e. make them positional arguments. So add_argument('file-size-csv', help=...). > +parser.add_argument('-g', '--other-file-size-csv', > + help='CSV file with file and package size statistics to compare with') Wrap this line (using '''). argparse will rewrap the help text. Same below. Note that you can even align the help text here, all whitespace will be rewrapped. > +parser.add_argument('-d', '--detail', action='store_true', > + help='detail with individual files rather than packages') > +parser.add_argument('-t', '--threshold', type=int, > + help='threshold value: ignore size differences smaller than this value (bytes)') > +args = parser.parse_args() > + > +sizes = read_packages_csv(args.file_size_csv, args.detail) > +other_sizes = read_packages_csv(args.other_file_size_csv, args.detail) > + > +delta, added, removed = compare_sizes(sizes, other_sizes) > + > +if args.detail: > + keyword = 'file' > +else: > + keyword = 'package' > + > +print_results(delta, 'Size difference per %s (bytes)' % keyword, args.threshold) > +print_results(added, 'Size difference due to added %ss (bytes)' % keyword, args.threshold) > +print_results(removed, 'Size difference due to removed %ss (bytes)' % keyword, args.threshold) > Otherwise looks good. Or in fact, most of my comments are optional, feel free to ignore them. Only the wrapping and the full description I consider essential. Regards, Arnout
Arnout, On Sat, 16 Jan 2016 01:28:23 +0100, Arnout Vandecappelle wrote: > I think you should also add a reference to the script to > docs/manual/common-usage.txt. No full documentation is needed, you can just > point to the --help option (otherwise it would just get out of sync). Agreed. > > +# This code relies on the output CSV files of 'make graph-size'. If you > > +# are comparing to an older Buildroot release, you can consider backporting > > +# that feature. See commits: > > +# 99aca138c20e1259ac8c23252c223e150d460482 (instrumentation hook) > > +# 598c80be8fc7b5ddfcba6f7715d90757033723ef (size-stats script) > > +# 24c75fbb38ffeb5c5a7278cf40f4f7cf867502e0 (make graph-size) > > I think this documentation should be part of the argparse description. I tend to disagree here, and instead who advocate to not have such documentation at all. Nowhere do we document which commits should be backported to bring this or that feature to an older Buildroot release, and I don't think we should start doing this. Best regards, Thomas
On 16-01-16 13:51, Thomas Petazzoni wrote: > Arnout, > > On Sat, 16 Jan 2016 01:28:23 +0100, Arnout Vandecappelle wrote: > >> I think you should also add a reference to the script to >> docs/manual/common-usage.txt. No full documentation is needed, you can just >> point to the --help option (otherwise it would just get out of sync). > > Agreed. > > >>> +# This code relies on the output CSV files of 'make graph-size'. If you >>> +# are comparing to an older Buildroot release, you can consider backporting >>> +# that feature. See commits: >>> +# 99aca138c20e1259ac8c23252c223e150d460482 (instrumentation hook) >>> +# 598c80be8fc7b5ddfcba6f7715d90757033723ef (size-stats script) >>> +# 24c75fbb38ffeb5c5a7278cf40f4f7cf867502e0 (make graph-size) >> >> I think this documentation should be part of the argparse description. > > I tend to disagree here, and instead who advocate to not have such > documentation at all. Nowhere do we document which commits should be > backported to bring this or that feature to an older Buildroot release, > and I don't think we should start doing this. OK, I actually meant that the documentation should be part of the description, I tend to agree that it's not necessary to tell people what to backport. Regards, Arnout > > Best regards, > > Thomas >
Hi Arnout, Thanks for all your suggestions. I addressed them in v2, just some note below... On Sat, Jan 16, 2016 at 1:28 AM, Arnout Vandecappelle <arnout@mind.be> wrote: [..] >> + >> +import csv >> +import argparse >> + >> +def read_packages_csv(inputf, detail=None): > > Given the name of the input file read_file_size_stats_csv would be a more > natural name. I removed the _stats_ suffix to match the argument names of size-stats (and also because it reads better). The history of these 'packages' names was because I started out without the --detail option, and read in the package-size-stats.csv files. [..] > >> + for pkg in result.keys(): >> + if abs(result[pkg]) < threshold: > > I think <= would be more natural here. Then you can pass -t 0 to get rid of all > packages that remain equal. > > Also a nice extension for a future patch would be to support K, M, G in the > threshold. > >> + continue >> + print '%12d %s' % (result[pkg], pkg) > > I think it would be better if the output was also CSV. If it is not CSV, please > make it python3-compatible by adding () around the argument. > > A nice additional feature would be if the output CSV would (optionally?) > contain all the info from the original CSVs. But that can be follow-up patch, of > course. > In order to start small, I added these suggestions in a TODO at the top of the script. For an output CSV, I think it should in any case be in addition of standard textual output, not as a replacement. A CSV of the form: Package name, Package size 1, Package size 2, Difference could easily be created, and similar for the --detail switch with files. However, one big accumulated file-size-stats.csv extension including differences looks difficult to me: how would it be organized? Already now, the package size is duplicated for each file belonging to that package. Adding both file and package size differences to each line would become a nightmare. I think we need to choose between package or file output. /Thomas
diff --git a/support/scripts/size-stats-compare b/support/scripts/size-stats-compare new file mode 100755 index 0000000..d7eec09 --- /dev/null +++ b/support/scripts/size-stats-compare @@ -0,0 +1,98 @@ +#!/usr/bin/env python + +# Copyright (C) 2016 Thomas De Schampheleire <thomas.de.schampheleire@gmail.com> + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + +# Compare the rootfs size per package/file with that from another compilation +# (possibly from a different release). + +# This code relies on the output CSV files of 'make graph-size'. If you +# are comparing to an older Buildroot release, you can consider backporting +# that feature. See commits: +# 99aca138c20e1259ac8c23252c223e150d460482 (instrumentation hook) +# 598c80be8fc7b5ddfcba6f7715d90757033723ef (size-stats script) +# 24c75fbb38ffeb5c5a7278cf40f4f7cf867502e0 (make graph-size) + +import csv +import argparse + +def read_packages_csv(inputf, detail=None): + sizes = {} + with open(inputf) as f: + reader = csv.reader(f) + # skip first line (header) + next(reader) + for row in csv.reader(f): + if detail: + sizes[row[0]] = int(row[2]) + else: + sizes[row[1]] = int(row[3]) + return sizes + +def compare_sizes(this, other): + delta = {} + removed = {} + added = {} + thiskeys = set(this.keys()) + otherkeys = set(other.keys()) + + # packages in both + for pkg in thiskeys.intersection(otherkeys): + delta[pkg] = this[pkg] - other[pkg] + # packages only in this + for pkg in thiskeys.difference(otherkeys): + added[pkg] = this[pkg] + # packages only in other + for pkg in otherkeys.difference(thiskeys): + removed[pkg] = -other[pkg] + + return delta, added, removed + +def print_results(result, title, threshold): + print '%s:' % title + for pkg in result.keys(): + if abs(result[pkg]) < threshold: + continue + print '%12d %s' % (result[pkg], pkg) + + +# main ######################################################################### + +parser = argparse.ArgumentParser(description='Compare rootfs size between Buildroot compilations') + +parser.add_argument('-f', '--file-size-csv', + help='CSV file with file and package size statistics') +parser.add_argument('-g', '--other-file-size-csv', + help='CSV file with file and package size statistics to compare with') +parser.add_argument('-d', '--detail', action='store_true', + help='detail with individual files rather than packages') +parser.add_argument('-t', '--threshold', type=int, + help='threshold value: ignore size differences smaller than this value (bytes)') +args = parser.parse_args() + +sizes = read_packages_csv(args.file_size_csv, args.detail) +other_sizes = read_packages_csv(args.other_file_size_csv, args.detail) + +delta, added, removed = compare_sizes(sizes, other_sizes) + +if args.detail: + keyword = 'file' +else: + keyword = 'package' + +print_results(delta, 'Size difference per %s (bytes)' % keyword, args.threshold) +print_results(added, 'Size difference due to added %ss (bytes)' % keyword, args.threshold) +print_results(removed, 'Size difference due to removed %ss (bytes)' % keyword, args.threshold)