Message ID | 50038580.20299907.1486634551103.JavaMail.zimbra@redhat.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, Feb 9, 2017 at 11:02 AM, Jason Wang <jasowang@redhat.com> wrote: > > > ----- Original Message ----- >> Hello, >> >> I've got the following report while running syzkaller fuzzer on mmotm >> (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git) >> remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e: >> > > [...] > >> >> other info that might help us debug this: >> >> Possible interrupt unsafe locking scenario: >> >> CPU0 CPU1 >> ---- ---- >> lock(&(&r->consumer_lock)->rlock); >> local_irq_disable(); >> lock(&(&r->producer_lock)->rlock); >> lock(&(&r->consumer_lock)->rlock); >> <Interrupt> >> lock(&(&r->producer_lock)->rlock); >> > > Thanks a lot for the testing. > > Looks like we could address this by using skb_array_consume_bh() instead. > > Could you pls verify if the following patch works? No, I can't test it, sorry. This happened once on bots. And bots currently test only upstream versions. > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 8a7d6b9..a97c00d 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -520,7 +520,7 @@ static void tun_queue_purge(struct tun_file *tfile) > { > struct sk_buff *skb; > > - while ((skb = skb_array_consume(&tfile->tx_array)) != NULL) > + while ((skb = skb_array_consume_bh(&tfile->tx_array)) != NULL) > kfree_skb(skb); > > skb_queue_purge(&tfile->sk.sk_write_queue); > @@ -1458,7 +1458,7 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock, > struct sk_buff *skb = NULL; > int error = 0; > > - skb = skb_array_consume(&tfile->tx_array); > + skb = skb_array_consume_bh(&tfile->tx_array); > if (skb) > goto out; > if (noblock) { > @@ -1470,7 +1470,7 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock, > current->state = TASK_INTERRUPTIBLE; > > while (1) { > - skb = skb_array_consume(&tfile->tx_array); > + skb = skb_array_consume_bh(&tfile->tx_array); > if (skb) > break; > if (signal_pending(current)) { > > -- > You received this message because you are subscribed to the Google Groups "syzkaller" group. > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout.
On Thu, Feb 09, 2017 at 11:49:30AM +0100, Dmitry Vyukov wrote: > On Thu, Feb 9, 2017 at 11:02 AM, Jason Wang <jasowang@redhat.com> wrote: > > > > > > ----- Original Message ----- > >> Hello, > >> > >> I've got the following report while running syzkaller fuzzer on mmotm > >> (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git) > >> remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e: > >> > > > > [...] > > > >> > >> other info that might help us debug this: > >> > >> Possible interrupt unsafe locking scenario: > >> > >> CPU0 CPU1 > >> ---- ---- > >> lock(&(&r->consumer_lock)->rlock); > >> local_irq_disable(); > >> lock(&(&r->producer_lock)->rlock); > >> lock(&(&r->consumer_lock)->rlock); > >> <Interrupt> > >> lock(&(&r->producer_lock)->rlock); > >> > > > > Thanks a lot for the testing. > > > > Looks like we could address this by using skb_array_consume_bh() instead. > > > > Could you pls verify if the following patch works? > > No, I can't test it, sorry. This happened once on bots. And bots > currently test only upstream versions. Which trees are tested? Will linux-next help? > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > > index 8a7d6b9..a97c00d 100644 > > --- a/drivers/net/tun.c > > +++ b/drivers/net/tun.c > > @@ -520,7 +520,7 @@ static void tun_queue_purge(struct tun_file *tfile) > > { > > struct sk_buff *skb; > > > > - while ((skb = skb_array_consume(&tfile->tx_array)) != NULL) > > + while ((skb = skb_array_consume_bh(&tfile->tx_array)) != NULL) > > kfree_skb(skb); > > > > skb_queue_purge(&tfile->sk.sk_write_queue); > > @@ -1458,7 +1458,7 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock, > > struct sk_buff *skb = NULL; > > int error = 0; > > > > - skb = skb_array_consume(&tfile->tx_array); > > + skb = skb_array_consume_bh(&tfile->tx_array); > > if (skb) > > goto out; > > if (noblock) { > > @@ -1470,7 +1470,7 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock, > > current->state = TASK_INTERRUPTIBLE; > > > > while (1) { > > - skb = skb_array_consume(&tfile->tx_array); > > + skb = skb_array_consume_bh(&tfile->tx_array); > > if (skb) > > break; > > if (signal_pending(current)) { > > > > -- > > You received this message because you are subscribed to the Google Groups "syzkaller" group. > > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com. > > For more options, visit https://groups.google.com/d/optout.
On Thu, Feb 9, 2017 at 6:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Thu, Feb 09, 2017 at 11:49:30AM +0100, Dmitry Vyukov wrote: >> On Thu, Feb 9, 2017 at 11:02 AM, Jason Wang <jasowang@redhat.com> wrote: >> > >> > >> > ----- Original Message ----- >> >> Hello, >> >> >> >> I've got the following report while running syzkaller fuzzer on mmotm >> >> (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git) >> >> remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e: >> >> >> > >> > [...] >> > >> >> >> >> other info that might help us debug this: >> >> >> >> Possible interrupt unsafe locking scenario: >> >> >> >> CPU0 CPU1 >> >> ---- ---- >> >> lock(&(&r->consumer_lock)->rlock); >> >> local_irq_disable(); >> >> lock(&(&r->producer_lock)->rlock); >> >> lock(&(&r->consumer_lock)->rlock); >> >> <Interrupt> >> >> lock(&(&r->producer_lock)->rlock); >> >> >> > >> > Thanks a lot for the testing. >> > >> > Looks like we could address this by using skb_array_consume_bh() instead. >> > >> > Could you pls verify if the following patch works? >> >> No, I can't test it, sorry. This happened once on bots. And bots >> currently test only upstream versions. > > Which trees are tested? Will linux-next help? Linus tree, linux-next and mmotm at the moment.
On Thu, Feb 09, 2017 at 05:02:31AM -0500, Jason Wang wrote: > > > ----- Original Message ----- > > Hello, > > > > I've got the following report while running syzkaller fuzzer on mmotm > > (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git) > > remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e: > > > > [...] > > > > > other info that might help us debug this: > > > > Possible interrupt unsafe locking scenario: > > > > CPU0 CPU1 > > ---- ---- > > lock(&(&r->consumer_lock)->rlock); > > local_irq_disable(); > > lock(&(&r->producer_lock)->rlock); > > lock(&(&r->consumer_lock)->rlock); > > <Interrupt> > > lock(&(&r->producer_lock)->rlock); > > > > Thanks a lot for the testing. > > Looks like we could address this by using skb_array_consume_bh() instead. > > Could you pls verify if the following patch works? I think we should use _bh for the produce call as well, since resizing takes the producer lock. > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 8a7d6b9..a97c00d 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -520,7 +520,7 @@ static void tun_queue_purge(struct tun_file *tfile) > { > struct sk_buff *skb; > > - while ((skb = skb_array_consume(&tfile->tx_array)) != NULL) > + while ((skb = skb_array_consume_bh(&tfile->tx_array)) != NULL) > kfree_skb(skb); > > skb_queue_purge(&tfile->sk.sk_write_queue); > @@ -1458,7 +1458,7 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock, > struct sk_buff *skb = NULL; > int error = 0; > > - skb = skb_array_consume(&tfile->tx_array); > + skb = skb_array_consume_bh(&tfile->tx_array); > if (skb) > goto out; > if (noblock) { > @@ -1470,7 +1470,7 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock, > current->state = TASK_INTERRUPTIBLE; > > while (1) { > - skb = skb_array_consume(&tfile->tx_array); > + skb = skb_array_consume_bh(&tfile->tx_array); > if (skb) > break; > if (signal_pending(current)) {
On Thu, Feb 09, 2017 at 06:55:41PM +0100, Dmitry Vyukov wrote: > On Thu, Feb 9, 2017 at 6:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Feb 09, 2017 at 11:49:30AM +0100, Dmitry Vyukov wrote: > >> On Thu, Feb 9, 2017 at 11:02 AM, Jason Wang <jasowang@redhat.com> wrote: > >> > > >> > > >> > ----- Original Message ----- > >> >> Hello, > >> >> > >> >> I've got the following report while running syzkaller fuzzer on mmotm > >> >> (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git) > >> >> remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e: > >> >> > >> > > >> > [...] > >> > > >> >> > >> >> other info that might help us debug this: > >> >> > >> >> Possible interrupt unsafe locking scenario: > >> >> > >> >> CPU0 CPU1 > >> >> ---- ---- > >> >> lock(&(&r->consumer_lock)->rlock); > >> >> local_irq_disable(); > >> >> lock(&(&r->producer_lock)->rlock); > >> >> lock(&(&r->consumer_lock)->rlock); > >> >> <Interrupt> > >> >> lock(&(&r->producer_lock)->rlock); > >> >> > >> > > >> > Thanks a lot for the testing. > >> > > >> > Looks like we could address this by using skb_array_consume_bh() instead. > >> > > >> > Could you pls verify if the following patch works? > >> > >> No, I can't test it, sorry. This happened once on bots. And bots > >> currently test only upstream versions. > > > > Which trees are tested? Will linux-next help? > > Linus tree, linux-next and mmotm at the moment. OK that works, I'll add the fix to my tree includes in linux-next.
On 2017年02月09日 18:49, Dmitry Vyukov wrote: > On Thu, Feb 9, 2017 at 11:02 AM, Jason Wang<jasowang@redhat.com> wrote: >> ----- Original Message ----- >>> Hello, >>> >>> I've got the following report while running syzkaller fuzzer on mmotm >>> (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git) >>> remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e: >>> >> [...] >> >>> other info that might help us debug this: >>> >>> Possible interrupt unsafe locking scenario: >>> >>> CPU0 CPU1 >>> ---- ---- >>> lock(&(&r->consumer_lock)->rlock); >>> local_irq_disable(); >>> lock(&(&r->producer_lock)->rlock); >>> lock(&(&r->consumer_lock)->rlock); >>> <Interrupt> >>> lock(&(&r->producer_lock)->rlock); >>> >> Thanks a lot for the testing. >> >> Looks like we could address this by using skb_array_consume_bh() instead. >> >> Could you pls verify if the following patch works? > No, I can't test it, sorry. This happened once on bots. And bots > currently test only upstream versions. > > No problem, will try to test my self. Thanks
On 2017年02月10日 02:10, Michael S. Tsirkin wrote: > On Thu, Feb 09, 2017 at 05:02:31AM -0500, Jason Wang wrote: >> ----- Original Message ----- >>> Hello, >>> >>> I've got the following report while running syzkaller fuzzer on mmotm >>> (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git) >>> remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e: >>> >> [...] >> >>> other info that might help us debug this: >>> >>> Possible interrupt unsafe locking scenario: >>> >>> CPU0 CPU1 >>> ---- ---- >>> lock(&(&r->consumer_lock)->rlock); >>> local_irq_disable(); >>> lock(&(&r->producer_lock)->rlock); >>> lock(&(&r->consumer_lock)->rlock); >>> <Interrupt> >>> lock(&(&r->producer_lock)->rlock); >>> >> Thanks a lot for the testing. >> >> Looks like we could address this by using skb_array_consume_bh() instead. >> >> Could you pls verify if the following patch works? > I think we should use _bh for the produce call as well, > since resizing takes the producer lock. Looks not since irq was disabled during resizing? Thanks
On Fri, Feb 10, 2017 at 6:17 AM, Jason Wang <jasowang@redhat.com> wrote: > > > On 2017年02月10日 02:10, Michael S. Tsirkin wrote: >> >> On Thu, Feb 09, 2017 at 05:02:31AM -0500, Jason Wang wrote: >>> >>> ----- Original Message ----- >>>> >>>> Hello, >>>> >>>> I've got the following report while running syzkaller fuzzer on mmotm >>>> (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git) >>>> remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e: >>>> >>> [...] >>> >>>> other info that might help us debug this: >>>> >>>> Possible interrupt unsafe locking scenario: >>>> >>>> CPU0 CPU1 >>>> ---- ---- >>>> lock(&(&r->consumer_lock)->rlock); >>>> local_irq_disable(); >>>> lock(&(&r->producer_lock)->rlock); >>>> lock(&(&r->consumer_lock)->rlock); >>>> <Interrupt> >>>> lock(&(&r->producer_lock)->rlock); >>>> >>> Thanks a lot for the testing. >>> >>> Looks like we could address this by using skb_array_consume_bh() instead. >>> >>> Could you pls verify if the following patch works? >> >> I think we should use _bh for the produce call as well, >> since resizing takes the producer lock. > > Looks not since irq was disabled during resizing? Hello, Is there a fix for this that we can pick up? This killed 10'000 VMs on our testing infra over the last day. Still happening on linux-next. Thanks
On Sat, Feb 18, 2017 at 6:28 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Fri, Feb 10, 2017 at 6:17 AM, Jason Wang <jasowang@redhat.com> wrote: >> >> >> On 2017年02月10日 02:10, Michael S. Tsirkin wrote: >>> >>> On Thu, Feb 09, 2017 at 05:02:31AM -0500, Jason Wang wrote: >>>> >>>> ----- Original Message ----- >>>>> >>>>> Hello, >>>>> >>>>> I've got the following report while running syzkaller fuzzer on mmotm >>>>> (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git) >>>>> remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e: >>>>> >>>> [...] >>>> >>>>> other info that might help us debug this: >>>>> >>>>> Possible interrupt unsafe locking scenario: >>>>> >>>>> CPU0 CPU1 >>>>> ---- ---- >>>>> lock(&(&r->consumer_lock)->rlock); >>>>> local_irq_disable(); >>>>> lock(&(&r->producer_lock)->rlock); >>>>> lock(&(&r->consumer_lock)->rlock); >>>>> <Interrupt> >>>>> lock(&(&r->producer_lock)->rlock); >>>>> >>>> Thanks a lot for the testing. >>>> >>>> Looks like we could address this by using skb_array_consume_bh() instead. >>>> >>>> Could you pls verify if the following patch works? >>> >>> I think we should use _bh for the produce call as well, >>> since resizing takes the producer lock. >> >> Looks not since irq was disabled during resizing? > > > Hello, > > Is there a fix for this that we can pick up? > This killed 10'000 VMs on our testing infra over the last day. Still > happening on linux-next. Ah, sorry, I see the patch above with skb_array_consume_bh. It's just that it's not in linux-next. Will manually apply it now then. Should we also do something with produce_skb?
On Sat, Feb 18, 2017 at 06:28:39PM +0100, Dmitry Vyukov wrote: > On Fri, Feb 10, 2017 at 6:17 AM, Jason Wang <jasowang@redhat.com> wrote: > > > > > > On 2017年02月10日 02:10, Michael S. Tsirkin wrote: > >> > >> On Thu, Feb 09, 2017 at 05:02:31AM -0500, Jason Wang wrote: > >>> > >>> ----- Original Message ----- > >>>> > >>>> Hello, > >>>> > >>>> I've got the following report while running syzkaller fuzzer on mmotm > >>>> (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git) > >>>> remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e: > >>>> > >>> [...] > >>> > >>>> other info that might help us debug this: > >>>> > >>>> Possible interrupt unsafe locking scenario: > >>>> > >>>> CPU0 CPU1 > >>>> ---- ---- > >>>> lock(&(&r->consumer_lock)->rlock); > >>>> local_irq_disable(); > >>>> lock(&(&r->producer_lock)->rlock); > >>>> lock(&(&r->consumer_lock)->rlock); > >>>> <Interrupt> > >>>> lock(&(&r->producer_lock)->rlock); > >>>> > >>> Thanks a lot for the testing. > >>> > >>> Looks like we could address this by using skb_array_consume_bh() instead. > >>> > >>> Could you pls verify if the following patch works? > >> > >> I think we should use _bh for the produce call as well, > >> since resizing takes the producer lock. > > > > Looks not since irq was disabled during resizing? > > > Hello, > > Is there a fix for this that we can pick up? > This killed 10'000 VMs on our testing infra over the last day. Still > happening on linux-next. > > Thanks I posted a fix. ptr_ring: fix race conditions when resizing Just reposted. I'll push into linux-next ASAP.
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 8a7d6b9..a97c00d 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -520,7 +520,7 @@ static void tun_queue_purge(struct tun_file *tfile) { struct sk_buff *skb; - while ((skb = skb_array_consume(&tfile->tx_array)) != NULL) + while ((skb = skb_array_consume_bh(&tfile->tx_array)) != NULL) kfree_skb(skb); skb_queue_purge(&tfile->sk.sk_write_queue); @@ -1458,7 +1458,7 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock, struct sk_buff *skb = NULL; int error = 0; - skb = skb_array_consume(&tfile->tx_array); + skb = skb_array_consume_bh(&tfile->tx_array); if (skb) goto out; if (noblock) { @@ -1470,7 +1470,7 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock, current->state = TASK_INTERRUPTIBLE; while (1) { - skb = skb_array_consume(&tfile->tx_array); + skb = skb_array_consume_bh(&tfile->tx_array); if (skb) break; if (signal_pending(current)) {