diff mbox series

[2/3] chardev: report blocked write to chardev backend

Message ID 20231116115354.228678-2-npiggin@gmail.com
State New
Headers show
Series [1/3] tests/avocado: reverse_debugging drain console to prevent hang | expand

Commit Message

Nicholas Piggin Nov. 16, 2023, 11:53 a.m. UTC
If a chardev socket is not read, it will eventually fill and QEMU
can block attempting to write to it. A difficult bug in avocado
tests where the console socket was not being read from caused this
hang.

warn if a chardev write is blocked for 100ms.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
This is not necessary for the fix but it does trigger in the
failing avocado test without the previous patch applied. Maybe
it would be helpful?

Thanks,
Nick

 chardev/char.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Marc-André Lureau Nov. 20, 2023, 12:06 p.m. UTC | #1
Hi

On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> If a chardev socket is not read, it will eventually fill and QEMU
> can block attempting to write to it. A difficult bug in avocado
> tests where the console socket was not being read from caused this
> hang.
>
> warn if a chardev write is blocked for 100ms.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> This is not necessary for the fix but it does trigger in the
> failing avocado test without the previous patch applied. Maybe
> it would be helpful?
>
> Thanks,
> Nick
>
>  chardev/char.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/chardev/char.c b/chardev/char.c
> index 996a024c7a..7c375e3cc4 100644
> --- a/chardev/char.c
> +++ b/chardev/char.c
> @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
>  {
>      ChardevClass *cc = CHARDEV_GET_CLASS(s);
>      int res = 0;
> +    int nr_retries = 0;
> +
>      *offset = 0;
>
>      qemu_mutex_lock(&s->chr_write_lock);
> @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
>              } else {
>                  g_usleep(100);
>              }
> +            if (++nr_retries == 1000) { /* 100ms */
> +                warn_report("Chardev '%s' write blocked for > 100ms, "
> +                            "socket buffer full?", s->label);
> +            }

That shouldn't happen, the frontend should poll and only write when it
can. What is the qemu command being used here?
I think this change can be worth for debugging though.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Nicholas Piggin Nov. 20, 2023, 1:35 p.m. UTC | #2
On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
> Hi
>
> On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > If a chardev socket is not read, it will eventually fill and QEMU
> > can block attempting to write to it. A difficult bug in avocado
> > tests where the console socket was not being read from caused this
> > hang.
> >
> > warn if a chardev write is blocked for 100ms.
> >
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> > This is not necessary for the fix but it does trigger in the
> > failing avocado test without the previous patch applied. Maybe
> > it would be helpful?
> >
> > Thanks,
> > Nick
> >
> >  chardev/char.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/chardev/char.c b/chardev/char.c
> > index 996a024c7a..7c375e3cc4 100644
> > --- a/chardev/char.c
> > +++ b/chardev/char.c
> > @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
> >  {
> >      ChardevClass *cc = CHARDEV_GET_CLASS(s);
> >      int res = 0;
> > +    int nr_retries = 0;
> > +
> >      *offset = 0;
> >
> >      qemu_mutex_lock(&s->chr_write_lock);
> > @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
> >              } else {
> >                  g_usleep(100);
> >              }
> > +            if (++nr_retries == 1000) { /* 100ms */
> > +                warn_report("Chardev '%s' write blocked for > 100ms, "
> > +                            "socket buffer full?", s->label);
> > +            }
>
> That shouldn't happen, the frontend should poll and only write when it
> can. What is the qemu command being used here?

You can follow it through the thread here

https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/

In short, a console device is attached to a socket pair and nothing
ever reads from it. It eventually fills, and writing to it fails
indefinitely here.

It can be reproduced with:

make check-avocado
AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries


> I think this change can be worth for debugging though.
>
> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Thanks,
Nick
Marc-André Lureau Nov. 21, 2023, 9:39 a.m. UTC | #3
Hi

On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
> > Hi
> >
> > On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> > >
> > > If a chardev socket is not read, it will eventually fill and QEMU
> > > can block attempting to write to it. A difficult bug in avocado
> > > tests where the console socket was not being read from caused this
> > > hang.
> > >
> > > warn if a chardev write is blocked for 100ms.
> > >
> > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > ---
> > > This is not necessary for the fix but it does trigger in the
> > > failing avocado test without the previous patch applied. Maybe
> > > it would be helpful?
> > >
> > > Thanks,
> > > Nick
> > >
> > >  chardev/char.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/chardev/char.c b/chardev/char.c
> > > index 996a024c7a..7c375e3cc4 100644
> > > --- a/chardev/char.c
> > > +++ b/chardev/char.c
> > > @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
> > >  {
> > >      ChardevClass *cc = CHARDEV_GET_CLASS(s);
> > >      int res = 0;
> > > +    int nr_retries = 0;
> > > +
> > >      *offset = 0;
> > >
> > >      qemu_mutex_lock(&s->chr_write_lock);
> > > @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
> > >              } else {
> > >                  g_usleep(100);
> > >              }
> > > +            if (++nr_retries == 1000) { /* 100ms */
> > > +                warn_report("Chardev '%s' write blocked for > 100ms, "
> > > +                            "socket buffer full?", s->label);
> > > +            }
> >
> > That shouldn't happen, the frontend should poll and only write when it
> > can. What is the qemu command being used here?
>
> You can follow it through the thread here
>
> https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
>
> In short, a console device is attached to a socket pair and nothing
> ever reads from it. It eventually fills, and writing to it fails
> indefinitely here.
>
> It can be reproduced with:
>
> make check-avocado
> AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries
>
>

How reliably? I tried 10/10.

> > I think this change can be worth for debugging though.
> >
> > Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>
> Thanks,
> Nick
>
Daniel P. Berrangé Nov. 21, 2023, 9:42 a.m. UTC | #4
On Tue, Nov 21, 2023 at 01:39:03PM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
> > > Hi
> > >
> > > On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> > > >
> > > > If a chardev socket is not read, it will eventually fill and QEMU
> > > > can block attempting to write to it. A difficult bug in avocado
> > > > tests where the console socket was not being read from caused this
> > > > hang.
> > > >
> > > > warn if a chardev write is blocked for 100ms.
> > > >
> > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > > ---
> > > > This is not necessary for the fix but it does trigger in the
> > > > failing avocado test without the previous patch applied. Maybe
> > > > it would be helpful?
> > > >
> > > > Thanks,
> > > > Nick
> > > >
> > > >  chardev/char.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/chardev/char.c b/chardev/char.c
> > > > index 996a024c7a..7c375e3cc4 100644
> > > > --- a/chardev/char.c
> > > > +++ b/chardev/char.c
> > > > @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
> > > >  {
> > > >      ChardevClass *cc = CHARDEV_GET_CLASS(s);
> > > >      int res = 0;
> > > > +    int nr_retries = 0;
> > > > +
> > > >      *offset = 0;
> > > >
> > > >      qemu_mutex_lock(&s->chr_write_lock);
> > > > @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
> > > >              } else {
> > > >                  g_usleep(100);
> > > >              }
> > > > +            if (++nr_retries == 1000) { /* 100ms */
> > > > +                warn_report("Chardev '%s' write blocked for > 100ms, "
> > > > +                            "socket buffer full?", s->label);
> > > > +            }
> > >
> > > That shouldn't happen, the frontend should poll and only write when it
> > > can. What is the qemu command being used here?
> >
> > You can follow it through the thread here
> >
> > https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
> >
> > In short, a console device is attached to a socket pair and nothing
> > ever reads from it. It eventually fills, and writing to it fails
> > indefinitely here.
> >
> > It can be reproduced with:
> >
> > make check-avocado
> > AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries
> >
> >
> 
> How reliably? I tried 10/10.

It reproduced 100% reliably, but note git master is fixed now, so to
test you'll need to revert cd43f00524070c0267613acc98a153dba0e398d9

With regards,
Daniel
Thomas Huth Nov. 21, 2023, 9:44 a.m. UTC | #5
On 21/11/2023 10.39, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
>>> Hi
>>>
>>> On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>>>
>>>> If a chardev socket is not read, it will eventually fill and QEMU
>>>> can block attempting to write to it. A difficult bug in avocado
>>>> tests where the console socket was not being read from caused this
>>>> hang.
>>>>
>>>> warn if a chardev write is blocked for 100ms.
>>>>
>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>> ---
>>>> This is not necessary for the fix but it does trigger in the
>>>> failing avocado test without the previous patch applied. Maybe
>>>> it would be helpful?
>>>>
>>>> Thanks,
>>>> Nick
>>>>
>>>>   chardev/char.c | 6 ++++++
>>>>   1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/chardev/char.c b/chardev/char.c
>>>> index 996a024c7a..7c375e3cc4 100644
>>>> --- a/chardev/char.c
>>>> +++ b/chardev/char.c
>>>> @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
>>>>   {
>>>>       ChardevClass *cc = CHARDEV_GET_CLASS(s);
>>>>       int res = 0;
>>>> +    int nr_retries = 0;
>>>> +
>>>>       *offset = 0;
>>>>
>>>>       qemu_mutex_lock(&s->chr_write_lock);
>>>> @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
>>>>               } else {
>>>>                   g_usleep(100);
>>>>               }
>>>> +            if (++nr_retries == 1000) { /* 100ms */
>>>> +                warn_report("Chardev '%s' write blocked for > 100ms, "
>>>> +                            "socket buffer full?", s->label);
>>>> +            }
>>>
>>> That shouldn't happen, the frontend should poll and only write when it
>>> can. What is the qemu command being used here?
>>
>> You can follow it through the thread here
>>
>> https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
>>
>> In short, a console device is attached to a socket pair and nothing
>> ever reads from it. It eventually fills, and writing to it fails
>> indefinitely here.
>>
>> It can be reproduced with:
>>
>> make check-avocado
>> AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries
>>
>>
> 
> How reliably? I tried 10/10.

It used to fail for me every time I tried - but the fix has already been 
merged yesterday (commit cd43f00524070c026), so if you updated today, you'll 
see the test passing again.

  Thomas
Marc-André Lureau Nov. 21, 2023, 11:47 a.m. UTC | #6
Hi

On Tue, Nov 21, 2023 at 1:45 PM Thomas Huth <thuth@redhat.com> wrote:
>
> On 21/11/2023 10.39, Marc-André Lureau wrote:
> > Hi
> >
> > On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >>
> >> On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
> >>> Hi
> >>>
> >>> On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >>>>
> >>>> If a chardev socket is not read, it will eventually fill and QEMU
> >>>> can block attempting to write to it. A difficult bug in avocado
> >>>> tests where the console socket was not being read from caused this
> >>>> hang.
> >>>>
> >>>> warn if a chardev write is blocked for 100ms.
> >>>>
> >>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >>>> ---
> >>>> This is not necessary for the fix but it does trigger in the
> >>>> failing avocado test without the previous patch applied. Maybe
> >>>> it would be helpful?
> >>>>
> >>>> Thanks,
> >>>> Nick
> >>>>
> >>>>   chardev/char.c | 6 ++++++
> >>>>   1 file changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/chardev/char.c b/chardev/char.c
> >>>> index 996a024c7a..7c375e3cc4 100644
> >>>> --- a/chardev/char.c
> >>>> +++ b/chardev/char.c
> >>>> @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
> >>>>   {
> >>>>       ChardevClass *cc = CHARDEV_GET_CLASS(s);
> >>>>       int res = 0;
> >>>> +    int nr_retries = 0;
> >>>> +
> >>>>       *offset = 0;
> >>>>
> >>>>       qemu_mutex_lock(&s->chr_write_lock);
> >>>> @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
> >>>>               } else {
> >>>>                   g_usleep(100);
> >>>>               }
> >>>> +            if (++nr_retries == 1000) { /* 100ms */
> >>>> +                warn_report("Chardev '%s' write blocked for > 100ms, "
> >>>> +                            "socket buffer full?", s->label);
> >>>> +            }
> >>>
> >>> That shouldn't happen, the frontend should poll and only write when it
> >>> can. What is the qemu command being used here?
> >>
> >> You can follow it through the thread here
> >>
> >> https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
> >>
> >> In short, a console device is attached to a socket pair and nothing
> >> ever reads from it. It eventually fills, and writing to it fails
> >> indefinitely here.
> >>
> >> It can be reproduced with:
> >>
> >> make check-avocado
> >> AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries
> >>
> >>
> >
> > How reliably? I tried 10/10.
>
> It used to fail for me every time I tried - but the fix has already been
> merged yesterday (commit cd43f00524070c026), so if you updated today, you'll
> see the test passing again.

Ok so the "frontend" is spapr-vty and there:

void vty_putchars(SpaprVioDevice *sdev, uint8_t *buf, int len)
{
    SpaprVioVty *dev = VIO_SPAPR_VTY_DEVICE(sdev);

    /* XXX this blocks entire thread. Rewrite to use
     * qemu_chr_fe_write and background I/O callbacks */
    qemu_chr_fe_write_all(&dev->chardev, buf, len);
}

(grep "XXX this blocks", we have a lot...)

Can H_PUT_TERM_CHAR return the number of bytes written?

Is there a way to tell the guest the console is ready to accept more bytes?
Alex Bennée Nov. 22, 2023, 9:55 a.m. UTC | #7
Marc-André Lureau <marcandre.lureau@redhat.com> writes:

> Hi
>
> On Tue, Nov 21, 2023 at 1:45 PM Thomas Huth <thuth@redhat.com> wrote:
>>
>> On 21/11/2023 10.39, Marc-André Lureau wrote:
>> > Hi
>> >
>> > On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>> >>
>> >> On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
>> >>> Hi
>> >>>
>> >>> On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>> >>>>
>> >>>> If a chardev socket is not read, it will eventually fill and QEMU
>> >>>> can block attempting to write to it. A difficult bug in avocado
>> >>>> tests where the console socket was not being read from caused this
>> >>>> hang.
>> >>>>
>> >>>> warn if a chardev write is blocked for 100ms.
>> >>>>
>> >>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
<snip>
>> >>>> index 996a024c7a..7c375e3cc4 100644
>> >>>> --- a/chardev/char.c
>> >>>> +++ b/chardev/char.c
>> >>>> @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
>> >>>>   {
>> >>>>       ChardevClass *cc = CHARDEV_GET_CLASS(s);
>> >>>>       int res = 0;
>> >>>> +    int nr_retries = 0;
>> >>>> +
>> >>>>       *offset = 0;
>> >>>>
>> >>>>       qemu_mutex_lock(&s->chr_write_lock);
>> >>>> @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
>> >>>>               } else {
>> >>>>                   g_usleep(100);
>> >>>>               }
>> >>>> +            if (++nr_retries == 1000) { /* 100ms */
>> >>>> +                warn_report("Chardev '%s' write blocked for > 100ms, "
>> >>>> +                            "socket buffer full?", s->label);
>> >>>> +            }
>> >>>
>> >>> That shouldn't happen, the frontend should poll and only write when it
>> >>> can. What is the qemu command being used here?
<snip>
>
> Ok so the "frontend" is spapr-vty and there:
>
> void vty_putchars(SpaprVioDevice *sdev, uint8_t *buf, int len)
> {
>     SpaprVioVty *dev = VIO_SPAPR_VTY_DEVICE(sdev);
>
>     /* XXX this blocks entire thread. Rewrite to use
>      * qemu_chr_fe_write and background I/O callbacks */
>     qemu_chr_fe_write_all(&dev->chardev, buf, len);
> }
>
> (grep "XXX this blocks", we have a lot...)
>
> Can H_PUT_TERM_CHAR return the number of bytes written?
>
> Is there a way to tell the guest the console is ready to accept more bytes?

See also:

  Message-ID: <20231109192814.95977-1-philmd@linaro.org>
  Date: Thu,  9 Nov 2023 20:28:04 +0100
  Subject: [PATCH-for-8.2 v4 00/10] hw/char/pl011: Implement TX (async) FIFO to avoid blocking the main loop
  From: =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= <philmd@linaro.org>

Although it didn't make it into 8.2. I was hoping it would be a template
for fixing up the other cases.
Thomas Huth Nov. 22, 2023, 10:38 a.m. UTC | #8
On 21/11/2023 12.47, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Nov 21, 2023 at 1:45 PM Thomas Huth <thuth@redhat.com> wrote:
>>
>> On 21/11/2023 10.39, Marc-André Lureau wrote:
>>> Hi
>>>
>>> On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>>>
>>>> On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
>>>>> Hi
>>>>>
>>>>> On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>>>>>
>>>>>> If a chardev socket is not read, it will eventually fill and QEMU
>>>>>> can block attempting to write to it. A difficult bug in avocado
>>>>>> tests where the console socket was not being read from caused this
>>>>>> hang.
>>>>>>
>>>>>> warn if a chardev write is blocked for 100ms.
>>>>>>
>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>>> ---
>>>>>> This is not necessary for the fix but it does trigger in the
>>>>>> failing avocado test without the previous patch applied. Maybe
>>>>>> it would be helpful?
>>>>>>
>>>>>> Thanks,
>>>>>> Nick
>>>>>>
>>>>>>    chardev/char.c | 6 ++++++
>>>>>>    1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/chardev/char.c b/chardev/char.c
>>>>>> index 996a024c7a..7c375e3cc4 100644
>>>>>> --- a/chardev/char.c
>>>>>> +++ b/chardev/char.c
>>>>>> @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
>>>>>>    {
>>>>>>        ChardevClass *cc = CHARDEV_GET_CLASS(s);
>>>>>>        int res = 0;
>>>>>> +    int nr_retries = 0;
>>>>>> +
>>>>>>        *offset = 0;
>>>>>>
>>>>>>        qemu_mutex_lock(&s->chr_write_lock);
>>>>>> @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
>>>>>>                } else {
>>>>>>                    g_usleep(100);
>>>>>>                }
>>>>>> +            if (++nr_retries == 1000) { /* 100ms */
>>>>>> +                warn_report("Chardev '%s' write blocked for > 100ms, "
>>>>>> +                            "socket buffer full?", s->label);
>>>>>> +            }
>>>>>
>>>>> That shouldn't happen, the frontend should poll and only write when it
>>>>> can. What is the qemu command being used here?
>>>>
>>>> You can follow it through the thread here
>>>>
>>>> https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
>>>>
>>>> In short, a console device is attached to a socket pair and nothing
>>>> ever reads from it. It eventually fills, and writing to it fails
>>>> indefinitely here.
>>>>
>>>> It can be reproduced with:
>>>>
>>>> make check-avocado
>>>> AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries
>>>>
>>>>
>>>
>>> How reliably? I tried 10/10.
>>
>> It used to fail for me every time I tried - but the fix has already been
>> merged yesterday (commit cd43f00524070c026), so if you updated today, you'll
>> see the test passing again.
> 
> Ok so the "frontend" is spapr-vty and there:
> 
> void vty_putchars(SpaprVioDevice *sdev, uint8_t *buf, int len)
> {
>      SpaprVioVty *dev = VIO_SPAPR_VTY_DEVICE(sdev);
> 
>      /* XXX this blocks entire thread. Rewrite to use
>       * qemu_chr_fe_write and background I/O callbacks */
>      qemu_chr_fe_write_all(&dev->chardev, buf, len);
> }
> 
> (grep "XXX this blocks", we have a lot...)
> 
> Can H_PUT_TERM_CHAR return the number of bytes written?

You can find the definition of the hypercall in the LoPAPR spec:

  https://elinux.org/images/a/a4/LoPAPR_DRAFT_v11_24March2016.pdf

... and if I get it right, it does not have a way to tell the guest the 
amount of accepted characters. But it could return H_BUSY if it is not able 
to enqueue all characters at once. As far as I can see, this will make the 
guest spin until it can finally send out the characters... not sure whether 
that's so much better...?

  Thomas
Daniel P. Berrangé Nov. 22, 2023, 10:42 a.m. UTC | #9
On Wed, Nov 22, 2023 at 11:38:28AM +0100, Thomas Huth wrote:
> On 21/11/2023 12.47, Marc-André Lureau wrote:
> > Hi
> > 
> > On Tue, Nov 21, 2023 at 1:45 PM Thomas Huth <thuth@redhat.com> wrote:
> > > 
> > > On 21/11/2023 10.39, Marc-André Lureau wrote:
> > > > Hi
> > > > 
> > > > On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> > > > > 
> > > > > On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
> > > > > > Hi
> > > > > > 
> > > > > > On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> > > > > > > 
> > > > > > > If a chardev socket is not read, it will eventually fill and QEMU
> > > > > > > can block attempting to write to it. A difficult bug in avocado
> > > > > > > tests where the console socket was not being read from caused this
> > > > > > > hang.
> > > > > > > 
> > > > > > > warn if a chardev write is blocked for 100ms.
> > > > > > > 
> > > > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > > > > > ---
> > > > > > > This is not necessary for the fix but it does trigger in the
> > > > > > > failing avocado test without the previous patch applied. Maybe
> > > > > > > it would be helpful?
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Nick
> > > > > > > 
> > > > > > >    chardev/char.c | 6 ++++++
> > > > > > >    1 file changed, 6 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/chardev/char.c b/chardev/char.c
> > > > > > > index 996a024c7a..7c375e3cc4 100644
> > > > > > > --- a/chardev/char.c
> > > > > > > +++ b/chardev/char.c
> > > > > > > @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
> > > > > > >    {
> > > > > > >        ChardevClass *cc = CHARDEV_GET_CLASS(s);
> > > > > > >        int res = 0;
> > > > > > > +    int nr_retries = 0;
> > > > > > > +
> > > > > > >        *offset = 0;
> > > > > > > 
> > > > > > >        qemu_mutex_lock(&s->chr_write_lock);
> > > > > > > @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
> > > > > > >                } else {
> > > > > > >                    g_usleep(100);
> > > > > > >                }
> > > > > > > +            if (++nr_retries == 1000) { /* 100ms */
> > > > > > > +                warn_report("Chardev '%s' write blocked for > 100ms, "
> > > > > > > +                            "socket buffer full?", s->label);
> > > > > > > +            }
> > > > > > 
> > > > > > That shouldn't happen, the frontend should poll and only write when it
> > > > > > can. What is the qemu command being used here?
> > > > > 
> > > > > You can follow it through the thread here
> > > > > 
> > > > > https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
> > > > > 
> > > > > In short, a console device is attached to a socket pair and nothing
> > > > > ever reads from it. It eventually fills, and writing to it fails
> > > > > indefinitely here.
> > > > > 
> > > > > It can be reproduced with:
> > > > > 
> > > > > make check-avocado
> > > > > AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries
> > > > > 
> > > > > 
> > > > 
> > > > How reliably? I tried 10/10.
> > > 
> > > It used to fail for me every time I tried - but the fix has already been
> > > merged yesterday (commit cd43f00524070c026), so if you updated today, you'll
> > > see the test passing again.
> > 
> > Ok so the "frontend" is spapr-vty and there:
> > 
> > void vty_putchars(SpaprVioDevice *sdev, uint8_t *buf, int len)
> > {
> >      SpaprVioVty *dev = VIO_SPAPR_VTY_DEVICE(sdev);
> > 
> >      /* XXX this blocks entire thread. Rewrite to use
> >       * qemu_chr_fe_write and background I/O callbacks */
> >      qemu_chr_fe_write_all(&dev->chardev, buf, len);
> > }
> > 
> > (grep "XXX this blocks", we have a lot...)
> > 
> > Can H_PUT_TERM_CHAR return the number of bytes written?
> 
> You can find the definition of the hypercall in the LoPAPR spec:
> 
>  https://elinux.org/images/a/a4/LoPAPR_DRAFT_v11_24March2016.pdf
> 
> ... and if I get it right, it does not have a way to tell the guest the
> amount of accepted characters. But it could return H_BUSY if it is not able
> to enqueue all characters at once. As far as I can see, this will make the
> guest spin until it can finally send out the characters... not sure whether
> that's so much better...?

If the rest of the guest can get on with useful work that's better. If we
block in QEMU, the entire guest hardware emulation is blocked so nothing
can make progress if it exits from vCPU run context.

With regards,
Daniel
diff mbox series

Patch

diff --git a/chardev/char.c b/chardev/char.c
index 996a024c7a..7c375e3cc4 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -114,6 +114,8 @@  static int qemu_chr_write_buffer(Chardev *s,
 {
     ChardevClass *cc = CHARDEV_GET_CLASS(s);
     int res = 0;
+    int nr_retries = 0;
+
     *offset = 0;
 
     qemu_mutex_lock(&s->chr_write_lock);
@@ -126,6 +128,10 @@  static int qemu_chr_write_buffer(Chardev *s,
             } else {
                 g_usleep(100);
             }
+            if (++nr_retries == 1000) { /* 100ms */
+                warn_report("Chardev '%s' write blocked for > 100ms, "
+                            "socket buffer full?", s->label);
+            }
             goto retry;
         }