diff mbox series

[v2,02/12] tests/avocado: improve behaviour waiting for login prompts

Message ID 20221111145529.4020801-3-alex.bennee@linaro.org
State New
Headers show
Series testing, docs, plugins, arm pre-PR | expand

Commit Message

Alex Bennée Nov. 11, 2022, 2:55 p.m. UTC
This attempts to deal with the problem of login prompts not being
guaranteed to be terminated with a newline. The solution to this is to
peek at the incoming data looking to see if we see an up-coming match
before we fall back to the old readline() logic. The reason to mostly
rely on readline is because I am occasionally seeing the peek stalling
despite data being there.

This seems kinda hacky and gross so I'm open to alternative approaches
and cleaner python code.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

---
v2
  - remove superfluous /r/n
---
 tests/avocado/avocado_qemu/__init__.py | 89 +++++++++++++++++++++++++-
 1 file changed, 88 insertions(+), 1 deletion(-)

Comments

Peter Maydell Nov. 14, 2022, 4:28 p.m. UTC | #1
On Fri, 11 Nov 2022 at 14:58, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> This attempts to deal with the problem of login prompts not being
> guaranteed to be terminated with a newline. The solution to this is to
> peek at the incoming data looking to see if we see an up-coming match
> before we fall back to the old readline() logic. The reason to mostly
> rely on readline is because I am occasionally seeing the peek stalling
> despite data being there.
>
> This seems kinda hacky and gross so I'm open to alternative approaches
> and cleaner python code.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

With this patch, the evb_sdk test fails:

Fetching asset from
./build/arm-clang/tests/avocado/machine_aspeed.py:AST2x00MachineSDK.test_arm_ast2500_evb_sdk
JOB ID     : 542e050c4f7e1ddd6d5cdd350e4c26e1bdfcdee4
JOB LOG    : /home/petmay01/avocado/job-results/job-2022-11-14T16.21-542e050/job.log
 (1/1) ./build/arm-clang/tests/avocado/machine_aspeed.py:AST2x00MachineSDK.test_arm_ast2500_evb_sdk:
ERROR: log() missing 1 required positional argument: 'msg' (82.57 s)
RESULTS    : PASS 0 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0
| CANCEL 0
JOB TIME   : 84.09 s

The avocado log reports a traceback where Python has thrown a
UnicodeDecodeError, and then subsequently an attempted debug
message in the error-handling path has a syntax error
("log() missing 1 required positional argument"):

2022-11-14 16:22:47,952 __init__         L0240 DEBUG| Stopping Host
logger for ttyVUART0...
2022-11-14 16:22:48,240 __init__         L0240 DEBUG| Starting Wait
for /xyz/ope…control/host0/boot/one_time...
2022-11-14 16:22:48,570 stacktrace       L0039 ERROR|
2022-11-14 16:22:48,570 stacktrace       L0041 ERROR| Reproduced
traceback from: /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm
-clang/tests/venv/lib/python3.8/site-packages/avocado/core/test.py:770
2022-11-14 16:22:48,572 stacktrace       L0045 ERROR| Traceback (most
recent call last):
2022-11-14 16:22:48,572 stacktrace       L0045 ERROR|   File
"/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/avocad
o/avocado_qemu/__init__.py", line 149, in _peek_ahead
2022-11-14 16:22:48,572 stacktrace       L0045 ERROR|     peek_ahead =
console.peek(min_match).decode()
2022-11-14 16:22:48,572 stacktrace       L0045 ERROR|
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position
44-45: unexpec
ted end of data
2022-11-14 16:22:48,572 stacktrace       L0045 ERROR|
2022-11-14 16:22:48,572 stacktrace       L0045 ERROR| During handling
of the above exception, another exception occurred:
2022-11-14 16:22:48,572 stacktrace       L0045 ERROR|
2022-11-14 16:22:48,572 stacktrace       L0045 ERROR| Traceback (most
recent call last):
2022-11-14 16:22:48,572 stacktrace       L0045 ERROR|   File
"/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/venv/l
ib/python3.8/site-packages/avocado/core/decorators.py", line 90, in wrapper
2022-11-14 16:22:48,572 stacktrace       L0045 ERROR|     return
function(obj, *args, **kwargs)
2022-11-14 16:22:48,572 stacktrace       L0045 ERROR|   File
"/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/avocad
o/machine_aspeed.py", line 225, in test_arm_ast2500_evb_sdk
2022-11-14 16:22:48,572 stacktrace       L0045 ERROR|
self.wait_for_console_pattern('ast2500-default login:')
2022-11-14 16:22:48,573 stacktrace       L0045 ERROR|   File
"/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/avocad
o/machine_aspeed.py", line 193, in wait_for_console_pattern
2022-11-14 16:22:48,573 stacktrace       L0045 ERROR|
wait_for_console_pattern(self, success_message,
2022-11-14 16:22:48,573 stacktrace       L0045 ERROR|   File
"/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/avocad
o/avocado_qemu/__init__.py", line 286, in wait_for_console_pattern
2022-11-14 16:22:48,573 stacktrace       L0045 ERROR|
_console_interaction(test, success_message, failure_message, None,
vm=vm)
2022-11-14 16:22:48,573 stacktrace       L0045 ERROR|   File
"/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/avocad
o/avocado_qemu/__init__.py", line 226, in _console_interaction
2022-11-14 16:22:48,573 stacktrace       L0045 ERROR|     msg =
_peek_ahead(console, min_match, success_message)
2022-11-14 16:22:48,573 stacktrace       L0045 ERROR|   File
"/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/avocad
o/avocado_qemu/__init__.py", line 180, in _peek_ahead
2022-11-14 16:22:48,573 stacktrace       L0045 ERROR|
console_logger.log("error in decode of peek")
2022-11-14 16:22:48,573 stacktrace       L0045 ERROR| TypeError: log()
missing 1 required positional argument: 'msg'
2022-11-14 16:22:48,573 stacktrace       L0046 ERROR|
2022-11-14 16:22:48,573 test             L0775 DEBUG| Local variables:
2022-11-14 16:22:48,605 test             L0778 DEBUG|  -> obj <class
'machine_aspeed.AST2x00MachineSDK'>:
1-./build/arm-clang/tests/avocado/machine_aspeed.py:AST2x00MachineSDK.test_arm_ast2500_evb_sdk
2022-11-14 16:22:48,605 test             L0778 DEBUG|  -> args <class
'tuple'>: ()
2022-11-14 16:22:48,605 test             L0778 DEBUG|  -> kwargs
<class 'dict'>: {}
2022-11-14 16:22:48,605 test             L0778 DEBUG|  -> condition
<class 'NoneType'>: None
2022-11-14 16:22:48,605 test             L0778 DEBUG|  -> function
<class 'function'>: <function
AST2x00MachineSDK.test_arm_ast2500_evb_sdk at 0x7f05ee0678b0>
2022-11-14 16:22:48,606 test             L0778 DEBUG|  -> message
<class 'str'>: Running on GitLab
2022-11-14 16:22:48,606 test             L0778 DEBUG|  -> negate
<class 'bool'>: False

thanks
-- PMM
Philippe Mathieu-Daudé Nov. 14, 2022, 10:15 p.m. UTC | #2
On 14/11/22 17:28, Peter Maydell wrote:
> On Fri, 11 Nov 2022 at 14:58, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> This attempts to deal with the problem of login prompts not being
>> guaranteed to be terminated with a newline. The solution to this is to
>> peek at the incoming data looking to see if we see an up-coming match
>> before we fall back to the old readline() logic. The reason to mostly
>> rely on readline is because I am occasionally seeing the peek stalling
>> despite data being there.
>>
>> This seems kinda hacky and gross so I'm open to alternative approaches
>> and cleaner python code.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> 
> With this patch, the evb_sdk test fails:
> 
> Fetching asset from
> ./build/arm-clang/tests/avocado/machine_aspeed.py:AST2x00MachineSDK.test_arm_ast2500_evb_sdk
> JOB ID     : 542e050c4f7e1ddd6d5cdd350e4c26e1bdfcdee4
> JOB LOG    : /home/petmay01/avocado/job-results/job-2022-11-14T16.21-542e050/job.log
>   (1/1) ./build/arm-clang/tests/avocado/machine_aspeed.py:AST2x00MachineSDK.test_arm_ast2500_evb_sdk:
> ERROR: log() missing 1 required positional argument: 'msg' (82.57 s)
> RESULTS    : PASS 0 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0
> | CANCEL 0
> JOB TIME   : 84.09 s
> 
> The avocado log reports a traceback where Python has thrown a
> UnicodeDecodeError, and then subsequently an attempted debug
> message in the error-handling path has a syntax error
> ("log() missing 1 required positional argument"):

> _console_interaction(test, success_message, failure_message, None,
> vm=vm)
> 2022-11-14 16:22:48,573 stacktrace       L0045 ERROR|   File
> "/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/avocad
> o/avocado_qemu/__init__.py", line 226, in _console_interaction
> 2022-11-14 16:22:48,573 stacktrace       L0045 ERROR|     msg =
> _peek_ahead(console, min_match, success_message)
> 2022-11-14 16:22:48,573 stacktrace       L0045 ERROR|   File
> "/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/avocad
> o/avocado_qemu/__init__.py", line 180, in _peek_ahead
> 2022-11-14 16:22:48,573 stacktrace       L0045 ERROR|
> console_logger.log("error in decode of peek")
> 2022-11-14 16:22:48,573 stacktrace       L0045 ERROR| TypeError: log()
> missing 1 required positional argument: 'msg'

Indeed, log() expects a Level as first argument. Here we simply want to
report the exception as a warning and continue:

-- >8 --
          except UnicodeDecodeError:
-            console_logger.log("error in decode of peek")
+            console_logger.warning("error in decode of peek")
              return None
---
Cédric Le Goater Nov. 17, 2022, 1:38 p.m. UTC | #3
Hello Alex,

On 11/11/22 15:55, Alex Bennée wrote:
> This attempts to deal with the problem of login prompts not being
> guaranteed to be terminated with a newline. The solution to this is to
> peek at the incoming data looking to see if we see an up-coming match
> before we fall back to the old readline() logic. The reason to mostly
> rely on readline is because I am occasionally seeing the peek stalling
> despite data being there.
> 
> This seems kinda hacky and gross so I'm open to alternative approaches
> and cleaner python code.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

I have pulled this patch in the aspeed tree hoping it would improve tests:

   AST2x00MachineSDK.test_arm_ast2500_evb_sdk
   AST2x00MachineSDK.test_arm_ast2600_evb_sdk

but the failure rate has increased :/ I have seen failures in these also :

   AST2x00Machine.test_arm_ast2500_evb_buildroot
   AST2x00Machine.test_arm_ast2600_evb_buildroot

which used to be quite stable.

Sorry, this is not of much help. the issue might be elsewhere.

Thanks for the time you spend on this.

C.


> ---
> v2
>    - remove superfluous /r/n
> ---
>   tests/avocado/avocado_qemu/__init__.py | 89 +++++++++++++++++++++++++-
>   1 file changed, 88 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/avocado/avocado_qemu/__init__.py b/tests/avocado/avocado_qemu/__init__.py
> index 910f3ba1ea..20cba57161 100644
> --- a/tests/avocado/avocado_qemu/__init__.py
> +++ b/tests/avocado/avocado_qemu/__init__.py
> @@ -131,6 +131,58 @@ def pick_default_qemu_bin(bin_prefix='qemu-system-', arch=None):
>               return path
>       return None
>   
> +def _peek_ahead(console, min_match, success_message):
> +    """
> +    peek ahead in the console stream keeping an eye out for the
> +    success message.
> +
> +    Returns some message to process or None, indicating the normal
> +    readline should occur.
> +    """
> +    console_logger = logging.getLogger('console')
> +    peek_len = 0
> +    retries = 0
> +
> +    while True:
> +        try:
> +            old_peek_len = peek_len
> +            peek_ahead = console.peek(min_match).decode()
> +            peek_len = len(peek_ahead)
> +
> +            # if we get stuck too long lets just fallback to readline
> +            if peek_len <= old_peek_len:
> +                retries += 1
> +                if retries > 10:
> +                    return None
> +
> +            # if we see a newline in the peek we can let safely bail
> +            # and let the normal readline() deal with it
> +            if peek_ahead.endswith(('\n', '\r')):
> +                return None
> +
> +            # if we haven't seen enough for the whole message but the
> +            # first part matches lets just loop again
> +            if len(peek_ahead) < min_match and \
> +               success_message[:peek_len] in peek_ahead:
> +                continue
> +
> +            # if we see the whole success_message we are done, consume
> +            # it and pass back so we can exit to the user
> +            if success_message in peek_ahead:
> +                return console.read(peek_len).decode()
> +
> +            # of course if we've seen enough then this line probably
> +            # doesn't contain what we are looking for, fallback
> +            if peek_len > min_match:
> +                return None
> +
> +        except UnicodeDecodeError:
> +            console_logger.log("error in decode of peek")
> +            return None
> +
> +    # we should never get here
> +    return None
> +
>   
>   def _console_interaction(test, success_message, failure_message,
>                            send_string, keep_sending=False, vm=None):
> @@ -139,17 +191,52 @@ def _console_interaction(test, success_message, failure_message,
>           vm = test.vm
>       console = vm.console_socket.makefile(mode='rb', encoding='utf-8')
>       console_logger = logging.getLogger('console')
> +
> +    # Establish the minimum number of bytes we would need to
> +    # potentially match against success_message
> +    if success_message is not None:
> +        min_match = len(success_message)
> +    else:
> +        min_match = 0
> +
> +    console_logger.debug("looking for %d:(%s), sending %s (always=%s)",
> +                         min_match, success_message, send_string, keep_sending)
> +
>       while True:
> +        msg = None
> +
> +        # First send our string, optionally repeating the send next
> +        # cycle.
>           if send_string:
>               vm.console_socket.sendall(send_string.encode())
>               if not keep_sending:
>                   send_string = None # send only once
> +
> +        # If the console has no data to read we briefly
> +        # sleep before continuing.
> +        if not console.readable():
> +            time.sleep(0.1)
> +            continue
> +
>           try:
> -            msg = console.readline().decode().strip()
> +
> +            # First we shall peek ahead for a potential match to cover waiting
> +            # for lines without any newlines.
> +            if min_match > 0:
> +                msg = _peek_ahead(console, min_match, success_message)
> +
> +            # otherwise we block here for a full line
> +            if not msg:
> +                msg = console.readline().decode().strip()
> +
>           except UnicodeDecodeError:
> +            console_logger.debug("skipped unicode error")
>               msg = None
> +
> +        # if nothing came out we continue and try again
>           if not msg:
>               continue
> +
>           console_logger.debug(msg)
>           if success_message is None or success_message in msg:
>               break
Peter Maydell Nov. 17, 2022, 1:50 p.m. UTC | #4
On Thu, 17 Nov 2022 at 13:39, Cédric Le Goater <clg@kaod.org> wrote:
>
> Hello Alex,
>
> On 11/11/22 15:55, Alex Bennée wrote:
> > This attempts to deal with the problem of login prompts not being
> > guaranteed to be terminated with a newline. The solution to this is to
> > peek at the incoming data looking to see if we see an up-coming match
> > before we fall back to the old readline() logic. The reason to mostly
> > rely on readline is because I am occasionally seeing the peek stalling
> > despite data being there.
> >
> > This seems kinda hacky and gross so I'm open to alternative approaches
> > and cleaner python code.
> >
> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> > Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>
> I have pulled this patch in the aspeed tree hoping it would improve tests:
>
>    AST2x00MachineSDK.test_arm_ast2500_evb_sdk
>    AST2x00MachineSDK.test_arm_ast2600_evb_sdk
>
> but the failure rate has increased :/ I have seen failures in these also :
>
>    AST2x00Machine.test_arm_ast2500_evb_buildroot
>    AST2x00Machine.test_arm_ast2600_evb_buildroot
>
> which used to be quite stable.

This version of this patch is buggy, so you should drop it from
your tree...

-- PMM
Alex Bennée Nov. 17, 2022, 2:04 p.m. UTC | #5
Cédric Le Goater <clg@kaod.org> writes:

> Hello Alex,
>
> On 11/11/22 15:55, Alex Bennée wrote:
>> This attempts to deal with the problem of login prompts not being
>> guaranteed to be terminated with a newline. The solution to this is to
>> peek at the incoming data looking to see if we see an up-coming match
>> before we fall back to the old readline() logic. The reason to mostly
>> rely on readline is because I am occasionally seeing the peek stalling
>> despite data being there.
>> This seems kinda hacky and gross so I'm open to alternative
>> approaches
>> and cleaner python code.
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>
> I have pulled this patch in the aspeed tree hoping it would improve tests:
>
>   AST2x00MachineSDK.test_arm_ast2500_evb_sdk
>   AST2x00MachineSDK.test_arm_ast2600_evb_sdk
>
> but the failure rate has increased :/ I have seen failures in these also :
>
>   AST2x00Machine.test_arm_ast2500_evb_buildroot
>   AST2x00Machine.test_arm_ast2600_evb_buildroot
>
> which used to be quite stable.
>
> Sorry, this is not of much help. the issue might be elsewhere.

Do you see what is happening in the logs? I've made a couple of tweaks
since and it gets through the negotiation but then timesout:

  console: looking for 22:(ast2600-default login:), sending None (always=False)
  /console: [    0.939039] aspeed-pcie 1e7700c0.pcie: [1] : tx idle timeout [0]
  -console: [    1.385144] spi-nor spi0.1: unrecognized JEDEC id bytes: 00 00 00 00 00 00
  console: [    1.413028] spi-nor spi2.0: unrecognized JEDEC id bytes: 00 00 00 00 00 00                                                                                       
  console: [    1.617539] aspeed-rtc 1e781000.rtc: hctosys: unable to read the hardware clock
  -console: rofs = mtd4 squashfs rwfs = mtd5 jffs2
  /console: [   14.563495] systemd[1]: Failed to find module 'autofs4'                  
  -console: [   18.034371] systemd[176]: /lib/systemd/system-generators/systemd-gpt-auto-generator failed with exit status 1.
  |console: [FAILED] Failed to start Intel Power Control for the Host 0.
  /console: [FAILED] Failed to start Phosphor C…istening on device /dev/ttyS2.
  console: [DEPEND] Dependency failed for Host logger for ttyS2.
  -console: Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) nodistro.0 ast2600-default ttyS4
  console: ast2600-default login:
   (always=False)g for 9:(Password:), sending root
  console: root
  |console: Password:
   (always=False)g for 23:(root@ast2600-default:~#), sending 0penBmc
  console: Login timed out after 60 seconds.
  |console: Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) nodistro.0 ast2600-default ttyS4
  /avocado.test:

>
> Thanks for the time you spend on this.
>
> C.
>
>
>> ---
>> v2
>>    - remove superfluous /r/n
>> ---
>>   tests/avocado/avocado_qemu/__init__.py | 89 +++++++++++++++++++++++++-
>>   1 file changed, 88 insertions(+), 1 deletion(-)
>> diff --git a/tests/avocado/avocado_qemu/__init__.py
>> b/tests/avocado/avocado_qemu/__init__.py
>> index 910f3ba1ea..20cba57161 100644
>> --- a/tests/avocado/avocado_qemu/__init__.py
>> +++ b/tests/avocado/avocado_qemu/__init__.py
>> @@ -131,6 +131,58 @@ def pick_default_qemu_bin(bin_prefix='qemu-system-', arch=None):
>>               return path
>>       return None
>>   +def _peek_ahead(console, min_match, success_message):
>> +    """
>> +    peek ahead in the console stream keeping an eye out for the
>> +    success message.
>> +
>> +    Returns some message to process or None, indicating the normal
>> +    readline should occur.
>> +    """
>> +    console_logger = logging.getLogger('console')
>> +    peek_len = 0
>> +    retries = 0
>> +
>> +    while True:
>> +        try:
>> +            old_peek_len = peek_len
>> +            peek_ahead = console.peek(min_match).decode()
>> +            peek_len = len(peek_ahead)
>> +
>> +            # if we get stuck too long lets just fallback to readline
>> +            if peek_len <= old_peek_len:
>> +                retries += 1
>> +                if retries > 10:
>> +                    return None
>> +
>> +            # if we see a newline in the peek we can let safely bail
>> +            # and let the normal readline() deal with it
>> +            if peek_ahead.endswith(('\n', '\r')):
>> +                return None
>> +
>> +            # if we haven't seen enough for the whole message but the
>> +            # first part matches lets just loop again
>> +            if len(peek_ahead) < min_match and \
>> +               success_message[:peek_len] in peek_ahead:
>> +                continue
>> +
>> +            # if we see the whole success_message we are done, consume
>> +            # it and pass back so we can exit to the user
>> +            if success_message in peek_ahead:
>> +                return console.read(peek_len).decode()
>> +
>> +            # of course if we've seen enough then this line probably
>> +            # doesn't contain what we are looking for, fallback
>> +            if peek_len > min_match:
>> +                return None
>> +
>> +        except UnicodeDecodeError:
>> +            console_logger.log("error in decode of peek")
>> +            return None
>> +
>> +    # we should never get here
>> +    return None
>> +
>>     def _console_interaction(test, success_message, failure_message,
>>                            send_string, keep_sending=False, vm=None):
>> @@ -139,17 +191,52 @@ def _console_interaction(test, success_message, failure_message,
>>           vm = test.vm
>>       console = vm.console_socket.makefile(mode='rb', encoding='utf-8')
>>       console_logger = logging.getLogger('console')
>> +
>> +    # Establish the minimum number of bytes we would need to
>> +    # potentially match against success_message
>> +    if success_message is not None:
>> +        min_match = len(success_message)
>> +    else:
>> +        min_match = 0
>> +
>> +    console_logger.debug("looking for %d:(%s), sending %s (always=%s)",
>> +                         min_match, success_message, send_string, keep_sending)
>> +
>>       while True:
>> +        msg = None
>> +
>> +        # First send our string, optionally repeating the send next
>> +        # cycle.
>>           if send_string:
>>               vm.console_socket.sendall(send_string.encode())
>>               if not keep_sending:
>>                   send_string = None # send only once
>> +
>> +        # If the console has no data to read we briefly
>> +        # sleep before continuing.
>> +        if not console.readable():
>> +            time.sleep(0.1)
>> +            continue
>> +
>>           try:
>> -            msg = console.readline().decode().strip()
>> +
>> +            # First we shall peek ahead for a potential match to cover waiting
>> +            # for lines without any newlines.
>> +            if min_match > 0:
>> +                msg = _peek_ahead(console, min_match, success_message)
>> +
>> +            # otherwise we block here for a full line
>> +            if not msg:
>> +                msg = console.readline().decode().strip()
>> +
>>           except UnicodeDecodeError:
>> +            console_logger.debug("skipped unicode error")
>>               msg = None
>> +
>> +        # if nothing came out we continue and try again
>>           if not msg:
>>               continue
>> +
>>           console_logger.debug(msg)
>>           if success_message is None or success_message in msg:
>>               break
Cédric Le Goater Nov. 17, 2022, 5:14 p.m. UTC | #6
On 11/17/22 15:04, Alex Bennée wrote:
> 
> Cédric Le Goater <clg@kaod.org> writes:
> 
>> Hello Alex,
>>
>> On 11/11/22 15:55, Alex Bennée wrote:
>>> This attempts to deal with the problem of login prompts not being
>>> guaranteed to be terminated with a newline. The solution to this is to
>>> peek at the incoming data looking to see if we see an up-coming match
>>> before we fall back to the old readline() logic. The reason to mostly
>>> rely on readline is because I am occasionally seeing the peek stalling
>>> despite data being there.
>>> This seems kinda hacky and gross so I'm open to alternative
>>> approaches
>>> and cleaner python code.
>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>>
>> I have pulled this patch in the aspeed tree hoping it would improve tests:
>>
>>    AST2x00MachineSDK.test_arm_ast2500_evb_sdk
>>    AST2x00MachineSDK.test_arm_ast2600_evb_sdk
>>
>> but the failure rate has increased :/ I have seen failures in these also :
>>
>>    AST2x00Machine.test_arm_ast2500_evb_buildroot
>>    AST2x00Machine.test_arm_ast2600_evb_buildroot
>>
>> which used to be quite stable.
>>
>> Sorry, this is not of much help. the issue might be elsewhere.
> 
> Do you see what is happening in the logs? I've made a couple of tweaks
> since and it gets through the negotiation but then timesout:
> 
>    console: looking for 22:(ast2600-default login:), sending None (always=False)
>    /console: [    0.939039] aspeed-pcie 1e7700c0.pcie: [1] : tx idle timeout [0]
>    -console: [    1.385144] spi-nor spi0.1: unrecognized JEDEC id bytes: 00 00 00 00 00 00
>    console: [    1.413028] spi-nor spi2.0: unrecognized JEDEC id bytes: 00 00 00 00 00 00
>    console: [    1.617539] aspeed-rtc 1e781000.rtc: hctosys: unable to read the hardware clock
>    -console: rofs = mtd4 squashfs rwfs = mtd5 jffs2
>    /console: [   14.563495] systemd[1]: Failed to find module 'autofs4'
>    -console: [   18.034371] systemd[176]: /lib/systemd/system-generators/systemd-gpt-auto-generator failed with exit status 1.
>    |console: [FAILED] Failed to start Intel Power Control for the Host 0.
>    /console: [FAILED] Failed to start Phosphor C…istening on device /dev/ttyS2.
>    console: [DEPEND] Dependency failed for Host logger for ttyS2.
>    -console: Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) nodistro.0 ast2600-default ttyS4
>    console: ast2600-default login:
>     (always=False)g for 9:(Password:), sending root
>    console: root
>    |console: Password:
>     (always=False)g for 23:(root@ast2600-default:~#), sending 0penBmc
>    console: Login timed out after 60 seconds.
>    |console: Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) nodistro.0 ast2600-default ttyS4
>    /avocado.test:

Same for me :

console: Starting kernel ...
console: looking for 22:(ast2600-default login:), sending None (always=False)
\console: [    0.845678] aspeed-pcie 1e7700c0.pcie: [1] : tx idle timeout [0]
|console: [    1.388468] spi-nor spi0.1: unrecognized JEDEC id bytes: 00 00 00 00 00 00
console: [    1.409448] spi-nor spi2.0: unrecognized JEDEC id bytes: 00 00 00 00 00 00
console: [    1.564132] aspeed-rtc 1e781000.rtc: hctosys: unable to read the hardware clock
\console: rofs = mtd4 squashfs rwfs = mtd5 jffs2
/console: [   12.708097] systemd[1]: Failed to find module 'autofs4'
|console: [   14.757409] systemd[177]: /lib/systemd/system-generators/systemd-gpt-auto-generator failed with exit status 1.
|console: [DEPEND] Dependency failed for Host logger for ttyS2.
\console: Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) nodistro.0 ast2600-default ttyS4
console:
ast2600-default login:
  (always=False)g for 9:(Password:), sending root
console: root
-console: Password:
  (always=False)g for 23:(root@ast2600-default:~#), sending 0penBmc
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '1-build/tests/avocado/machine_aspeed.py:AST2x00MachineSDK.test_arm_ast2600_evb_sdk', 'logdir': '/home/legoater/avocado/job-results/job-2022-11-17T18.03-780... (240.28 s)
diff mbox series

Patch

diff --git a/tests/avocado/avocado_qemu/__init__.py b/tests/avocado/avocado_qemu/__init__.py
index 910f3ba1ea..20cba57161 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -131,6 +131,58 @@  def pick_default_qemu_bin(bin_prefix='qemu-system-', arch=None):
             return path
     return None
 
+def _peek_ahead(console, min_match, success_message):
+    """
+    peek ahead in the console stream keeping an eye out for the
+    success message.
+
+    Returns some message to process or None, indicating the normal
+    readline should occur.
+    """
+    console_logger = logging.getLogger('console')
+    peek_len = 0
+    retries = 0
+
+    while True:
+        try:
+            old_peek_len = peek_len
+            peek_ahead = console.peek(min_match).decode()
+            peek_len = len(peek_ahead)
+
+            # if we get stuck too long lets just fallback to readline
+            if peek_len <= old_peek_len:
+                retries += 1
+                if retries > 10:
+                    return None
+
+            # if we see a newline in the peek we can let safely bail
+            # and let the normal readline() deal with it
+            if peek_ahead.endswith(('\n', '\r')):
+                return None
+
+            # if we haven't seen enough for the whole message but the
+            # first part matches lets just loop again
+            if len(peek_ahead) < min_match and \
+               success_message[:peek_len] in peek_ahead:
+                continue
+
+            # if we see the whole success_message we are done, consume
+            # it and pass back so we can exit to the user
+            if success_message in peek_ahead:
+                return console.read(peek_len).decode()
+
+            # of course if we've seen enough then this line probably
+            # doesn't contain what we are looking for, fallback
+            if peek_len > min_match:
+                return None
+
+        except UnicodeDecodeError:
+            console_logger.log("error in decode of peek")
+            return None
+
+    # we should never get here
+    return None
+
 
 def _console_interaction(test, success_message, failure_message,
                          send_string, keep_sending=False, vm=None):
@@ -139,17 +191,52 @@  def _console_interaction(test, success_message, failure_message,
         vm = test.vm
     console = vm.console_socket.makefile(mode='rb', encoding='utf-8')
     console_logger = logging.getLogger('console')
+
+    # Establish the minimum number of bytes we would need to
+    # potentially match against success_message
+    if success_message is not None:
+        min_match = len(success_message)
+    else:
+        min_match = 0
+
+    console_logger.debug("looking for %d:(%s), sending %s (always=%s)",
+                         min_match, success_message, send_string, keep_sending)
+
     while True:
+        msg = None
+
+        # First send our string, optionally repeating the send next
+        # cycle.
         if send_string:
             vm.console_socket.sendall(send_string.encode())
             if not keep_sending:
                 send_string = None # send only once
+
+        # If the console has no data to read we briefly
+        # sleep before continuing.
+        if not console.readable():
+            time.sleep(0.1)
+            continue
+
         try:
-            msg = console.readline().decode().strip()
+
+            # First we shall peek ahead for a potential match to cover waiting
+            # for lines without any newlines.
+            if min_match > 0:
+                msg = _peek_ahead(console, min_match, success_message)
+
+            # otherwise we block here for a full line
+            if not msg:
+                msg = console.readline().decode().strip()
+
         except UnicodeDecodeError:
+            console_logger.debug("skipped unicode error")
             msg = None
+
+        # if nothing came out we continue and try again
         if not msg:
             continue
+
         console_logger.debug(msg)
         if success_message is None or success_message in msg:
             break