diff mbox series

[2/2] pty04: Retry reads when short

Message ID 20200512142824.13063-2-rpalethorpe@suse.com
State Superseded
Headers show
Series [1/2] pty04: Remove unnecessary volatile and style fix | expand

Commit Message

Richard Palethorpe May 12, 2020, 2:28 p.m. UTC
Even though reads are blocking and packets are flipped into the netdevice
buffer whole, it seems read may return before a full packet is read into user
land. Retrying read should prevent timeouts and read failures on some
machines.

Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Cc: Petr Vorel <pvorel@suse.cz>
---

NOTE: When running this test repeatedly and in parallel; strange things still
happen, see:
https://github.com/linux-test-project/ltp/issues/674#issuecomment-625181783

However this hopefully will fix the timeout issues and read failures reported
by some testers.

 testcases/kernel/pty/pty04.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

Comments

Cyril Hrubis May 12, 2020, 3:44 p.m. UTC | #1
Hi!
> +static ssize_t try_read(int fd, char *data, ssize_t size, ssize_t *n)
> +{
> +	ssize_t ret = read(fd, data, size);
> +
> +	if (ret < 0)
> +		return -(errno != EAGAIN);
> +
> +	return (*n += ret) >= size;
> +}

I had to read this piece twice, but I think it's correct.

>  static void read_netdev(const struct ldisc_info *ldisc)
>  {
> -	int rlen, plen = 0;
> +	int ret, rlen, plen = 0;
> +	ssize_t n;
>  	char *data;
>  
>  	switch (ldisc->n) {
> @@ -305,20 +316,27 @@ static void read_netdev(const struct ldisc_info *ldisc)
>  
>  	tst_res(TINFO, "Reading from socket %d", sk);
>  
> -	SAFE_READ(1, sk, data, plen);
> +	n = 0;
> +	ret = TST_RETRY_FUNC(try_read(sk, data, plen, &n), TST_RETVAL_NOTNULL);
> +	if (ret < 0)
> +		tst_brk(TBROK | TERRNO, "Read %zd of %d bytes", n, plen);

I wonder if a simple loop without exponential backoff would suffice
here. A least the code would probably be more readable.

>  	check_data(ldisc, data, plen);
>  	tst_res(TPASS, "Read netdev 1");
>  
> -	SAFE_READ(1, sk, data, plen);
> +	n = 0;
> +	ret = TST_RETRY_FUNC(try_read(sk, data, plen, &n), TST_RETVAL_NOTNULL);
> +	if (ret < 0)
> +		tst_brk(TBROK | TERRNO, "Read %zd of %d bytes", n, plen);
>  	check_data(ldisc, data, plen);
>  	tst_res(TPASS, "Read netdev 2");
>  
>  	TST_CHECKPOINT_WAKE(0);
> -	while((rlen = read(sk, data, plen)) > 0)
> +	while ((rlen = read(sk, data, plen)) > 0)
>  		check_data(ldisc, data, rlen);

This should have been part of the previous cleanup patch.

>  	tst_res(TPASS, "Reading data from netdev interrupted by hangup");
>  
> +	close(sk);
>  	tst_free_all();
>  }
>  
> @@ -357,6 +375,7 @@ static void cleanup(void)
>  {
>  	ioctl(pts, TIOCVHANGUP);
>  	ioctl(ptmx, TIOCVHANGUP);
> +	close(sk);
>  
>  	tst_reap_children();
>  }
> -- 
> 2.26.1
> 
> 
> -- 
> Mailing list info: https://lists.linux.it/listinfo/ltp
Petr Vorel May 12, 2020, 3:49 p.m. UTC | #2
Hi Richard,

> Even though reads are blocking and packets are flipped into the netdevice
> buffer whole, it seems read may return before a full packet is read into user
> land. Retrying read should prevent timeouts and read failures on some
> machines.

> Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
> Reported-by: Jan Stancek <jstancek@redhat.com>
> Cc: Petr Vorel <pvorel@suse.cz>

Reviewed-by: Petr Vorel <pvorel@suse.cz>

Thanks for taking care of this.
It's still possible to reproduce timeout just with higher -i (-i10),
but it's an improvement.

Kind regards,
Petr
Jan Stancek May 12, 2020, 7:44 p.m. UTC | #3
----- Original Message -----
> Hi Richard,
> 
> > Even though reads are blocking and packets are flipped into the netdevice
> > buffer whole, it seems read may return before a full packet is read into
> > user
> > land. Retrying read should prevent timeouts and read failures on some
> > machines.
> 
> > Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
> > Reported-by: Jan Stancek <jstancek@redhat.com>
> > Cc: Petr Vorel <pvorel@suse.cz>
> 
> Reviewed-by: Petr Vorel <pvorel@suse.cz>
> 
> Thanks for taking care of this.

ACK, I'll update github issue when this runs for a while with recent upstream kernels.

> It's still possible to reproduce timeout just with higher -i (-i10),
> but it's an improvement.
> 
> Kind regards,
> Petr
> 
>
diff mbox series

Patch

diff --git a/testcases/kernel/pty/pty04.c b/testcases/kernel/pty/pty04.c
index bfda08b2b..9b2911421 100644
--- a/testcases/kernel/pty/pty04.c
+++ b/testcases/kernel/pty/pty04.c
@@ -288,9 +288,20 @@  static void check_data(const struct ldisc_info *ldisc,
 		tst_res(TINFO, "Will continue test without data checking");
 }
 
+static ssize_t try_read(int fd, char *data, ssize_t size, ssize_t *n)
+{
+	ssize_t ret = read(fd, data, size);
+
+	if (ret < 0)
+		return -(errno != EAGAIN);
+
+	return (*n += ret) >= size;
+}
+
 static void read_netdev(const struct ldisc_info *ldisc)
 {
-	int rlen, plen = 0;
+	int ret, rlen, plen = 0;
+	ssize_t n;
 	char *data;
 
 	switch (ldisc->n) {
@@ -305,20 +316,27 @@  static void read_netdev(const struct ldisc_info *ldisc)
 
 	tst_res(TINFO, "Reading from socket %d", sk);
 
-	SAFE_READ(1, sk, data, plen);
+	n = 0;
+	ret = TST_RETRY_FUNC(try_read(sk, data, plen, &n), TST_RETVAL_NOTNULL);
+	if (ret < 0)
+		tst_brk(TBROK | TERRNO, "Read %zd of %d bytes", n, plen);
 	check_data(ldisc, data, plen);
 	tst_res(TPASS, "Read netdev 1");
 
-	SAFE_READ(1, sk, data, plen);
+	n = 0;
+	ret = TST_RETRY_FUNC(try_read(sk, data, plen, &n), TST_RETVAL_NOTNULL);
+	if (ret < 0)
+		tst_brk(TBROK | TERRNO, "Read %zd of %d bytes", n, plen);
 	check_data(ldisc, data, plen);
 	tst_res(TPASS, "Read netdev 2");
 
 	TST_CHECKPOINT_WAKE(0);
-	while((rlen = read(sk, data, plen)) > 0)
+	while ((rlen = read(sk, data, plen)) > 0)
 		check_data(ldisc, data, rlen);
 
 	tst_res(TPASS, "Reading data from netdev interrupted by hangup");
 
+	close(sk);
 	tst_free_all();
 }
 
@@ -357,6 +375,7 @@  static void cleanup(void)
 {
 	ioctl(pts, TIOCVHANGUP);
 	ioctl(ptmx, TIOCVHANGUP);
+	close(sk);
 
 	tst_reap_children();
 }