diff mbox series

syscalls/acct02: Check read size.

Message ID 20190925094721.18932-1-chrubis@suse.cz
State Accepted
Headers show
Series syscalls/acct02: Check read size. | expand

Commit Message

Cyril Hrubis Sept. 25, 2019, 9:47 a.m. UTC
This is just a minor adjustements that checks that we get correct size
from read before we attempt to validate the data we read in order to
print more meaningful failure message.

Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
---
 testcases/kernel/syscalls/acct/acct02.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Petr Vorel Sept. 25, 2019, 1:56 p.m. UTC | #1
Hi Cyril,

> This is just a minor adjustements that checks that we get correct size
> from read before we attempt to validate the data we read in order to
> print more meaningful failure message.

> Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
Reviewed-by: Petr Vorel <pvorel@suse.cz>

>  testcases/kernel/syscalls/acct/acct02.c | 16 ++++++++++++++--
...
>  	do {
> -		tst_res(TINFO, "== entry %d ==", ++i);
> -
>  		read_bytes = SAFE_READ(0, fd, &acct_struct, acct_size);

> +		if (i == 0 && read_bytes == 0) {
> +			tst_res(TFAIL, "acct file is empty");
> +			goto exit;
> +		}
> +
> +		if (read_bytes != acct_size) {
> +			tst_res(TFAIL, "incomplete read %i bytes, expected %i",
> +			        read_bytes, acct_size);
> +			goto exit;
> +		}
> +
> +		tst_res(TINFO, "== entry %d ==", ++i);
> +
>  		if (v3)
>  			ret = verify_acct(&acct_struct.v3, acct_struct.v3.ac_etime);
>  		else
> @@ -192,6 +203,7 @@ static void run(void)
>  	else
>  		tst_res(TPASS, "acct() wrote correct file contents!");

> +exit:
>  	SAFE_CLOSE(fd);
>  }

OK, on s390 where it's failing it actually catches "incomplete read":

tst_kconfig.c:62: INFO: Parsing kernel config '/proc/config.gz'
tst_test.c:1118: INFO: Timeout per run is 0h 05m 00s
tst_kconfig.c:62: INFO: Parsing kernel config '/proc/config.gz'
acct02.c:234: INFO: Verifying using 'struct acct_v3'
acct02.c:187: INFO: == entry 1 ==
acct02.c:147: INFO: ac_version != 3 (131)
acct02.c:187: INFO: == entry 2 ==
acct02.c:82: INFO: ac_comm != 'acct02_helper' ('acct02')
acct02.c:133: INFO: ac_exitcode != 32768 (0)
acct02.c:141: INFO: ac_ppid != 54783 (54781)
acct02.c:147: INFO: ac_version != 3 (131)
acct02.c:183: FAIL: incomplete read 0 bytes, expected 64

Kind regards,
Petr
Cyril Hrubis Sept. 25, 2019, 4:14 p.m. UTC | #2
Hi!
> >  	do {
> > -		tst_res(TINFO, "== entry %d ==", ++i);
> > -
> >  		read_bytes = SAFE_READ(0, fd, &acct_struct, acct_size);
> 
> > +		if (i == 0 && read_bytes == 0) {
> > +			tst_res(TFAIL, "acct file is empty");
> > +			goto exit;
> > +		}
> > +
> > +		if (read_bytes != acct_size) {
> > +			tst_res(TFAIL, "incomplete read %i bytes, expected %i",
> > +			        read_bytes, acct_size);
> > +			goto exit;
> > +		}
> > +
> > +		tst_res(TINFO, "== entry %d ==", ++i);
> > +
> >  		if (v3)
> >  			ret = verify_acct(&acct_struct.v3, acct_struct.v3.ac_etime);
> >  		else
> > @@ -192,6 +203,7 @@ static void run(void)
> >  	else
> >  		tst_res(TPASS, "acct() wrote correct file contents!");
> 
> > +exit:
> >  	SAFE_CLOSE(fd);
> >  }
> 
> OK, on s390 where it's failing it actually catches "incomplete read":
> 
> tst_kconfig.c:62: INFO: Parsing kernel config '/proc/config.gz'
> tst_test.c:1118: INFO: Timeout per run is 0h 05m 00s
> tst_kconfig.c:62: INFO: Parsing kernel config '/proc/config.gz'
> acct02.c:234: INFO: Verifying using 'struct acct_v3'
> acct02.c:187: INFO: == entry 1 ==
> acct02.c:147: INFO: ac_version != 3 (131)
> acct02.c:187: INFO: == entry 2 ==
> acct02.c:82: INFO: ac_comm != 'acct02_helper' ('acct02')
> acct02.c:133: INFO: ac_exitcode != 32768 (0)
> acct02.c:141: INFO: ac_ppid != 54783 (54781)
> acct02.c:147: INFO: ac_version != 3 (131)
> acct02.c:183: FAIL: incomplete read 0 bytes, expected 64

Isn't that just end-of-file? I guess that we should just break the loop
if i != 0 and read returned 0.
Cyril Hrubis Sept. 27, 2019, 11:38 a.m. UTC | #3
Hi!
I've just pushed this patch + one more special case for read_bytes == 0,
can you please give it a try and check that at least the test works
where it was working previously?
Petr Vorel Sept. 27, 2019, 12:48 p.m. UTC | #4
Hi Cyril,

> Hi!
> I've just pushed this patch + one more special case for read_bytes == 0,
> can you please give it a try and check that at least the test works
> where it was working previously?

Yep, confirm that it's working as expected :).

Kind regards,
Petr
Jan Stancek Oct. 30, 2019, 9:06 a.m. UTC | #5
----- Original Message -----
> 
> OK, on s390 where it's failing it actually catches "incomplete read":
> 
> tst_kconfig.c:62: INFO: Parsing kernel config '/proc/config.gz'
> tst_test.c:1118: INFO: Timeout per run is 0h 05m 00s
> tst_kconfig.c:62: INFO: Parsing kernel config '/proc/config.gz'
> acct02.c:234: INFO: Verifying using 'struct acct_v3'
> acct02.c:187: INFO: == entry 1 ==
> acct02.c:147: INFO: ac_version != 3 (131)
> acct02.c:187: INFO: == entry 2 ==
> acct02.c:82: INFO: ac_comm != 'acct02_helper' ('acct02')
> acct02.c:133: INFO: ac_exitcode != 32768 (0)
> acct02.c:141: INFO: ac_ppid != 54783 (54781)
> acct02.c:147: INFO: ac_version != 3 (131)
> acct02.c:183: FAIL: incomplete read 0 bytes, expected 64

I'm seeing similar failure:

tst_kconfig.c:62: INFO: Parsing kernel config '/boot/config-3.10.0-1106.el7.cki.s390x'
tst_test.c:1118: INFO: Timeout per run is 0h 05m 00s
tst_kconfig.c:62: INFO: Parsing kernel config '/boot/config-3.10.0-1106.el7.cki.s390x'
acct02.c:239: INFO: Verifying using 'struct acct_v3'
acct02.c:192: INFO: == entry 1 ==
acct02.c:88: INFO: ac_btime < 1572367954 (1572367953)
acct02.c:192: INFO: == entry 2 ==
acct02.c:82: INFO: ac_comm != 'acct02_helper' ('acct02')
acct02.c:88: INFO: ac_btime < 1572367954 (1572367953)
acct02.c:133: INFO: ac_exitcode != 32768 (0)
acct02.c:141: INFO: ac_ppid != 25608 (25607)
acct02.c:182: FAIL: end of file reached

but I don't get what process is this record from.
I see 3 pids: library, main and 'acct02_helper'. What 'acct02' process
terminated?
Jan Stancek Oct. 30, 2019, 1:03 p.m. UTC | #6
----- Original Message -----
> 
> 
> ----- Original Message -----
> > 
> > OK, on s390 where it's failing it actually catches "incomplete read":
> > 
> > tst_kconfig.c:62: INFO: Parsing kernel config '/proc/config.gz'
> > tst_test.c:1118: INFO: Timeout per run is 0h 05m 00s
> > tst_kconfig.c:62: INFO: Parsing kernel config '/proc/config.gz'
> > acct02.c:234: INFO: Verifying using 'struct acct_v3'
> > acct02.c:187: INFO: == entry 1 ==
> > acct02.c:147: INFO: ac_version != 3 (131)
> > acct02.c:187: INFO: == entry 2 ==
> > acct02.c:82: INFO: ac_comm != 'acct02_helper' ('acct02')
> > acct02.c:133: INFO: ac_exitcode != 32768 (0)
> > acct02.c:141: INFO: ac_ppid != 54783 (54781)
> > acct02.c:147: INFO: ac_version != 3 (131)
> > acct02.c:183: FAIL: incomplete read 0 bytes, expected 64
> 
> I'm seeing similar failure:
> 
> tst_kconfig.c:62: INFO: Parsing kernel config
> '/boot/config-3.10.0-1106.el7.cki.s390x'
> tst_test.c:1118: INFO: Timeout per run is 0h 05m 00s
> tst_kconfig.c:62: INFO: Parsing kernel config
> '/boot/config-3.10.0-1106.el7.cki.s390x'
> acct02.c:239: INFO: Verifying using 'struct acct_v3'
> acct02.c:192: INFO: == entry 1 ==
> acct02.c:88: INFO: ac_btime < 1572367954 (1572367953)

^^ This is 'acct02_helper', which seemingly started before 'start_time'.

ac_btime is calculated back from current time and elapsed time at fill_ac():

  ac->ac_btime = get_seconds() - elapsed;

s390s are slower, so my guess is another second ticks before btime gets calculated,
and it appears to have started later.

> acct02.c:192: INFO: == entry 2 ==
> acct02.c:82: INFO: ac_comm != 'acct02_helper' ('acct02')
> acct02.c:88: INFO: ac_btime < 1572367954 (1572367953)
> acct02.c:133: INFO: ac_exitcode != 32768 (0)
> acct02.c:141: INFO: ac_ppid != 25608 (25607)

The 2nd record is filled at acct(NULL), even though process still runs:

[ 1156.025654] CPU: 0 PID: 11772 Comm: acct02_helper Tainted: G           OE  ------------   3.10.0-1106.el7.s390x #1
<>
[ 1156.027105]  [<00000000001dd630>] fill_ac+0x0/0x440
[ 1156.027108] ([<00000000001ddae8>] do_acct_process+0x78/0x1a0)
[ 1156.027110]  [<00000000001de3ae>] acct_process+0xf6/0x140
[ 1156.027112]  [<000000000014de32>] do_exit+0x752/0xa38
[ 1156.027116]  [<000000000014e216>] do_group_exit+0x66/0xf8
[ 1156.027118]  [<000000000014e2ea>] SyS_exit_group+0x42/0x48

[ 1157.033294] CPU: 1 PID: 11771 Comm: acct02 Tainted: G           OE  ------------   3.10.0-1106.el7.s390x #1
<>
[ 1157.033384]  [<00000000001dd630>] fill_ac+0x0/0x440
[ 1157.033386] ([<00000000001ddae8>] do_acct_process+0x78/0x1a0)
[ 1157.033389]  [<00000000001ddcda>] acct_pin_kill+0x52/0xa8
[ 1157.033391]  [<000000000034a83e>] pin_kill+0xb6/0x178
[ 1157.033395]  [<00000000001ddfe8>] SyS_acct+0x2b8/0x350
Petr Vorel Oct. 30, 2019, 2:46 p.m. UTC | #7
Hi Jan,

> > I'm seeing similar failure:

> > tst_kconfig.c:62: INFO: Parsing kernel config
> > '/boot/config-3.10.0-1106.el7.cki.s390x'
> > tst_test.c:1118: INFO: Timeout per run is 0h 05m 00s
> > tst_kconfig.c:62: INFO: Parsing kernel config
> > '/boot/config-3.10.0-1106.el7.cki.s390x'
> > acct02.c:239: INFO: Verifying using 'struct acct_v3'
> > acct02.c:192: INFO: == entry 1 ==
> > acct02.c:88: INFO: ac_btime < 1572367954 (1572367953)

> ^^ This is 'acct02_helper', which seemingly started before 'start_time'.

> ac_btime is calculated back from current time and elapsed time at fill_ac():

>   ac->ac_btime = get_seconds() - elapsed;

> s390s are slower, so my guess is another second ticks before btime gets calculated,
> and it appears to have started later.
Agree.

> > acct02.c:192: INFO: == entry 2 ==
> > acct02.c:82: INFO: ac_comm != 'acct02_helper' ('acct02')
> > acct02.c:88: INFO: ac_btime < 1572367954 (1572367953)
> > acct02.c:133: INFO: ac_exitcode != 32768 (0)
> > acct02.c:141: INFO: ac_ppid != 25608 (25607)

> The 2nd record is filled at acct(NULL), even though process still runs:

> [ 1156.025654] CPU: 0 PID: 11772 Comm: acct02_helper Tainted: G           OE  ------------   3.10.0-1106.el7.s390x #1
> <>
> [ 1156.027105]  [<00000000001dd630>] fill_ac+0x0/0x440
> [ 1156.027108] ([<00000000001ddae8>] do_acct_process+0x78/0x1a0)
> [ 1156.027110]  [<00000000001de3ae>] acct_process+0xf6/0x140
> [ 1156.027112]  [<000000000014de32>] do_exit+0x752/0xa38
> [ 1156.027116]  [<000000000014e216>] do_group_exit+0x66/0xf8
> [ 1156.027118]  [<000000000014e2ea>] SyS_exit_group+0x42/0x48

> [ 1157.033294] CPU: 1 PID: 11771 Comm: acct02 Tainted: G           OE  ------------   3.10.0-1106.el7.s390x #1
> <>
> [ 1157.033384]  [<00000000001dd630>] fill_ac+0x0/0x440
> [ 1157.033386] ([<00000000001ddae8>] do_acct_process+0x78/0x1a0)
> [ 1157.033389]  [<00000000001ddcda>] acct_pin_kill+0x52/0xa8
> [ 1157.033391]  [<000000000034a83e>] pin_kill+0xb6/0x178
> [ 1157.033395]  [<00000000001ddfe8>] SyS_acct+0x2b8/0x350

Hm, did you reproduce it on 4.18 (rhel8)?

Kind regards,
Petr
Jan Stancek Oct. 30, 2019, 5:41 p.m. UTC | #8
----- Original Message -----
> Hi Jan,
> 
> > > I'm seeing similar failure:
> 
> > > tst_kconfig.c:62: INFO: Parsing kernel config
> > > '/boot/config-3.10.0-1106.el7.cki.s390x'
> > > tst_test.c:1118: INFO: Timeout per run is 0h 05m 00s
> > > tst_kconfig.c:62: INFO: Parsing kernel config
> > > '/boot/config-3.10.0-1106.el7.cki.s390x'
> > > acct02.c:239: INFO: Verifying using 'struct acct_v3'
> > > acct02.c:192: INFO: == entry 1 ==
> > > acct02.c:88: INFO: ac_btime < 1572367954 (1572367953)
> 
> > ^^ This is 'acct02_helper', which seemingly started before 'start_time'.
> 
> > ac_btime is calculated back from current time and elapsed time at
> > fill_ac():
> 
> >   ac->ac_btime = get_seconds() - elapsed;
> 
> > s390s are slower, so my guess is another second ticks before btime gets
> > calculated,
> > and it appears to have started later.
> Agree.
> 
> > > acct02.c:192: INFO: == entry 2 ==
> > > acct02.c:82: INFO: ac_comm != 'acct02_helper' ('acct02')
> > > acct02.c:88: INFO: ac_btime < 1572367954 (1572367953)
> > > acct02.c:133: INFO: ac_exitcode != 32768 (0)
> > > acct02.c:141: INFO: ac_ppid != 25608 (25607)
> 
> > The 2nd record is filled at acct(NULL), even though process still runs:
> 
> > [ 1156.025654] CPU: 0 PID: 11772 Comm: acct02_helper Tainted: G
> > OE  ------------   3.10.0-1106.el7.s390x #1
> > <>
> > [ 1156.027105]  [<00000000001dd630>] fill_ac+0x0/0x440
> > [ 1156.027108] ([<00000000001ddae8>] do_acct_process+0x78/0x1a0)
> > [ 1156.027110]  [<00000000001de3ae>] acct_process+0xf6/0x140
> > [ 1156.027112]  [<000000000014de32>] do_exit+0x752/0xa38
> > [ 1156.027116]  [<000000000014e216>] do_group_exit+0x66/0xf8
> > [ 1156.027118]  [<000000000014e2ea>] SyS_exit_group+0x42/0x48
> 
> > [ 1157.033294] CPU: 1 PID: 11771 Comm: acct02 Tainted: G           OE
> > ------------   3.10.0-1106.el7.s390x #1
> > <>
> > [ 1157.033384]  [<00000000001dd630>] fill_ac+0x0/0x440
> > [ 1157.033386] ([<00000000001ddae8>] do_acct_process+0x78/0x1a0)
> > [ 1157.033389]  [<00000000001ddcda>] acct_pin_kill+0x52/0xa8
> > [ 1157.033391]  [<000000000034a83e>] pin_kill+0xb6/0x178
> > [ 1157.033395]  [<00000000001ddfe8>] SyS_acct+0x2b8/0x350
> 
> Hm, did you reproduce it on 4.18 (rhel8)?

Above was RHEL7. Do you expect different outcome on RHEL8?
I was looking at upstream sources too and it looked similar.

Anyway, I don't think this part is an issue, test will skip it
because "comm" doesn't match.

We can tweak 'ac_btime' condition, but I think the test will remain
race-y in environments with higher steal time:

acct02.c:239: INFO: Verifying using 'struct acct_v3'
acct02.c:192: INFO: == entry 1 ==
acct02.c:127: INFO: elap_time/clock_ticks >= 2 (236/100: 2.00)
acct02.c:192: INFO: == entry 2 ==
acct02.c:82: INFO: ac_comm != 'acct02_helper' ('acct02')
acct02.c:127: INFO: elap_time/clock_ticks >= 2 (236/100: 2.00)
acct02.c:133: INFO: ac_exitcode != 32768 (0)
acct02.c:141: INFO: ac_ppid != 43213 (43212)
acct02.c:182: FAIL: end of file reached
Li Wang Oct. 31, 2019, 8:43 a.m. UTC | #9
On Thu, Oct 31, 2019 at 1:41 AM Jan Stancek <jstancek@redhat.com> wrote:

>
>
> ----- Original Message -----
> > Hi Jan,
> >
> > > > I'm seeing similar failure:
> >
> > > > tst_kconfig.c:62: INFO: Parsing kernel config
> > > > '/boot/config-3.10.0-1106.el7.cki.s390x'
> > > > tst_test.c:1118: INFO: Timeout per run is 0h 05m 00s
> > > > tst_kconfig.c:62: INFO: Parsing kernel config
> > > > '/boot/config-3.10.0-1106.el7.cki.s390x'
> > > > acct02.c:239: INFO: Verifying using 'struct acct_v3'
> > > > acct02.c:192: INFO: == entry 1 ==
> > > > acct02.c:88: INFO: ac_btime < 1572367954 (1572367953)
> >
> > > ^^ This is 'acct02_helper', which seemingly started before
> 'start_time'.
> >
> > > ac_btime is calculated back from current time and elapsed time at
> > > fill_ac():
> >
> > >   ac->ac_btime = get_seconds() - elapsed;
> >
> > > s390s are slower, so my guess is another second ticks before btime gets
> > > calculated,
> > > and it appears to have started later.
> > Agree.
> >
> > > > acct02.c:192: INFO: == entry 2 ==
> > > > acct02.c:82: INFO: ac_comm != 'acct02_helper' ('acct02')
> > > > acct02.c:88: INFO: ac_btime < 1572367954 (1572367953)
> > > > acct02.c:133: INFO: ac_exitcode != 32768 (0)
> > > > acct02.c:141: INFO: ac_ppid != 25608 (25607)
> >
> > > The 2nd record is filled at acct(NULL), even though process still runs:
> >
> > > [ 1156.025654] CPU: 0 PID: 11772 Comm: acct02_helper Tainted: G
> > > OE  ------------   3.10.0-1106.el7.s390x #1
> > > <>
> > > [ 1156.027105]  [<00000000001dd630>] fill_ac+0x0/0x440
> > > [ 1156.027108] ([<00000000001ddae8>] do_acct_process+0x78/0x1a0)
> > > [ 1156.027110]  [<00000000001de3ae>] acct_process+0xf6/0x140
> > > [ 1156.027112]  [<000000000014de32>] do_exit+0x752/0xa38
> > > [ 1156.027116]  [<000000000014e216>] do_group_exit+0x66/0xf8
> > > [ 1156.027118]  [<000000000014e2ea>] SyS_exit_group+0x42/0x48
> >
> > > [ 1157.033294] CPU: 1 PID: 11771 Comm: acct02 Tainted: G           OE
> > > ------------   3.10.0-1106.el7.s390x #1
> > > <>
> > > [ 1157.033384]  [<00000000001dd630>] fill_ac+0x0/0x440
> > > [ 1157.033386] ([<00000000001ddae8>] do_acct_process+0x78/0x1a0)
> > > [ 1157.033389]  [<00000000001ddcda>] acct_pin_kill+0x52/0xa8
> > > [ 1157.033391]  [<000000000034a83e>] pin_kill+0xb6/0x178
> > > [ 1157.033395]  [<00000000001ddfe8>] SyS_acct+0x2b8/0x350
> >
> > Hm, did you reproduce it on 4.18 (rhel8)?
>
> Above was RHEL7. Do you expect different outcome on RHEL8?
> I was looking at upstream sources too and it looked similar.
>
> Anyway, I don't think this part is an issue, test will skip it
> because "comm" doesn't match.
>
> We can tweak 'ac_btime' condition, but I think the test will remain
>

How much time of the ac_btime shaking to be tolerated is proper?

> race-y in environments with higher steal time:
>

I'm sorry, what does it mean here?



>
> acct02.c:239: INFO: Verifying using 'struct acct_v3'
> acct02.c:192: INFO: == entry 1 ==
> acct02.c:127: INFO: elap_time/clock_ticks >= 2 (236/100: 2.00)
> acct02.c:192: INFO: == entry 2 ==
> acct02.c:82: INFO: ac_comm != 'acct02_helper' ('acct02')
> acct02.c:127: INFO: elap_time/clock_ticks >= 2 (236/100: 2.00)
> acct02.c:133: INFO: ac_exitcode != 32768 (0)
> acct02.c:141: INFO: ac_ppid != 43213 (43212)
> acct02.c:182: FAIL: end of file reached
>
>
> --
> Mailing list info: https://lists.linux.it/listinfo/ltp
>
Jan Stancek Oct. 31, 2019, 8:53 a.m. UTC | #10
----- Original Message -----
> > Above was RHEL7. Do you expect different outcome on RHEL8?
> > I was looking at upstream sources too and it looked similar.
> >
> > Anyway, I don't think this part is an issue, test will skip it
> > because "comm" doesn't match.
> >
> > We can tweak 'ac_btime' condition, but I think the test will remain
> >
> 
> How much time of the ac_btime shaking to be tolerated is proper?

I wanted to go with 1 sec:

@@ -83,7 +83,7 @@ static int verify_acct(void *acc, int elap_time)
                ret = 1;
        }
 
-       if (ACCT_MEMBER(ac_btime) < start_time) {
+       if (ACCT_MEMBER(ac_btime) + 1 < start_time) {
                tst_res(TINFO, "ac_btime < %d (%d)", start_time,
                        ACCT_MEMBER(ac_btime));
                ret = 1;

> 
> > race-y in environments with higher steal time:
> >
> 
> I'm sorry, what does it mean here?

I was referring to failed check below. s390 is scarce resource
and when over-committed, sleeps are less consistent.
E.g. sleep(1) taking 2+ seconds:
  nanosleep({1, 0}, NULL) = 0 <1.926617>

> 
> 
> 
> >
> > acct02.c:239: INFO: Verifying using 'struct acct_v3'
> > acct02.c:192: INFO: == entry 1 ==
> > acct02.c:127: INFO: elap_time/clock_ticks >= 2 (236/100: 2.00)
> > acct02.c:192: INFO: == entry 2 ==
> > acct02.c:82: INFO: ac_comm != 'acct02_helper' ('acct02')
> > acct02.c:127: INFO: elap_time/clock_ticks >= 2 (236/100: 2.00)
> > acct02.c:133: INFO: ac_exitcode != 32768 (0)
> > acct02.c:141: INFO: ac_ppid != 43213 (43212)
> > acct02.c:182: FAIL: end of file reached
Petr Vorel Oct. 31, 2019, 9:06 a.m. UTC | #11
Hi Jan,

> > Hm, did you reproduce it on 4.18 (rhel8)?

> Above was RHEL7. Do you expect different outcome on RHEL8?
> I was looking at upstream sources too and it looked similar.
I know it was RHEL7. Good it's not affected by old kernel code,
but rather by overloaded host. Agree, that some tolerance is needed.

> Anyway, I don't think this part is an issue, test will skip it
> because "comm" doesn't match.

> We can tweak 'ac_btime' condition, but I think the test will remain
> race-y in environments with higher steal time:

> acct02.c:239: INFO: Verifying using 'struct acct_v3'
> acct02.c:192: INFO: == entry 1 ==
> acct02.c:127: INFO: elap_time/clock_ticks >= 2 (236/100: 2.00)
> acct02.c:192: INFO: == entry 2 ==
> acct02.c:82: INFO: ac_comm != 'acct02_helper' ('acct02')
> acct02.c:127: INFO: elap_time/clock_ticks >= 2 (236/100: 2.00)
> acct02.c:133: INFO: ac_exitcode != 32768 (0)
> acct02.c:141: INFO: ac_ppid != 43213 (43212)
> acct02.c:182: FAIL: end of file reached

Kind regards,
Petr
Petr Vorel Oct. 31, 2019, 9:09 a.m. UTC | #12
Hi Jan,

> > How much time of the ac_btime shaking to be tolerated is proper?

> I wanted to go with 1 sec:

> @@ -83,7 +83,7 @@ static int verify_acct(void *acc, int elap_time)
>                 ret = 1;
>         }

> -       if (ACCT_MEMBER(ac_btime) < start_time) {
> +       if (ACCT_MEMBER(ac_btime) + 1 < start_time) {
>                 tst_res(TINFO, "ac_btime < %d (%d)", start_time,
>                         ACCT_MEMBER(ac_btime));
>                 ret = 1;

Ack.

> > > race-y in environments with higher steal time:


> > I'm sorry, what does it mean here?

> I was referring to failed check below. s390 is scarce resource
> and when over-committed, sleeps are less consistent.
> E.g. sleep(1) taking 2+ seconds:
>   nanosleep({1, 0}, NULL) = 0 <1.926617>

Lol :)

Kind regards,
Petr
Li Wang Nov. 1, 2019, 6:03 a.m. UTC | #13
On Thu, Oct 31, 2019 at 4:53 PM Jan Stancek <jstancek@redhat.com> wrote:

>
> ----- Original Message -----
> > > Above was RHEL7. Do you expect different outcome on RHEL8?
> > > I was looking at upstream sources too and it looked similar.
> > >
> > > Anyway, I don't think this part is an issue, test will skip it
> > > because "comm" doesn't match.
> > >
> > > We can tweak 'ac_btime' condition, but I think the test will remain
> > >
> >
> > How much time of the ac_btime shaking to be tolerated is proper?
>
> I wanted to go with 1 sec:
>

I'm ok with that. Seems no better way so far.


>
> @@ -83,7 +83,7 @@ static int verify_acct(void *acc, int elap_time)
>                 ret = 1;
>         }
>
> -       if (ACCT_MEMBER(ac_btime) < start_time) {
> +       if (ACCT_MEMBER(ac_btime) + 1 < start_time) {
>                 tst_res(TINFO, "ac_btime < %d (%d)", start_time,
>                         ACCT_MEMBER(ac_btime));
>                 ret = 1;
>
> >
> > > race-y in environments with higher steal time:
> > >
> >
> > I'm sorry, what does it mean here?
>
> I was referring to failed check below. s390 is scarce resource
> and when over-committed, sleeps are less consistent.
> E.g. sleep(1) taking 2+ seconds:
>   nanosleep({1, 0}, NULL) = 0 <1.926617>
>

Okay.

>
> >
> >
> >
> > >
> > > acct02.c:239: INFO: Verifying using 'struct acct_v3'
> > > acct02.c:192: INFO: == entry 1 ==
> > > acct02.c:127: INFO: elap_time/clock_ticks >= 2 (236/100: 2.00)
> > > acct02.c:192: INFO: == entry 2 ==
> > > acct02.c:82: INFO: ac_comm != 'acct02_helper' ('acct02')
> > > acct02.c:127: INFO: elap_time/clock_ticks >= 2 (236/100: 2.00)
> > > acct02.c:133: INFO: ac_exitcode != 32768 (0)
> > > acct02.c:141: INFO: ac_ppid != 43213 (43212)
> > > acct02.c:182: FAIL: end of file reached
>
Jan Stancek Nov. 1, 2019, 8:54 a.m. UTC | #14
----- Original Message -----
> On Thu, Oct 31, 2019 at 4:53 PM Jan Stancek <jstancek@redhat.com> wrote:
> 
> >
> > ----- Original Message -----
> > > > Above was RHEL7. Do you expect different outcome on RHEL8?
> > > > I was looking at upstream sources too and it looked similar.
> > > >
> > > > Anyway, I don't think this part is an issue, test will skip it
> > > > because "comm" doesn't match.
> > > >
> > > > We can tweak 'ac_btime' condition, but I think the test will remain
> > > >
> > >
> > > How much time of the ac_btime shaking to be tolerated is proper?
> >
> > I wanted to go with 1 sec:
> >
> 
> I'm ok with that. Seems no better way so far.

I want to look at this a bit more, and add some extra traces to kernel.
diff mbox series

Patch

diff --git a/testcases/kernel/syscalls/acct/acct02.c b/testcases/kernel/syscalls/acct/acct02.c
index 110d99359..bdb7c1194 100644
--- a/testcases/kernel/syscalls/acct/acct02.c
+++ b/testcases/kernel/syscalls/acct/acct02.c
@@ -171,10 +171,21 @@  static void run(void)
 	acct(NULL);
 
 	do {
-		tst_res(TINFO, "== entry %d ==", ++i);
-
 		read_bytes = SAFE_READ(0, fd, &acct_struct, acct_size);
 
+		if (i == 0 && read_bytes == 0) {
+			tst_res(TFAIL, "acct file is empty");
+			goto exit;
+		}
+
+		if (read_bytes != acct_size) {
+			tst_res(TFAIL, "incomplete read %i bytes, expected %i",
+			        read_bytes, acct_size);
+			goto exit;
+		}
+
+		tst_res(TINFO, "== entry %d ==", ++i);
+
 		if (v3)
 			ret = verify_acct(&acct_struct.v3, acct_struct.v3.ac_etime);
 		else
@@ -192,6 +203,7 @@  static void run(void)
 	else
 		tst_res(TPASS, "acct() wrote correct file contents!");
 
+exit:
 	SAFE_CLOSE(fd);
 }