Message ID | 20201125101633.30154-1-liwang@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | fzsync: skip test when avaliable CPUs less than 2 | expand |
Hello Li, Li Wang <liwang@redhat.com> writes: > It makes no sense to run parallel thread to simulate race conditions on > system with CPU number less than two, especially for kvm guest, it does > not have any chance to get real parallel running and probably encounter > failure as below: Most of the tests using FuzzySync do not need true parallism. We were able to reproduce a number of race conditions on a single vCPU. Infact it may actually benefit some races because one thread has to pause to allow the other to run, perhaps creating a huge race window. > > === 100% reproducible on a 1cpu guest === > > cmdline="af_alg07" > contacts="" > analysis=exit > <<<test_output>>> > tst_test.c:1248: TINFO: Timeout per run is 0h 05m 00s > ../../../include/tst_fuzzy_sync.h:507: TINFO: Minimum sampling period ended > ../../../include/tst_fuzzy_sync.h:330: TINFO: loop = 1024, delay_bias = 0 > ../../../include/tst_fuzzy_sync.h:318: TINFO: start_a - start_b: { avg = -137522ns, avg_dev = 854248ns, dev_ratio = 6.21 } > ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - start_a : { avg = 1915ns, avg_dev = 535ns, dev_ratio = 0.28 } > ../../../include/tst_fuzzy_sync.h:318: TINFO: end_b - start_b : { avg = 1885ns, avg_dev = 42ns, dev_ratio = 0.02 } > ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - end_b : { avg = -137492ns, avg_dev = 854818ns, dev_ratio = 6.22 } > ../../../include/tst_fuzzy_sync.h:318: TINFO: spins : { avg = 554786 , avg_dev = 7355 , dev_ratio = 0.01 } > ../../../include/tst_fuzzy_sync.h:636: TINFO: Exceeded execution time, requesting exit > af_alg07.c:96: TFAIL: fchownat() failed to fail, kernel may be vulnerable > > Signed-off-by: Li Wang <liwang@redhat.com> > CC: Richard Palethorpe <rpalethorpe@suse.de> > --- > include/tst_fuzzy_sync.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h > index 4141f5c64..2e864b312 100644 > --- a/include/tst_fuzzy_sync.h > +++ b/include/tst_fuzzy_sync.h > @@ -281,6 +281,9 @@ static void tst_init_stat(struct tst_fzsync_stat *s) > static void tst_fzsync_pair_reset(struct tst_fzsync_pair *pair, > void *(*run_b)(void *)) > { > + if (get_nprocs() < 2) > + tst_brk(TCONF, "Fuzzy Sync requires at least two CPUs available"); > + > tst_fzsync_pair_cleanup(pair); > > tst_init_stat(&pair->diff_ss); Perhaps this test would pass with more loops and a big enough delay range, but this is also wasting time on a single vCPU. I'm not sure whether we should filter this test at the LTP level; it may trigger the bug on some single CPU configs. Why not print a warning instead of refusing to run?
Hi! > Perhaps this test would pass with more loops and a big enough delay > range, but this is also wasting time on a single vCPU. I'm not sure > whether we should filter this test at the LTP level; it may trigger the > bug on some single CPU configs. > > Why not print a warning instead of refusing to run? That's not a solution either, warning would end up in a test results as well. I guess that we can add something as .min_cpus to the tst_test structure and set it for this test?
On 25. 11. 20 12:22, Richard Palethorpe wrote: > Hello Li, > > Li Wang <liwang@redhat.com> writes: > >> It makes no sense to run parallel thread to simulate race conditions on >> system with CPU number less than two, especially for kvm guest, it does >> not have any chance to get real parallel running and probably encounter >> failure as below: > > Most of the tests using FuzzySync do not need true parallism. We were > able to reproduce a number of race conditions on a single vCPU. Infact > it may actually benefit some races because one thread has to pause to > allow the other to run, perhaps creating a huge race window. > >> >> === 100% reproducible on a 1cpu guest === >> >> cmdline="af_alg07" >> contacts="" >> analysis=exit >> <<<test_output>>> >> tst_test.c:1248: TINFO: Timeout per run is 0h 05m 00s >> ../../../include/tst_fuzzy_sync.h:507: TINFO: Minimum sampling period ended >> ../../../include/tst_fuzzy_sync.h:330: TINFO: loop = 1024, delay_bias = 0 >> ../../../include/tst_fuzzy_sync.h:318: TINFO: start_a - start_b: { avg = -137522ns, avg_dev = 854248ns, dev_ratio = 6.21 } >> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - start_a : { avg = 1915ns, avg_dev = 535ns, dev_ratio = 0.28 } >> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_b - start_b : { avg = 1885ns, avg_dev = 42ns, dev_ratio = 0.02 } >> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - end_b : { avg = -137492ns, avg_dev = 854818ns, dev_ratio = 6.22 } >> ../../../include/tst_fuzzy_sync.h:318: TINFO: spins : { avg = 554786 , avg_dev = 7355 , dev_ratio = 0.01 } >> ../../../include/tst_fuzzy_sync.h:636: TINFO: Exceeded execution time, requesting exit >> af_alg07.c:96: TFAIL: fchownat() failed to fail, kernel may be vulnerable >> >> Signed-off-by: Li Wang <liwang@redhat.com> >> CC: Richard Palethorpe <rpalethorpe@suse.de> >> --- >> include/tst_fuzzy_sync.h | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h >> index 4141f5c64..2e864b312 100644 >> --- a/include/tst_fuzzy_sync.h >> +++ b/include/tst_fuzzy_sync.h >> @@ -281,6 +281,9 @@ static void tst_init_stat(struct tst_fzsync_stat *s) >> static void tst_fzsync_pair_reset(struct tst_fzsync_pair *pair, >> void *(*run_b)(void *)) >> { >> + if (get_nprocs() < 2) >> + tst_brk(TCONF, "Fuzzy Sync requires at least two CPUs available"); >> + >> tst_fzsync_pair_cleanup(pair); >> >> tst_init_stat(&pair->diff_ss); > > Perhaps this test would pass with more loops and a big enough delay > range, but this is also wasting time on a single vCPU. I'm not sure > whether we should filter this test at the LTP level; it may trigger the > bug on some single CPU configs. No, af_alg07 requires 2 CPUs, otherwise it'll report false positives. The test will pass only if fchownat() hits a half-closed socket and returns error. But IIRC the half-closed socket will be destroyed during reschedule which means there's no race window to hit anymore. But it would be better to put the TCONF condition into the test itself.
On 25. 11. 20 12:54, Cyril Hrubis wrote: > Hi! >> Perhaps this test would pass with more loops and a big enough delay >> range, but this is also wasting time on a single vCPU. I'm not sure >> whether we should filter this test at the LTP level; it may trigger the >> bug on some single CPU configs. >> >> Why not print a warning instead of refusing to run? > > That's not a solution either, warning would end up in a test results as > well. > > I guess that we can add something as .min_cpus to the tst_test structure > and set it for this test? +1 for .min_cpus from me.
On Wed, Nov 25, 2020 at 7:56 PM Martin Doucha <mdoucha@suse.cz> wrote: > On 25. 11. 20 12:22, Richard Palethorpe wrote: > > Hello Li, > > > > Li Wang <liwang@redhat.com> writes: > > > >> It makes no sense to run parallel thread to simulate race conditions on > >> system with CPU number less than two, especially for kvm guest, it does > >> not have any chance to get real parallel running and probably encounter > >> failure as below: > > > > Most of the tests using FuzzySync do not need true parallism. We were > > able to reproduce a number of race conditions on a single vCPU. Infact > > it may actually benefit some races because one thread has to pause to > > allow the other to run, perhaps creating a huge race window. > > >> > >> === 100% reproducible on a 1cpu guest === > >> > >> cmdline="af_alg07" > >> contacts="" > >> analysis=exit > >> <<<test_output>>> > >> tst_test.c:1248: TINFO: Timeout per run is 0h 05m 00s > >> ../../../include/tst_fuzzy_sync.h:507: TINFO: Minimum sampling period > ended > >> ../../../include/tst_fuzzy_sync.h:330: TINFO: loop = 1024, delay_bias = > 0 > >> ../../../include/tst_fuzzy_sync.h:318: TINFO: start_a - start_b: { avg > = -137522ns, avg_dev = 854248ns, dev_ratio = 6.21 } > >> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - start_a : { avg > = 1915ns, avg_dev = 535ns, dev_ratio = 0.28 } > >> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_b - start_b : { avg > = 1885ns, avg_dev = 42ns, dev_ratio = 0.02 } > >> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - end_b : { avg > = -137492ns, avg_dev = 854818ns, dev_ratio = 6.22 } > >> ../../../include/tst_fuzzy_sync.h:318: TINFO: spins : { avg > = 554786 , avg_dev = 7355 , dev_ratio = 0.01 } > >> ../../../include/tst_fuzzy_sync.h:636: TINFO: Exceeded execution time, > requesting exit > >> af_alg07.c:96: TFAIL: fchownat() failed to fail, kernel may be > vulnerable > >> > >> Signed-off-by: Li Wang <liwang@redhat.com> > >> CC: Richard Palethorpe <rpalethorpe@suse.de> > >> --- > >> include/tst_fuzzy_sync.h | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h > >> index 4141f5c64..2e864b312 100644 > >> --- a/include/tst_fuzzy_sync.h > >> +++ b/include/tst_fuzzy_sync.h > >> @@ -281,6 +281,9 @@ static void tst_init_stat(struct tst_fzsync_stat *s) > >> static void tst_fzsync_pair_reset(struct tst_fzsync_pair *pair, > >> void *(*run_b)(void *)) > >> { > >> + if (get_nprocs() < 2) > >> + tst_brk(TCONF, "Fuzzy Sync requires at least two CPUs > available"); > >> + > >> tst_fzsync_pair_cleanup(pair); > >> > >> tst_init_stat(&pair->diff_ss); > > > > Perhaps this test would pass with more loops and a big enough delay > > range, but this is also wasting time on a single vCPU. I'm not sure > > whether we should filter this test at the LTP level; it may trigger the > > bug on some single CPU configs. > > No, af_alg07 requires 2 CPUs, otherwise it'll report false positives. > The test will pass only if fchownat() hits a half-closed socket and > returns error. But IIRC the half-closed socket will be destroyed during > reschedule which means there's no race window to hit anymore. But it > would be better to put the TCONF condition into the test itself. > +1 Correct, I stand by Martin's point. And we can avoid adding this patch to FuzzySync lib, but for af_alg07 2cpus is required. (maybe go with Cyril's suggest to add .min_cpus)
Hi! > And we can avoid adding this patch to FuzzySync lib, but for af_alg07 2cpus > is required. > (maybe go with Cyril's suggest to add .min_cpus) I would go for that and we can make use of it in getcwd04.c as well.
Hello, Martin Doucha <mdoucha@suse.cz> writes: > On 25. 11. 20 12:22, Richard Palethorpe wrote: >> Hello Li, >> >> Li Wang <liwang@redhat.com> writes: >> >>> It makes no sense to run parallel thread to simulate race conditions on >>> system with CPU number less than two, especially for kvm guest, it does >>> not have any chance to get real parallel running and probably encounter >>> failure as below: >> >> Most of the tests using FuzzySync do not need true parallism. We were >> able to reproduce a number of race conditions on a single vCPU. Infact >> it may actually benefit some races because one thread has to pause to >> allow the other to run, perhaps creating a huge race window. >> >>> >>> === 100% reproducible on a 1cpu guest === >>> >>> cmdline="af_alg07" >>> contacts="" >>> analysis=exit >>> <<<test_output>>> >>> tst_test.c:1248: TINFO: Timeout per run is 0h 05m 00s >>> ../../../include/tst_fuzzy_sync.h:507: TINFO: Minimum sampling period ended >>> ../../../include/tst_fuzzy_sync.h:330: TINFO: loop = 1024, delay_bias = 0 >>> ../../../include/tst_fuzzy_sync.h:318: TINFO: start_a - start_b: { avg = -137522ns, avg_dev = 854248ns, dev_ratio = 6.21 } >>> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - start_a : { avg = 1915ns, avg_dev = 535ns, dev_ratio = 0.28 } >>> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_b - start_b : { avg = 1885ns, avg_dev = 42ns, dev_ratio = 0.02 } >>> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - end_b : { avg = -137492ns, avg_dev = 854818ns, dev_ratio = 6.22 } >>> ../../../include/tst_fuzzy_sync.h:318: TINFO: spins : { avg = 554786 , avg_dev = 7355 , dev_ratio = 0.01 } >>> ../../../include/tst_fuzzy_sync.h:636: TINFO: Exceeded execution time, requesting exit >>> af_alg07.c:96: TFAIL: fchownat() failed to fail, kernel may be vulnerable >>> >>> Signed-off-by: Li Wang <liwang@redhat.com> >>> CC: Richard Palethorpe <rpalethorpe@suse.de> >>> --- >>> include/tst_fuzzy_sync.h | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h >>> index 4141f5c64..2e864b312 100644 >>> --- a/include/tst_fuzzy_sync.h >>> +++ b/include/tst_fuzzy_sync.h >>> @@ -281,6 +281,9 @@ static void tst_init_stat(struct tst_fzsync_stat *s) >>> static void tst_fzsync_pair_reset(struct tst_fzsync_pair *pair, >>> void *(*run_b)(void *)) >>> { >>> + if (get_nprocs() < 2) >>> + tst_brk(TCONF, "Fuzzy Sync requires at least two CPUs available"); >>> + >>> tst_fzsync_pair_cleanup(pair); >>> >>> tst_init_stat(&pair->diff_ss); >> >> Perhaps this test would pass with more loops and a big enough delay >> range, but this is also wasting time on a single vCPU. I'm not sure >> whether we should filter this test at the LTP level; it may trigger the >> bug on some single CPU configs. > > No, af_alg07 requires 2 CPUs, otherwise it'll report false positives. > The test will pass only if fchownat() hits a half-closed socket and > returns error. But IIRC the half-closed socket will be destroyed during > reschedule which means there's no race window to hit anymore. But it > would be better to put the TCONF condition into the test itself. Interesting, I wonder if this is also true for the real-time kernel with the threads set to RT priority?
Hi, >> No, af_alg07 requires 2 CPUs, otherwise it'll report false positives. >> The test will pass only if fchownat() hits a half-closed socket and >> returns error. But IIRC the half-closed socket will be destroyed during >> reschedule which means there's no race window to hit anymore. But it >> would be better to put the TCONF condition into the test itself. > Interesting, I wonder if this is also true for the real-time kernel with > the threads set to RT priority? It looks like the test can fail even with more than one cpu. I've seen this sporadic failure on different hardware with more than two cores, at least on intel denverton (x86_64) and renesas r-car (aarch64) systems. Both with kernel 4.19 with the fix included, on the denverton system the rt parches were included and on the r-car not. The test passes most of the time, but sometimes fails with the message Li posted. It also seems to fail sporadically on other systems as well: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860 Additionally I tested on qemu-x86 with 4.19 with and without rt patches. The test succeeds even with only one virtualized cpu. So either Martin's assumption is wrong or it holds only for newer kernel versions? Jörg
Hi Joerg, On Mon, Nov 30, 2020 at 3:53 PM Joerg Vehlow <lkml@jv-coder.de> wrote: > Hi, > >> No, af_alg07 requires 2 CPUs, otherwise it'll report false positives. > >> The test will pass only if fchownat() hits a half-closed socket and > >> returns error. But IIRC the half-closed socket will be destroyed during > >> reschedule which means there's no race window to hit anymore. But it > >> would be better to put the TCONF condition into the test itself. > > Interesting, I wonder if this is also true for the real-time kernel with > > the threads set to RT priority? > It looks like the test can fail even with more than one cpu. I've seen > this sporadic failure on different hardware with more than two cores, at > least on intel denverton (x86_64) and renesas r-car (aarch64) systems. > Both with kernel 4.19 with the fix included, on the denverton system the > rt parches were included and on the r-car not. The test passes most of > the time, but sometimes fails with the message Li posted. > > It also seems to fail sporadically on other systems as well: > https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860 > > Additionally I tested on qemu-x86 with 4.19 with and without rt patches. > The test succeeds even with only one virtualized cpu. So either Martin's > assumption is wrong or it holds only for newer kernel versions? > No, Mertin is not wrong, and you are also right. They are totally two different issues of af_alg07, the test on 1CPU should be fixed with TCONF. But the fail with aarch64 is more like a hardware issue, Chunyu has a drafted patch to add init delay value for such a system. Can you try this on your aarm64 platform? ----------------------------- fzsync can't get a random delay range on hpe-moonshot systems, so run with delay=0 during all the tests. This is probably the hardware issue such as cache line design so can't get a stable state during the execution of the critical section. Provide an experience delay value on hpe-moonshot to make it hit the race window immediately without exceeding samples. --- testcases/kernel/crypto/af_alg07.c | 1 + 1 file changed, 1 insertion(+) diff --git a/testcases/kernel/crypto/af_alg07.c b/testcases/kernel/crypto/af_alg07.c index 6ad86f4f3..24f5b8088 100644 --- a/testcases/kernel/crypto/af_alg07.c +++ b/testcases/kernel/crypto/af_alg07.c @@ -47,6 +47,7 @@ static void setup(void) fd = SAFE_OPEN("tmpfile", O_RDWR | O_CREAT, 0644); tst_fzsync_pair_init(&fzsync_pair); + fzsync_pair.delay_bias = 700; } static void *thread_run(void *arg)
Hi Li, On 11/30/2020 9:14 AM, Li Wang wrote: > Hi Joerg, > > On Mon, Nov 30, 2020 at 3:53 PM Joerg Vehlow <lkml@jv-coder.de > <mailto:lkml@jv-coder.de>> wrote: > > Hi, > >> No, af_alg07 requires 2 CPUs, otherwise it'll report false > positives. > >> The test will pass only if fchownat() hits a half-closed socket and > >> returns error. But IIRC the half-closed socket will be > destroyed during > >> reschedule which means there's no race window to hit anymore. > But it > >> would be better to put the TCONF condition into the test itself. > > Interesting, I wonder if this is also true for the real-time > kernel with > > the threads set to RT priority? > It looks like the test can fail even with more than one cpu. I've > seen > this sporadic failure on different hardware with more than two > cores, at > least on intel denverton (x86_64) and renesas r-car (aarch64) > systems. > Both with kernel 4.19 with the fix included, on the denverton > system the > rt parches were included and on the r-car not. The test passes > most of > the time, but sometimes fails with the message Li posted. > > It also seems to fail sporadically on other systems as well: > https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860 > <https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860> > > Additionally I tested on qemu-x86 with 4.19 with and without rt > patches. > The test succeeds even with only one virtualized cpu. So either > Martin's > assumption is wrong or it holds only for newer kernel versions? > > > No, Mertin is not wrong, and you are also right. > > They are totally two different issues of af_alg07, the test on 1CPU > should be fixed with TCONF. But the fail with aarch64 is more like a > hardware issue, Chunyu has a drafted patch to add init delay value for > such a system. I think you misunderstood something. I see random fails with "TFAIL: fchownat() failed to fail, kernel may be vulnerable" on both x86_64 and aarch64 with more than one cpu core (4 for x86_64 and 2 or 4 for aarch64). I see no error ("TPASS: fchownat() failed successfully: ENOENT (2)") on single core qemu-x86. This is why I think Martin's assumption may be wrong. If it was right, it should never succeed on a single core system right? Jörg
Hello, Li Wang <liwang@redhat.com> writes: > Hi Joerg, > > On Mon, Nov 30, 2020 at 3:53 PM Joerg Vehlow <lkml@jv-coder.de> wrote: > >> Hi, >> >> No, af_alg07 requires 2 CPUs, otherwise it'll report false positives. >> >> The test will pass only if fchownat() hits a half-closed socket and >> >> returns error. But IIRC the half-closed socket will be destroyed during >> >> reschedule which means there's no race window to hit anymore. But it >> >> would be better to put the TCONF condition into the test itself. >> > Interesting, I wonder if this is also true for the real-time kernel with >> > the threads set to RT priority? >> It looks like the test can fail even with more than one cpu. I've seen >> this sporadic failure on different hardware with more than two cores, at >> least on intel denverton (x86_64) and renesas r-car (aarch64) systems. >> Both with kernel 4.19 with the fix included, on the denverton system the >> rt parches were included and on the r-car not. The test passes most of >> the time, but sometimes fails with the message Li posted. >> >> It also seems to fail sporadically on other systems as well: >> https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860 >> >> Additionally I tested on qemu-x86 with 4.19 with and without rt patches. >> The test succeeds even with only one virtualized cpu. So either Martin's >> assumption is wrong or it holds only for newer kernel versions? >> > > No, Mertin is not wrong, and you are also right. > > They are totally two different issues of af_alg07, the test on 1CPU > should be fixed with TCONF. But the fail with aarch64 is more like a > hardware issue, Chunyu has a drafted patch to add init delay value for > such a system. > > Can you try this on your aarm64 platform? > ----------------------------- > fzsync can't get a random delay range on hpe-moonshot systems, so run with > delay=0 during all the tests. This is probably the hardware issue such as > cache line design so can't get a stable state during the execution of the > critical > section. Provide an experience delay value on hpe-moonshot to make it hit > the race window immediately without exceeding samples. > > --- > testcases/kernel/crypto/af_alg07.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/testcases/kernel/crypto/af_alg07.c > b/testcases/kernel/crypto/af_alg07.c > index 6ad86f4f3..24f5b8088 100644 > --- a/testcases/kernel/crypto/af_alg07.c > +++ b/testcases/kernel/crypto/af_alg07.c > @@ -47,6 +47,7 @@ static void setup(void) > fd = SAFE_OPEN("tmpfile", O_RDWR | O_CREAT, 0644); > > tst_fzsync_pair_init(&fzsync_pair); > + fzsync_pair.delay_bias = 700; I hope there is some way to set this dynamically. Similar to CVE-2016-7117. If we know that we should get some particular error we could modify the bias until the error happens. > } > > static void *thread_run(void *arg) > -- > 2.19.1
Hi Joerg, On Mon, Nov 30, 2020 at 4:39 PM Joerg Vehlow <lkml@jv-coder.de> wrote: > Hi Li, > > On 11/30/2020 9:14 AM, Li Wang wrote: > > Hi Joerg, > > > > On Mon, Nov 30, 2020 at 3:53 PM Joerg Vehlow <lkml@jv-coder.de > > <mailto:lkml@jv-coder.de>> wrote: > > > > Hi, > > >> No, af_alg07 requires 2 CPUs, otherwise it'll report false > > positives. > > >> The test will pass only if fchownat() hits a half-closed socket > and > > >> returns error. But IIRC the half-closed socket will be > > destroyed during > > >> reschedule which means there's no race window to hit anymore. > > But it > > >> would be better to put the TCONF condition into the test itself. > > > Interesting, I wonder if this is also true for the real-time > > kernel with > > > the threads set to RT priority? > > It looks like the test can fail even with more than one cpu. I've > > seen > > this sporadic failure on different hardware with more than two > > cores, at > > least on intel denverton (x86_64) and renesas r-car (aarch64) > > systems. > > Both with kernel 4.19 with the fix included, on the denverton > > system the > > rt parches were included and on the r-car not. The test passes > > most of > > the time, but sometimes fails with the message Li posted. > > > > It also seems to fail sporadically on other systems as well: > > https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860 > > <https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860> > > > > Additionally I tested on qemu-x86 with 4.19 with and without rt > > patches. > > The test succeeds even with only one virtualized cpu. So either > > Martin's > > assumption is wrong or it holds only for newer kernel versions? > > > > > > No, Mertin is not wrong, and you are also right. > > > > They are totally two different issues of af_alg07, the test on 1CPU > > should be fixed with TCONF. But the fail with aarch64 is more like a > > hardware issue, Chunyu has a drafted patch to add init delay value for > > such a system. > I think you misunderstood something. I see random fails with "TFAIL: > fchownat() failed to fail, kernel may be vulnerable" on both x86_64 and > aarch64 with more than one cpu core (4 for x86_64 and 2 or 4 for aarch64). > Well, seems I was somewhat arbitrary on this problem a moment ago. Probably I was missing the 4cores fails on x86_64 you mentioned, we just observed that FAIL on 1CPU x86_64 and hpe_moonshot(aarch64) so far. The tentative conclusion of our debugging result: 1. FAIL with 1CPU KVM x86_64 is false positives 2. FAIL with hpe_moonshot aarch64 is caused by cache line design > > I see no error ("TPASS: fchownat() failed successfully: ENOENT (2)") on > single core qemu-x86. This is why I think Martin's assumption may be > wrong. If it was right, it should never succeed on a single core system > right? > Hmm, it's hard to say never, it is also possible to create a race window on a single-core system. Anyway, we need to do more investigation.
On 30. 11. 20 10:01, Richard Palethorpe wrote: > Hello, > > Li Wang <liwang@redhat.com> writes: > >> Hi Joerg, >> >> On Mon, Nov 30, 2020 at 3:53 PM Joerg Vehlow <lkml@jv-coder.de> wrote: >> >>> Hi, >>>>> No, af_alg07 requires 2 CPUs, otherwise it'll report false positives. >>>>> The test will pass only if fchownat() hits a half-closed socket and >>>>> returns error. But IIRC the half-closed socket will be destroyed during >>>>> reschedule which means there's no race window to hit anymore. But it >>>>> would be better to put the TCONF condition into the test itself. >>>> Interesting, I wonder if this is also true for the real-time kernel with >>>> the threads set to RT priority? >>> It looks like the test can fail even with more than one cpu. I've seen >>> this sporadic failure on different hardware with more than two cores, at >>> least on intel denverton (x86_64) and renesas r-car (aarch64) systems. >>> Both with kernel 4.19 with the fix included, on the denverton system the >>> rt parches were included and on the r-car not. The test passes most of >>> the time, but sometimes fails with the message Li posted. >>> >>> It also seems to fail sporadically on other systems as well: >>> https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860 >>> >>> Additionally I tested on qemu-x86 with 4.19 with and without rt patches. >>> The test succeeds even with only one virtualized cpu. So either Martin's >>> assumption is wrong or it holds only for newer kernel versions? >>> >> >> No, Mertin is not wrong, and you are also right. >> >> They are totally two different issues of af_alg07, the test on 1CPU >> should be fixed with TCONF. But the fail with aarch64 is more like a >> hardware issue, Chunyu has a drafted patch to add init delay value for >> such a system. >> >> Can you try this on your aarm64 platform? >> ----------------------------- >> fzsync can't get a random delay range on hpe-moonshot systems, so run with >> delay=0 during all the tests. This is probably the hardware issue such as >> cache line design so can't get a stable state during the execution of the >> critical >> section. Provide an experience delay value on hpe-moonshot to make it hit >> the race window immediately without exceeding samples. >> >> --- >> testcases/kernel/crypto/af_alg07.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/testcases/kernel/crypto/af_alg07.c >> b/testcases/kernel/crypto/af_alg07.c >> index 6ad86f4f3..24f5b8088 100644 >> --- a/testcases/kernel/crypto/af_alg07.c >> +++ b/testcases/kernel/crypto/af_alg07.c >> @@ -47,6 +47,7 @@ static void setup(void) >> fd = SAFE_OPEN("tmpfile", O_RDWR | O_CREAT, 0644); >> >> tst_fzsync_pair_init(&fzsync_pair); >> + fzsync_pair.delay_bias = 700; > > I hope there is some way to set this dynamically. Similar to > CVE-2016-7117. > > If we know that we should get some particular error we could modify the > bias until the error happens. There are three possible outcomes of the race: 1) fchownat() returns 0 => fchownat() was called too early or the kernel is vulnerable, you can adjust bias here 2) fchwonat() fails with ENOENT => kernel is fixed, print TPASS and exit 3) fchownat() fails with EBADF => fchownat() was called too late, you can adjust bias here IIRC I didn't play with bias in this test because on x86_64 it passes almost instantly on a fixed kernel. Feel free to add dynamic bias adjustment for ARM.
diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h index 4141f5c64..2e864b312 100644 --- a/include/tst_fuzzy_sync.h +++ b/include/tst_fuzzy_sync.h @@ -281,6 +281,9 @@ static void tst_init_stat(struct tst_fzsync_stat *s) static void tst_fzsync_pair_reset(struct tst_fzsync_pair *pair, void *(*run_b)(void *)) { + if (get_nprocs() < 2) + tst_brk(TCONF, "Fuzzy Sync requires at least two CPUs available"); + tst_fzsync_pair_cleanup(pair); tst_init_stat(&pair->diff_ss);
It makes no sense to run parallel thread to simulate race conditions on system with CPU number less than two, especially for kvm guest, it does not have any chance to get real parallel running and probably encounter failure as below: === 100% reproducible on a 1cpu guest === cmdline="af_alg07" contacts="" analysis=exit <<<test_output>>> tst_test.c:1248: TINFO: Timeout per run is 0h 05m 00s ../../../include/tst_fuzzy_sync.h:507: TINFO: Minimum sampling period ended ../../../include/tst_fuzzy_sync.h:330: TINFO: loop = 1024, delay_bias = 0 ../../../include/tst_fuzzy_sync.h:318: TINFO: start_a - start_b: { avg = -137522ns, avg_dev = 854248ns, dev_ratio = 6.21 } ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - start_a : { avg = 1915ns, avg_dev = 535ns, dev_ratio = 0.28 } ../../../include/tst_fuzzy_sync.h:318: TINFO: end_b - start_b : { avg = 1885ns, avg_dev = 42ns, dev_ratio = 0.02 } ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - end_b : { avg = -137492ns, avg_dev = 854818ns, dev_ratio = 6.22 } ../../../include/tst_fuzzy_sync.h:318: TINFO: spins : { avg = 554786 , avg_dev = 7355 , dev_ratio = 0.01 } ../../../include/tst_fuzzy_sync.h:636: TINFO: Exceeded execution time, requesting exit af_alg07.c:96: TFAIL: fchownat() failed to fail, kernel may be vulnerable Signed-off-by: Li Wang <liwang@redhat.com> CC: Richard Palethorpe <rpalethorpe@suse.de> --- include/tst_fuzzy_sync.h | 3 +++ 1 file changed, 3 insertions(+)