diff mbox series

cpuset_memory_spread: change lowerlimit to 5000kb

Message ID 20230609012740.19097-1-zhanghongchen@loongson.cn
State Changes Requested
Headers show
Series cpuset_memory_spread: change lowerlimit to 5000kb | expand

Commit Message

Hongchen Zhang June 9, 2023, 1:27 a.m. UTC
When I test the cpuset_memory_spread case,this case FAIL too often.
After dig into the code, I find out that the fowlloing things trigger
the FAIL:
1) random events,the probability is very small and can be ignored
2) get_meminfo which before send signal to test_pid
3) account_memsinfo before result_check

About 2) and 3), we can increase the value of lowerlimit to keep
the result as SUCCESS.After my testing, 5000kb is a reasonable value.

Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
---
 .../cpuset_memory_spread_test/cpuset_memory_spread_testset.sh   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Martin Doucha June 15, 2023, 2:27 p.m. UTC | #1
Hi,

On 09. 06. 23 3:27, Hongchen Zhang wrote:
> When I test the cpuset_memory_spread case,this case FAIL too often.
> After dig into the code, I find out that the fowlloing things trigger
> the FAIL:
> 1) random events,the probability is very small and can be ignored
> 2) get_meminfo which before send signal to test_pid
> 3) account_memsinfo before result_check
> 
> About 2) and 3), we can increase the value of lowerlimit to keep
> the result as SUCCESS.After my testing, 5000kb is a reasonable value.

we're also seeing these failures but only on architectures like PowerPC 
with pagesize higher than the usual 4KB. On which architectures do you 
see failures and what's the pagesize there?

> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
> ---
>   .../cpuset_memory_spread_test/cpuset_memory_spread_testset.sh   | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/testcases/kernel/controllers/cpuset/cpuset_memory_spread_test/cpuset_memory_spread_testset.sh b/testcases/kernel/controllers/cpuset/cpuset_memory_spread_test/cpuset_memory_spread_testset.sh
> index e2767ef05..d33468525 100755
> --- a/testcases/kernel/controllers/cpuset/cpuset_memory_spread_test/cpuset_memory_spread_testset.sh
> +++ b/testcases/kernel/controllers/cpuset/cpuset_memory_spread_test/cpuset_memory_spread_testset.sh
> @@ -38,7 +38,7 @@ nr_mems=$N_NODES
>   # on which it is running. The other nodes' slab space has littler change.(less
>   # than 1000 kb).
>   upperlimit=10000
> -lowerlimit=2000
> +lowerlimit=5000
>   
>   cpus_all="$(seq -s, 0 $((nr_cpus-1)))"
>   mems_all="$(seq -s, 0 $((nr_mems-1)))"
Hongchen Zhang June 16, 2023, 2:10 a.m. UTC | #2
Hi Martin,

On 2023/6/15 pm 10:27, Martin Doucha wrote:
> Hi,
> 
> On 09. 06. 23 3:27, Hongchen Zhang wrote:
>> When I test the cpuset_memory_spread case,this case FAIL too often.
>> After dig into the code, I find out that the fowlloing things trigger
>> the FAIL:
>> 1) random events,the probability is very small and can be ignored
>> 2) get_meminfo which before send signal to test_pid
>> 3) account_memsinfo before result_check
>>
>> About 2) and 3), we can increase the value of lowerlimit to keep
>> the result as SUCCESS.After my testing, 5000kb is a reasonable value.
> 
> we're also seeing these failures but only on architectures like PowerPC 
> with pagesize higher than the usual 4KB. On which architectures do you 
> see failures and what's the pagesize there?
I test on 3C5000+7A2000 machine, the architecture is LoongArch.The 
pagesize we used is 16KB.
> 
>> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
>> ---
>>   .../cpuset_memory_spread_test/cpuset_memory_spread_testset.sh   | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git 
>> a/testcases/kernel/controllers/cpuset/cpuset_memory_spread_test/cpuset_memory_spread_testset.sh 
>> b/testcases/kernel/controllers/cpuset/cpuset_memory_spread_test/cpuset_memory_spread_testset.sh 
>>
>> index e2767ef05..d33468525 100755
>> --- 
>> a/testcases/kernel/controllers/cpuset/cpuset_memory_spread_test/cpuset_memory_spread_testset.sh 
>>
>> +++ 
>> b/testcases/kernel/controllers/cpuset/cpuset_memory_spread_test/cpuset_memory_spread_testset.sh 
>>
>> @@ -38,7 +38,7 @@ nr_mems=$N_NODES
>>   # on which it is running. The other nodes' slab space has littler 
>> change.(less
>>   # than 1000 kb).
>>   upperlimit=10000
>> -lowerlimit=2000
>> +lowerlimit=5000
>>   cpus_all="$(seq -s, 0 $((nr_cpus-1)))"
>>   mems_all="$(seq -s, 0 $((nr_mems-1)))"
> 

Best Regards
Hongchen Zhang
Martin Doucha June 16, 2023, 9:31 a.m. UTC | #3
On 16. 06. 23 4:10, Hongchen Zhang wrote:
> Hi Martin,
> 
> On 2023/6/15 pm 10:27, Martin Doucha wrote:
>> Hi,
>>
>> On 09. 06. 23 3:27, Hongchen Zhang wrote:
>>> When I test the cpuset_memory_spread case,this case FAIL too often.
>>> After dig into the code, I find out that the fowlloing things trigger
>>> the FAIL:
>>> 1) random events,the probability is very small and can be ignored
>>> 2) get_meminfo which before send signal to test_pid
>>> 3) account_memsinfo before result_check
>>>
>>> About 2) and 3), we can increase the value of lowerlimit to keep
>>> the result as SUCCESS.After my testing, 5000kb is a reasonable value.
>>
>> we're also seeing these failures but only on architectures like 
>> PowerPC with pagesize higher than the usual 4KB. On which 
>> architectures do you see failures and what's the pagesize there?
> I test on 3C5000+7A2000 machine, the architecture is LoongArch.The 
> pagesize we used is 16KB.

So the underlying cause is the same - higher pagesize. That means the 
upperlimit, lowerlimit and DATAFILE size should be calculated from 
pagesize instead.
Hongchen Zhang June 16, 2023, 10:13 a.m. UTC | #4
Hi Martin,

On 2023/6/16 pm 5:31, Martin Doucha wrote:
> On 16. 06. 23 4:10, Hongchen Zhang wrote:
>> Hi Martin,
>>
>> On 2023/6/15 pm 10:27, Martin Doucha wrote:
>>> Hi,
>>>
>>> On 09. 06. 23 3:27, Hongchen Zhang wrote:
>>>> When I test the cpuset_memory_spread case,this case FAIL too often.
>>>> After dig into the code, I find out that the fowlloing things trigger
>>>> the FAIL:
>>>> 1) random events,the probability is very small and can be ignored
>>>> 2) get_meminfo which before send signal to test_pid
>>>> 3) account_memsinfo before result_check
>>>>
>>>> About 2) and 3), we can increase the value of lowerlimit to keep
>>>> the result as SUCCESS.After my testing, 5000kb is a reasonable value.
>>>
>>> we're also seeing these failures but only on architectures like 
>>> PowerPC with pagesize higher than the usual 4KB. On which 
>>> architectures do you see failures and what's the pagesize there?
>> I test on 3C5000+7A2000 machine, the architecture is LoongArch.The 
>> pagesize we used is 16KB.
> 
> So the underlying cause is the same - higher pagesize. That means the 
> upperlimit, lowerlimit and DATAFILE size should be calculated from 
> pagesize instead.IMO,upperlimit and DATAFILE size will not affect the result.
Change the lowerlimit like following?
lowerlimit = 2000kb*get_pagesize()/SIZE_4K;
But I have no idea which formula can we use.
Any suggestion?
> 

Best Regards
Hongchen Zhang
Hongchen Zhang June 27, 2023, 11:39 a.m. UTC | #5
Hi Chril,
Any suggestion about this patch?

On 2023/6/16 pm 6:13, Hongchen Zhang wrote:
> Hi Martin,
> 
> On 2023/6/16 pm 5:31, Martin Doucha wrote:
>> On 16. 06. 23 4:10, Hongchen Zhang wrote:
>>> Hi Martin,
>>>
>>> On 2023/6/15 pm 10:27, Martin Doucha wrote:
>>>> Hi,
>>>>
>>>> On 09. 06. 23 3:27, Hongchen Zhang wrote:
>>>>> When I test the cpuset_memory_spread case,this case FAIL too often.
>>>>> After dig into the code, I find out that the fowlloing things trigger
>>>>> the FAIL:
>>>>> 1) random events,the probability is very small and can be ignored
>>>>> 2) get_meminfo which before send signal to test_pid
>>>>> 3) account_memsinfo before result_check
>>>>>
>>>>> About 2) and 3), we can increase the value of lowerlimit to keep
>>>>> the result as SUCCESS.After my testing, 5000kb is a reasonable value.
>>>>
>>>> we're also seeing these failures but only on architectures like 
>>>> PowerPC with pagesize higher than the usual 4KB. On which 
>>>> architectures do you see failures and what's the pagesize there?
>>> I test on 3C5000+7A2000 machine, the architecture is LoongArch.The 
>>> pagesize we used is 16KB.
>>
>> So the underlying cause is the same - higher pagesize. That means the 
>> upperlimit, lowerlimit and DATAFILE size should be calculated from 
>> pagesize instead.IMO,upperlimit and DATAFILE size will not affect the 
>> result.
> Change the lowerlimit like following?
> lowerlimit = 2000kb*get_pagesize()/SIZE_4K;
> But I have no idea which formula can we use.
> Any suggestion?
>>
>

Best Regards
Hongchen Zhang
Richard Palethorpe Aug. 29, 2023, 9:25 a.m. UTC | #6
Hello,

Hongchen Zhang <zhanghongchen@loongson.cn> writes:

> Hi Chril,
> Any suggestion about this patch?
>
> On 2023/6/16 pm 6:13, Hongchen Zhang wrote:
>> Hi Martin,
>> On 2023/6/16 pm 5:31, Martin Doucha wrote:
>>> On 16. 06. 23 4:10, Hongchen Zhang wrote:
>>>> Hi Martin,
>>>>
>>>> On 2023/6/15 pm 10:27, Martin Doucha wrote:
>>>>> Hi,
>>>>>
>>>>> On 09. 06. 23 3:27, Hongchen Zhang wrote:
>>>>>> When I test the cpuset_memory_spread case,this case FAIL too often.
>>>>>> After dig into the code, I find out that the fowlloing things trigger
>>>>>> the FAIL:
>>>>>> 1) random events,the probability is very small and can be ignored
>>>>>> 2) get_meminfo which before send signal to test_pid
>>>>>> 3) account_memsinfo before result_check
>>>>>>
>>>>>> About 2) and 3), we can increase the value of lowerlimit to keep
>>>>>> the result as SUCCESS.After my testing, 5000kb is a reasonable value.
>>>>>
>>>>> we're also seeing these failures but only on architectures like
>>>>> PowerPC with pagesize higher than the usual 4KB. On which
>>>>> architectures do you see failures and what's the pagesize there?
>>>> I test on 3C5000+7A2000 machine, the architecture is LoongArch.The
>>>> pagesize we used is 16KB.
>>>
>>> So the underlying cause is the same - higher pagesize. That means
>>> the upperlimit, lowerlimit and DATAFILE size should be calculated
>>> from pagesize instead.IMO,upperlimit and DATAFILE size will not
>>> affect the result.
>> Change the lowerlimit like following?
>> lowerlimit = 2000kb*get_pagesize()/SIZE_4K;

This formula looks ok, but you need to scale the other values by the
page size as well.

Also I would recommend ensuring all values are multiples of the page
size because the kernel will round up to the nearest page
size.

So lowerlimit = 4096 * 5 = 2048Kb
or lowerlimit = 16384 * 5 = 8192Kb

Maybe the upperlimit should be 5 * lowerlimit? Because we want the
gap/spread to get bigger too. I don't know if the DATAFILE needs to
change in size it is already 500MB.

Alternatively you could just create a lookup table with values for each
page size we have tested. e.g.

switch (get_pagesize()) {
       case 4096: 2048Kb
       case 16384: 8192Kb
       default: ...
}

This may be better if the values to do not scale linearly. Which is
totally possible because the page size effects most things and there
could be feedback loops.

Please submit another patch if you are still interested.
Hongchen Zhang Aug. 30, 2023, 3:28 a.m. UTC | #7
Hi Richard ,
   Thanks for your review,
On 2023/8/29 下午5:25, Richard Palethorpe wrote:
> Hello,
> 
> Hongchen Zhang <zhanghongchen@loongson.cn> writes:
> 
>> Hi Chril,
>> Any suggestion about this patch?
>>
>> On 2023/6/16 pm 6:13, Hongchen Zhang wrote:
>>> Hi Martin,
>>> On 2023/6/16 pm 5:31, Martin Doucha wrote:
>>>> On 16. 06. 23 4:10, Hongchen Zhang wrote:
>>>>> Hi Martin,
>>>>>
>>>>> On 2023/6/15 pm 10:27, Martin Doucha wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 09. 06. 23 3:27, Hongchen Zhang wrote:
>>>>>>> When I test the cpuset_memory_spread case,this case FAIL too often.
>>>>>>> After dig into the code, I find out that the fowlloing things trigger
>>>>>>> the FAIL:
>>>>>>> 1) random events,the probability is very small and can be ignored
>>>>>>> 2) get_meminfo which before send signal to test_pid
>>>>>>> 3) account_memsinfo before result_check
>>>>>>>
>>>>>>> About 2) and 3), we can increase the value of lowerlimit to keep
>>>>>>> the result as SUCCESS.After my testing, 5000kb is a reasonable value.
>>>>>>
>>>>>> we're also seeing these failures but only on architectures like
>>>>>> PowerPC with pagesize higher than the usual 4KB. On which
>>>>>> architectures do you see failures and what's the pagesize there?
>>>>> I test on 3C5000+7A2000 machine, the architecture is LoongArch.The
>>>>> pagesize we used is 16KB.
>>>>
>>>> So the underlying cause is the same - higher pagesize. That means
>>>> the upperlimit, lowerlimit and DATAFILE size should be calculated
>>>> from pagesize instead.IMO,upperlimit and DATAFILE size will not
>>>> affect the result.
>>> Change the lowerlimit like following?
>>> lowerlimit = 2000kb*get_pagesize()/SIZE_4K;
> 
> This formula looks ok, but you need to scale the other values by the
> page size as well.
> 
> Also I would recommend ensuring all values are multiples of the page
> size because the kernel will round up to the nearest page
> size.
> 
> So lowerlimit = 4096 * 5 = 2048Kb
> or lowerlimit = 16384 * 5 = 8192Kb
> 
> Maybe the upperlimit should be 5 * lowerlimit? Because we want the
> gap/spread to get bigger too. I don't know if the DATAFILE needs to
> change in size it is already 500MB.
> 
> Alternatively you could just create a lookup table with values for each
> page size we have tested. e.g.
> 
> switch (get_pagesize()) {
>         case 4096: 2048Kb
>         case 16384: 8192Kb
>         default: ...
> }
> 
> This may be better if the values to do not scale linearly. Which is
> totally possible because the page size effects most things and there
> could be feedback loops.
> 
> Please submit another patch if you are still interested.
OK, let me send the V2 patch.
>
diff mbox series

Patch

diff --git a/testcases/kernel/controllers/cpuset/cpuset_memory_spread_test/cpuset_memory_spread_testset.sh b/testcases/kernel/controllers/cpuset/cpuset_memory_spread_test/cpuset_memory_spread_testset.sh
index e2767ef05..d33468525 100755
--- a/testcases/kernel/controllers/cpuset/cpuset_memory_spread_test/cpuset_memory_spread_testset.sh
+++ b/testcases/kernel/controllers/cpuset/cpuset_memory_spread_test/cpuset_memory_spread_testset.sh
@@ -38,7 +38,7 @@  nr_mems=$N_NODES
 # on which it is running. The other nodes' slab space has littler change.(less
 # than 1000 kb).
 upperlimit=10000
-lowerlimit=2000
+lowerlimit=5000
 
 cpus_all="$(seq -s, 0 $((nr_cpus-1)))"
 mems_all="$(seq -s, 0 $((nr_mems-1)))"