diff mbox

[4/4] mm: numa: Slow PTE scan rate if migration failures occur

Message ID 20150309210147.GA3406@suse.de (mailing list archive)
State Not Applicable
Delegated to: Michael Ellerman
Headers show

Commit Message

Mel Gorman March 9, 2015, 9:02 p.m. UTC
On Sun, Mar 08, 2015 at 08:40:25PM +0000, Mel Gorman wrote:
> > Because if the answer is 'yes', then we can safely say: 'we regressed 
> > performance because correctness [not dropping dirty bits] comes before 
> > performance'.
> > 
> > If the answer is 'no', then we still have a mystery (and a regression) 
> > to track down.
> > 
> > As a second hack (not to be applied), could we change:
> > 
> >  #define _PAGE_BIT_PROTNONE      _PAGE_BIT_GLOBAL
> > 
> > to:
> > 
> >  #define _PAGE_BIT_PROTNONE      (_PAGE_BIT_GLOBAL+1)
> > 
> 
> In itself, that's not enough. The SWP_OFFSET_SHIFT would also need updating
> as a partial revert of 21d9ee3eda7792c45880b2f11bff8e95c9a061fb but it
> can be done.
> 

More importantily, _PAGE_BIT_GLOBAL+1 == the special PTE bit so just
updating the value should crash. For the purposes of testing the idea, I
thought the straight-forward option was to break soft dirty page tracking
and steal their bit for testing (patch below). Took most of the day to
get access to the test machine so tests are not long running and only
the autonuma one has completed;

autonumabench
                                              3.19.0             4.0.0-rc1             4.0.0-rc1             4.0.0-rc1
                                             vanilla               vanilla         slowscan-v2r7        protnone-v3
Time User-NUMA01                  25695.96 (  0.00%)    32883.59 (-27.97%)    35288.00 (-37.33%)    35236.21 (-37.13%)
Time User-NUMA01_THEADLOCAL       17404.36 (  0.00%)    17453.20 ( -0.28%)    17765.79 ( -2.08%)    17590.10 ( -1.07%)
Time User-NUMA02                   2037.65 (  0.00%)     2063.70 ( -1.28%)     2063.22 ( -1.25%)     2072.95 ( -1.73%)
Time User-NUMA02_SMT                981.02 (  0.00%)      983.70 ( -0.27%)      976.01 (  0.51%)      983.42 ( -0.24%)
Time System-NUMA01                  194.70 (  0.00%)      602.44 (-209.42%)      209.42 ( -7.56%)      737.36 (-278.72%)
Time System-NUMA01_THEADLOCAL        98.52 (  0.00%)       78.10 ( 20.73%)       92.70 (  5.91%)       80.69 ( 18.10%)
Time System-NUMA02                    9.28 (  0.00%)        6.47 ( 30.28%)        6.06 ( 34.70%)        6.63 ( 28.56%)
Time System-NUMA02_SMT                3.79 (  0.00%)        5.06 (-33.51%)        3.39 ( 10.55%)        3.60 (  5.01%)
Time Elapsed-NUMA01                 558.84 (  0.00%)      755.96 (-35.27%)      833.63 (-49.17%)      804.50 (-43.96%)
Time Elapsed-NUMA01_THEADLOCAL      382.54 (  0.00%)      382.22 (  0.08%)      395.45 ( -3.37%)      388.12 ( -1.46%)
Time Elapsed-NUMA02                  49.83 (  0.00%)       49.38 (  0.90%)       50.21 ( -0.76%)       48.99 (  1.69%)
Time Elapsed-NUMA02_SMT              46.59 (  0.00%)       47.70 ( -2.38%)       48.55 ( -4.21%)       49.50 ( -6.25%)
Time CPU-NUMA01                    4632.00 (  0.00%)     4429.00 (  4.38%)     4258.00 (  8.07%)     4471.00 (  3.48%)
Time CPU-NUMA01_THEADLOCAL         4575.00 (  0.00%)     4586.00 ( -0.24%)     4515.00 (  1.31%)     4552.00 (  0.50%)
Time CPU-NUMA02                    4107.00 (  0.00%)     4191.00 ( -2.05%)     4120.00 ( -0.32%)     4244.00 ( -3.34%)
Time CPU-NUMA02_SMT                2113.00 (  0.00%)     2072.00 (  1.94%)     2017.00 (  4.54%)     1993.00 (  5.68%)

              3.19.0   4.0.0-rc1   4.0.0-rc1   4.0.0-rc1
             vanilla     vanillaslowscan-v2r7protnone-v3
User        46119.12    53384.29    56093.11    55882.82
System        306.41      692.14      311.64      828.36
Elapsed      1039.88     1236.87     1328.61     1292.92

So just using a different bit doesn't seem to be it either

                                3.19.0   4.0.0-rc1   4.0.0-rc1   4.0.0-rc1
                               vanilla     vanillaslowscan-v2r7protnone-v3
NUMA alloc hit                 1202922     1437560     1472578     1499274
NUMA alloc miss                      0           0           0           0
NUMA interleave hit                  0           0           0           0
NUMA alloc local               1200683     1436781     1472226     1498680
NUMA base PTE updates        222840103   304513172   121532313   337431414
NUMA huge PMD updates           434894      594467      237170      658715
NUMA page range updates      445505831   608880276   242963353   674693494
NUMA hint faults                601358      733491      334334      820793
NUMA hint local faults          371571      511530      227171      565003
NUMA hint local percent             61          69          67          68
NUMA pages migrated            7073177    26366701     8607082    31288355

Patch to use a bit other than the global bit for prot none is below.

Comments

Mel Gorman March 10, 2015, 1:08 p.m. UTC | #1
On Mon, Mar 09, 2015 at 09:02:19PM +0000, Mel Gorman wrote:
> On Sun, Mar 08, 2015 at 08:40:25PM +0000, Mel Gorman wrote:
> > > Because if the answer is 'yes', then we can safely say: 'we regressed 
> > > performance because correctness [not dropping dirty bits] comes before 
> > > performance'.
> > > 
> > > If the answer is 'no', then we still have a mystery (and a regression) 
> > > to track down.
> > > 
> > > As a second hack (not to be applied), could we change:
> > > 
> > >  #define _PAGE_BIT_PROTNONE      _PAGE_BIT_GLOBAL
> > > 
> > > to:
> > > 
> > >  #define _PAGE_BIT_PROTNONE      (_PAGE_BIT_GLOBAL+1)
> > > 
> > 
> > In itself, that's not enough. The SWP_OFFSET_SHIFT would also need updating
> > as a partial revert of 21d9ee3eda7792c45880b2f11bff8e95c9a061fb but it
> > can be done.
> > 
> 
> More importantily, _PAGE_BIT_GLOBAL+1 == the special PTE bit so just
> updating the value should crash. For the purposes of testing the idea, I
> thought the straight-forward option was to break soft dirty page tracking
> and steal their bit for testing (patch below). Took most of the day to
> get access to the test machine so tests are not long running and only
> the autonuma one has completed;
> 

And the xfsrepair workload also does not show any benefit from using a
different bit either

                                       3.19.0             4.0.0-rc1             4.0.0-rc1             4.0.0-rc1
                                      vanilla               vanilla         slowscan-v2r7        protnone-v3r17
Min      real-fsmark        1164.44 (  0.00%)     1157.41 (  0.60%)     1150.38 (  1.21%)     1173.22 ( -0.75%)
Min      syst-fsmark        4016.12 (  0.00%)     3998.06 (  0.45%)     3988.42 (  0.69%)     4037.90 ( -0.54%)
Min      real-xfsrepair      442.64 (  0.00%)      497.64 (-12.43%)      456.87 ( -3.21%)      489.60 (-10.61%)
Min      syst-xfsrepair      194.97 (  0.00%)      500.61 (-156.76%)      263.41 (-35.10%)      544.56 (-179.30%)
Amean    real-fsmark        1166.28 (  0.00%)     1166.63 ( -0.03%)     1155.97 (  0.88%)     1183.19 ( -1.45%)
Amean    syst-fsmark        4025.87 (  0.00%)     4020.94 (  0.12%)     4004.19 (  0.54%)     4061.64 ( -0.89%)
Amean    real-xfsrepair      447.66 (  0.00%)      507.85 (-13.45%)      459.58 ( -2.66%)      498.71 (-11.40%)
Amean    syst-xfsrepair      202.93 (  0.00%)      519.88 (-156.19%)      281.63 (-38.78%)      569.21 (-180.50%)
Stddev   real-fsmark           1.44 (  0.00%)        6.55 (-354.10%)        3.97 (-175.65%)        9.20 (-537.90%)
Stddev   syst-fsmark           9.76 (  0.00%)       16.22 (-66.27%)       15.09 (-54.69%)       17.47 (-79.13%)
Stddev   real-xfsrepair        5.57 (  0.00%)       11.17 (-100.68%)        3.41 ( 38.66%)        6.77 (-21.63%)
Stddev   syst-xfsrepair        5.69 (  0.00%)       13.98 (-145.78%)       19.94 (-250.49%)       20.03 (-252.05%)
CoeffVar real-fsmark           0.12 (  0.00%)        0.56 (-353.96%)        0.34 (-178.11%)        0.78 (-528.79%)
CoeffVar syst-fsmark           0.24 (  0.00%)        0.40 (-66.48%)        0.38 (-55.53%)        0.43 (-77.55%)
CoeffVar real-xfsrepair        1.24 (  0.00%)        2.20 (-76.89%)        0.74 ( 40.25%)        1.36 ( -9.17%)
CoeffVar syst-xfsrepair        2.80 (  0.00%)        2.69 (  4.06%)        7.08 (-152.54%)        3.52 (-25.51%)
Max      real-fsmark        1167.96 (  0.00%)     1171.98 ( -0.34%)     1159.25 (  0.75%)     1195.41 ( -2.35%)
Max      syst-fsmark        4039.20 (  0.00%)     4033.84 (  0.13%)     4024.53 (  0.36%)     4079.45 ( -1.00%)
Max      real-xfsrepair      455.42 (  0.00%)      523.40 (-14.93%)      464.40 ( -1.97%)      505.82 (-11.07%)
Max      syst-xfsrepair      207.94 (  0.00%)      533.37 (-156.50%)      309.38 (-48.78%)      593.62 (-185.48%)
diff mbox

Patch

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 8c7c10802e9c..1f243323693c 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -20,16 +20,16 @@ 
 #define _PAGE_BIT_SOFTW2	10	/* " */
 #define _PAGE_BIT_SOFTW3	11	/* " */
 #define _PAGE_BIT_PAT_LARGE	12	/* On 2MB or 1GB pages */
-#define _PAGE_BIT_SPECIAL	_PAGE_BIT_SOFTW1
-#define _PAGE_BIT_CPA_TEST	_PAGE_BIT_SOFTW1
+#define _PAGE_BIT_SPECIAL	_PAGE_BIT_SOFTW3
+#define _PAGE_BIT_CPA_TEST	_PAGE_BIT_SOFTW3
 #define _PAGE_BIT_SPLITTING	_PAGE_BIT_SOFTW2 /* only valid on a PSE pmd */
-#define _PAGE_BIT_HIDDEN	_PAGE_BIT_SOFTW3 /* hidden by kmemcheck */
-#define _PAGE_BIT_SOFT_DIRTY	_PAGE_BIT_SOFTW3 /* software dirty tracking */
+#define _PAGE_BIT_HIDDEN	_PAGE_BIT_SOFTW1 /* hidden by kmemcheck */
+#define _PAGE_BIT_SOFT_DIRTY	_PAGE_BIT_SOFTW1 /* software dirty tracking */
 #define _PAGE_BIT_NX           63       /* No execute: only valid after cpuid check */
 
 /* If _PAGE_BIT_PRESENT is clear, we use these: */
 /* - if the user mapped it with PROT_NONE; pte_present gives true */
-#define _PAGE_BIT_PROTNONE	_PAGE_BIT_GLOBAL
+#define _PAGE_BIT_PROTNONE	_PAGE_BIT_SOFTW1
 
 #define _PAGE_PRESENT	(_AT(pteval_t, 1) << _PAGE_BIT_PRESENT)
 #define _PAGE_RW	(_AT(pteval_t, 1) << _PAGE_BIT_RW)
@@ -98,8 +98,7 @@ 
 
 /* Set of bits not changed in pte_modify */
 #define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT |		\
-			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY |	\
-			 _PAGE_SOFT_DIRTY)
+			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY)
 #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
 
 /*