diff mbox

[01/14] mm: Serialize access to min_free_kbytes

Message ID 1308575540-25219-2-git-send-email-mgorman@suse.de
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Mel Gorman June 20, 2011, 1:12 p.m. UTC
There is a race between the min_free_kbytes sysctl, memory hotplug
and transparent hugepage support enablement.  Memory hotplug uses a
zonelists_mutex to avoid a race when building zonelists. Reuse it to
serialise watermark updates.

[a.p.zijlstra@chello.nl: Older patch fixed the race with spinlock]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c |   23 +++++++++++++++--------
 1 files changed, 15 insertions(+), 8 deletions(-)

Comments

Andrew Morton July 6, 2011, 11:44 p.m. UTC | #1
On Mon, 20 Jun 2011 14:12:07 +0100
Mel Gorman <mgorman@suse.de> wrote:

> There is a race between the min_free_kbytes sysctl, memory hotplug
> and transparent hugepage support enablement.  Memory hotplug uses a
> zonelists_mutex to avoid a race when building zonelists. Reuse it to
> serialise watermark updates.

This patch appears to be a standalone fix, unrelated to the overall
patch series?

How does one trigger the race and what happens when it hits, btw?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mel Gorman July 7, 2011, 9:17 a.m. UTC | #2
On Wed, Jul 06, 2011 at 04:44:47PM -0700, Andrew Morton wrote:
> On Mon, 20 Jun 2011 14:12:07 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > There is a race between the min_free_kbytes sysctl, memory hotplug
> > and transparent hugepage support enablement.  Memory hotplug uses a
> > zonelists_mutex to avoid a race when building zonelists. Reuse it to
> > serialise watermark updates.
> 
> This patch appears to be a standalone fix, unrelated to the overall
> patch series?
> 

Yes. In the original series this would have been a more serious problem
as min_free_kbytes was potentially adjusted more frequently.

> How does one trigger the race and what happens when it hits, btw?

One could trigger the trace by having multiple processes on different
CPUs write to min_free_kbytes. One could add memory hotplug events
to that for extra fun but it is unnecessary to trigger the race.

The consequences are that the value for min_free_kbytes and the zone
watermarks get out of sync. Whether the zone watermarks will be too
high or too low would depend on the timing. For the most part, the
consequence will simply be that the min free level for some zones will
be wrong. A more serious consequence is that totalreserve_pages could
get out of sync and strict no memory overcommit could fail a mmap when
it should have succeeded for the value of min_free_kbytes or suspend
fail because it did not preallocate enough pages.

It's not exactly earth shattering.
diff mbox

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4e8985a..a327a72 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5045,14 +5045,7 @@  static void setup_per_zone_lowmem_reserve(void)
 	calculate_totalreserve_pages();
 }
 
-/**
- * setup_per_zone_wmarks - called when min_free_kbytes changes
- * or when memory is hot-{added|removed}
- *
- * Ensures that the watermark[min,low,high] values for each zone are set
- * correctly with respect to min_free_kbytes.
- */
-void setup_per_zone_wmarks(void)
+static void __setup_per_zone_wmarks(void)
 {
 	unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
 	unsigned long lowmem_pages = 0;
@@ -5107,6 +5100,20 @@  void setup_per_zone_wmarks(void)
 	calculate_totalreserve_pages();
 }
 
+/**
+ * setup_per_zone_wmarks - called when min_free_kbytes changes
+ * or when memory is hot-{added|removed}
+ *
+ * Ensures that the watermark[min,low,high] values for each zone are set
+ * correctly with respect to min_free_kbytes.
+ */
+void setup_per_zone_wmarks(void)
+{
+	mutex_lock(&zonelists_mutex);
+	__setup_per_zone_wmarks();
+	mutex_unlock(&zonelists_mutex);
+}
+
 /*
  * The inactive anon list should be small enough that the VM never has to
  * do too much work, but large enough that each inactive page has a chance