From patchwork Tue Jun 21 15:55:36 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Bader X-Patchwork-Id: 101306 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from chlorine.canonical.com (chlorine.canonical.com [91.189.94.204]) by ozlabs.org (Postfix) with ESMTP id E760FB6F83 for ; Wed, 22 Jun 2011 01:55:58 +1000 (EST) Received: from localhost ([127.0.0.1] helo=chlorine.canonical.com) by chlorine.canonical.com with esmtp (Exim 4.71) (envelope-from ) id 1QZ3Iq-00034b-W7; Tue, 21 Jun 2011 15:55:49 +0000 Received: from adelie.canonical.com ([91.189.90.139]) by chlorine.canonical.com with esmtp (Exim 4.71) (envelope-from ) id 1QZ3Ik-000347-1O for kernel-team@lists.ubuntu.com; Tue, 21 Jun 2011 15:55:42 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by adelie.canonical.com with esmtp (Exim 4.71 #1 (Debian)) id 1QZ3Ig-0000XV-Rg for ; Tue, 21 Jun 2011 15:55:39 +0000 Received: from p5b2e554f.dip.t-dialin.net ([91.46.85.79] helo=[192.168.2.5]) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1QZ3Ig-000239-MI for kernel-team@lists.ubuntu.com; Tue, 21 Jun 2011 15:55:38 +0000 Message-ID: <4E00BEF8.5000001@canonical.com> Date: Tue, 21 Jun 2011 17:55:36 +0200 From: Stefan Bader User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110516 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: Ubuntu Kernel Team Subject: [Hardy-xen] SRU: Fix OMG how did this ever work (32bit) X-Enigmail-Version: 1.1.2 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.13 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kernel-team-bounces@lists.ubuntu.com Errors-To: kernel-team-bounces@lists.ubuntu.com SRU Justification: Impact: For i386 PGDs are stored in a linked list. For this two elements of struct page are (mis-)used. To have a backwards pointer, the private field is assigned a pointer to the index field of the previous struct page. The main problem there was that list_add and list_del operations accidentally were done twice. Which leads to accesses to (after first list operation) innocent struct pages. Fix: This is a bit more than needed to fix the bug itself, but it will bring our code more into a shape that resembles upstream (factually there is only a 2.6.18 upstream but that code did not do the double list access). Testcase: Running a 32bit domU (64bit Hardy dom0, though that should not matter) with the xen kernel and doing a lot of process starts (like the aslr qa regression test does) would quite soon crash because the destructor of a PTE (which incidentally is stored in index) was suddenly overwritten. Patches are attached directly. Those would go into the patches directory. Note, that I attributed it to fix bug 705562 which has not yet been verified. But the way of failure was just exactly the same. -Stefan Acked-by: Tim Gardner From 41961f2ea525a5df9c256765c60cfe16d4b5e776 Mon Sep 17 00:00:00 2001 From: Stefan Bader Date: Tue, 21 Jun 2011 13:08:25 +0200 Subject: [PATCH 2/2] xen: Avoid double list adds/frees of pgds This patch also moves around some code to make the flow more obvious and actually look like the current code in the xen repo. The major change is that pgd_ctor will not add the pgd to the pgd_list when compiled with PAE. Similar pgd_dtor will not remove a pgd from that list. This was not done already for shared KERNEL_PMDs and in the other case it is already done by pgd_alloc and pgd_free. Also the call to xen_create_continous_region is only done for PAE and not having SHARED_KERNEL_PMD set. So the call to destroy it should only be done in that case, too. BugLink: http://bugs.launchpad.net/bugs/705562 Signed-off-by: Stefan Bader --- arch/x86/mm/pgtable_32-xen.c | 162 ++++++++++++++++++++++------------------- 1 files changed, 87 insertions(+), 75 deletions(-) diff --git a/arch/x86/mm/pgtable_32-xen.c b/arch/x86/mm/pgtable_32-xen.c index 021e29e..03dd6ed 100644 --- a/arch/x86/mm/pgtable_32-xen.c +++ b/arch/x86/mm/pgtable_32-xen.c @@ -254,65 +254,62 @@ static inline void pgd_list_del(pgd_t *pgd) } - -#if (PTRS_PER_PMD == 1) -/* Non-PAE pgd constructor */ +/* + * The constructors for the PAE and non-PAE case have been combined now. + * In case of PAE with non-shared kernel PMD it was wrong to add the pgd + * to the pgd_list (already done in pgd_alloc) and setting the pointers + * to NULL is also unnecessary (but not fatal) as pgd_alloc will either + * freshly set all of them or be aware of partially set pointers in the + * OOM case. + */ static void pgd_ctor(void *pgd) { - unsigned long flags; - - memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t)); - - spin_lock_irqsave(&pgd_lock, flags); - - /* must happen under lock */ - clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD, - swapper_pg_dir + USER_PTRS_PER_PGD, - KERNEL_PGD_PTRS); + if (PTRS_PER_PMD > 1) { + /* PAE, kernel PMD may be shared */ + if (SHARED_KERNEL_PMD) + clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD, + swapper_pg_dir + USER_PTRS_PER_PGD, + KERNEL_PGD_PTRS); + } else { /* NON-PAE case */ + unsigned long flags; - paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT, - __pa(swapper_pg_dir) >> PAGE_SHIFT, - USER_PTRS_PER_PGD, - KERNEL_PGD_PTRS); - pgd_list_add(pgd); - spin_unlock_irqrestore(&pgd_lock, flags); -} -#else /* PTRS_PER_PMD > 1 */ -/* PAE pgd constructor */ -static void pgd_ctor(void *pgd) -{ - /* PAE, kernel PMD may be shared */ + spin_lock_irqsave(&pgd_lock, flags); + memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t)); - if (SHARED_KERNEL_PMD) { + /* must happen under lock */ clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD, swapper_pg_dir + USER_PTRS_PER_PGD, KERNEL_PGD_PTRS); -#ifndef CONFIG_XEN - } else { - unsigned long flags; - memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t)); - spin_lock_irqsave(&pgd_lock, flags); + paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT, + __pa(swapper_pg_dir) >> PAGE_SHIFT, + USER_PTRS_PER_PGD, + KERNEL_PGD_PTRS); pgd_list_add(pgd); spin_unlock_irqrestore(&pgd_lock, flags); -#endif } } -#endif /* PTRS_PER_PMD */ static void pgd_dtor(void *pgd) { - unsigned long flags; /* can be called from interrupt context */ - - if (SHARED_KERNEL_PMD) - return; + /* + * In the upstream code it is claimed to be only called for NON-PAE. + * The call for pgd_test_and_unpin was probably duplicate already + * as it is called in pgd_free. Unlinking from the pgd_list is + * definitely wrong here for NON-PAE. Either it was never added + * (if SHARED_KERNEL_PMD is defined) or has already been taken off + * the list. + */ + if (PTRS_PER_PMD == 1) { + unsigned long flags; /* can be called from interrupt context */ - paravirt_release_pd(__pa(pgd) >> PAGE_SHIFT); - spin_lock_irqsave(&pgd_lock, flags); - pgd_list_del(pgd); - spin_unlock_irqrestore(&pgd_lock, flags); + paravirt_release_pd(__pa(pgd) >> PAGE_SHIFT); + spin_lock_irqsave(&pgd_lock, flags); + pgd_list_del(pgd); + spin_unlock_irqrestore(&pgd_lock, flags); - pgd_test_and_unpin(pgd); + pgd_test_and_unpin(pgd); + } } #define UNSHARED_PTRS_PER_PGD \ @@ -354,7 +351,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm) { int i; pgd_t *pgd = quicklist_alloc(0, GFP_KERNEL, pgd_ctor); - pmd_t **pmds = NULL; + pmd_t **pmds; unsigned long flags; if (!pgd) @@ -366,39 +363,46 @@ pgd_t *pgd_alloc(struct mm_struct *mm) return pgd; #ifdef CONFIG_XEN - if (!SHARED_KERNEL_PMD) { - /* - * We can race save/restore (if we sleep during a GFP_KERNEL memory - * allocation). We therefore store virtual addresses of pmds as they - * do not change across save/restore, and poke the machine addresses - * into the pgdir under the pgd_lock. - */ - pmds = kmalloc(PTRS_PER_PGD * sizeof(pmd_t *), GFP_KERNEL); - if (!pmds) { - quicklist_free(0, pgd_dtor, pgd); - return NULL; + /* + * Take care of setting the non-shared pointers here. This duplicates + * a bit of code but allows to early exit for the shared-case. + */ + if (SHARED_KERNEL_PMD) { + for (i = 0; i < USER_PTRS_PER_PGD; ++i) { + pmd_t *pmd = pmd_cache_alloc(i); + if (!pmd) + goto out_oom; + + paravirt_alloc_pd(__pa(pmd) >> PAGE_SHIFT); + set_pgd(&pgd[i], __pgd(1 + __pa(pmd))); } + return pgd; + } + + /* + * We can race save/restore (if we sleep during a GFP_KERNEL memory + * allocation). We therefore store virtual addresses of pmds as they + * do not change across save/restore, and poke the machine addresses + * into the pgdir under the pgd_lock. + */ + pmds = kmalloc(PTRS_PER_PGD * sizeof(pmd_t *), GFP_KERNEL); + if (!pmds) { + quicklist_free(0, pgd_dtor, pgd); + return NULL; } #endif /* Allocate pmds, remember virtual addresses. */ - for (i = 0; i < UNSHARED_PTRS_PER_PGD; ++i) { - pmd_t *pmd = pmd_cache_alloc(i); + for (i = 0; i < PTRS_PER_PGD; ++i) { + pmds[i] = pmd_cache_alloc(i); - if (!pmd) + if (!pmds[i]) goto out_oom; - paravirt_alloc_pd(__pa(pmd) >> PAGE_SHIFT); - if (pmds) - pmds[i] = pmd; - else - set_pgd(&pgd[i], __pgd(1 + __pa(pmd))); + paravirt_alloc_pd(__pa(pmds[i]) >> PAGE_SHIFT); } #ifdef CONFIG_XEN - if (SHARED_KERNEL_PMD) - return pgd; - spin_lock_irqsave(&pgd_lock, flags); /* Protect against save/restore: move below 4GB under pgd_lock. */ @@ -437,7 +441,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm) return pgd; out_oom: - if (!pmds) { + if (SHARED_KERNEL_PMD) { for (i--; i >= 0; i--) { pgd_t pgdent = pgd[i]; void* pmd = (void *)__va(pgd_val(pgdent)-1); @@ -471,22 +475,30 @@ void pgd_free(pgd_t *pgd) /* in the PAE case user pgd entries are overwritten before usage */ if (PTRS_PER_PMD > 1) { + for (i = 0; i < USER_PTRS_PER_PGD; ++i) { + pgd_t pgdent = pgd[i]; + void* pmd = (void *)__va(pgd_val(pgdent)-1); + paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT); + pmd_cache_free(pmd, i); + } + if (!SHARED_KERNEL_PMD) { unsigned long flags; spin_lock_irqsave(&pgd_lock, flags); pgd_list_del(pgd); spin_unlock_irqrestore(&pgd_lock, flags); - } - for (i = 0; i < UNSHARED_PTRS_PER_PGD; ++i) { - pgd_t pgdent = pgd[i]; - void* pmd = (void *)__va(pgd_val(pgdent)-1); - paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT); - pmd_cache_free(pmd, i); - } + for (i = USER_PTRS_PER_PGD; i < PTRS_PER_PGD; ++i) { + pgd_t pgdent = pgd[i]; + void* pmd = (void *)__va(pgd_val(pgdent)-1); + paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT); + pmd_cache_free(pmd, i); + } - if (!xen_feature(XENFEAT_pae_pgdir_above_4gb)) - xen_destroy_contiguous_region((unsigned long)pgd, 0); + if (!xen_feature(XENFEAT_pae_pgdir_above_4gb)) + xen_destroy_contiguous_region( + (unsigned long)pgd, 0); + } } /* in the non-PAE case, free_pgtables() clears user pgd entries */ -- 1.7.4.1