diff mbox

mm/shmem.c: fix division by zero

Message ID 200812190844.57996.yur@emcraft.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Yuri Tikhonov Dec. 19, 2008, 5:44 a.m. UTC
The following patch fixes division by zero, which we have in
shmem_truncate_range() and shmem_unuse_inode(), if use big
PAGE_SIZE values (e.g. 256KB on ppc44x).

With 256KB PAGE_SIZE the ENTRIES_PER_PAGEPAGE constant becomes
too large (0x1.0000.0000), so this patch just changes the types
from 'ulong' to 'ullong' where it's necessary.

Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
---
 mm/shmem.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

Comments

Andrew Morton Dec. 19, 2008, 9:24 p.m. UTC | #1
On Fri, 19 Dec 2008 08:44:57 +0300
Yuri Tikhonov <yur@emcraft.com> wrote:

> 
> The following patch fixes division by zero, which we have in
> shmem_truncate_range() and shmem_unuse_inode(), if use big
> PAGE_SIZE values (e.g. 256KB on ppc44x).
> 
> With 256KB PAGE_SIZE the ENTRIES_PER_PAGEPAGE constant becomes
> too large (0x1.0000.0000), so this patch just changes the types
> from 'ulong' to 'ullong' where it's necessary.
> 
> Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
> ---
>  mm/shmem.c |    8 ++++----
>  1 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 0ed0752..99d7c91 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -57,7 +57,7 @@
>  #include <asm/pgtable.h>
>  
>  #define ENTRIES_PER_PAGE (PAGE_CACHE_SIZE/sizeof(unsigned long))
> -#define ENTRIES_PER_PAGEPAGE (ENTRIES_PER_PAGE*ENTRIES_PER_PAGE)
> +#define ENTRIES_PER_PAGEPAGE ((unsigned long long)ENTRIES_PER_PAGE*ENTRIES_PER_PAGE)
>  #define BLOCKS_PER_PAGE  (PAGE_CACHE_SIZE/512)
>  
>  #define SHMEM_MAX_INDEX  (SHMEM_NR_DIRECT + (ENTRIES_PER_PAGEPAGE/2) * (ENTRIES_PER_PAGE+1))
> @@ -95,7 +95,7 @@ static unsigned long shmem_default_max_inodes(void)
>  }
>  #endif
>  
> -static int shmem_getpage(struct inode *inode, unsigned long idx,
> +static int shmem_getpage(struct inode *inode, unsigned long long idx,
>  			 struct page **pagep, enum sgp_type sgp, int *type);
>  
>  static inline struct page *shmem_dir_alloc(gfp_t gfp_mask)
> @@ -533,7 +533,7 @@ static void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end)
>  	int punch_hole;
>  	spinlock_t *needs_lock;
>  	spinlock_t *punch_lock;
> -	unsigned long upper_limit;
> +	unsigned long long upper_limit;
>  
>  	inode->i_ctime = inode->i_mtime = CURRENT_TIME;
>  	idx = (start + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> @@ -1175,7 +1175,7 @@ static inline struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo)
>   * vm. If we swap it in we mark it dirty since we also free the swap
>   * entry since a page cannot live in both the swap and page cache
>   */
> -static int shmem_getpage(struct inode *inode, unsigned long idx,
> +static int shmem_getpage(struct inode *inode, unsigned long long idx,
>  			struct page **pagep, enum sgp_type sgp, int *type)
>  {
>  	struct address_space *mapping = inode->i_mapping;

umm, ok, maybe.  I guess that it would be more appropriate to use
loff_t in here, and things like

	if (!diroff && !offset && upper_limit >= stage) {

need a bit of thought (`stage' is still unsigned long).

I'll queue the patch as-is pending review from Hugh.  Thanks.
Hugh Dickins Dec. 23, 2008, 12:35 a.m. UTC | #2
On Fri, 19 Dec 2008, Yuri Tikhonov wrote:
> 
> The following patch fixes division by zero, which we have in
> shmem_truncate_range() and shmem_unuse_inode(), if use big
> PAGE_SIZE values (e.g. 256KB on ppc44x).
> 
> With 256KB PAGE_SIZE the ENTRIES_PER_PAGEPAGE constant becomes
> too large (0x1.0000.0000), so this patch just changes the types
> from 'ulong' to 'ullong' where it's necessary.
> 
> Signed-off-by: Yuri Tikhonov <yur@emcraft.com>

Sorry for the slow reply, but I'm afraid I don't like spattering
around an increasing number of unsigned long longs to fix that division
by zero on an unusual configuration: I doubt that's the right solution.

It's ridiculous for shmem.c to be trying to support a wider address
range than the page cache itself can support, and it's wasteful for
it to be using 256KB pages for its index blocks (not to mention its
data blocks! but we'll agree to differ on that).

Maybe it should be doing a kmalloc(4096) instead of using alloc_pages();
though admittedly that's not a straightforward change, since we do make
use of highmem and page->private.  Maybe I should use this as stimulus
to switch shmem over to storing its swap entries in the pagecache radix
tree.  Maybe we should simply disable its use of swap in such an
extreme configuration.

But I need to understand more about your ppc44x target to make
the right decision.  What's very strange to me is this: since
unsigned long long is the same size as unsigned long on 64-bit,
this change appears to be for a 32-bit machine with 256KB pages.
I wonder what market segment that is targeted at?

Hugh
Yuri Tikhonov Dec. 23, 2008, 2:52 p.m. UTC | #3
Hello Hugh,

On Tuesday, December 23, 2008 you wrote:

> On Fri, 19 Dec 2008, Yuri Tikhonov wrote:
>> 
>> The following patch fixes division by zero, which we have in
>> shmem_truncate_range() and shmem_unuse_inode(), if use big
>> PAGE_SIZE values (e.g. 256KB on ppc44x).
>> 
>> With 256KB PAGE_SIZE the ENTRIES_PER_PAGEPAGE constant becomes
>> too large (0x1.0000.0000), so this patch just changes the types
>> from 'ulong' to 'ullong' where it's necessary.
>> 
>> Signed-off-by: Yuri Tikhonov <yur@emcraft.com>

> Sorry for the slow reply, but I'm afraid I don't like spattering
> around an increasing number of unsigned long longs to fix that division
> by zero on an unusual configuration: I doubt that's the right solution.

> It's ridiculous for shmem.c to be trying to support a wider address
> range than the page cache itself can support, and it's wasteful for
> it to be using 256KB pages for its index blocks (not to mention its
> data blocks! but we'll agree to differ on that).

> Maybe it should be doing a kmalloc(4096) instead of using alloc_pages();
> though admittedly that's not a straightforward change, since we do make
> use of highmem and page->private.  Maybe I should use this as stimulus
> to switch shmem over to storing its swap entries in the pagecache radix
> tree.  Maybe we should simply disable its use of swap in such an
> extreme configuration.

> But I need to understand more about your ppc44x target to make
> the right decision.  What's very strange to me is this: since
> unsigned long long is the same size as unsigned long on 64-bit,
> this change appears to be for a 32-bit machine with 256KB pages.
> I wonder what market segment that is targeted at?

 Right, sizeof(unsigned long long)==8 on our ppc44x target.

 The main processor here is a PPC440SPe from AMCC, which is a 32-bit 
RISC machine with 36-bit physical addressing. 

 The market segment for this are RAID applications. The Linux s/w RAID 
driver had been significantly reworked over the last years, and now it
allows efficiently offload the RAID-related operations (as well as 
the data copy) from CPU to the dedicated engines via  ASYN_TX/ADMA 
API. The 440SPe controller has a reach RAID-related peripheral 
integrated on chip: XOR engine and two DMA engines with different 
capabilities including XOR calculations/checks for RAID5/6, PQ 
parities calculations/checks for RAID6, memory copy, and so on. All 
these make 440SPe to be a good choice for developing RAID storage 
applications.

 By increasing the PAGE_SIZE we improve the performance of RAID 
operations, because the RAID stripes (on which basic the Linux RAID 
driver operates) have a PAGE_SIZE width: so, the bigger the strip is, 
then the less CPU cycles are necessary to process the same chunk of 
data. The value of improvement differs from case to case, and has the 
maximum number in such cases like sequential writes.

 For example, on the ppc440spe-base Katmai board we observe the 
following performance distribution of sequential writes to RAID-5 
built on 16 drives (actually, we can achieve higher performance if 
skipping RAID caching of the data; the following figures are measured 
involving the RAID caching):

  4K PAGE_SIZE:   s/w:  84 MBps;  h/w accelerated: 172 MBps
 16K PAGE_SIZE:   s/w: 123 MBps;  h/w accelerated: 361 MBps
 64K PAGE_SIZE:   s/w: 125 MBps;  h/w accelerated: 409 MBps
256K PAGE_SIZE:   s/w: 132 MBps;  h/w accelerated: 473 MBps

 Regards, Yuri

 --
 Yuri Tikhonov, Senior Software Engineer
 Emcraft Systems, www.emcraft.com
diff mbox

Patch

diff --git a/mm/shmem.c b/mm/shmem.c
index 0ed0752..99d7c91 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -57,7 +57,7 @@ 
 #include <asm/pgtable.h>
 
 #define ENTRIES_PER_PAGE (PAGE_CACHE_SIZE/sizeof(unsigned long))
-#define ENTRIES_PER_PAGEPAGE (ENTRIES_PER_PAGE*ENTRIES_PER_PAGE)
+#define ENTRIES_PER_PAGEPAGE ((unsigned long long)ENTRIES_PER_PAGE*ENTRIES_PER_PAGE)
 #define BLOCKS_PER_PAGE  (PAGE_CACHE_SIZE/512)
 
 #define SHMEM_MAX_INDEX  (SHMEM_NR_DIRECT + (ENTRIES_PER_PAGEPAGE/2) * (ENTRIES_PER_PAGE+1))
@@ -95,7 +95,7 @@  static unsigned long shmem_default_max_inodes(void)
 }
 #endif
 
-static int shmem_getpage(struct inode *inode, unsigned long idx,
+static int shmem_getpage(struct inode *inode, unsigned long long idx,
 			 struct page **pagep, enum sgp_type sgp, int *type);
 
 static inline struct page *shmem_dir_alloc(gfp_t gfp_mask)
@@ -533,7 +533,7 @@  static void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end)
 	int punch_hole;
 	spinlock_t *needs_lock;
 	spinlock_t *punch_lock;
-	unsigned long upper_limit;
+	unsigned long long upper_limit;
 
 	inode->i_ctime = inode->i_mtime = CURRENT_TIME;
 	idx = (start + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
@@ -1175,7 +1175,7 @@  static inline struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo)
  * vm. If we swap it in we mark it dirty since we also free the swap
  * entry since a page cannot live in both the swap and page cache
  */
-static int shmem_getpage(struct inode *inode, unsigned long idx,
+static int shmem_getpage(struct inode *inode, unsigned long long idx,
 			struct page **pagep, enum sgp_type sgp, int *type)
 {
 	struct address_space *mapping = inode->i_mapping;