[v3] lib/bch: Remove VLA usage

Message ID 20180531184525.GA11068@beast
State Accepted
Delegated to: Boris Brezillon
Headers show
Series
  • [v3] lib/bch: Remove VLA usage
Related show

Commit Message

Kees Cook May 31, 2018, 6:45 p.m.
In the quest to remove all stack VLA usage from the kernel[1], this
allocates a fixed size stack array to cover the range needed for
bch. This was done instead of a preallocation on the SLAB due to
performance reasons, shown by Ivan Djelic:

 little-endian, type sizes: int=4 long=8 longlong=8
 cpu: Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
 calibration: iter=4.9143µs niter=2034 nsamples=200 m=13 t=4

   Buffer allocation |  Encoding throughput (Mbit/s)
 ---------------------------------------------------
  on-stack, VLA      |   3988
  on-stack, fixed    |   4494
  kmalloc            |   1967

So this change actually improves performance too, it seems.

The resulting stack allocation can get rather large; without
CONFIG_BCH_CONST_PARAMS, it will allocate 4096 bytes, which
trips the stack size checking:

lib/bch.c: In function ‘encode_bch’:
lib/bch.c:261:1: warning: the frame size of 4432 bytes is larger than 2048 bytes [-Wframe-larger-than=]

Even the default case for "allmodconfig" (with CONFIG_BCH_CONST_M=14 and
CONFIG_BCH_CONST_T=4) would have started throwing a warning:

lib/bch.c: In function ‘encode_bch’:
lib/bch.c:261:1: warning: the frame size of 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]

But this is how large it's always been; it was just hidden from
the checker because it was a VLA. So the Makefile has been adjusted to
silence this warning for anything smaller than 4500 bytes, which should
provide room for normal cases, but still low enough to catch any future
pathological situations.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Signed-off-by: Kees Cook <keescook@chromium.org>
---
v3: fix r_bytes to whole-word size
v2: switch to fixed-size stack array
---
 lib/Makefile |  1 +
 lib/bch.c    | 23 +++++++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

Comments

Ivan Djelic June 1, 2018, 11:09 a.m. | #1
On Thu, May 31, 2018 at 11:45:25AM -0700, Kees Cook wrote:
> In the quest to remove all stack VLA usage from the kernel[1], this
> allocates a fixed size stack array to cover the range needed for
> bch. This was done instead of a preallocation on the SLAB due to
> performance reasons, shown by Ivan Djelic:
> 
>  little-endian, type sizes: int=4 long=8 longlong=8
>  cpu: Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
>  calibration: iter=4.9143µs niter=2034 nsamples=200 m=13 t=4
> 
>    Buffer allocation |  Encoding throughput (Mbit/s)
>  ---------------------------------------------------
>   on-stack, VLA      |   3988
>   on-stack, fixed    |   4494
>   kmalloc            |   1967
> 
> So this change actually improves performance too, it seems.
> 
> The resulting stack allocation can get rather large; without
> CONFIG_BCH_CONST_PARAMS, it will allocate 4096 bytes, which
> trips the stack size checking:
> 
> lib/bch.c: In function ‘encode_bch’:
> lib/bch.c:261:1: warning: the frame size of 4432 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> 
> Even the default case for "allmodconfig" (with CONFIG_BCH_CONST_M=14 and
> CONFIG_BCH_CONST_T=4) would have started throwing a warning:
> 
> lib/bch.c: In function ‘encode_bch’:
> lib/bch.c:261:1: warning: the frame size of 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> 
> But this is how large it's always been; it was just hidden from
> the checker because it was a VLA. So the Makefile has been adjusted to
> silence this warning for anything smaller than 4500 bytes, which should
> provide room for normal cases, but still low enough to catch any future
> pathological situations.
> 
> [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> v3: fix r_bytes to whole-word size
> v2: switch to fixed-size stack array
> ---
>  lib/Makefile |  1 +
>  lib/bch.c    | 23 +++++++++++++++--------
>  2 files changed, 16 insertions(+), 8 deletions(-)
 

The patch looks good to me. It also passed my regression tests.

Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
Tested-by: Ivan Djelic <ivan.djelic@parrot.com>

Thanks,
--
Ivan
Kees Cook June 19, 2018, 11:48 p.m. | #2
On Fri, Jun 1, 2018 at 4:09 AM, Ivan Djelic <ivan.djelic@parrot.com> wrote:
> On Thu, May 31, 2018 at 11:45:25AM -0700, Kees Cook wrote:
>> In the quest to remove all stack VLA usage from the kernel[1], this
>> allocates a fixed size stack array to cover the range needed for
>> bch. This was done instead of a preallocation on the SLAB due to
>> performance reasons, shown by Ivan Djelic:
>>
>>  little-endian, type sizes: int=4 long=8 longlong=8
>>  cpu: Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
>>  calibration: iter=4.9143µs niter=2034 nsamples=200 m=13 t=4
>>
>>    Buffer allocation |  Encoding throughput (Mbit/s)
>>  ---------------------------------------------------
>>   on-stack, VLA      |   3988
>>   on-stack, fixed    |   4494
>>   kmalloc            |   1967
>>
>> So this change actually improves performance too, it seems.
>>
>> The resulting stack allocation can get rather large; without
>> CONFIG_BCH_CONST_PARAMS, it will allocate 4096 bytes, which
>> trips the stack size checking:
>>
>> lib/bch.c: In function ‘encode_bch’:
>> lib/bch.c:261:1: warning: the frame size of 4432 bytes is larger than 2048 bytes [-Wframe-larger-than=]
>>
>> Even the default case for "allmodconfig" (with CONFIG_BCH_CONST_M=14 and
>> CONFIG_BCH_CONST_T=4) would have started throwing a warning:
>>
>> lib/bch.c: In function ‘encode_bch’:
>> lib/bch.c:261:1: warning: the frame size of 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
>>
>> But this is how large it's always been; it was just hidden from
>> the checker because it was a VLA. So the Makefile has been adjusted to
>> silence this warning for anything smaller than 4500 bytes, which should
>> provide room for normal cases, but still low enough to catch any future
>> pathological situations.
>>
>> [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
>>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>> v3: fix r_bytes to whole-word size
>> v2: switch to fixed-size stack array
>> ---
>>  lib/Makefile |  1 +
>>  lib/bch.c    | 23 +++++++++++++++--------
>>  2 files changed, 16 insertions(+), 8 deletions(-)
>
>
> The patch looks good to me. It also passed my regression tests.
>
> Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
> Tested-by: Ivan Djelic <ivan.djelic@parrot.com>

Thanks for the review and testing!

Who's the best person to carry this patch?

-Kees
Boris Brezillon June 20, 2018, 7:38 a.m. | #3
Hi Kees,

On Tue, 19 Jun 2018 16:48:17 -0700
Kees Cook <keescook@chromium.org> wrote:

> On Fri, Jun 1, 2018 at 4:09 AM, Ivan Djelic <ivan.djelic@parrot.com> wrote:
> > On Thu, May 31, 2018 at 11:45:25AM -0700, Kees Cook wrote:  
> >> In the quest to remove all stack VLA usage from the kernel[1], this
> >> allocates a fixed size stack array to cover the range needed for
> >> bch. This was done instead of a preallocation on the SLAB due to
> >> performance reasons, shown by Ivan Djelic:
> >>
> >>  little-endian, type sizes: int=4 long=8 longlong=8
> >>  cpu: Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
> >>  calibration: iter=4.9143µs niter=2034 nsamples=200 m=13 t=4
> >>
> >>    Buffer allocation |  Encoding throughput (Mbit/s)
> >>  ---------------------------------------------------
> >>   on-stack, VLA      |   3988
> >>   on-stack, fixed    |   4494
> >>   kmalloc            |   1967
> >>
> >> So this change actually improves performance too, it seems.
> >>
> >> The resulting stack allocation can get rather large; without
> >> CONFIG_BCH_CONST_PARAMS, it will allocate 4096 bytes, which
> >> trips the stack size checking:
> >>
> >> lib/bch.c: In function ‘encode_bch’:
> >> lib/bch.c:261:1: warning: the frame size of 4432 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> >>
> >> Even the default case for "allmodconfig" (with CONFIG_BCH_CONST_M=14 and
> >> CONFIG_BCH_CONST_T=4) would have started throwing a warning:
> >>
> >> lib/bch.c: In function ‘encode_bch’:
> >> lib/bch.c:261:1: warning: the frame size of 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> >>
> >> But this is how large it's always been; it was just hidden from
> >> the checker because it was a VLA. So the Makefile has been adjusted to
> >> silence this warning for anything smaller than 4500 bytes, which should
> >> provide room for normal cases, but still low enough to catch any future
> >> pathological situations.
> >>
> >> [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
> >>
> >> Signed-off-by: Kees Cook <keescook@chromium.org>
> >> ---
> >> v3: fix r_bytes to whole-word size
> >> v2: switch to fixed-size stack array
> >> ---
> >>  lib/Makefile |  1 +
> >>  lib/bch.c    | 23 +++++++++++++++--------
> >>  2 files changed, 16 insertions(+), 8 deletions(-)  
> >
> >
> > The patch looks good to me. It also passed my regression tests.
> >
> > Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
> > Tested-by: Ivan Djelic <ivan.djelic@parrot.com>  
> 
> Thanks for the review and testing!
> 
> Who's the best person to carry this patch?

Looks like all users of this lib are in drivers/mtd, so I can take the
patches if you want, but I can also let you take them if you prefer.

Here is my

Acked-by: Boris Brezillon <boris.brezillon@bootlin.com>

in case you were waiting for it.

Regards,

Boris
Kees Cook June 20, 2018, 6:16 p.m. | #4
On Wed, Jun 20, 2018 at 12:38 AM, Boris Brezillon
<boris.brezillon@bootlin.com> wrote:
> Hi Kees,
>
> On Tue, 19 Jun 2018 16:48:17 -0700
> Kees Cook <keescook@chromium.org> wrote:
>
>> On Fri, Jun 1, 2018 at 4:09 AM, Ivan Djelic <ivan.djelic@parrot.com> wrote:
>> > On Thu, May 31, 2018 at 11:45:25AM -0700, Kees Cook wrote:
>> >> In the quest to remove all stack VLA usage from the kernel[1], this
>> >> allocates a fixed size stack array to cover the range needed for
>> >> bch. This was done instead of a preallocation on the SLAB due to
>> >> performance reasons, shown by Ivan Djelic:
>> >>
>> >>  little-endian, type sizes: int=4 long=8 longlong=8
>> >>  cpu: Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
>> >>  calibration: iter=4.9143µs niter=2034 nsamples=200 m=13 t=4
>> >>
>> >>    Buffer allocation |  Encoding throughput (Mbit/s)
>> >>  ---------------------------------------------------
>> >>   on-stack, VLA      |   3988
>> >>   on-stack, fixed    |   4494
>> >>   kmalloc            |   1967
>> >>
>> >> So this change actually improves performance too, it seems.
>> >>
>> >> The resulting stack allocation can get rather large; without
>> >> CONFIG_BCH_CONST_PARAMS, it will allocate 4096 bytes, which
>> >> trips the stack size checking:
>> >>
>> >> lib/bch.c: In function ‘encode_bch’:
>> >> lib/bch.c:261:1: warning: the frame size of 4432 bytes is larger than 2048 bytes [-Wframe-larger-than=]
>> >>
>> >> Even the default case for "allmodconfig" (with CONFIG_BCH_CONST_M=14 and
>> >> CONFIG_BCH_CONST_T=4) would have started throwing a warning:
>> >>
>> >> lib/bch.c: In function ‘encode_bch’:
>> >> lib/bch.c:261:1: warning: the frame size of 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
>> >>
>> >> But this is how large it's always been; it was just hidden from
>> >> the checker because it was a VLA. So the Makefile has been adjusted to
>> >> silence this warning for anything smaller than 4500 bytes, which should
>> >> provide room for normal cases, but still low enough to catch any future
>> >> pathological situations.
>> >>
>> >> [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
>> >>
>> >> Signed-off-by: Kees Cook <keescook@chromium.org>
>> >> ---
>> >> v3: fix r_bytes to whole-word size
>> >> v2: switch to fixed-size stack array
>> >> ---
>> >>  lib/Makefile |  1 +
>> >>  lib/bch.c    | 23 +++++++++++++++--------
>> >>  2 files changed, 16 insertions(+), 8 deletions(-)
>> >
>> >
>> > The patch looks good to me. It also passed my regression tests.
>> >
>> > Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
>> > Tested-by: Ivan Djelic <ivan.djelic@parrot.com>
>>
>> Thanks for the review and testing!
>>
>> Who's the best person to carry this patch?
>
> Looks like all users of this lib are in drivers/mtd, so I can take the
> patches if you want, but I can also let you take them if you prefer.

I'd love it if you could take it; thank you!

-Kees
Boris Brezillon June 24, 2018, 8:12 p.m. | #5
On Wed, 20 Jun 2018 11:16:11 -0700
Kees Cook <keescook@chromium.org> wrote:

> On Wed, Jun 20, 2018 at 12:38 AM, Boris Brezillon
> <boris.brezillon@bootlin.com> wrote:
> > Hi Kees,
> >
> > On Tue, 19 Jun 2018 16:48:17 -0700
> > Kees Cook <keescook@chromium.org> wrote:
> >  
> >> On Fri, Jun 1, 2018 at 4:09 AM, Ivan Djelic <ivan.djelic@parrot.com> wrote:  
> >> > On Thu, May 31, 2018 at 11:45:25AM -0700, Kees Cook wrote:  
> >> >> In the quest to remove all stack VLA usage from the kernel[1], this
> >> >> allocates a fixed size stack array to cover the range needed for
> >> >> bch. This was done instead of a preallocation on the SLAB due to
> >> >> performance reasons, shown by Ivan Djelic:
> >> >>
> >> >>  little-endian, type sizes: int=4 long=8 longlong=8
> >> >>  cpu: Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
> >> >>  calibration: iter=4.9143µs niter=2034 nsamples=200 m=13 t=4
> >> >>
> >> >>    Buffer allocation |  Encoding throughput (Mbit/s)
> >> >>  ---------------------------------------------------
> >> >>   on-stack, VLA      |   3988
> >> >>   on-stack, fixed    |   4494
> >> >>   kmalloc            |   1967
> >> >>
> >> >> So this change actually improves performance too, it seems.
> >> >>
> >> >> The resulting stack allocation can get rather large; without
> >> >> CONFIG_BCH_CONST_PARAMS, it will allocate 4096 bytes, which
> >> >> trips the stack size checking:
> >> >>
> >> >> lib/bch.c: In function ‘encode_bch’:
> >> >> lib/bch.c:261:1: warning: the frame size of 4432 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> >> >>
> >> >> Even the default case for "allmodconfig" (with CONFIG_BCH_CONST_M=14 and
> >> >> CONFIG_BCH_CONST_T=4) would have started throwing a warning:
> >> >>
> >> >> lib/bch.c: In function ‘encode_bch’:
> >> >> lib/bch.c:261:1: warning: the frame size of 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> >> >>
> >> >> But this is how large it's always been; it was just hidden from
> >> >> the checker because it was a VLA. So the Makefile has been adjusted to
> >> >> silence this warning for anything smaller than 4500 bytes, which should
> >> >> provide room for normal cases, but still low enough to catch any future
> >> >> pathological situations.
> >> >>
> >> >> [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
> >> >>
> >> >> Signed-off-by: Kees Cook <keescook@chromium.org>
> >> >> ---
> >> >> v3: fix r_bytes to whole-word size
> >> >> v2: switch to fixed-size stack array
> >> >> ---
> >> >>  lib/Makefile |  1 +
> >> >>  lib/bch.c    | 23 +++++++++++++++--------
> >> >>  2 files changed, 16 insertions(+), 8 deletions(-)  
> >> >
> >> >
> >> > The patch looks good to me. It also passed my regression tests.
> >> >
> >> > Reviewed-by: Ivan Djelic <ivan.djelic@parrot.com>
> >> > Tested-by: Ivan Djelic <ivan.djelic@parrot.com>  
> >>
> >> Thanks for the review and testing!
> >>
> >> Who's the best person to carry this patch?  
> >
> > Looks like all users of this lib are in drivers/mtd, so I can take the
> > patches if you want, but I can also let you take them if you prefer.  
> 
> I'd love it if you could take it; thank you!

Applied to mtd/next.

Thanks,

Boris

Patch

diff --git a/lib/Makefile b/lib/Makefile
index ce20696d5a92..24331b122d7f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -121,6 +121,7 @@  obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
 obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
 obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
 obj-$(CONFIG_BCH) += bch.o
+CFLAGS_bch.o := $(call cc-option,-Wframe-larger-than=4500)
 obj-$(CONFIG_LZO_COMPRESS) += lzo/
 obj-$(CONFIG_LZO_DECOMPRESS) += lzo/
 obj-$(CONFIG_LZ4_COMPRESS) += lz4/
diff --git a/lib/bch.c b/lib/bch.c
index bc89dfe4d1b3..7b0f2006698b 100644
--- a/lib/bch.c
+++ b/lib/bch.c
@@ -78,15 +78,22 @@ 
 #define GF_M(_p)               (CONFIG_BCH_CONST_M)
 #define GF_T(_p)               (CONFIG_BCH_CONST_T)
 #define GF_N(_p)               ((1 << (CONFIG_BCH_CONST_M))-1)
+#define BCH_MAX_M              (CONFIG_BCH_CONST_M)
 #else
 #define GF_M(_p)               ((_p)->m)
 #define GF_T(_p)               ((_p)->t)
 #define GF_N(_p)               ((_p)->n)
+#define BCH_MAX_M              15
 #endif
 
+#define BCH_MAX_T              (((1 << BCH_MAX_M) - 1) / BCH_MAX_M)
+
 #define BCH_ECC_WORDS(_p)      DIV_ROUND_UP(GF_M(_p)*GF_T(_p), 32)
 #define BCH_ECC_BYTES(_p)      DIV_ROUND_UP(GF_M(_p)*GF_T(_p), 8)
 
+#define BCH_ECC_MAX_WORDS      DIV_ROUND_UP(BCH_MAX_M * BCH_MAX_T, 32)
+#define BCH_ECC_MAX_BYTES      DIV_ROUND_UP(BCH_MAX_M * BCH_MAX_T, 8)
+
 #ifndef dbg
 #define dbg(_fmt, args...)     do {} while (0)
 #endif
@@ -187,7 +194,8 @@  void encode_bch(struct bch_control *bch, const uint8_t *data,
 	const unsigned int l = BCH_ECC_WORDS(bch)-1;
 	unsigned int i, mlen;
 	unsigned long m;
-	uint32_t w, r[l+1];
+	uint32_t w, r[BCH_ECC_MAX_WORDS];
+	const size_t r_bytes = BCH_ECC_WORDS(bch) * sizeof(*r);
 	const uint32_t * const tab0 = bch->mod8_tab;
 	const uint32_t * const tab1 = tab0 + 256*(l+1);
 	const uint32_t * const tab2 = tab1 + 256*(l+1);
@@ -198,7 +206,7 @@  void encode_bch(struct bch_control *bch, const uint8_t *data,
 		/* load ecc parity bytes into internal 32-bit buffer */
 		load_ecc8(bch, bch->ecc_buf, ecc);
 	} else {
-		memset(bch->ecc_buf, 0, sizeof(r));
+		memset(bch->ecc_buf, 0, r_bytes);
 	}
 
 	/* process first unaligned data bytes */
@@ -215,7 +223,7 @@  void encode_bch(struct bch_control *bch, const uint8_t *data,
 	mlen  = len/4;
 	data += 4*mlen;
 	len  -= 4*mlen;
-	memcpy(r, bch->ecc_buf, sizeof(r));
+	memcpy(r, bch->ecc_buf, r_bytes);
 
 	/*
 	 * split each 32-bit word into 4 polynomials of weight 8 as follows:
@@ -241,7 +249,7 @@  void encode_bch(struct bch_control *bch, const uint8_t *data,
 
 		r[l] = p0[l]^p1[l]^p2[l]^p3[l];
 	}
-	memcpy(bch->ecc_buf, r, sizeof(r));
+	memcpy(bch->ecc_buf, r, r_bytes);
 
 	/* process last unaligned bytes */
 	if (len)
@@ -434,7 +442,7 @@  static int solve_linear_system(struct bch_control *bch, unsigned int *rows,
 {
 	const int m = GF_M(bch);
 	unsigned int tmp, mask;
-	int rem, c, r, p, k, param[m];
+	int rem, c, r, p, k, param[BCH_MAX_M];
 
 	k = 0;
 	mask = 1 << m;
@@ -1114,7 +1122,7 @@  static int build_deg2_base(struct bch_control *bch)
 {
 	const int m = GF_M(bch);
 	int i, j, r;
-	unsigned int sum, x, y, remaining, ak = 0, xi[m];
+	unsigned int sum, x, y, remaining, ak = 0, xi[BCH_MAX_M];
 
 	/* find k s.t. Tr(a^k) = 1 and 0 <= k < m */
 	for (i = 0; i < m; i++) {
@@ -1254,7 +1262,6 @@  struct bch_control *init_bch(int m, int t, unsigned int prim_poly)
 	struct bch_control *bch = NULL;
 
 	const int min_m = 5;
-	const int max_m = 15;
 
 	/* default primitive polynomials */
 	static const unsigned int prim_poly_tab[] = {
@@ -1270,7 +1277,7 @@  struct bch_control *init_bch(int m, int t, unsigned int prim_poly)
 		goto fail;
 	}
 #endif
-	if ((m < min_m) || (m > max_m))
+	if ((m < min_m) || (m > BCH_MAX_M))
 		/*
 		 * values of m greater than 15 are not currently supported;
 		 * supporting m > 15 would require changing table base type