[RESEND,v2] devres: Really align data field to unsigned long long

Message ID 20180709044444.6397-1-abrodkin@synopsys.com
State New
Headers show
Series
  • [RESEND,v2] devres: Really align data field to unsigned long long
Related show

Commit Message

Alexey Brodkin July 9, 2018, 4:44 a.m.
Depending on ABI "long long" type of a particular 32-bit CPU
might be aligned by either word (32-bits) or double word (64-bits).
Make sure "data" is really 64-bit aligned for any 32-bit CPU.

At least for 32-bit ARC cores ABI requires "long long" types
to be aligned by normal 32-bit word. This makes "data" field aligned to
12 bytes. Which is still OK as long as we use 32-bit data only.

But once we want to use native atomic64_t type (i.e. when we use special
instructions LLOCKD/SCONDD for accessing 64-bit data) we easily hit
misaligned access exception.

That's because even on CPUs capable of non-aligned data access LL/SC
instructions require strict alignment.

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---

Changes v1 -> v2:

 * Reworded commit message
 * Inserted comment right in source [Thomas]

 drivers/base/devres.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

greg@kroah.com July 9, 2018, 5:48 a.m. | #1
On Mon, Jul 09, 2018 at 07:44:44AM +0300, Alexey Brodkin wrote:
> Depending on ABI "long long" type of a particular 32-bit CPU
> might be aligned by either word (32-bits) or double word (64-bits).
> Make sure "data" is really 64-bit aligned for any 32-bit CPU.
> 
> At least for 32-bit ARC cores ABI requires "long long" types
> to be aligned by normal 32-bit word. This makes "data" field aligned to
> 12 bytes. Which is still OK as long as we use 32-bit data only.
> 
> But once we want to use native atomic64_t type (i.e. when we use special
> instructions LLOCKD/SCONDD for accessing 64-bit data) we easily hit
> misaligned access exception.

So is this something you hit today?  If not, why is this needed for
stable kernels?

> That's because even on CPUs capable of non-aligned data access LL/SC
> instructions require strict alignment.

Are you going to hit this code with all types of structures?  What
happens when you do have an unaligned access?

> 
> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

You didn't cc: this address :(

> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> ---
> 
> Changes v1 -> v2:
> 
>  * Reworded commit message
>  * Inserted comment right in source [Thomas]
> 
>  drivers/base/devres.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/base/devres.c b/drivers/base/devres.c
> index f98a097e73f2..466fa59c866a 100644
> --- a/drivers/base/devres.c
> +++ b/drivers/base/devres.c
> @@ -24,8 +24,12 @@ struct devres_node {
>  
>  struct devres {
>  	struct devres_node		node;
> -	/* -- 3 pointers */
> -	unsigned long long		data[];	/* guarantee ull alignment */
> +	/*
> +	 * Depending on ABI "long long" type of a particular 32-bit CPU
> +	 * might be aligned by either word (32-bits) or double word (64-bits).
> +	 * Make sure "data" is really 64-bit aligned for any 32-bit CPU.
> +	 */
> +	unsigned long long		data[] __aligned(sizeof(unsigned long long));
>  };

Does this change the padding today for any other arches?  If so, does
the increased memory usage cause any noticable issues?

thanks,

greg k-h
Alexey Brodkin July 9, 2018, 6:46 a.m. | #2
Hi Greg,

On Mon, 2018-07-09 at 07:48 +0200, Greg KH wrote:
> On Mon, Jul 09, 2018 at 07:44:44AM +0300, Alexey Brodkin wrote:
> > Depending on ABI "long long" type of a particular 32-bit CPU
> > might be aligned by either word (32-bits) or double word (64-bits).
> > Make sure "data" is really 64-bit aligned for any 32-bit CPU.
> > 
> > At least for 32-bit ARC cores ABI requires "long long" types
> > to be aligned by normal 32-bit word. This makes "data" field aligned to
> > 12 bytes. Which is still OK as long as we use 32-bit data only.
> > 
> > But once we want to use native atomic64_t type (i.e. when we use special
> > instructions LLOCKD/SCONDD for accessing 64-bit data) we easily hit
> > misaligned access exception.
> 
> So is this something you hit today?  If not, why is this needed for
> stable kernels?

Indeed we hit that problem recently when Etnaviv driver was switched to
DRM GPU scheduler, see
commit e93b6deeb45a ("drm/etnaviv: hook up DRM GPU scheduler").
The most important part of DRM GPU scheduler is "job_id_count" member of
"drm_gpu_scheduler" structure of type "atomic64_t". This structure is put
in a buffer allocated by devm_kzalloc() and if "job_id_count" is not 64-bit
aligned atomic instruction fails with an exception.

As for stable requirements - mentioned commit was a part of 4.17 kernel
which broke GPU driver for one of our HSDK board so I guess back-porting
to 4.17 is a no-brainer.

Now given a nature of the problem at hand I may expect other breakages
whenever another driver/fs etc that use "atomic64_t" gets enabled.

> > That's because even on CPUs capable of non-aligned data access LL/SC
> > instructions require strict alignment.
> 
> Are you going to hit this code with all types of structures?

If there're other cases which lead to 4-byte aligned "atomic64_t" variables
there will be a problem as well but it's quite hard to predict those cases.
That said if we manage to reproduce more similar issues there will be more
patches with fixes :)

> What happens when you do have an unaligned access?

Atomic instructions are a bit special as compared to normal loads and stores.
Even if normal loads and stores may deal with unaligned data atomic instructions
still require data to be aligned because it's hard to manage atomic value that
spans through multiple cache lines or even MMU pages. And hardware just
raises an alignment fault exception.

And that's not something special for ARC, I guess all CPUs are the same in
that regard, see here's an extract from ARM(r) Architecture Reference
Manual ARMv7-A and ARMv7-R edition: https://lkml.org/lkml/2017/12/5/440
From "Table A3-1 Alignment requirements of load/store instructions"
it's seen that LDREXD, STREXD instructions will cause alignment fault
even if SCTLR.A=0 (strict alignment fault checking disabled) for non
double-word-aligned data.

> > 
> > Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> You didn't cc: this address :(

Sorry, which one?
That what I have in .mbox, see https://patchwork.ozlabs.org/patch/941092/mbox/:
------------------------------>8------------------------------
Cc: linux-arch@vger.kernel.org, Greg KH <greg@kroah.com>,
 Alexey Brodkin <Alexey.Brodkin@synopsys.com>, stable@vger.kernel.org, 
 Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
 Thomas Gleixner <tglx@linutronix.de>, linux-snps-arc@lists.infradead.org
------------------------------>8------------------------------

-Alexey
greg@kroah.com July 9, 2018, 7:06 a.m. | #3
On Mon, Jul 09, 2018 at 06:46:50AM +0000, Alexey Brodkin wrote:
> Hi Greg,
> 
> On Mon, 2018-07-09 at 07:48 +0200, Greg KH wrote:
> > On Mon, Jul 09, 2018 at 07:44:44AM +0300, Alexey Brodkin wrote:
> > > Depending on ABI "long long" type of a particular 32-bit CPU
> > > might be aligned by either word (32-bits) or double word (64-bits).
> > > Make sure "data" is really 64-bit aligned for any 32-bit CPU.
> > > 
> > > At least for 32-bit ARC cores ABI requires "long long" types
> > > to be aligned by normal 32-bit word. This makes "data" field aligned to
> > > 12 bytes. Which is still OK as long as we use 32-bit data only.
> > > 
> > > But once we want to use native atomic64_t type (i.e. when we use special
> > > instructions LLOCKD/SCONDD for accessing 64-bit data) we easily hit
> > > misaligned access exception.
> > 
> > So is this something you hit today?  If not, why is this needed for
> > stable kernels?
> 
> Indeed we hit that problem recently when Etnaviv driver was switched to
> DRM GPU scheduler, see
> commit e93b6deeb45a ("drm/etnaviv: hook up DRM GPU scheduler").
> The most important part of DRM GPU scheduler is "job_id_count" member of
> "drm_gpu_scheduler" structure of type "atomic64_t". This structure is put
> in a buffer allocated by devm_kzalloc() and if "job_id_count" is not 64-bit
> aligned atomic instruction fails with an exception.
> 
> As for stable requirements - mentioned commit was a part of 4.17 kernel
> which broke GPU driver for one of our HSDK board so I guess back-porting
> to 4.17 is a no-brainer.

Ok, so 4.17 is as far back as you need?  Please try to be specific when
asking for stable backports.

> > > That's because even on CPUs capable of non-aligned data access LL/SC
> > > instructions require strict alignment.
> > 
> > Are you going to hit this code with all types of structures?
> 
> If there're other cases which lead to 4-byte aligned "atomic64_t" variables
> there will be a problem as well but it's quite hard to predict those cases.
> That said if we manage to reproduce more similar issues there will be more
> patches with fixes :)
> 
> > What happens when you do have an unaligned access?
> 
> Atomic instructions are a bit special as compared to normal loads and stores.
> Even if normal loads and stores may deal with unaligned data atomic instructions
> still require data to be aligned because it's hard to manage atomic value that
> spans through multiple cache lines or even MMU pages. And hardware just
> raises an alignment fault exception.
> 
> And that's not something special for ARC, I guess all CPUs are the same in
> that regard, see here's an extract from ARM(r) Architecture Reference
> Manual ARMv7-A and ARMv7-R edition: https://lkml.org/lkml/2017/12/5/440
> From "Table A3-1 Alignment requirements of load/store instructions"
> it's seen that LDREXD, STREXD instructions will cause alignment fault
> even if SCTLR.A=0 (strict alignment fault checking disabled) for non
> double-word-aligned data.

Thanks for the better explaination, that helps out a lot.  Can you redo
the patch with all of that information so that others do not have the
same questions as I did?

thanks,

greg k-h
Geert Uytterhoeven July 9, 2018, 7:07 a.m. | #4
On Mon, Jul 9, 2018 at 7:49 AM Greg KH <greg@kroah.com> wrote:
> On Mon, Jul 09, 2018 at 07:44:44AM +0300, Alexey Brodkin wrote:
> > Depending on ABI "long long" type of a particular 32-bit CPU
> > might be aligned by either word (32-bits) or double word (64-bits).

Or even 16-bit (on e.g. m68k).

> > --- a/drivers/base/devres.c
> > +++ b/drivers/base/devres.c
> > @@ -24,8 +24,12 @@ struct devres_node {
> >
> >  struct devres {
> >       struct devres_node              node;
> > -     /* -- 3 pointers */
> > -     unsigned long long              data[]; /* guarantee ull alignment */
> > +     /*
> > +      * Depending on ABI "long long" type of a particular 32-bit CPU
> > +      * might be aligned by either word (32-bits) or double word (64-bits).

or even 16-bit (on e.g. m68k).

> > +      * Make sure "data" is really 64-bit aligned for any 32-bit CPU.
> > +      */
> > +     unsigned long long              data[] __aligned(sizeof(unsigned long long));
> >  };
>
> Does this change the padding today for any other arches?  If so, does
> the increased memory usage cause any noticable issues?

Yes it does. struct devres_node contains an odd number of pointers, so
there will be a hole between node and data on all 32-bit architectures.

Gr{oetje,eeting}s,

                        Geert
Alexey Brodkin July 9, 2018, 7:17 a.m. | #5
Hi Greg,

On Mon, 2018-07-09 at 09:06 +0200, greg@kroah.com wrote:
> On Mon, Jul 09, 2018 at 06:46:50AM +0000, Alexey Brodkin wrote:
> > Hi Greg,
> > 
> > On Mon, 2018-07-09 at 07:48 +0200, Greg KH wrote:
> > > On Mon, Jul 09, 2018 at 07:44:44AM +0300, Alexey Brodkin wrote:
> > > > Depending on ABI "long long" type of a particular 32-bit CPU
> > > > might be aligned by either word (32-bits) or double word (64-bits).
> > > > Make sure "data" is really 64-bit aligned for any 32-bit CPU.
> > > > 
> > > > At least for 32-bit ARC cores ABI requires "long long" types
> > > > to be aligned by normal 32-bit word. This makes "data" field aligned to
> > > > 12 bytes. Which is still OK as long as we use 32-bit data only.
> > > > 
> > > > But once we want to use native atomic64_t type (i.e. when we use special
> > > > instructions LLOCKD/SCONDD for accessing 64-bit data) we easily hit
> > > > misaligned access exception.
> > > 
> > > So is this something you hit today?  If not, why is this needed for
> > > stable kernels?
> > 
> > Indeed we hit that problem recently when Etnaviv driver was switched to
> > DRM GPU scheduler, see
> > commit e93b6deeb45a ("drm/etnaviv: hook up DRM GPU scheduler").
> > The most important part of DRM GPU scheduler is "job_id_count" member of
> > "drm_gpu_scheduler" structure of type "atomic64_t". This structure is put
> > in a buffer allocated by devm_kzalloc() and if "job_id_count" is not 64-bit
> > aligned atomic instruction fails with an exception.
> > 
> > As for stable requirements - mentioned commit was a part of 4.17 kernel
> > which broke GPU driver for one of our HSDK board so I guess back-porting
> > to 4.17 is a no-brainer.
> 
> Ok, so 4.17 is as far back as you need?  Please try to be specific when
> asking for stable backports.

Well in that particular case I really wanted to get this fix back-ported
starting from v4.8 so I guess that's what I need to add in Cc tag, right?
----------------------------->8---------------------
Cc: <stable@vger.kernel.org> # 4.8+
----------------------------->8---------------------

> > > > That's because even on CPUs capable of non-aligned data access LL/SC
> > > > instructions require strict alignment.
> > > 
> > > Are you going to hit this code with all types of structures?
> > 
> > If there're other cases which lead to 4-byte aligned "atomic64_t" variables
> > there will be a problem as well but it's quite hard to predict those cases.
> > That said if we manage to reproduce more similar issues there will be more
> > patches with fixes :)
> > 
> > > What happens when you do have an unaligned access?
> > 
> > Atomic instructions are a bit special as compared to normal loads and stores.
> > Even if normal loads and stores may deal with unaligned data atomic instructions
> > still require data to be aligned because it's hard to manage atomic value that
> > spans through multiple cache lines or even MMU pages. And hardware just
> > raises an alignment fault exception.
> > 
> > And that's not something special for ARC, I guess all CPUs are the same in
> > that regard, see here's an extract from ARM(r) Architecture Reference
> > Manual ARMv7-A and ARMv7-R edition: https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2017_12_5_440&d=DwIBAg&c=DPL6_X_6JkXFx7AXWq
> > B0tg&r=lqdeeSSEes0GFDDl656eViXO7breS55ytWkhpk5R81I&m=eT1OUXQEt6zlA0bABwdI7sFr7Hys3aHzoCXTDAkM6xY&s=-3n_Xurm4D2TIC-_G_GIvGj9P_3unmq839oGATLE5W0&e=
> > From "Table A3-1 Alignment requirements of load/store instructions"
> > it's seen that LDREXD, STREXD instructions will cause alignment fault
> > even if SCTLR.A=0 (strict alignment fault checking disabled) for non
> > double-word-aligned data.
> 
> Thanks for the better explaination, that helps out a lot.  Can you redo
> the patch with all of that information so that others do not have the
> same questions as I did?

Sure will do it soonish.

-Alexey
Alexey Brodkin July 9, 2018, 7:22 a.m. | #6
Hi Geert,

On Mon, 2018-07-09 at 09:07 +0200, Geert Uytterhoeven wrote:
> On Mon, Jul 9, 2018 at 7:49 AM Greg KH <greg@kroah.com> wrote:
> > On Mon, Jul 09, 2018 at 07:44:44AM +0300, Alexey Brodkin wrote:
> > > Depending on ABI "long long" type of a particular 32-bit CPU
> > > might be aligned by either word (32-bits) or double word (64-bits).
> 
> Or even 16-bit (on e.g. m68k).

Indeed, thanks for the note!
Will add this in my v3.

For ARC I'd like this fix to be back-ported starting
from v4.8 where we added support of "native" atomic64_t, see commit
ce6365270ecd (" ARCv2: Implement atomic64 based on LLOCKD/SCONDD instructions").

What about m68k, do you have any preference of earliest kernel version
where this fix might be useful?

-Alexey
greg@kroah.com July 9, 2018, 7:33 a.m. | #7
On Mon, Jul 09, 2018 at 07:17:11AM +0000, Alexey Brodkin wrote:
> Hi Greg,
> 
> On Mon, 2018-07-09 at 09:06 +0200, greg@kroah.com wrote:
> > On Mon, Jul 09, 2018 at 06:46:50AM +0000, Alexey Brodkin wrote:
> > > Hi Greg,
> > > 
> > > On Mon, 2018-07-09 at 07:48 +0200, Greg KH wrote:
> > > > On Mon, Jul 09, 2018 at 07:44:44AM +0300, Alexey Brodkin wrote:
> > > > > Depending on ABI "long long" type of a particular 32-bit CPU
> > > > > might be aligned by either word (32-bits) or double word (64-bits).
> > > > > Make sure "data" is really 64-bit aligned for any 32-bit CPU.
> > > > > 
> > > > > At least for 32-bit ARC cores ABI requires "long long" types
> > > > > to be aligned by normal 32-bit word. This makes "data" field aligned to
> > > > > 12 bytes. Which is still OK as long as we use 32-bit data only.
> > > > > 
> > > > > But once we want to use native atomic64_t type (i.e. when we use special
> > > > > instructions LLOCKD/SCONDD for accessing 64-bit data) we easily hit
> > > > > misaligned access exception.
> > > > 
> > > > So is this something you hit today?  If not, why is this needed for
> > > > stable kernels?
> > > 
> > > Indeed we hit that problem recently when Etnaviv driver was switched to
> > > DRM GPU scheduler, see
> > > commit e93b6deeb45a ("drm/etnaviv: hook up DRM GPU scheduler").
> > > The most important part of DRM GPU scheduler is "job_id_count" member of
> > > "drm_gpu_scheduler" structure of type "atomic64_t". This structure is put
> > > in a buffer allocated by devm_kzalloc() and if "job_id_count" is not 64-bit
> > > aligned atomic instruction fails with an exception.
> > > 
> > > As for stable requirements - mentioned commit was a part of 4.17 kernel
> > > which broke GPU driver for one of our HSDK board so I guess back-porting
> > > to 4.17 is a no-brainer.
> > 
> > Ok, so 4.17 is as far back as you need?  Please try to be specific when
> > asking for stable backports.
> 
> Well in that particular case I really wanted to get this fix back-ported
> starting from v4.8 so I guess that's what I need to add in Cc tag, right?
> ----------------------------->8---------------------
> Cc: <stable@vger.kernel.org> # 4.8+
> ----------------------------->8---------------------

Yes.
Geert Uytterhoeven July 9, 2018, 7:52 a.m. | #8
Hi Alexey,

On Mon, Jul 9, 2018 at 9:22 AM Alexey Brodkin
<Alexey.Brodkin@synopsys.com> wrote:
> On Mon, 2018-07-09 at 09:07 +0200, Geert Uytterhoeven wrote:
> > On Mon, Jul 9, 2018 at 7:49 AM Greg KH <greg@kroah.com> wrote:
> > > On Mon, Jul 09, 2018 at 07:44:44AM +0300, Alexey Brodkin wrote:
> > > > Depending on ABI "long long" type of a particular 32-bit CPU
> > > > might be aligned by either word (32-bits) or double word (64-bits).
> >
> > Or even 16-bit (on e.g. m68k).
>
> Indeed, thanks for the note!
> Will add this in my v3.

Note that in this particular case, the field will probably always be
aligned to 4 bytes,
as the struct will be allocated using *alloc().

> For ARC I'd like this fix to be back-ported starting
> from v4.8 where we added support of "native" atomic64_t, see commit
> ce6365270ecd (" ARCv2: Implement atomic64 based on LLOCKD/SCONDD instructions").
>
> What about m68k, do you have any preference of earliest kernel version
> where this fix might be useful?

Given m68k is 32-bit, it will access atomic64_t variables while
holding a spinlock,
so it should still be safe without this change.

Not to mention no one will try etnaviv on m68k ;-)

Gr{oetje,eeting}s,

                        Geert
Alexey Brodkin July 9, 2018, 8:37 a.m. | #9
Hi Geert,

On Mon, 2018-07-09 at 09:52 +0200, Geert Uytterhoeven wrote:
> Hi Alexey,
> 
> On Mon, Jul 9, 2018 at 9:22 AM Alexey Brodkin
> <Alexey.Brodkin@synopsys.com> wrote:
> > On Mon, 2018-07-09 at 09:07 +0200, Geert Uytterhoeven wrote:
> > > On Mon, Jul 9, 2018 at 7:49 AM Greg KH <greg@kroah.com> wrote:
> > > > On Mon, Jul 09, 2018 at 07:44:44AM +0300, Alexey Brodkin wrote:
> > > > > Depending on ABI "long long" type of a particular 32-bit CPU
> > > > > might be aligned by either word (32-bits) or double word (64-bits).
> > > 
> > > Or even 16-bit (on e.g. m68k).
> > 
> > Indeed, thanks for the note!
> > Will add this in my v3.
> 
> Note that in this particular case, the field will probably always be
> aligned to 4 bytes,
> as the struct will be allocated using *alloc().

Well I think maybe it worth mentioning only 32-bit and 64-bit alignments in
this particular case because as it was mentioned initially in the comment there're
3 pointers before "data" field and for 32-bit machines I guess we may safely
assume that a pointer size is 32-bit.

This leaves us only one problematic situation 32-bit instead of 64-bit alignment.

> > For ARC I'd like this fix to be back-ported starting
> > from v4.8 where we added support of "native" atomic64_t, see commit
> > ce6365270ecd (" ARCv2: Implement atomic64 based on LLOCKD/SCONDD instructions").
> > 
> > What about m68k, do you have any preference of earliest kernel version
> > where this fix might be useful?
> 
> Given m68k is 32-bit, it will access atomic64_t variables while
> holding a spinlock,
> so it should still be safe without this change.

Well ARC and ARMv7 are 32-bit machines as well still both have
a special instructions to read/write 64-bit data.

> Not to mention no one will try etnaviv on m68k ;-)

See it was just a trigger case but other GPUs that use or will use
DRM GPU scheduler are good candidates to it that problem and not to
mention some other users of atomic64_t.

But from you above comment I may deduce that m68k doesn't have
instructions for 64-bit data; in that case there's no reason to worry
at least about struct devres_node :)

-Alexey
Geert Uytterhoeven July 9, 2018, 9:03 a.m. | #10
Hi Alexey,

On Mon, Jul 9, 2018 at 10:37 AM Alexey Brodkin
<Alexey.Brodkin@synopsys.com> wrote:
> On Mon, 2018-07-09 at 09:52 +0200, Geert Uytterhoeven wrote:
> > On Mon, Jul 9, 2018 at 9:22 AM Alexey Brodkin
> > <Alexey.Brodkin@synopsys.com> wrote:
> > > On Mon, 2018-07-09 at 09:07 +0200, Geert Uytterhoeven wrote:
> > > > On Mon, Jul 9, 2018 at 7:49 AM Greg KH <greg@kroah.com> wrote:
> > > > > On Mon, Jul 09, 2018 at 07:44:44AM +0300, Alexey Brodkin wrote:
> > > > > > Depending on ABI "long long" type of a particular 32-bit CPU
> > > > > > might be aligned by either word (32-bits) or double word (64-bits).
> > > >
> > > > Or even 16-bit (on e.g. m68k).
> > >
> > > Indeed, thanks for the note!
> > > Will add this in my v3.
> >
> > Note that in this particular case, the field will probably always be
> > aligned to 4 bytes,
> > as the struct will be allocated using *alloc().
>
> Well I think maybe it worth mentioning only 32-bit and 64-bit alignments in
> this particular case because as it was mentioned initially in the comment there're
> 3 pointers before "data" field and for 32-bit machines I guess we may safely
> assume that a pointer size is 32-bit.
>
> This leaves us only one problematic situation 32-bit instead of 64-bit alignment.
>
> > > For ARC I'd like this fix to be back-ported starting
> > > from v4.8 where we added support of "native" atomic64_t, see commit
> > > ce6365270ecd (" ARCv2: Implement atomic64 based on LLOCKD/SCONDD instructions").
> > >
> > > What about m68k, do you have any preference of earliest kernel version
> > > where this fix might be useful?
> >
> > Given m68k is 32-bit, it will access atomic64_t variables while
> > holding a spinlock,
> > so it should still be safe without this change.
>
> Well ARC and ARMv7 are 32-bit machines as well still both have
> a special instructions to read/write 64-bit data.

Sure.

> > Not to mention no one will try etnaviv on m68k ;-)
>
> See it was just a trigger case but other GPUs that use or will use
> DRM GPU scheduler are good candidates to it that problem and not to
> mention some other users of atomic64_t.
>
> But from you above comment I may deduce that m68k doesn't have
> instructions for 64-bit data; in that case there's no reason to worry
> at least about struct devres_node :)

That's correct: m68k has no instructions for 64-bit data.

Gr{oetje,eeting}s,

                        Geert
David Laight July 9, 2018, 9:16 a.m. | #11
From: Alexey Brodkin
> Sent: 09 July 2018 05:45
> Depending on ABI "long long" type of a particular 32-bit CPU
> might be aligned by either word (32-bits) or double word (64-bits).
> Make sure "data" is really 64-bit aligned for any 32-bit CPU.
> 
> At least for 32-bit ARC cores ABI requires "long long" types
> to be aligned by normal 32-bit word. This makes "data" field aligned to
> 12 bytes. Which is still OK as long as we use 32-bit data only.
> 
> But once we want to use native atomic64_t type (i.e. when we use special
> instructions LLOCKD/SCONDD for accessing 64-bit data) we easily hit
> misaligned access exception.

Shouldn't there be a typedef for the actual type.
Perhaps it is even atomic64_t ?
And have the __aligned(8) applied to that typedef ??

> That's because even on CPUs capable of non-aligned data access LL/SC
> instructions require strict alignment.
...
> --- a/drivers/base/devres.c
> +++ b/drivers/base/devres.c
> @@ -24,8 +24,12 @@ struct devres_node {
> 
>  struct devres {
>  	struct devres_node		node;
> -	/* -- 3 pointers */
> -	unsigned long long		data[];	/* guarantee ull alignment */
> +	/*
> +	 * Depending on ABI "long long" type of a particular 32-bit CPU
> +	 * might be aligned by either word (32-bits) or double word (64-bits).
> +	 * Make sure "data" is really 64-bit aligned for any 32-bit CPU.

Just:
	/* data[] must be 64 bit aligned even on 32 bit architectures
	 * because it might be accessed by instructions that require
	 * aligned memory arguments.

> +	 */
> +	unsigned long long		data[] __aligned(sizeof(unsigned long long));

One day assuming that 'unsigned long long' is exactly 64 bits will bite.
This probably ought to be u64 (or similar).
(If not atomic64_t)

	David
Geert Uytterhoeven July 9, 2018, 9:23 a.m. | #12
On Mon, Jul 9, 2018 at 11:15 AM David Laight <David.Laight@aculab.com> wrote:
> From: Alexey Brodkin
> > Sent: 09 July 2018 05:45
> > Depending on ABI "long long" type of a particular 32-bit CPU
> > might be aligned by either word (32-bits) or double word (64-bits).
> > Make sure "data" is really 64-bit aligned for any 32-bit CPU.
> >
> > At least for 32-bit ARC cores ABI requires "long long" types
> > to be aligned by normal 32-bit word. This makes "data" field aligned to
> > 12 bytes. Which is still OK as long as we use 32-bit data only.
> >
> > But once we want to use native atomic64_t type (i.e. when we use special
> > instructions LLOCKD/SCONDD for accessing 64-bit data) we easily hit
> > misaligned access exception.
>
> Shouldn't there be a typedef for the actual type.
> Perhaps it is even atomic64_t ?
> And have the __aligned(8) applied to that typedef ??

That indeed sounds like the best thing to do, as it will fix this issue in other
places, too.

Gr{oetje,eeting}s,

                        Geert
David Laight July 9, 2018, 9:54 a.m. | #13
From: Geert Uytterhoeven
> Sent: 09 July 2018 10:23
> On Mon, Jul 9, 2018 at 11:15 AM David Laight <David.Laight@aculab.com> wrote:
> > From: Alexey Brodkin
> > > Sent: 09 July 2018 05:45
> > > Depending on ABI "long long" type of a particular 32-bit CPU
> > > might be aligned by either word (32-bits) or double word (64-bits).
> > > Make sure "data" is really 64-bit aligned for any 32-bit CPU.
> > >
> > > At least for 32-bit ARC cores ABI requires "long long" types
> > > to be aligned by normal 32-bit word. This makes "data" field aligned to
> > > 12 bytes. Which is still OK as long as we use 32-bit data only.
> > >
> > > But once we want to use native atomic64_t type (i.e. when we use special
> > > instructions LLOCKD/SCONDD for accessing 64-bit data) we easily hit
> > > misaligned access exception.
> >
> > Shouldn't there be a typedef for the actual type.
> > Perhaps it is even atomic64_t ?
> > And have the __aligned(8) applied to that typedef ??
> 
> That indeed sounds like the best thing to do, as it will fix this issue in other
> places, too.

Something like:
typedef struct {
	u64 val __aligned(8);
} atomic64_t;

would pick up most errors.
Including all the places that fail to use atomic_read().

	David
Alexey Brodkin July 9, 2018, 9:59 a.m. | #14
Hi David,

On Mon, 2018-07-09 at 09:16 +0000, David Laight wrote:
> From: Alexey Brodkin
> > Sent: 09 July 2018 05:45
> > Depending on ABI "long long" type of a particular 32-bit CPU
> > might be aligned by either word (32-bits) or double word (64-bits).
> > Make sure "data" is really 64-bit aligned for any 32-bit CPU.
> > 
> > At least for 32-bit ARC cores ABI requires "long long" types
> > to be aligned by normal 32-bit word. This makes "data" field aligned to
> > 12 bytes. Which is still OK as long as we use 32-bit data only.
> > 
> > But once we want to use native atomic64_t type (i.e. when we use special
> > instructions LLOCKD/SCONDD for accessing 64-bit data) we easily hit
> > misaligned access exception.
> 
> Shouldn't there be a typedef for the actual type.
> Perhaps it is even atomic64_t ?
> And have the __aligned(8) applied to that typedef ??

That's a good idea indeed but it doesn't solve the problem with
struct devres_node. Consider the following snippet:
-------------------------------->8-------------------------------
	struct mystruct {
		atomic64_t myvar;
	}

	struct mystruct *p;
	p = devm_kzalloc(dev, sizeof(*p), GFP_KERNEL);
-------------------------------->8-------------------------------

Here myvar address will match address of "data" member of struct devres_node.
So if "data" is has offset of 12 bytes from the beginning of a page then
myvar won't be 64-bit aligned regardless of myvar's attribute, right?


> > That's because even on CPUs capable of non-aligned data access LL/SC
> > instructions require strict alignment.
> 
> ...
> > --- a/drivers/base/devres.c
> > +++ b/drivers/base/devres.c
> > @@ -24,8 +24,12 @@ struct devres_node {
> > 
> >  struct devres {
> >  	struct devres_node		node;
> > -	/* -- 3 pointers */
> > -	unsigned long long		data[];	/* guarantee ull alignment */
> > +	/*
> > +	 * Depending on ABI "long long" type of a particular 32-bit CPU
> > +	 * might be aligned by either word (32-bits) or double word (64-bits).
> > +	 * Make sure "data" is really 64-bit aligned for any 32-bit CPU.
> 
> Just:
> 	/* data[] must be 64 bit aligned even on 32 bit architectures
> 	 * because it might be accessed by instructions that require
> 	 * aligned memory arguments.
> 
> > +	 */
> > +	unsigned long long		data[] __aligned(sizeof(unsigned long long));
> 
> One day assuming that 'unsigned long long' is exactly 64 bits will bite.
> This probably ought to be u64 (or similar).

I agree. Initially I wanted to keep as few changes as possible but
IMHO switching to more predictable data type makes sense.

-Alexey
David Laight July 9, 2018, 10:18 a.m. | #15
From: Alexey Brodkin
> Sent: 09 July 2018 11:00
...
> That's a good idea indeed but it doesn't solve the problem with
> struct devres_node. Consider the following snippet:
> -------------------------------->8-------------------------------
> 	struct mystruct {
> 		atomic64_t myvar;
> 	}
> 
> 	struct mystruct *p;
> 	p = devm_kzalloc(dev, sizeof(*p), GFP_KERNEL);
> -------------------------------->8-------------------------------
> 
> Here myvar address will match address of "data" member of struct devres_node.
> So if "data" is has offset of 12 bytes from the beginning of a page then
> myvar won't be 64-bit aligned regardless of myvar's attribute, right?
...
> > > -	unsigned long long		data[];	/* guarantee ull alignment */

Ahh, that line should be:
	u8 data[] __aligned(8); /* Guarantee 64bit alignment */

	David
Alexey Brodkin July 9, 2018, 10:23 a.m. | #16
Hi David,

On Mon, 2018-07-09 at 10:18 +0000, David Laight wrote:
> From: Alexey Brodkin
> > Sent: 09 July 2018 11:00
> 
> ...
> > That's a good idea indeed but it doesn't solve the problem with
> > struct devres_node. Consider the following snippet:
> > -------------------------------->8-------------------------------
> > 	struct mystruct {
> > 		atomic64_t myvar;
> > 	}
> > 
> > 	struct mystruct *p;
> > 	p = devm_kzalloc(dev, sizeof(*p), GFP_KERNEL);
> > -------------------------------->8-------------------------------
> > 
> > Here myvar address will match address of "data" member of struct devres_node.
> > So if "data" is has offset of 12 bytes from the beginning of a page then
> > myvar won't be 64-bit aligned regardless of myvar's attribute, right?
> 
> ...
> > > > -	unsigned long long		data[];	/* guarantee ull alignment */
> 
> Ahh, that line should be:
> 	u8 data[] __aligned(8); /* Guarantee 64bit alignment */

And that pretty much what I suggested in my initial patch :)

For the record x86 has exactly the same atomic64_t as you suggested,
see https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/atomic64_32.h#L13:
---------------------->8------------------
typedef struct {
	u64 __aligned(8) counter;
} atomic64_t;
---------------------->8------------------

-Alexey
Vineet Gupta July 9, 2018, 6:27 p.m. | #17
On 07/09/2018 03:23 AM, Alexey Brodkin wrote:
> Hi David,
> 
> On Mon, 2018-07-09 at 10:18 +0000, David Laight wrote:
>> From: Alexey Brodkin
>>> Sent: 09 July 2018 11:00
>>
>> ...
>>> That's a good idea indeed but it doesn't solve the problem with
>>> struct devres_node. Consider the following snippet:
>>> -------------------------------->8-------------------------------
>>> 	struct mystruct {
>>> 		atomic64_t myvar;
>>> 	}
>>>
>>> 	struct mystruct *p;
>>> 	p = devm_kzalloc(dev, sizeof(*p), GFP_KERNEL);
>>> -------------------------------->8-------------------------------
>>>
>>> Here myvar address will match address of "data" member of struct devres_node.
>>> So if "data" is has offset of 12 bytes from the beginning of a page then
>>> myvar won't be 64-bit aligned regardless of myvar's attribute, right?
>>
>> ...
>>>>> -	unsigned long long		data[];	/* guarantee ull alignment */
>>
>> Ahh, that line should be:
>> 	u8 data[] __aligned(8); /* Guarantee 64bit alignment */
> 
> And that pretty much what I suggested in my initial patch :)
> 
> For the record x86 has exactly the same atomic64_t as you suggested,
> see https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/atomic64_32.h#L13:
> ---------------------->8------------------
> typedef struct {
> 	u64 __aligned(8) counter;
> } atomic64_t;
> ---------------------->8------------------

And so does the ARC version since when the atomic64_t support was added by commit
ce6365270ecd

Also, you should consider using the pre-canned type aligned_u64.

typedef struct {
       aligned_u64 counter;
       ^^^^^^^^^^^
} atomic64_t;
Alexey Brodkin July 10, 2018, 6:42 a.m. | #18
Hi Vineet,

On Mon, 2018-07-09 at 11:27 -0700, Vineet Gupta wrote:
> On 07/09/2018 03:23 AM, Alexey Brodkin wrote:
> > Hi David,
> > 
> > On Mon, 2018-07-09 at 10:18 +0000, David Laight wrote:
> > > From: Alexey Brodkin
> > > > Sent: 09 July 2018 11:00
> > > 
> > > ...
> > > > That's a good idea indeed but it doesn't solve the problem with
> > > > struct devres_node. Consider the following snippet:
> > > > -------------------------------->8-------------------------------
> > > > 	struct mystruct {
> > > > 		atomic64_t myvar;
> > > > 	}
> > > > 
> > > > 	struct mystruct *p;
> > > > 	p = devm_kzalloc(dev, sizeof(*p), GFP_KERNEL);
> > > > -------------------------------->8-------------------------------
> > > > 
> > > > Here myvar address will match address of "data" member of struct devres_node.
> > > > So if "data" is has offset of 12 bytes from the beginning of a page then
> > > > myvar won't be 64-bit aligned regardless of myvar's attribute, right?
> > > 
> > > ...
> > > > > > -	unsigned long long		data[];	/* guarantee ull alignment */
> > > 
> > > Ahh, that line should be:
> > > 	u8 data[] __aligned(8); /* Guarantee 64bit alignment */
> > 
> > And that pretty much what I suggested in my initial patch :)
> > 
> > For the record x86 has exactly the same atomic64_t as you suggested,
> > see https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/atomic64_32.h#L13:
> > ---------------------->8------------------
> > typedef struct {
> > 	u64 __aligned(8) counter;
> > } atomic64_t;
> > ---------------------->8------------------
> 
> And so does the ARC version since when the atomic64_t support was added by commit
> ce6365270ecd
> 
> Also, you should consider using the pre-canned type aligned_u64.
> 
> typedef struct {
>        aligned_u64 counter;
>        ^^^^^^^^^^^
> } atomic64_t;

As Peter mentioned earlier atomic is signed, see:
http://lists.infradead.org/pipermail/linux-snps-arc/2018-July/004030.html

So looks like we need this fix for ARC instead:
--------------------------------->8----------------------------
diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 11859287c52a..ef94b7371de6 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -367,7 +367,7 @@ ATOMIC_OPS(xor, ^=, CTOP_INST_AXOR_DI_R2_R2_R3)
  */
 
 typedef struct {
-       aligned_u64 counter;
+       long long __aligned(8) counter;
 } atomic64_t;
 
 #define ATOMIC64_INIT(a) { (a) }
--------------------------------->8----------------------------

-Alexey

Patch

diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index f98a097e73f2..466fa59c866a 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -24,8 +24,12 @@  struct devres_node {
 
 struct devres {
 	struct devres_node		node;
-	/* -- 3 pointers */
-	unsigned long long		data[];	/* guarantee ull alignment */
+	/*
+	 * Depending on ABI "long long" type of a particular 32-bit CPU
+	 * might be aligned by either word (32-bits) or double word (64-bits).
+	 * Make sure "data" is really 64-bit aligned for any 32-bit CPU.
+	 */
+	unsigned long long		data[] __aligned(sizeof(unsigned long long));
 };
 
 struct devres_group {