diff mbox

[tpmdd-devel] tpm: fix cacheline alignment for DMA-able buffers

Message ID 1469761153-85576-1-git-send-email-apronin@chromium.org
State New
Headers show

Commit Message

apronin@chromium.org July 29, 2016, 2:59 a.m. UTC
Annotate buffers used in spi transactions as ____cacheline_aligned
to use in DMA transfers.

Signed-off-by: Andrey Pronin <apronin@chromium.org>
---
 drivers/char/tpm/st33zp24/spi.c | 4 ++--
 drivers/char/tpm/tpm_tis_spi.c  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Comments

Jason Gunthorpe July 29, 2016, 5:27 p.m. UTC | #1
On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote:
> Annotate buffers used in spi transactions as ____cacheline_aligned
> to use in DMA transfers.
> 
> Signed-off-by: Andrey Pronin <apronin@chromium.org>
>  drivers/char/tpm/st33zp24/spi.c | 4 ++--
>  drivers/char/tpm/tpm_tis_spi.c  | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/char/tpm/st33zp24/spi.c b/drivers/char/tpm/st33zp24/spi.c
> index 9f5a011..0e9aad9 100644
> +++ b/drivers/char/tpm/st33zp24/spi.c
> @@ -70,8 +70,8 @@
>  struct st33zp24_spi_phy {
>  	struct spi_device *spi_device;
>  
> -	u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> -	u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> +	u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
> +	u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
>  
>  	int io_lpcpd;
>  	int latency;

Hurm, this still looks wrong to me. Aligning the start of buffers is
not enough, the DMA'able space must also end on a cache line as well.

So, the buffers must also always be placed at the end of the struct.

IMHO It would be cleaner and safer to always kmalloc the DMA buffer
alone than to try and optimize like this.

Jason

------------------------------------------------------------------------------
Dmitry Torokhov July 29, 2016, 5:30 p.m. UTC | #2
On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe <
jgunthorpe@obsidianresearch.com> wrote:

> On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote:
> > Annotate buffers used in spi transactions as ____cacheline_aligned
> > to use in DMA transfers.
> >
> > Signed-off-by: Andrey Pronin <apronin@chromium.org>
> >  drivers/char/tpm/st33zp24/spi.c | 4 ++--
> >  drivers/char/tpm/tpm_tis_spi.c  | 4 ++--
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/char/tpm/st33zp24/spi.c
> b/drivers/char/tpm/st33zp24/spi.c
> > index 9f5a011..0e9aad9 100644
> > +++ b/drivers/char/tpm/st33zp24/spi.c
> > @@ -70,8 +70,8 @@
> >  struct st33zp24_spi_phy {
> >       struct spi_device *spi_device;
> >
> > -     u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> > -     u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> > +     u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
> > +     u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
> >
> >       int io_lpcpd;
> >       int latency;
>
> Hurm, this still looks wrong to me. Aligning the start of buffers is
> not enough, the DMA'able space must also end on a cache line as well.
>
> So, the buffers must also always be placed at the end of the struct.
>
> IMHO It would be cleaner and safer to always kmalloc the DMA buffer
> alone than to try and optimize like this.
>

In this case moving them to the end of the structure and commenting why
they have to be at the end might be less invasive change. More
performance-efficient and resilient in low memory situations too.

Thanks,
Dmitry
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev
Jarkko Sakkinen Aug. 9, 2016, 9:46 a.m. UTC | #3
On Fri, Jul 29, 2016 at 10:30:22AM -0700, Dmitry Torokhov wrote:
>    On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe
>    <jgunthorpe@obsidianresearch.com> wrote:
> 
>      On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote:
>      > Annotate buffers used in spi transactions as ____cacheline_aligned
>      > to use in DMA transfers.
>      >
>      > Signed-off-by: Andrey Pronin <apronin@chromium.org>
>      >  drivers/char/tpm/st33zp24/spi.c | 4 ++--
>      >  drivers/char/tpm/tpm_tis_spi.c  | 4 ++--
>      >  2 files changed, 4 insertions(+), 4 deletions(-)
>      >
>      > diff --git a/drivers/char/tpm/st33zp24/spi.c
>      b/drivers/char/tpm/st33zp24/spi.c
>      > index 9f5a011..0e9aad9 100644
>      > +++ b/drivers/char/tpm/st33zp24/spi.c
>      > @@ -70,8 +70,8 @@
>      >  struct st33zp24_spi_phy {
>      >       struct spi_device *spi_device;
>      >
>      > -     u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE];
>      > -     u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE];
>      > +     u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
>      > +     u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
>      >
>      >       int io_lpcpd;
>      >       int latency;
> 
>      Hurm, this still looks wrong to me. Aligning the start of buffers is
>      not enough, the DMA'able space must also end on a cache line as well.
> 
>      So, the buffers must also always be placed at the end of the struct.
> 
>      IMHO It would be cleaner and safer to always kmalloc the DMA buffer
>      alone than to try and optimize like this.
> 
>    In this case moving them to the end of the structure and commenting why
>    they have to be at the end might be less invasive change. More
>    performance-efficient and resilient in low memory situations too.

kmallocs would be done in the driver initialization:

* you rarely are in low memory situation
* performance gain/loss is insignificant

I really don't see your point.

>    Thanks,
>    Dmitry

/Jarkko

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev
Jarkko Sakkinen Aug. 9, 2016, 3:01 p.m. UTC | #4
On Tue, Aug 09, 2016 at 12:46:10PM +0300, Jarkko Sakkinen wrote:
> On Fri, Jul 29, 2016 at 10:30:22AM -0700, Dmitry Torokhov wrote:
> >    On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe
> >    <jgunthorpe@obsidianresearch.com> wrote:
> > 
> >      On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote:
> >      > Annotate buffers used in spi transactions as ____cacheline_aligned
> >      > to use in DMA transfers.
> >      >
> >      > Signed-off-by: Andrey Pronin <apronin@chromium.org>
> >      >  drivers/char/tpm/st33zp24/spi.c | 4 ++--
> >      >  drivers/char/tpm/tpm_tis_spi.c  | 4 ++--
> >      >  2 files changed, 4 insertions(+), 4 deletions(-)
> >      >
> >      > diff --git a/drivers/char/tpm/st33zp24/spi.c
> >      b/drivers/char/tpm/st33zp24/spi.c
> >      > index 9f5a011..0e9aad9 100644
> >      > +++ b/drivers/char/tpm/st33zp24/spi.c
> >      > @@ -70,8 +70,8 @@
> >      >  struct st33zp24_spi_phy {
> >      >       struct spi_device *spi_device;
> >      >
> >      > -     u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> >      > -     u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> >      > +     u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
> >      > +     u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
> >      >
> >      >       int io_lpcpd;
> >      >       int latency;
> > 
> >      Hurm, this still looks wrong to me. Aligning the start of buffers is
> >      not enough, the DMA'able space must also end on a cache line as well.
> > 
> >      So, the buffers must also always be placed at the end of the struct.
> > 
> >      IMHO It would be cleaner and safer to always kmalloc the DMA buffer
> >      alone than to try and optimize like this.
> > 
> >    In this case moving them to the end of the structure and commenting why
> >    they have to be at the end might be less invasive change. More
> >    performance-efficient and resilient in low memory situations too.
> 
> kmallocs would be done in the driver initialization:
> 
> * you rarely are in low memory situation
> * performance gain/loss is insignificant
> 
> I really don't see your point.

I'm fine having them at the end of the structure mainly for simplicity
reasons but those arguments just didn't hold at all.

/Jarkko

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev
Dmitry Torokhov Aug. 9, 2016, 3:18 p.m. UTC | #5
On Tue, Aug 9, 2016 at 8:01 AM, Jarkko Sakkinen <
jarkko.sakkinen@linux.intel.com> wrote:

> On Tue, Aug 09, 2016 at 12:46:10PM +0300, Jarkko Sakkinen wrote:
> > On Fri, Jul 29, 2016 at 10:30:22AM -0700, Dmitry Torokhov wrote:
> > >    On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe
> > >    <jgunthorpe@obsidianresearch.com> wrote:
> > >
> > >      On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote:
> > >      > Annotate buffers used in spi transactions as
> ____cacheline_aligned
> > >      > to use in DMA transfers.
> > >      >
> > >      > Signed-off-by: Andrey Pronin <apronin@chromium.org>
> > >      >  drivers/char/tpm/st33zp24/spi.c | 4 ++--
> > >      >  drivers/char/tpm/tpm_tis_spi.c  | 4 ++--
> > >      >  2 files changed, 4 insertions(+), 4 deletions(-)
> > >      >
> > >      > diff --git a/drivers/char/tpm/st33zp24/spi.c
> > >      b/drivers/char/tpm/st33zp24/spi.c
> > >      > index 9f5a011..0e9aad9 100644
> > >      > +++ b/drivers/char/tpm/st33zp24/spi.c
> > >      > @@ -70,8 +70,8 @@
> > >      >  struct st33zp24_spi_phy {
> > >      >       struct spi_device *spi_device;
> > >      >
> > >      > -     u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> > >      > -     u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE];
> > >      > +     u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE]
> ____cacheline_aligned;
> > >      > +     u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE]
> ____cacheline_aligned;
> > >      >
> > >      >       int io_lpcpd;
> > >      >       int latency;
> > >
> > >      Hurm, this still looks wrong to me. Aligning the start of buffers
> is
> > >      not enough, the DMA'able space must also end on a cache line as
> well.
> > >
> > >      So, the buffers must also always be placed at the end of the
> struct.
> > >
> > >      IMHO It would be cleaner and safer to always kmalloc the DMA
> buffer
> > >      alone than to try and optimize like this.
> > >
> > >    In this case moving them to the end of the structure and commenting
> why
> > >    they have to be at the end might be less invasive change. More
> > >    performance-efficient and resilient in low memory situations too.
> >
> > kmallocs would be done in the driver initialization:
> >
> > * you rarely are in low memory situation
> > * performance gain/loss is insignificant
> >
> > I really don't see your point.
>
> I'm fine having them at the end of the structure mainly for simplicity
> reasons but those arguments just didn't hold at all.
>

Well, the main reason was simplicity and invasiveness of the change.

But I still maintain that doing 3 memory allocations instead of 1 is less
performant and puts more pressure on the kernel. Yes, it is at bind time,
but you do not have to do 3 times work when one allocation will suffice.
Also, driver binding does not necessarily happen at boot time. I can always
unbind and rebind the driver or reload the module.

Thanks,
Dmitry
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev
Jason Gunthorpe Aug. 9, 2016, 10:08 p.m. UTC | #6
On Tue, Aug 09, 2016 at 08:18:00AM -0700, Dmitry Torokhov wrote:

>    Well, the main reason was simplicity and invasiveness of the
>    change.

Well, it isn't simple, because the proposed patches have had subtle
problems with DMA. Simple is to use a guaranteed dma-able allocation
for DMA memory and stop trying to over optimize.

Jason

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev
Jarkko Sakkinen Aug. 10, 2016, 10:36 a.m. UTC | #7
On Tue, Aug 09, 2016 at 08:18:00AM -0700, Dmitry Torokhov wrote:
>    On Tue, Aug 9, 2016 at 8:01 AM, Jarkko Sakkinen
>    <jarkko.sakkinen@linux.intel.com> wrote:
> 
>      On Tue, Aug 09, 2016 at 12:46:10PM +0300, Jarkko Sakkinen wrote:
>      > On Fri, Jul 29, 2016 at 10:30:22AM -0700, Dmitry Torokhov wrote:
>      > >    On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe
>      > >    <jgunthorpe@obsidianresearch.com> wrote:
>      > >
>      > >      On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin
>      wrote:
>      > >      > Annotate buffers used in spi transactions as
>      ____cacheline_aligned
>      > >      > to use in DMA transfers.
>      > >      >
>      > >      > Signed-off-by: Andrey Pronin <apronin@chromium.org>
>      > >      >  drivers/char/tpm/st33zp24/spi.c | 4 ++--
>      > >      >  drivers/char/tpm/tpm_tis_spi.c  | 4 ++--
>      > >      >  2 files changed, 4 insertions(+), 4 deletions(-)
>      > >      >
>      > >      > diff --git a/drivers/char/tpm/st33zp24/spi.c
>      > >      b/drivers/char/tpm/st33zp24/spi.c
>      > >      > index 9f5a011..0e9aad9 100644
>      > >      > +++ b/drivers/char/tpm/st33zp24/spi.c
>      > >      > @@ -70,8 +70,8 @@
>      > >      >  struct st33zp24_spi_phy {
>      > >      >       struct spi_device *spi_device;
>      > >      >
>      > >      > -     u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE];
>      > >      > -     u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE];
>      > >      > +     u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE]
>      ____cacheline_aligned;
>      > >      > +     u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE]
>      ____cacheline_aligned;
>      > >      >
>      > >      >       int io_lpcpd;
>      > >      >       int latency;
>      > >
>      > >      Hurm, this still looks wrong to me. Aligning the start of
>      buffers is
>      > >      not enough, the DMA'able space must also end on a cache line
>      as well.
>      > >
>      > >      So, the buffers must also always be placed at the end of the
>      struct.
>      > >
>      > >      IMHO It would be cleaner and safer to always kmalloc the DMA
>      buffer
>      > >      alone than to try and optimize like this.
>      > >
>      > >    In this case moving them to the end of the structure and
>      commenting why
>      > >    they have to be at the end might be less invasive change. More
>      > >    performance-efficient and resilient in low memory situations
>      too.
>      >
>      > kmallocs would be done in the driver initialization:
>      >
>      > * you rarely are in low memory situation
>      > * performance gain/loss is insignificant
>      >
>      > I really don't see your point.
> 
>      I'm fine having them at the end of the structure mainly for simplicity
>      reasons but those arguments just didn't hold at all.
> 
>    Well, the main reason was simplicity and invasiveness of the change.
>    But I still maintain that doing 3 memory allocations instead of 1 is less
>    performant and puts more pressure on the kernel. Yes, it is at bind time,
>    but you do not have to do 3 times work when one allocation will suffice.
>    Also, driver binding does not necessarily happen at boot time. I can
>    always unbind and rebind the driver or reload the module.

I'm fine with either approach.

>    Thanks,
>    Dmitry

/Jarkko

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev
diff mbox

Patch

diff --git a/drivers/char/tpm/st33zp24/spi.c b/drivers/char/tpm/st33zp24/spi.c
index 9f5a011..0e9aad9 100644
--- a/drivers/char/tpm/st33zp24/spi.c
+++ b/drivers/char/tpm/st33zp24/spi.c
@@ -70,8 +70,8 @@ 
 struct st33zp24_spi_phy {
 	struct spi_device *spi_device;
 
-	u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE];
-	u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE];
+	u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
+	u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned;
 
 	int io_lpcpd;
 	int latency;
diff --git a/drivers/char/tpm/tpm_tis_spi.c b/drivers/char/tpm/tpm_tis_spi.c
index dbaad9c..58d7758 100644
--- a/drivers/char/tpm/tpm_tis_spi.c
+++ b/drivers/char/tpm/tpm_tis_spi.c
@@ -48,8 +48,8 @@  struct tpm_tis_spi_phy {
 	struct tpm_tis_data priv;
 	struct spi_device *spi_device;
 
-	u8 tx_buf[MAX_SPI_FRAMESIZE + 4];
-	u8 rx_buf[MAX_SPI_FRAMESIZE + 4];
+	u8 tx_buf[MAX_SPI_FRAMESIZE + 4] ____cacheline_aligned;
+	u8 rx_buf[MAX_SPI_FRAMESIZE + 4] ____cacheline_aligned;
 };
 
 static inline struct tpm_tis_spi_phy *to_tpm_tis_spi_phy(struct tpm_tis_data *data)