diff mbox series

misc: pci_endpoint_test: Use memcpy_toio()/memcpy_fromio() for BAR tests

Message ID 20240318193019.123795-1-cassel@kernel.org
State New
Headers show
Series misc: pci_endpoint_test: Use memcpy_toio()/memcpy_fromio() for BAR tests | expand

Commit Message

Niklas Cassel March 18, 2024, 7:30 p.m. UTC
The current code uses writel()/readl(), which has an implicit memory
barrier for every single readl()/writel().

Additionally, reading 4 bytes at a time over the PCI bus is not really
optimal, considering that this code is running in an ioctl handler.

Use memcpy_toio()/memcpy_fromio() for BAR tests.

Before patch with a 4MB BAR:
$ time /usr/bin/pcitest -b 1
BAR1:           OKAY
real    0m 1.56s

After patch with a 4MB BAR:
$ time /usr/bin/pcitest -b 1
BAR1:           OKAY
real    0m 0.54s

Signed-off-by: Niklas Cassel <cassel@kernel.org>
---
 drivers/misc/pci_endpoint_test.c | 52 ++++++++++++++++++++++++--------
 1 file changed, 40 insertions(+), 12 deletions(-)

Comments

Arnd Bergmann March 18, 2024, 8:02 p.m. UTC | #1
On Mon, Mar 18, 2024, at 20:30, Niklas Cassel wrote:
> diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
> index 705029ad8eb5..cb6c9ccf3a5f 100644
> --- a/drivers/misc/pci_endpoint_test.c
> +++ b/drivers/misc/pci_endpoint_test.c
> @@ -272,31 +272,59 @@ static const u32 bar_test_pattern[] = {
>  	0xA5A5A5A5,
>  };
> 
> +static int pci_endpoint_test_bar_memcmp(struct pci_endpoint_test *test,
> +					enum pci_barno barno, int offset,
> +					void *write_buf, void *read_buf,
> +					int size)
> +{
> +	memset(write_buf, bar_test_pattern[barno], size);
> +	memcpy_toio(test->bar[barno] + offset, write_buf, size);
> +
> +	/* Make sure that reads are performed after writes. */
> +	mb();
> +	memcpy_fromio(read_buf, test->bar[barno] + offset, size);

Did you see actual bugs without the barrier? On normal PCI
semantics, a read will always force a write to be flushed first.

     Arnd
Manivannan Sadhasivam March 19, 2024, 4:31 a.m. UTC | #2
On Mon, Mar 18, 2024 at 09:02:21PM +0100, Arnd Bergmann wrote:
> On Mon, Mar 18, 2024, at 20:30, Niklas Cassel wrote:
> > diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
> > index 705029ad8eb5..cb6c9ccf3a5f 100644
> > --- a/drivers/misc/pci_endpoint_test.c
> > +++ b/drivers/misc/pci_endpoint_test.c
> > @@ -272,31 +272,59 @@ static const u32 bar_test_pattern[] = {
> >  	0xA5A5A5A5,
> >  };
> > 
> > +static int pci_endpoint_test_bar_memcmp(struct pci_endpoint_test *test,
> > +					enum pci_barno barno, int offset,
> > +					void *write_buf, void *read_buf,
> > +					int size)
> > +{
> > +	memset(write_buf, bar_test_pattern[barno], size);
> > +	memcpy_toio(test->bar[barno] + offset, write_buf, size);
> > +
> > +	/* Make sure that reads are performed after writes. */
> > +	mb();
> > +	memcpy_fromio(read_buf, test->bar[barno] + offset, size);
> 
> Did you see actual bugs without the barrier? On normal PCI
> semantics, a read will always force a write to be flushed first.
> 

Even for non-PCI cases, read/writes to the same region are not reordered, right?

- Mani
Arnd Bergmann March 19, 2024, 6:41 a.m. UTC | #3
On Tue, Mar 19, 2024, at 05:31, Manivannan Sadhasivam wrote:
> On Mon, Mar 18, 2024 at 09:02:21PM +0100, Arnd Bergmann wrote:
>> On Mon, Mar 18, 2024, at 20:30, Niklas Cassel wrote:
>> > diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
>> > index 705029ad8eb5..cb6c9ccf3a5f 100644
>> > --- a/drivers/misc/pci_endpoint_test.c
>> > +++ b/drivers/misc/pci_endpoint_test.c
>> > @@ -272,31 +272,59 @@ static const u32 bar_test_pattern[] = {
>> >  	0xA5A5A5A5,
>> >  };
>> > 
>> > +static int pci_endpoint_test_bar_memcmp(struct pci_endpoint_test *test,
>> > +					enum pci_barno barno, int offset,
>> > +					void *write_buf, void *read_buf,
>> > +					int size)
>> > +{
>> > +	memset(write_buf, bar_test_pattern[barno], size);
>> > +	memcpy_toio(test->bar[barno] + offset, write_buf, size);
>> > +
>> > +	/* Make sure that reads are performed after writes. */
>> > +	mb();
>> > +	memcpy_fromio(read_buf, test->bar[barno] + offset, size);
>> 
>> Did you see actual bugs without the barrier? On normal PCI
>> semantics, a read will always force a write to be flushed first.
>> 
>
> Even for non-PCI cases, read/writes to the same region are not reordered, right?

Correct. The only thing that is special about PCI here is
that writes are explicitly allowed to get posted in the
first place, on other buses the memcpy_toio() by itself
might already require non-posted behavior.

      Arnd
Niklas Cassel March 19, 2024, 4:40 p.m. UTC | #4
Hello Arnd,

On Mon, Mar 18, 2024 at 09:02:21PM +0100, Arnd Bergmann wrote:
> On Mon, Mar 18, 2024, at 20:30, Niklas Cassel wrote:
> > diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
> > index 705029ad8eb5..cb6c9ccf3a5f 100644
> > --- a/drivers/misc/pci_endpoint_test.c
> > +++ b/drivers/misc/pci_endpoint_test.c
> > @@ -272,31 +272,59 @@ static const u32 bar_test_pattern[] = {
> >  	0xA5A5A5A5,
> >  };
> > 
> > +static int pci_endpoint_test_bar_memcmp(struct pci_endpoint_test *test,
> > +					enum pci_barno barno, int offset,
> > +					void *write_buf, void *read_buf,
> > +					int size)
> > +{
> > +	memset(write_buf, bar_test_pattern[barno], size);
> > +	memcpy_toio(test->bar[barno] + offset, write_buf, size);
> > +
> > +	/* Make sure that reads are performed after writes. */
> > +	mb();
> > +	memcpy_fromio(read_buf, test->bar[barno] + offset, size);
> 
> Did you see actual bugs without the barrier? On normal PCI
> semantics, a read will always force a write to be flushed first.

I'm aware that a Read Request must not pass a Posted Request under
normal PCI transaction ordering rules.
(As defined in PCIe 6.0, Table 2-42 Ordering Rules Summary)

I was more worried about the compiler or CPU reordering things:
https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt#L1876-L1878

I did try the patch without the barrier, and did not see any issues.


I did also see this comment:
https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt#L2785-L2790

Do you think that we need to perform any flushing after the memset(),
to ensure that the data written using memcpy_toio() is actually what
we expect it to me?


Kind regards,
Niklas
Manivannan Sadhasivam March 19, 2024, 4:48 p.m. UTC | #5
On Tue, Mar 19, 2024 at 05:40:33PM +0100, Niklas Cassel wrote:
> Hello Arnd,
> 
> On Mon, Mar 18, 2024 at 09:02:21PM +0100, Arnd Bergmann wrote:
> > On Mon, Mar 18, 2024, at 20:30, Niklas Cassel wrote:
> > > diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
> > > index 705029ad8eb5..cb6c9ccf3a5f 100644
> > > --- a/drivers/misc/pci_endpoint_test.c
> > > +++ b/drivers/misc/pci_endpoint_test.c
> > > @@ -272,31 +272,59 @@ static const u32 bar_test_pattern[] = {
> > >  	0xA5A5A5A5,
> > >  };
> > > 
> > > +static int pci_endpoint_test_bar_memcmp(struct pci_endpoint_test *test,
> > > +					enum pci_barno barno, int offset,
> > > +					void *write_buf, void *read_buf,
> > > +					int size)
> > > +{
> > > +	memset(write_buf, bar_test_pattern[barno], size);
> > > +	memcpy_toio(test->bar[barno] + offset, write_buf, size);
> > > +
> > > +	/* Make sure that reads are performed after writes. */
> > > +	mb();
> > > +	memcpy_fromio(read_buf, test->bar[barno] + offset, size);
> > 
> > Did you see actual bugs without the barrier? On normal PCI
> > semantics, a read will always force a write to be flushed first.
> 
> I'm aware that a Read Request must not pass a Posted Request under
> normal PCI transaction ordering rules.
> (As defined in PCIe 6.0, Table 2-42 Ordering Rules Summary)
> 
> I was more worried about the compiler or CPU reordering things:
> https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt#L1876-L1878
> 
> I did try the patch without the barrier, and did not see any issues.
> 

There shouldn't be an issue without the barrier.

> 
> I did also see this comment:
> https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt#L2785-L2790
> 
> Do you think that we need to perform any flushing after the memset(),
> to ensure that the data written using memcpy_toio() is actually what
> we expect it to me?
> 

The documentation recommends cache flushing only if the normal memory write and
MMIO access are dependent. But here you are just accessing the MMIO. So no
explicit ordering or cache flushing is required.

- Mani
Niklas Cassel March 19, 2024, 4:53 p.m. UTC | #6
On Tue, Mar 19, 2024 at 10:18:26PM +0530, Manivannan Sadhasivam wrote:
> > 
> > I did also see this comment:
> > https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt#L2785-L2790
> > 
> > Do you think that we need to perform any flushing after the memset(),
> > to ensure that the data written using memcpy_toio() is actually what
> > we expect it to me?
> > 
> 
> The documentation recommends cache flushing only if the normal memory write and
> MMIO access are dependent. But here you are just accessing the MMIO. So no
> explicit ordering or cache flushing is required.

What does dependent mean in this case then?

Since the data that we are writing to the device is the data that was
just written to memory using memset().


Kind regards,
Niklas
Arnd Bergmann March 19, 2024, 7:55 p.m. UTC | #7
On Tue, Mar 19, 2024, at 17:53, Niklas Cassel wrote:
> On Tue, Mar 19, 2024 at 10:18:26PM +0530, Manivannan Sadhasivam wrote:
>> > 
>> > I did also see this comment:
>> > https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt#L2785-L2790
>> > 
>> > Do you think that we need to perform any flushing after the memset(),
>> > to ensure that the data written using memcpy_toio() is actually what
>> > we expect it to me?
>> > 
>> 
>> The documentation recommends cache flushing only if the normal memory write and
>> MMIO access are dependent. But here you are just accessing the MMIO. So no
>> explicit ordering or cache flushing is required.
>
> What does dependent mean in this case then?
>
> Since the data that we are writing to the device is the data that was
> just written to memory using memset().

You need a barrier for the case where the memset() writes to
a buffer in RAM and then you write the address of that buffer
into a device register, which triggers a DMA read from the
buffer. Without a barrier, the side effect of the MMIO write
may come before the data in the RAM buffer is visible.

A memcpy_fromio() only involves a single master accessing
the memory (i.e. the CPU executing both the memset() and the
memcpy()), so there is no way this can go wrong.

     Arnd
Niklas Cassel March 19, 2024, 9:36 p.m. UTC | #8
Hello Arnd,

On Tue, Mar 19, 2024 at 08:55:19PM +0100, Arnd Bergmann wrote:
> On Tue, Mar 19, 2024, at 17:53, Niklas Cassel wrote:
> > On Tue, Mar 19, 2024 at 10:18:26PM +0530, Manivannan Sadhasivam wrote:
> >> > 
> >> > I did also see this comment:
> >> > https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt#L2785-L2790
> >> > 
> >> > Do you think that we need to perform any flushing after the memset(),
> >> > to ensure that the data written using memcpy_toio() is actually what
> >> > we expect it to me?
> >> > 
> >> 
> >> The documentation recommends cache flushing only if the normal memory write and
> >> MMIO access are dependent. But here you are just accessing the MMIO. So no
> >> explicit ordering or cache flushing is required.
> >
> > What does dependent mean in this case then?
> >
> > Since the data that we are writing to the device is the data that was
> > just written to memory using memset().
> 
> You need a barrier for the case where the memset() writes to
> a buffer in RAM and then you write the address of that buffer
> into a device register, which triggers a DMA read from the
> buffer. Without a barrier, the side effect of the MMIO write
> may come before the data in the RAM buffer is visible.
> 
> A memcpy_fromio() only involves a single master accessing
> the memory (i.e. the CPU executing both the memset() and the
> memcpy()), so there is no way this can go wrong.

Thank you for the clarification.


Kind regards,
Niklas
diff mbox series

Patch

diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index 705029ad8eb5..cb6c9ccf3a5f 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -272,31 +272,59 @@  static const u32 bar_test_pattern[] = {
 	0xA5A5A5A5,
 };
 
+static int pci_endpoint_test_bar_memcmp(struct pci_endpoint_test *test,
+					enum pci_barno barno, int offset,
+					void *write_buf, void *read_buf,
+					int size)
+{
+	memset(write_buf, bar_test_pattern[barno], size);
+	memcpy_toio(test->bar[barno] + offset, write_buf, size);
+
+	/* Make sure that reads are performed after writes. */
+	mb();
+	memcpy_fromio(read_buf, test->bar[barno] + offset, size);
+
+	return memcmp(write_buf, read_buf, size);
+}
+
 static bool pci_endpoint_test_bar(struct pci_endpoint_test *test,
 				  enum pci_barno barno)
 {
-	int j;
-	u32 val;
-	int size;
+	int j, bar_size, buf_size, iters, remain;
+	void *write_buf;
+	void *read_buf;
 	struct pci_dev *pdev = test->pdev;
+	struct device *dev = &pdev->dev;
 
 	if (!test->bar[barno])
 		return false;
 
-	size = pci_resource_len(pdev, barno);
+	bar_size = pci_resource_len(pdev, barno);
 
 	if (barno == test->test_reg_bar)
-		size = 0x4;
+		bar_size = 0x4;
 
-	for (j = 0; j < size; j += 4)
-		pci_endpoint_test_bar_writel(test, barno, j,
-					     bar_test_pattern[barno]);
+	buf_size = min(SZ_1M, bar_size);
 
-	for (j = 0; j < size; j += 4) {
-		val = pci_endpoint_test_bar_readl(test, barno, j);
-		if (val != bar_test_pattern[barno])
+	write_buf = kmalloc(buf_size, GFP_KERNEL);
+	if (!write_buf)
+		return false;
+
+	read_buf = kmalloc(buf_size, GFP_KERNEL);
+	if (!read_buf)
+		return false;
+
+	iters = bar_size / buf_size;
+	for (j = 0; j < iters; j++)
+		if (pci_endpoint_test_bar_memcmp(test, barno, buf_size * j,
+						 write_buf, read_buf, buf_size))
+			return false;
+
+	remain = bar_size % buf_size;
+	if (remain)
+		if (pci_endpoint_test_bar_memcmp(test, barno, buf_size * iters,
+						 write_buf, read_buf, remain))
 			return false;
-	}
 
 	return true;
 }