diff mbox

map-ram chip driver: 16-bit block transfer

Message ID 1501851740-14125-1-git-send-email-matthew.weber@rockwellcollins.com
State Changes Requested
Headers show

Commit Message

Matt Weber Aug. 4, 2017, 1:02 p.m. UTC
From: sgtandel <sanjay.tandel@rockwellcollins.com>

This patch adds 16-bit I/O memory write function (map_copy_to16) in map-ram
driver. map-ram driver's block write function(i.e map_copy_to) uses
memcpy_toio() function for block transfer. memcpy_toio() is limited to do
single byte write, irrespective of bankwidth. Therefore,
added map_copy_to16() function to be used by map-ram driver for block
transfers for chips that only supports word(16-bit) transfers. map-ram
driver's single entity read/write functions (such as map_read and
map_write) use bankwidth property(provided by DTS entry) and calls
appropriate readb/readw/readq function. For block write, there was no such
implementation found in map-ram driver. So map_copy_to() calls memcpy_toio
(which in turn calls writeb) for all block write, irrespective of
bankwidth. So wrong or corrupted data might be written into flash memory in
case chips that does not support 8-bit write. This patch adds a condition
in block write function (map_copy_to) to call map_copy_to16(), if bankwidth
is 2. This patch also adds map_copy_from16() function to be used for 16-bit
block read operation. map-ram driver's block read function
(i.e map_copy_from) uses memcpy_fromio() function for all block read,
irrespective of bankwidth. memcpy_fromio is limited to do single byte read.
Therefore, Added map_copy_from16() function to be used by map-ram driver
for block read for chips that only supports word(16-bit) transfers.
map-ram driver's single entity read/write functions (such as map_read and
map_write) already use bankwidth property(provided by DTS entry) and calls
appropriate readb/readw/readq function. For block read, there was no such
implementation found in map-ram driver. So map_copy_from() calls
memcpy_fromio() (which in turn calls writeb) for all block write,
irrespective of bankwidth. This patch adds a condition in block read
function (i.e map_copy_from) to call map_copy_from16(), if bankwidth is 2.

Signed-off-by: Sanjay Tandel <sanjay.tandel@rockwellcollins.com>
Signed-off-by: Matt Weber <matthew.weber@rockwellcollins.com>
---
 include/linux/mtd/map.h | 62 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

Comments

Boris Brezillon Aug. 9, 2017, 6:10 p.m. UTC | #1
+Arnd for the memcpy_to/fromio aspects.

Hi Matt,

Subject should be something like "mtd: map: support 16-bit block
transfers"

Le Fri,  4 Aug 2017 08:02:20 -0500,
Matt Weber <matthew.weber@rockwellcollins.com> a écrit :

> From: sgtandel <sanjay.tandel@rockwellcollins.com>
> 
> This patch adds 16-bit I/O memory write function (map_copy_to16) in map-ram
> driver. map-ram driver's block write function(i.e map_copy_to) uses
> memcpy_toio() function for block transfer. memcpy_toio() is limited to do
> single byte write, irrespective of bankwidth.

That's not true. Depending on the platform (+ config options)
memcpy_to/fromio() can be a simple redirection to memcpy, and
memcpy is likely to be optimized to do 32bit or 64bit accesses when
possible (IOW, when alignment allows it). 

> Therefore,
> added map_copy_to16() function to be used by map-ram driver for block
> transfers for chips that only supports word(16-bit) transfers. map-ram
> driver's single entity read/write functions (such as map_read and
> map_write) use bankwidth property(provided by DTS entry) and calls
> appropriate readb/readw/readq function. For block write, there was no such
> implementation found in map-ram driver. So map_copy_to() calls memcpy_toio
> (which in turn calls writeb) for all block write, irrespective of
> bankwidth.

You're kind of repeating what was said above.

> So wrong or corrupted data might be written into flash memory in
> case chips that does not support 8-bit write.

It's likely to be arch and buffer alignment dependent (see my
explanation about memcpy_to/fromio() being a simple redirection to
memcpy on some architectures).

> This patch adds a condition
> in block write function (map_copy_to) to call map_copy_to16(), if bankwidth
> is 2. This patch also adds map_copy_from16() function to be used for 16-bit
> block read operation. map-ram driver's block read function
> (i.e map_copy_from) uses memcpy_fromio() function for all block read,
> irrespective of bankwidth. memcpy_fromio is limited to do single byte read.
> Therefore, Added map_copy_from16() function to be used by map-ram driver
> for block read for chips that only supports word(16-bit) transfers.
> map-ram driver's single entity read/write functions (such as map_read and
> map_write) already use bankwidth property(provided by DTS entry) and calls
> appropriate readb/readw/readq function. For block read, there was no such
> implementation found in map-ram driver. So map_copy_from() calls
> memcpy_fromio() (which in turn calls writeb) for all block write,
> irrespective of bankwidth. This patch adds a condition in block read
> function (i.e map_copy_from) to call map_copy_from16(), if bankwidth is 2.

Come on! You repeat the same thing again but for the other direction.
I'm sure you can shrink the commit message and make it clearer.

Also, while you're at it, can you please re-format the commit message
to make it readable. Right now it's a huge block of text without any
blank line or line break.

> 
> Signed-off-by: Sanjay Tandel <sanjay.tandel@rockwellcollins.com>
> Signed-off-by: Matt Weber <matthew.weber@rockwellcollins.com>
> ---
>  include/linux/mtd/map.h | 62 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 61 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mtd/map.h b/include/linux/mtd/map.h
> index 3aa56e3..3d7a296 100644
> --- a/include/linux/mtd/map.h
> +++ b/include/linux/mtd/map.h
> @@ -449,17 +449,77 @@ static inline void inline_map_write(struct map_info *map, const map_word datum,
>  	mb();
>  }
>  
> +static void map_copy_from16(void *to, void __iomem *from, size_t count)

You're in a source header and you define a function. If 2 source files
include the same header you'll face a duplicate symbol at link time.
This function should be inlined.

> +{
> +	const unsigned char *t = to;
> +
> +	if (!(IS_ALIGNED((u32)from, sizeof(u16)))) {
> +		from = (void __iomem *)ALIGN((u32)from, sizeof(u16))
> +							- sizeof(u16);
> +		*(u8 *)t = (u8)((readw(from) & 0xff00) >> 8);
> +		count--;
> +		t++;
> +		from += 2;
> +	}
> +	while (count >= 2) {
> +		*(u16 *)t = readw(from);
> +		count -= 2;
> +		t += 2;
> +		from += 2;
> +	};
> +	while (count > 0) {
> +		*(u8 *)t = (u8)(readw(from) & 0x00ff);
> +		count--;
> +		t++;
> +		from++;
> +	}
> +}
> +
>  static inline void inline_map_copy_from(struct map_info *map, void *to, unsigned long from, ssize_t len)
>  {
>  	if (map->cached)
>  		memcpy(to, (char *)map->cached + from, len);
> +	else if (map_bankwidth_is_2(map))
> +		map_copy_from16(to, map->virt + from, len);
>  	else
>  		memcpy_fromio(to, map->virt + from, len);

We're likely to face the same problem with map_bankwidth_is_4()
(AKA 32bit busses), right? Why not patching the code to handler this
case?

BTW, I'm not sure what kind of guarantees are provided by
memcpy_to/fromio(), but I'd expect them to optimize things when they
can, so they might use 32bit or 64bit accesses when alignment
authorizes it. Will that work correctly even on 8bit chips?

>  }
>  
> +static void map_copy_to16(void __iomem *to, const void *from, size_t count)

Ditto -> static inline void


> +{
> +	const unsigned char *f = from;
> +	u16 d;
> +
> +	if (!(IS_ALIGNED((u32)to, sizeof(u16)))) {
> +		to = (void __iomem *)ALIGN((u32)to, sizeof(u16))
> +							- sizeof(u16);
> +		d = (readw(to) & 0x00ff) | ((u16)(*(const u8 *)f) << 8);
> +		writew(d, to);
> +		count--;
> +		to += 2;
> +		f++;
> +	}
> +	while (count >= 2) {
> +		writew(*(const u16 *)f, to);
> +		count -= 2;
> +		to += 2;
> +		f += 2;
> +	};
> +	while (count > 0) {
> +		d = (readw(to) & 0xff00) | (u16)(*(const u8 *)f);
> +		writew(d, to);
> +		count--;
> +		to++;
> +		f++;
> +	}
> +}
> +
>  static inline void inline_map_copy_to(struct map_info *map, unsigned long to, const void *from, ssize_t len)
>  {
> -	memcpy_toio(map->virt + to, from, len);
> +	if (map_bankwidth_is_2(map))
> +		map_copy_to16(map->virt + to, from, len);
> +	else
> +		memcpy_toio(map->virt + to, from, len);
>  }
>  
>  #ifdef CONFIG_MTD_COMPLEX_MAPPINGS
Arnd Bergmann Aug. 9, 2017, 7:51 p.m. UTC | #2
On Wed, Aug 9, 2017 at 8:10 PM, Boris Brezillon
<boris.brezillon@free-electrons.com> wrote:
> +Arnd for the memcpy_to/fromio aspects.
>
> Hi Matt,
>
> Subject should be something like "mtd: map: support 16-bit block
> transfers"
>
> Le Fri,  4 Aug 2017 08:02:20 -0500,
> Matt Weber <matthew.weber@rockwellcollins.com> a écrit :
>
>> From: sgtandel <sanjay.tandel@rockwellcollins.com>
>>
>> This patch adds 16-bit I/O memory write function (map_copy_to16) in map-ram
>> driver. map-ram driver's block write function(i.e map_copy_to) uses
>> memcpy_toio() function for block transfer. memcpy_toio() is limited to do
>> single byte write, irrespective of bankwidth.
>
> That's not true. Depending on the platform (+ config options)
> memcpy_to/fromio() can be a simple redirection to memcpy, and
> memcpy is likely to be optimized to do 32bit or 64bit accesses when
> possible (IOW, when alignment allows it).

right. I think we also have a helper that always does 32-bit
transfers, but I can't find the name at the moment.

>> +{
>> +     const unsigned char *t = to;
>> +
>> +     if (!(IS_ALIGNED((u32)from, sizeof(u16)))) {
>> +             from = (void __iomem *)ALIGN((u32)from, sizeof(u16))
>> +                                                     - sizeof(u16);
>> +             *(u8 *)t = (u8)((readw(from) & 0xff00) >> 8);
>> +             count--;
>> +             t++;
>> +             from += 2;
>> +     }
>> +     while (count >= 2) {
>> +             *(u16 *)t = readw(from);
>> +             count -= 2;
>> +             t += 2;
>> +             from += 2;
>> +     };
>> +     while (count > 0) {
>> +             *(u8 *)t = (u8)(readw(from) & 0x00ff);
>> +             count--;
>> +             t++;
>> +             from++;
>> +     }
>> +}

This looks wrong on bit-endian kernels, which do a byte-swap
in readw. This is one of the few places that probably should use
__raw_readw().

>>  static inline void inline_map_copy_from(struct map_info *map, void *to, unsigned long from, ssize_t len)
>>  {
>>       if (map->cached)
>>               memcpy(to, (char *)map->cached + from, len);
>> +     else if (map_bankwidth_is_2(map))
>> +             map_copy_from16(to, map->virt + from, len);
>>       else
>>               memcpy_fromio(to, map->virt + from, len);
>
> We're likely to face the same problem with map_bankwidth_is_4()
> (AKA 32bit busses), right? Why not patching the code to handler this
> case?
>
> BTW, I'm not sure what kind of guarantees are provided by
> memcpy_to/fromio(), but I'd expect them to optimize things when they
> can, so they might use 32bit or 64bit accesses when alignment
> authorizes it. Will that work correctly even on 8bit chips?

I think it may or may not work, depending on the particular bus
interface.

        Arnd
Sanjay Tandel Aug. 11, 2017, 4:50 p.m. UTC | #3
Hi Boris,

On Wed, Aug 9, 2017 at 1:10 PM, Boris Brezillon
<boris.brezillon@free-electrons.com> wrote:
> +Arnd for the memcpy_to/fromio aspects.
>
> Hi Matt,
>
> Subject should be something like "mtd: map: support 16-bit block
> transfers"
>
> Le Fri,  4 Aug 2017 08:02:20 -0500,
> Matt Weber <matthew.weber@rockwellcollins.com> a écrit :
>
>> From: sgtandel <sanjay.tandel@rockwellcollins.com>
>>
>> This patch adds 16-bit I/O memory write function (map_copy_to16) in map-ram
>> driver. map-ram driver's block write function(i.e map_copy_to) uses
>> memcpy_toio() function for block transfer. memcpy_toio() is limited to do
>> single byte write, irrespective of bankwidth.
>
> That's not true. Depending on the platform (+ config options)
> memcpy_to/fromio() can be a simple redirection to memcpy, and
> memcpy is likely to be optimized to do 32bit or 64bit accesses when
> possible (IOW, when alignment allows it).
>
Agree with you on memcpy_toio/_fromio.
but map-ram driver's map_copy_to/map_copy_from aren't based on bank-width
provided for specific chip.It relies on memcpy_toio, which is dependent on arch,
alignment and size.For example - like in our case of accessing 16-bit
flash, despite
of chip's bank-width being 16-bit, map_copy_to ends up accessing 8-bit using
memcpy_toio, because memcpy_toio for ARM arch always does 8-bit accesses.

Single entity read/write function (like map_read/map_write), on the other hand,
seems to be doing that correctly by checking the bank-width.

>> Therefore,
>> added map_copy_to16() function to be used by map-ram driver for block
>> transfers for chips that only supports word(16-bit) transfers. map-ram
>> driver's single entity read/write functions (such as map_read and
>> map_write) use bankwidth property(provided by DTS entry) and calls
>> appropriate readb/readw/readq function. For block write, there was no such
>> implementation found in map-ram driver. So map_copy_to() calls memcpy_toio
>> (which in turn calls writeb) for all block write, irrespective of
>> bankwidth.
>
> You're kind of repeating what was said above.
>

agree. lot of repeating!

>> So wrong or corrupted data might be written into flash memory in
>> case chips that does not support 8-bit write.
>
> It's likely to be arch and buffer alignment dependent (see my
> explanation about memcpy_to/fromio() being a simple redirection to
> memcpy on some architectures).
>

That's correct and calling memcpy_toio (and not checking chip's bank-width)
could lead to corrupt data.

>> This patch adds a condition
>> in block write function (map_copy_to) to call map_copy_to16(), if bankwidth
>> is 2. This patch also adds map_copy_from16() function to be used for 16-bit
>> block read operation. map-ram driver's block read function
>> (i.e map_copy_from) uses memcpy_fromio() function for all block read,
>> irrespective of bankwidth. memcpy_fromio is limited to do single byte read.
>> Therefore, Added map_copy_from16() function to be used by map-ram driver
>> for block read for chips that only supports word(16-bit) transfers.
>> map-ram driver's single entity read/write functions (such as map_read and
>> map_write) already use bankwidth property(provided by DTS entry) and calls
>> appropriate readb/readw/readq function. For block read, there was no such
>> implementation found in map-ram driver. So map_copy_from() calls
>> memcpy_fromio() (which in turn calls writeb) for all block write,
>> irrespective of bankwidth. This patch adds a condition in block read
>> function (i.e map_copy_from) to call map_copy_from16(), if bankwidth is 2.
>
> Come on! You repeat the same thing again but for the other direction.
> I'm sure you can shrink the commit message and make it clearer.
>
> Also, while you're at it, can you please re-format the commit message
> to make it readable. Right now it's a huge block of text without any
> blank line or line break.
>

How about below description? :)

"Unlike single entity access functions(map_read/map_write) in map-ram,
block access functions (map_copy_to/map_copy_from) aren't based on
bank-width provided for specific chip. It relies on memcpy_toio, which is
dependent on arch, buffer size and alignment.

So add support for block access based on bank-width."


>>
>> Signed-off-by: Sanjay Tandel <sanjay.tandel@rockwellcollins.com>
>> Signed-off-by: Matt Weber <matthew.weber@rockwellcollins.com>
>> ---
>>  include/linux/mtd/map.h | 62 ++++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 61 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/mtd/map.h b/include/linux/mtd/map.h
>> index 3aa56e3..3d7a296 100644
>> --- a/include/linux/mtd/map.h
>> +++ b/include/linux/mtd/map.h
>> @@ -449,17 +449,77 @@ static inline void inline_map_write(struct map_info *map, const map_word datum,
>>       mb();
>>  }
>>
>> +static void map_copy_from16(void *to, void __iomem *from, size_t count)
>
> You're in a source header and you define a function. If 2 source files
> include the same header you'll face a duplicate symbol at link time.
> This function should be inlined.
>

Okay. I will fix this.

>> +{
>> +     const unsigned char *t = to;
>> +
>> +     if (!(IS_ALIGNED((u32)from, sizeof(u16)))) {
>> +             from = (void __iomem *)ALIGN((u32)from, sizeof(u16))
>> +                                                     - sizeof(u16);
>> +             *(u8 *)t = (u8)((readw(from) & 0xff00) >> 8);
>> +             count--;
>> +             t++;
>> +             from += 2;
>> +     }
>> +     while (count >= 2) {
>> +             *(u16 *)t = readw(from);
>> +             count -= 2;
>> +             t += 2;
>> +             from += 2;
>> +     };
>> +     while (count > 0) {
>> +             *(u8 *)t = (u8)(readw(from) & 0x00ff);
>> +             count--;
>> +             t++;
>> +             from++;
>> +     }
>> +}
>> +
>>  static inline void inline_map_copy_from(struct map_info *map, void *to, unsigned long from, ssize_t len)
>>  {
>>       if (map->cached)
>>               memcpy(to, (char *)map->cached + from, len);
>> +     else if (map_bankwidth_is_2(map))
>> +             map_copy_from16(to, map->virt + from, len);
>>       else
>>               memcpy_fromio(to, map->virt + from, len);
>
> We're likely to face the same problem with map_bankwidth_is_4()
> (AKA 32bit busses), right? Why not patching the code to handler this
> case?
>
> BTW, I'm not sure what kind of guarantees are provided by
> memcpy_to/fromio(), but I'd expect them to optimize things when they
> can, so they might use 32bit or 64bit accesses when alignment
> authorizes it. Will that work correctly even on 8bit chips?
>

32-bit access for 8-bit chip may not work. I am working for 32-bit and
8-bit accesses based on bank-width.

>>  }
>>
>> +static void map_copy_to16(void __iomem *to, const void *from, size_t count)
>
> Ditto -> static inline void
>

agree.

>
>> +{
>> +     const unsigned char *f = from;
>> +     u16 d;
>> +
>> +     if (!(IS_ALIGNED((u32)to, sizeof(u16)))) {
>> +             to = (void __iomem *)ALIGN((u32)to, sizeof(u16))
>> +                                                     - sizeof(u16);
>> +             d = (readw(to) & 0x00ff) | ((u16)(*(const u8 *)f) << 8);
>> +             writew(d, to);
>> +             count--;
>> +             to += 2;
>> +             f++;
>> +     }
>> +     while (count >= 2) {
>> +             writew(*(const u16 *)f, to);
>> +             count -= 2;
>> +             to += 2;
>> +             f += 2;
>> +     };
>> +     while (count > 0) {
>> +             d = (readw(to) & 0xff00) | (u16)(*(const u8 *)f);
>> +             writew(d, to);
>> +             count--;
>> +             to++;
>> +             f++;
>> +     }
>> +}
>> +
>>  static inline void inline_map_copy_to(struct map_info *map, unsigned long to, const void *from, ssize_t len)
>>  {
>> -     memcpy_toio(map->virt + to, from, len);
>> +     if (map_bankwidth_is_2(map))
>> +             map_copy_to16(map->virt + to, from, len);
>> +     else
>> +             memcpy_toio(map->virt + to, from, len);
>>  }
>>
>>  #ifdef CONFIG_MTD_COMPLEX_MAPPINGS
>


Thanks,
Sanjay
Sanjay Tandel Aug. 11, 2017, 5:57 p.m. UTC | #4
Hi Arnd,

On Wed, Aug 9, 2017 at 2:51 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wed, Aug 9, 2017 at 8:10 PM, Boris Brezillon
> <boris.brezillon@free-electrons.com> wrote:
>> +Arnd for the memcpy_to/fromio aspects.
>>
>> Hi Matt,
>>
>> Subject should be something like "mtd: map: support 16-bit block
>> transfers"
>>
>> Le Fri,  4 Aug 2017 08:02:20 -0500,
>> Matt Weber <matthew.weber@rockwellcollins.com> a écrit :
>>
>>> From: sgtandel <sanjay.tandel@rockwellcollins.com>
>>>
>>> This patch adds 16-bit I/O memory write function (map_copy_to16) in map-ram
>>> driver. map-ram driver's block write function(i.e map_copy_to) uses
>>> memcpy_toio() function for block transfer. memcpy_toio() is limited to do
>>> single byte write, irrespective of bankwidth.
>>
>> That's not true. Depending on the platform (+ config options)
>> memcpy_to/fromio() can be a simple redirection to memcpy, and
>> memcpy is likely to be optimized to do 32bit or 64bit accesses when
>> possible (IOW, when alignment allows it).
>
> right. I think we also have a helper that always does 32-bit
> transfers, but I can't find the name at the moment.
>

agree.

>>> +{
>>> +     const unsigned char *t = to;
>>> +
>>> +     if (!(IS_ALIGNED((u32)from, sizeof(u16)))) {
>>> +             from = (void __iomem *)ALIGN((u32)from, sizeof(u16))
>>> +                                                     - sizeof(u16);
>>> +             *(u8 *)t = (u8)((readw(from) & 0xff00) >> 8);
>>> +             count--;
>>> +             t++;
>>> +             from += 2;
>>> +     }
>>> +     while (count >= 2) {
>>> +             *(u16 *)t = readw(from);
>>> +             count -= 2;
>>> +             t += 2;
>>> +             from += 2;
>>> +     };
>>> +     while (count > 0) {
>>> +             *(u8 *)t = (u8)(readw(from) & 0x00ff);
>>> +             count--;
>>> +             t++;
>>> +             from++;
>>> +     }
>>> +}
>
> This looks wrong on bit-endian kernels, which do a byte-swap
> in readw. This is one of the few places that probably should use
> __raw_readw().
>

Okay. will provide this change with fix-up patch.

>>>  static inline void inline_map_copy_from(struct map_info *map, void *to, unsigned long from, ssize_t len)
>>>  {
>>>       if (map->cached)
>>>               memcpy(to, (char *)map->cached + from, len);
>>> +     else if (map_bankwidth_is_2(map))
>>> +             map_copy_from16(to, map->virt + from, len);
>>>       else
>>>               memcpy_fromio(to, map->virt + from, len);
>>
>> We're likely to face the same problem with map_bankwidth_is_4()
>> (AKA 32bit busses), right? Why not patching the code to handler this
>> case?
>>
>> BTW, I'm not sure what kind of guarantees are provided by
>> memcpy_to/fromio(), but I'd expect them to optimize things when they
>> can, so they might use 32bit or 64bit accesses when alignment
>> authorizes it. Will that work correctly even on 8bit chips?
>
> I think it may or may not work, depending on the particular bus
> interface.
>

agree...

>         Arnd


Thanks,
Sanjay
Arnd Bergmann Aug. 11, 2017, 7:41 p.m. UTC | #5
On Fri, Aug 11, 2017 at 6:50 PM, Sanjay Tandel
<sanjay.tandel@rockwellcollins.com> wrote:
> On Wed, Aug 9, 2017 at 1:10 PM, Boris Brezillon
> <boris.brezillon@free-electrons.com> wrote:
>> Le Fri,  4 Aug 2017 08:02:20 -0500,
>> Matt Weber <matthew.weber@rockwellcollins.com> a écrit :
>>
>>> From: sgtandel <sanjay.tandel@rockwellcollins.com>
>>>
>>> This patch adds 16-bit I/O memory write function (map_copy_to16) in map-ram
>>> driver. map-ram driver's block write function(i.e map_copy_to) uses
>>> memcpy_toio() function for block transfer. memcpy_toio() is limited to do
>>> single byte write, irrespective of bankwidth.
>>
>> That's not true. Depending on the platform (+ config options)
>> memcpy_to/fromio() can be a simple redirection to memcpy, and
>> memcpy is likely to be optimized to do 32bit or 64bit accesses when
>> possible (IOW, when alignment allows it).
>>
> Agree with you on memcpy_toio/_fromio.
> but map-ram driver's map_copy_to/map_copy_from aren't based on bank-width
> provided for specific chip.It relies on memcpy_toio, which is dependent on arch,
> alignment and size.For example - like in our case of accessing 16-bit
> flash, despite
> of chip's bank-width being 16-bit, map_copy_to ends up accessing 8-bit using
> memcpy_toio, because memcpy_toio for ARM arch always does 8-bit accesses.

On almost all ARM platforms, it would use 32-bit accesses these days. On what
kernel version, ARM platform and endianess do you see memcpy_toio() use 8-bit
access?

       Arnd
Arnd Bergmann Aug. 11, 2017, 7:42 p.m. UTC | #6
On Fri, Aug 11, 2017 at 7:57 PM, Sanjay Tandel
<sanjay.tandel@rockwellcollins.com> wrote:

>>>> +{
>>>> +     const unsigned char *t = to;
>>>> +
>>>> +     if (!(IS_ALIGNED((u32)from, sizeof(u16)))) {
>>>> +             from = (void __iomem *)ALIGN((u32)from, sizeof(u16))
>>>> +                                                     - sizeof(u16);
>>>> +             *(u8 *)t = (u8)((readw(from) & 0xff00) >> 8);
>>>> +             count--;
>>>> +             t++;
>>>> +             from += 2;
>>>> +     }
>>>> +     while (count >= 2) {
>>>> +             *(u16 *)t = readw(from);
>>>> +             count -= 2;
>>>> +             t += 2;
>>>> +             from += 2;
>>>> +     };
>>>> +     while (count > 0) {
>>>> +             *(u8 *)t = (u8)(readw(from) & 0x00ff);
>>>> +             count--;
>>>> +             t++;
>>>> +             from++;
>>>> +     }
>>>> +}
>>
>> This looks wrong on bit-endian kernels, which do a byte-swap
>> in readw. This is one of the few places that probably should use
>> __raw_readw().
>>
>
> Okay. will provide this change with fix-up patch.

On second thought, __raw_readw() can get combined into 32-bit
accesses on some architectures. If you want to be sure you use only
16-bit reads, then cpu_to_le32(readw_relaxed(from)) would be
the safer option.

      Arnd
Sanjay Tandel Aug. 11, 2017, 9:03 p.m. UTC | #7
On Fri, Aug 11, 2017 at 2:41 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Fri, Aug 11, 2017 at 6:50 PM, Sanjay Tandel
> <sanjay.tandel@rockwellcollins.com> wrote:
>> On Wed, Aug 9, 2017 at 1:10 PM, Boris Brezillon
>> <boris.brezillon@free-electrons.com> wrote:
>>> Le Fri,  4 Aug 2017 08:02:20 -0500,
>>> Matt Weber <matthew.weber@rockwellcollins.com> a écrit :
>>>
>>>> From: sgtandel <sanjay.tandel@rockwellcollins.com>
>>>>
>>>> This patch adds 16-bit I/O memory write function (map_copy_to16) in map-ram
>>>> driver. map-ram driver's block write function(i.e map_copy_to) uses
>>>> memcpy_toio() function for block transfer. memcpy_toio() is limited to do
>>>> single byte write, irrespective of bankwidth.
>>>
>>> That's not true. Depending on the platform (+ config options)
>>> memcpy_to/fromio() can be a simple redirection to memcpy, and
>>> memcpy is likely to be optimized to do 32bit or 64bit accesses when
>>> possible (IOW, when alignment allows it).
>>>
>> Agree with you on memcpy_toio/_fromio.
>> but map-ram driver's map_copy_to/map_copy_from aren't based on bank-width
>> provided for specific chip.It relies on memcpy_toio, which is dependent on arch,
>> alignment and size.For example - like in our case of accessing 16-bit
>> flash, despite
>> of chip's bank-width being 16-bit, map_copy_to ends up accessing 8-bit using
>> memcpy_toio, because memcpy_toio for ARM arch always does 8-bit accesses.
>
> On almost all ARM platforms, it would use 32-bit accesses these days. On what
> kernel version, ARM platform and endianess do you see memcpy_toio() use 8-bit
> access?
>

We use version 4.1.8 with Freescale LS1021A( cortex-a7 core, little-endian).
With the newer versions of kernel also, I can see same 8-bit
implementation for 32-bit ARM arch.

BTW, for other arch also memcpy_toio()  uses combination of  32-bit(or
64-bit) and 8-bit accesses,
which may not work here.Because we need all accesses to be of same
size(bank-width).


>        Arnd
Arnd Bergmann Aug. 11, 2017, 9:30 p.m. UTC | #8
On Fri, Aug 11, 2017 at 11:03 PM, Sanjay Tandel
<sanjay.tandel@rockwellcollins.com> wrote:
> On Fri, Aug 11, 2017 at 2:41 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Fri, Aug 11, 2017 at 6:50 PM, Sanjay Tandel
>> On almost all ARM platforms, it would use 32-bit accesses these days. On what
>> kernel version, ARM platform and endianess do you see memcpy_toio() use 8-bit
>> access?
>>
>
> We use version 4.1.8 with Freescale LS1021A( cortex-a7 core, little-endian).
> With the newer versions of kernel also, I can see same 8-bit
> implementation for 32-bit ARM arch.

I see. Commit 7ddfe625cbc1 ("ARM: optimize
memset_io()/memcpy_fromio()/memcpy_toio()")
was merged in linux-4.2 and changed this to use mostly 32-bit accesses and a
bug in it fixed in commit 1bd46782d08b ("ARM: avoid unwanted GCC
memset()/memcpy() optimisations for IO variants").


> BTW, for other arch also memcpy_toio()  uses combination of  32-bit(or
> 64-bit) and 8-bit accesses,
> which may not work here.Because we need all accesses to be of same
> size(bank-width).

Could you try with a newer kernel or with the 7ddfe625cbc1/1bd46782d08b
commits backported to see if that works for you though? At the very least that
should impact the description of the patch we end up applying, since your
current text no longer matches what the kernel does.

      Arnd
Sanjay Tandel Aug. 11, 2017, 11:42 p.m. UTC | #9
On Fri, Aug 11, 2017 at 4:30 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Fri, Aug 11, 2017 at 11:03 PM, Sanjay Tandel
> <sanjay.tandel@rockwellcollins.com> wrote:
>> On Fri, Aug 11, 2017 at 2:41 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>> On Fri, Aug 11, 2017 at 6:50 PM, Sanjay Tandel
>>> On almost all ARM platforms, it would use 32-bit accesses these days. On what
>>> kernel version, ARM platform and endianess do you see memcpy_toio() use 8-bit
>>> access?
>>>
>>
>> We use version 4.1.8 with Freescale LS1021A( cortex-a7 core, little-endian).
>> With the newer versions of kernel also, I can see same 8-bit
>> implementation for 32-bit ARM arch.
>
> I see. Commit 7ddfe625cbc1 ("ARM: optimize
> memset_io()/memcpy_fromio()/memcpy_toio()")
> was merged in linux-4.2 and changed this to use mostly 32-bit accesses and a
> bug in it fixed in commit 1bd46782d08b ("ARM: avoid unwanted GCC
> memset()/memcpy() optimisations for IO variants").
>
>
>> BTW, for other arch also memcpy_toio()  uses combination of  32-bit(or
>> 64-bit) and 8-bit accesses,
>> which may not work here.Because we need all accesses to be of same
>> size(bank-width).
>
> Could you try with a newer kernel or with the 7ddfe625cbc1/1bd46782d08b
> commits backported to see if that works for you though? At the very least that
> should impact the description of the patch we end up applying, since your
> current text no longer matches what the kernel does.
>

I tested back-porting commits 7ddfe625cbc1/1bd46782d08b,
but that didn't work for me.

Before, it used to corrupt every 2nd byte(2nd,4rth,6th ... upto N) in memory,
with memcpy_toio() using 8-bit accesses for our 16-bit chip.

Now, with backporting 7ddfe625cbc1/1bd46782d08b, it corrupts (N + 1)th byte
in memory, when I tried to write N number of bytes (where N is not aligned to
4-byte boundary).That means, it still tries 8-bit access for last odd byte and
ends up corrupting next byte.

>       Arnd

Thanks,
Sanjay
Boris Brezillon Aug. 12, 2017, 5:39 p.m. UTC | #10
Le Fri, 11 Aug 2017 18:42:44 -0500,
Sanjay Tandel <sanjay.tandel@rockwellcollins.com> a écrit :

> On Fri, Aug 11, 2017 at 4:30 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Fri, Aug 11, 2017 at 11:03 PM, Sanjay Tandel
> > <sanjay.tandel@rockwellcollins.com> wrote:  
> >> On Fri, Aug 11, 2017 at 2:41 PM, Arnd Bergmann <arnd@arndb.de> wrote:  
> >>> On Fri, Aug 11, 2017 at 6:50 PM, Sanjay Tandel
> >>> On almost all ARM platforms, it would use 32-bit accesses these days. On what
> >>> kernel version, ARM platform and endianess do you see memcpy_toio() use 8-bit
> >>> access?
> >>>  
> >>
> >> We use version 4.1.8 with Freescale LS1021A( cortex-a7 core, little-endian).
> >> With the newer versions of kernel also, I can see same 8-bit
> >> implementation for 32-bit ARM arch.  
> >
> > I see. Commit 7ddfe625cbc1 ("ARM: optimize
> > memset_io()/memcpy_fromio()/memcpy_toio()")
> > was merged in linux-4.2 and changed this to use mostly 32-bit accesses and a
> > bug in it fixed in commit 1bd46782d08b ("ARM: avoid unwanted GCC
> > memset()/memcpy() optimisations for IO variants").
> >
> >  
> >> BTW, for other arch also memcpy_toio()  uses combination of  32-bit(or
> >> 64-bit) and 8-bit accesses,
> >> which may not work here.Because we need all accesses to be of same
> >> size(bank-width).  
> >
> > Could you try with a newer kernel or with the 7ddfe625cbc1/1bd46782d08b
> > commits backported to see if that works for you though? At the very least that
> > should impact the description of the patch we end up applying, since your
> > current text no longer matches what the kernel does.
> >  
> 
> I tested back-porting commits 7ddfe625cbc1/1bd46782d08b,
> but that didn't work for me.
> 
> Before, it used to corrupt every 2nd byte(2nd,4rth,6th ... upto N) in memory,
> with memcpy_toio() using 8-bit accesses for our 16-bit chip.
> 
> Now, with backporting 7ddfe625cbc1/1bd46782d08b, it corrupts (N + 1)th byte
> in memory, when I tried to write N number of bytes (where N is not aligned to
> 4-byte boundary).That means, it still tries 8-bit access for last odd byte and
> ends up corrupting next byte.

Yes, that's expected as the size is not 16bit aligned.

Anyway, I read the whole thread again and AFAIU memcpy_to/fromio() is
only appropriate if the bus controller the device is connected to is
smart enough to hide all alignment problems to its users.

Don't know what controller is causing trouble here, but you should
probably have a dedicated driver (if that's not already the case) that
overloads the ->copy_from()/->copy_to() methods to do the right thing
for your memory controller.
Arnd Bergmann Aug. 14, 2017, 8:08 a.m. UTC | #11
On Sat, Aug 12, 2017 at 7:39 PM, Boris Brezillon
<boris.brezillon@free-electrons.com> wrote:
> Le Fri, 11 Aug 2017 18:42:44 -0500,
> Sanjay Tandel <sanjay.tandel@rockwellcollins.com> a écrit :
>> On Fri, Aug 11, 2017 at 4:30 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Fri, Aug 11, 2017 at 11:03 PM, Sanjay Tandel
>> > <sanjay.tandel@rockwellcollins.com> wrote:

>> Before, it used to corrupt every 2nd byte(2nd,4rth,6th ... upto N) in memory,
>> with memcpy_toio() using 8-bit accesses for our 16-bit chip.
>>
>> Now, with backporting 7ddfe625cbc1/1bd46782d08b, it corrupts (N + 1)th byte
>> in memory, when I tried to write N number of bytes (where N is not aligned to
>> 4-byte boundary).That means, it still tries 8-bit access for last odd byte and
>> ends up corrupting next byte.
>
> Yes, that's expected as the size is not 16bit aligned.
>
> Anyway, I read the whole thread again and AFAIU memcpy_to/fromio() is
> only appropriate if the bus controller the device is connected to is
> smart enough to hide all alignment problems to its users.
>
> Don't know what controller is causing trouble here, but you should
> probably have a dedicated driver (if that's not already the case) that
> overloads the ->copy_from()/->copy_to() methods to do the right thing
> for your memory controller.

Right, I think that is a good conclusion. My understanding from the discussion
so far is that the existing code in the core mtd layer works efficiently and
correctly on most bus interfaces, so we should keep that but have
workarounds in the controller driver for any bus interface where this is
not the case.

I think it would be different if this was a regression and the core MTD support
worked correctly in the past, but from what I read, it always behaved like
this.

       Arnd
Sanjay Tandel Aug. 14, 2017, 4:33 p.m. UTC | #12
On Mon, Aug 14, 2017 at 3:08 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Sat, Aug 12, 2017 at 7:39 PM, Boris Brezillon
> <boris.brezillon@free-electrons.com> wrote:
>> Le Fri, 11 Aug 2017 18:42:44 -0500,
>> Sanjay Tandel <sanjay.tandel@rockwellcollins.com> a écrit :
>>> On Fri, Aug 11, 2017 at 4:30 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>> > On Fri, Aug 11, 2017 at 11:03 PM, Sanjay Tandel
>>> > <sanjay.tandel@rockwellcollins.com> wrote:
>
>>> Before, it used to corrupt every 2nd byte(2nd,4rth,6th ... upto N) in memory,
>>> with memcpy_toio() using 8-bit accesses for our 16-bit chip.
>>>
>>> Now, with backporting 7ddfe625cbc1/1bd46782d08b, it corrupts (N + 1)th byte
>>> in memory, when I tried to write N number of bytes (where N is not aligned to
>>> 4-byte boundary).That means, it still tries 8-bit access for last odd byte and
>>> ends up corrupting next byte.
>>
>> Yes, that's expected as the size is not 16bit aligned.
>>
>> Anyway, I read the whole thread again and AFAIU memcpy_to/fromio() is
>> only appropriate if the bus controller the device is connected to is
>> smart enough to hide all alignment problems to its users.
>>
>> Don't know what controller is causing trouble here, but you should
>> probably have a dedicated driver (if that's not already the case) that
>> overloads the ->copy_from()/->copy_to() methods to do the right thing
>> for your memory controller.
>
> Right, I think that is a good conclusion. My understanding from the discussion
> so far is that the existing code in the core mtd layer works efficiently and
> correctly on most bus interfaces, so we should keep that but have
> workarounds in the controller driver for any bus interface where this is
> not the case.
>
> I think it would be different if this was a regression and the core MTD support
> worked correctly in the past, but from what I read, it always behaved like
> this.
>

Okay. I will try to create dedicated driver. Thanks!

>        Arnd

Regards,
Sanjay
Matt Weber Aug. 29, 2017, 2:23 a.m. UTC | #13
All,

On Mon, Aug 14, 2017 at 3:08 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Sat, Aug 12, 2017 at 7:39 PM, Boris Brezillon
> <boris.brezillon@free-electrons.com> wrote:
>> Le Fri, 11 Aug 2017 18:42:44 -0500,
>> Sanjay Tandel <sanjay.tandel@rockwellcollins.com> a écrit :
>>> On Fri, Aug 11, 2017 at 4:30 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>> > On Fri, Aug 11, 2017 at 11:03 PM, Sanjay Tandel
>>> > <sanjay.tandel@rockwellcollins.com> wrote:
>
>>> Before, it used to corrupt every 2nd byte(2nd,4rth,6th ... upto N) in memory,
>>> with memcpy_toio() using 8-bit accesses for our 16-bit chip.
>>>
>>> Now, with backporting 7ddfe625cbc1/1bd46782d08b, it corrupts (N + 1)th byte
>>> in memory, when I tried to write N number of bytes (where N is not aligned to
>>> 4-byte boundary).That means, it still tries 8-bit access for last odd byte and
>>> ends up corrupting next byte.
>>
>> Yes, that's expected as the size is not 16bit aligned.
>>
>> Anyway, I read the whole thread again and AFAIU memcpy_to/fromio() is
>> only appropriate if the bus controller the device is connected to is
>> smart enough to hide all alignment problems to its users.
>>
>> Don't know what controller is causing trouble here, but you should
>> probably have a dedicated driver (if that's not already the case) that
>> overloads the ->copy_from()/->copy_to() methods to do the right thing
>> for your memory controller.
>
> Right, I think that is a good conclusion. My understanding from the discussion
> so far is that the existing code in the core mtd layer works efficiently and
> correctly on most bus interfaces, so we should keep that but have
> workarounds in the controller driver for any bus interface where this is
> not the case.
>
> I think it would be different if this was a regression and the core MTD support
> worked correctly in the past, but from what I read, it always behaved like
> this.
>

Related to a dedicated driver, we've reworked the proposed patches
with the following description and file list.

This patch adds map driver for parallel flash chips interfaced over a
NXP Integrated
Flash Controller (IFC). This driver allows either 8-bit or 16-bit accesses,
depending on bank-width, to parallel flash chips(like Everspin MR0A16A),
which are physically mapped to CPU's memory space. For unaligned accesses,
it performs read-modify-write operations to keep access size same as
bank-width.

 Documentation/devicetree/bindings/mtd/ifc-mram.txt |  34 ++
 drivers/mtd/maps/Kconfig                           |  13 +
 drivers/mtd/maps/Makefile                          |   1 +
 drivers/mtd/maps/ifc_mram.c                        | 343 +++++++++++++++++++++


Thanks for taking a look before we submit, we'd like to prevent another version.
Matt
Boris Brezillon Aug. 29, 2017, 7:12 a.m. UTC | #14
Hi Matt,

On Mon, 28 Aug 2017 21:23:41 -0500
Matthew Weber <matthew.weber@rockwellcollins.com> wrote:

> All,
> 
> On Mon, Aug 14, 2017 at 3:08 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Sat, Aug 12, 2017 at 7:39 PM, Boris Brezillon
> > <boris.brezillon@free-electrons.com> wrote:  
> >> Le Fri, 11 Aug 2017 18:42:44 -0500,
> >> Sanjay Tandel <sanjay.tandel@rockwellcollins.com> a écrit :  
> >>> On Fri, Aug 11, 2017 at 4:30 PM, Arnd Bergmann <arnd@arndb.de> wrote:  
> >>> > On Fri, Aug 11, 2017 at 11:03 PM, Sanjay Tandel
> >>> > <sanjay.tandel@rockwellcollins.com> wrote:  
> >  
> >>> Before, it used to corrupt every 2nd byte(2nd,4rth,6th ... upto N) in memory,
> >>> with memcpy_toio() using 8-bit accesses for our 16-bit chip.
> >>>
> >>> Now, with backporting 7ddfe625cbc1/1bd46782d08b, it corrupts (N + 1)th byte
> >>> in memory, when I tried to write N number of bytes (where N is not aligned to
> >>> 4-byte boundary).That means, it still tries 8-bit access for last odd byte and
> >>> ends up corrupting next byte.  
> >>
> >> Yes, that's expected as the size is not 16bit aligned.
> >>
> >> Anyway, I read the whole thread again and AFAIU memcpy_to/fromio() is
> >> only appropriate if the bus controller the device is connected to is
> >> smart enough to hide all alignment problems to its users.
> >>
> >> Don't know what controller is causing trouble here, but you should
> >> probably have a dedicated driver (if that's not already the case) that
> >> overloads the ->copy_from()/->copy_to() methods to do the right thing
> >> for your memory controller.  
> >
> > Right, I think that is a good conclusion. My understanding from the discussion
> > so far is that the existing code in the core mtd layer works efficiently and
> > correctly on most bus interfaces, so we should keep that but have
> > workarounds in the controller driver for any bus interface where this is
> > not the case.
> >
> > I think it would be different if this was a regression and the core MTD support
> > worked correctly in the past, but from what I read, it always behaved like
> > this.
> >  
> 
> Related to a dedicated driver, we've reworked the proposed patches
> with the following description and file list.
> 
> This patch adds map driver for parallel flash chips interfaced over a
> NXP Integrated
> Flash Controller (IFC). This driver allows either 8-bit or 16-bit accesses,
> depending on bank-width, to parallel flash chips(like Everspin MR0A16A),
> which are physically mapped to CPU's memory space. For unaligned accesses,
> it performs read-modify-write operations to keep access size same as
> bank-width.
> 
>  Documentation/devicetree/bindings/mtd/ifc-mram.txt |  34 ++
>  drivers/mtd/maps/Kconfig                           |  13 +
>  drivers/mtd/maps/Makefile                          |   1 +
>  drivers/mtd/maps/ifc_mram.c                        | 343 +++++++++++++++++++++
> 
> 
> Thanks for taking a look before we submit,

The code is missing (we only the diffstat) :-)

> we'd like to prevent another version.

What's the problem with sending a new version the proper way?

Regards,

Boris
diff mbox

Patch

diff --git a/include/linux/mtd/map.h b/include/linux/mtd/map.h
index 3aa56e3..3d7a296 100644
--- a/include/linux/mtd/map.h
+++ b/include/linux/mtd/map.h
@@ -449,17 +449,77 @@  static inline void inline_map_write(struct map_info *map, const map_word datum,
 	mb();
 }
 
+static void map_copy_from16(void *to, void __iomem *from, size_t count)
+{
+	const unsigned char *t = to;
+
+	if (!(IS_ALIGNED((u32)from, sizeof(u16)))) {
+		from = (void __iomem *)ALIGN((u32)from, sizeof(u16))
+							- sizeof(u16);
+		*(u8 *)t = (u8)((readw(from) & 0xff00) >> 8);
+		count--;
+		t++;
+		from += 2;
+	}
+	while (count >= 2) {
+		*(u16 *)t = readw(from);
+		count -= 2;
+		t += 2;
+		from += 2;
+	};
+	while (count > 0) {
+		*(u8 *)t = (u8)(readw(from) & 0x00ff);
+		count--;
+		t++;
+		from++;
+	}
+}
+
 static inline void inline_map_copy_from(struct map_info *map, void *to, unsigned long from, ssize_t len)
 {
 	if (map->cached)
 		memcpy(to, (char *)map->cached + from, len);
+	else if (map_bankwidth_is_2(map))
+		map_copy_from16(to, map->virt + from, len);
 	else
 		memcpy_fromio(to, map->virt + from, len);
 }
 
+static void map_copy_to16(void __iomem *to, const void *from, size_t count)
+{
+	const unsigned char *f = from;
+	u16 d;
+
+	if (!(IS_ALIGNED((u32)to, sizeof(u16)))) {
+		to = (void __iomem *)ALIGN((u32)to, sizeof(u16))
+							- sizeof(u16);
+		d = (readw(to) & 0x00ff) | ((u16)(*(const u8 *)f) << 8);
+		writew(d, to);
+		count--;
+		to += 2;
+		f++;
+	}
+	while (count >= 2) {
+		writew(*(const u16 *)f, to);
+		count -= 2;
+		to += 2;
+		f += 2;
+	};
+	while (count > 0) {
+		d = (readw(to) & 0xff00) | (u16)(*(const u8 *)f);
+		writew(d, to);
+		count--;
+		to++;
+		f++;
+	}
+}
+
 static inline void inline_map_copy_to(struct map_info *map, unsigned long to, const void *from, ssize_t len)
 {
-	memcpy_toio(map->virt + to, from, len);
+	if (map_bankwidth_is_2(map))
+		map_copy_to16(map->virt + to, from, len);
+	else
+		memcpy_toio(map->virt + to, from, len);
 }
 
 #ifdef CONFIG_MTD_COMPLEX_MAPPINGS