Message ID | 0a3b22ec3bc41c26536f3acc8acfd98f9b3207ed.1466440540.git.cyrille.pitchen@atmel.com |
---|---|
State | Superseded |
Headers | show |
Hello, this patch is kind of awesome. I have a few practical concerns however. On 20 June 2016 at 18:50, Cyrille Pitchen <cyrille.pitchen@atmel.com> wrote: > Before this patch, m25p80_read() supported few SPI protocols: > - regular SPI 1-1-1 > - SPI Dual Output 1-1-2 > - SPI Quad Output 1-1-4 > On the other hand, all other m25p80_*() hooks only supported SPI 1-1-1. Under typical use my estimate is that huge majority of data is transferred in _read() seconded by _write(). As I understand it the n-n-n means how many bits you transfer in parallel when sending command-address-data. In _read() the command and data overhead is negligible when you can read kilobytes at once. So difference between 1-1-4 and 4-4-4 is not meaningful performance-wise. Are there flash chips that support one but not the other? For _write() the benefits are even harder to assess. You can presumably write at n-n-4 or n-n-2 if your controller and flash supports it transferring the page faster. And then spend possibly large amount of time waiting for the flash to get ready again. If the programming time is fixed transferring the page faster may or may not have benefits. It may at least free the bus for other devices to use. The _reg_ stuff is probably negligible altogether, Lastly the faster transfers of address bytes seem to be achieved with increasingly longer command codes given how much the maximum command length increased. So even in a page write where the address is a few % of the transfer the benefit of these extra modes is dubious. Overall I wonder how much it is worthwhile to complicate the code to get all these modes in every single function. Thanks Michal
On 06/23/2016 10:35 PM, Michal Suchanek wrote: > Hello, Hi, > this patch is kind of awesome. > > I have a few practical concerns however. > > On 20 June 2016 at 18:50, Cyrille Pitchen <cyrille.pitchen@atmel.com> wrote: >> Before this patch, m25p80_read() supported few SPI protocols: >> - regular SPI 1-1-1 >> - SPI Dual Output 1-1-2 >> - SPI Quad Output 1-1-4 >> On the other hand, all other m25p80_*() hooks only supported SPI 1-1-1. > > Under typical use my estimate is that huge majority of data is > transferred in _read() seconded by _write(). > > As I understand it the n-n-n means how many bits you transfer in > parallel when sending command-address-data. > > In _read() the command and data overhead is negligible when you can > read kilobytes at once. So difference between 1-1-4 and 4-4-4 is not > meaningful performance-wise. Are there flash chips that support one > but not the other? That's quite unlikely. > For _write() the benefits are even harder to assess. The page program usually works on 256B pages, so the math is rather easy. > You can > presumably write at n-n-4 or n-n-2 if your controller and flash > supports it transferring the page faster. And then spend possibly > large amount of time waiting for the flash to get ready again. If the > programming time is fixed transferring the page faster may or may not > have benefits. It may at least free the bus for other devices to use. > > The _reg_ stuff is probably negligible altogether, > > Lastly the faster transfers of address bytes seem to be achieved with > increasingly longer command codes given how much the maximum command > length increased. So even in a page write where the address is a few % > of the transfer the benefit of these extra modes is dubious. > > Overall I wonder how much it is worthwhile to complicate the code to > get all these modes in every single function. In my opinion, 1-1-x makes sense as it is supported by most flashes, while n-m-x where n,m>1 does not make sense as it often requires some stateful change to non-volatile register with little gain. > Thanks > > Michal >
On 23 June 2016 at 22:46, Marek Vasut <marex@denx.de> wrote: > On 06/23/2016 10:35 PM, Michal Suchanek wrote: >> Hello, > > Hi, > >> this patch is kind of awesome. >> >> I have a few practical concerns however. >> >> On 20 June 2016 at 18:50, Cyrille Pitchen <cyrille.pitchen@atmel.com> wrote: >>> Before this patch, m25p80_read() supported few SPI protocols: >>> - regular SPI 1-1-1 >>> - SPI Dual Output 1-1-2 >>> - SPI Quad Output 1-1-4 >>> On the other hand, all other m25p80_*() hooks only supported SPI 1-1-1. >> >> Under typical use my estimate is that huge majority of data is >> transferred in _read() seconded by _write(). >> >> As I understand it the n-n-n means how many bits you transfer in >> parallel when sending command-address-data. >> >> In _read() the command and data overhead is negligible when you can >> read kilobytes at once. So difference between 1-1-4 and 4-4-4 is not >> meaningful performance-wise. Are there flash chips that support one >> but not the other? > > That's quite unlikely. > >> For _write() the benefits are even harder to assess. > > The page program usually works on 256B pages, so the math is rather easy. > >> You can >> presumably write at n-n-4 or n-n-2 if your controller and flash >> supports it transferring the page faster. And then spend possibly >> large amount of time waiting for the flash to get ready again. If the >> programming time is fixed transferring the page faster may or may not >> have benefits. It may at least free the bus for other devices to use. >> >> The _reg_ stuff is probably negligible altogether, >> >> Lastly the faster transfers of address bytes seem to be achieved with >> increasingly longer command codes given how much the maximum command >> length increased. So even in a page write where the address is a few % >> of the transfer the benefit of these extra modes is dubious. >> >> Overall I wonder how much it is worthwhile to complicate the code to >> get all these modes in every single function. > > In my opinion, 1-1-x makes sense as it is supported by most flashes, > while n-m-x where n,m>1 does not make sense as it often requires some > stateful change to non-volatile register with little gain. > There is actually one thing that x-x-x modes make easier. If I were to implement dual mode switch on my SPI master controller it would be probably set for whole message and would not change mid-transfer. Still you can probably simulate x-x-x with 1-1-x by scattering the 1-1-x command bits across more bytes. Thanks Michal
On 06/23/2016 11:58 PM, Michal Suchanek wrote: > On 23 June 2016 at 22:46, Marek Vasut <marex@denx.de> wrote: >> On 06/23/2016 10:35 PM, Michal Suchanek wrote: >>> Hello, >> >> Hi, >> >>> this patch is kind of awesome. >>> >>> I have a few practical concerns however. >>> >>> On 20 June 2016 at 18:50, Cyrille Pitchen <cyrille.pitchen@atmel.com> wrote: >>>> Before this patch, m25p80_read() supported few SPI protocols: >>>> - regular SPI 1-1-1 >>>> - SPI Dual Output 1-1-2 >>>> - SPI Quad Output 1-1-4 >>>> On the other hand, all other m25p80_*() hooks only supported SPI 1-1-1. >>> >>> Under typical use my estimate is that huge majority of data is >>> transferred in _read() seconded by _write(). >>> >>> As I understand it the n-n-n means how many bits you transfer in >>> parallel when sending command-address-data. >>> >>> In _read() the command and data overhead is negligible when you can >>> read kilobytes at once. So difference between 1-1-4 and 4-4-4 is not >>> meaningful performance-wise. Are there flash chips that support one >>> but not the other? >> >> That's quite unlikely. >> >>> For _write() the benefits are even harder to assess. >> >> The page program usually works on 256B pages, so the math is rather easy. >> >>> You can >>> presumably write at n-n-4 or n-n-2 if your controller and flash >>> supports it transferring the page faster. And then spend possibly >>> large amount of time waiting for the flash to get ready again. If the >>> programming time is fixed transferring the page faster may or may not >>> have benefits. It may at least free the bus for other devices to use. >>> >>> The _reg_ stuff is probably negligible altogether, >>> >>> Lastly the faster transfers of address bytes seem to be achieved with >>> increasingly longer command codes given how much the maximum command >>> length increased. So even in a page write where the address is a few % >>> of the transfer the benefit of these extra modes is dubious. >>> >>> Overall I wonder how much it is worthwhile to complicate the code to >>> get all these modes in every single function. >> >> In my opinion, 1-1-x makes sense as it is supported by most flashes, >> while n-m-x where n,m>1 does not make sense as it often requires some >> stateful change to non-volatile register with little gain. >> > > There is actually one thing that x-x-x modes make easier. If I were to > implement dual mode switch on my SPI master controller it would be > probably set for whole message and would not change mid-transfer. Your IP would not sell as customers would like to use it with SPI flashes which can only do 1-1-x modes. These flashes are on the market, today, and thus used and thus you have to support them if you want to make profit. In fact, the SPI flash starts in 1-1-1 mode anyway, thus you need to support that mode. To support other modes, you need to implement simple switch in the hardware which either shifts out a bit a time, two bits on two lines at a time or whatever else ; selecting which one it is must be done synchronous to input clock and on a byte boundary, which is trivial to implement in hardware. > Still you can probably simulate x-x-x with 1-1-x by scattering the > 1-1-x command bits across more bytes. That's not how you usually implement it. It's quite often a shift register. > Thanks > > Michal >
On 24 June 2016 at 00:14, Marek Vasut <marex@denx.de> wrote: > On 06/23/2016 11:58 PM, Michal Suchanek wrote: >> On 23 June 2016 at 22:46, Marek Vasut <marex@denx.de> wrote: >>> On 06/23/2016 10:35 PM, Michal Suchanek wrote: >>>> Hello, >>> >>> Hi, >>> >>>> this patch is kind of awesome. >>>> >>>> I have a few practical concerns however. >>>> >>>> On 20 June 2016 at 18:50, Cyrille Pitchen <cyrille.pitchen@atmel.com> wrote: >>>>> Before this patch, m25p80_read() supported few SPI protocols: >>>>> - regular SPI 1-1-1 >>>>> - SPI Dual Output 1-1-2 >>>>> - SPI Quad Output 1-1-4 >>>>> On the other hand, all other m25p80_*() hooks only supported SPI 1-1-1. >>>> >>>> Under typical use my estimate is that huge majority of data is >>>> transferred in _read() seconded by _write(). >>>> >>>> As I understand it the n-n-n means how many bits you transfer in >>>> parallel when sending command-address-data. >>>> >>>> In _read() the command and data overhead is negligible when you can >>>> read kilobytes at once. So difference between 1-1-4 and 4-4-4 is not >>>> meaningful performance-wise. Are there flash chips that support one >>>> but not the other? >>> >>> That's quite unlikely. >>> >>>> For _write() the benefits are even harder to assess. >>> >>> The page program usually works on 256B pages, so the math is rather easy. >>> >>>> You can >>>> presumably write at n-n-4 or n-n-2 if your controller and flash >>>> supports it transferring the page faster. And then spend possibly >>>> large amount of time waiting for the flash to get ready again. If the >>>> programming time is fixed transferring the page faster may or may not >>>> have benefits. It may at least free the bus for other devices to use. >>>> >>>> The _reg_ stuff is probably negligible altogether, >>>> >>>> Lastly the faster transfers of address bytes seem to be achieved with >>>> increasingly longer command codes given how much the maximum command >>>> length increased. So even in a page write where the address is a few % >>>> of the transfer the benefit of these extra modes is dubious. >>>> >>>> Overall I wonder how much it is worthwhile to complicate the code to >>>> get all these modes in every single function. >>> >>> In my opinion, 1-1-x makes sense as it is supported by most flashes, >>> while n-m-x where n,m>1 does not make sense as it often requires some >>> stateful change to non-volatile register with little gain. >>> >> >> There is actually one thing that x-x-x modes make easier. If I were to >> implement dual mode switch on my SPI master controller it would be >> probably set for whole message and would not change mid-transfer. > > >> Still you can probably simulate x-x-x with 1-1-x by scattering the >> 1-1-x command bits across more bytes. > > That's not how you usually implement it. It's quite often a shift register. > Checking the manual there is a bit in a register that switches the master controller to dual mode receive (only). So the master controller can do 1-1-2 read (only). I don't use that feature because afaict there is no code in m25p80 which does the switch and as pointed out the reg_read commands are done in 1-1-1. If there was similar bit for write you could do 2-2-2 write but any other option would be quite challenging. Thanks Michal
On 06/24/2016 12:43 AM, Michal Suchanek wrote: > On 24 June 2016 at 00:14, Marek Vasut <marex@denx.de> wrote: >> On 06/23/2016 11:58 PM, Michal Suchanek wrote: >>> On 23 June 2016 at 22:46, Marek Vasut <marex@denx.de> wrote: >>>> On 06/23/2016 10:35 PM, Michal Suchanek wrote: >>>>> Hello, >>>> >>>> Hi, >>>> >>>>> this patch is kind of awesome. >>>>> >>>>> I have a few practical concerns however. >>>>> >>>>> On 20 June 2016 at 18:50, Cyrille Pitchen <cyrille.pitchen@atmel.com> wrote: >>>>>> Before this patch, m25p80_read() supported few SPI protocols: >>>>>> - regular SPI 1-1-1 >>>>>> - SPI Dual Output 1-1-2 >>>>>> - SPI Quad Output 1-1-4 >>>>>> On the other hand, all other m25p80_*() hooks only supported SPI 1-1-1. >>>>> >>>>> Under typical use my estimate is that huge majority of data is >>>>> transferred in _read() seconded by _write(). >>>>> >>>>> As I understand it the n-n-n means how many bits you transfer in >>>>> parallel when sending command-address-data. >>>>> >>>>> In _read() the command and data overhead is negligible when you can >>>>> read kilobytes at once. So difference between 1-1-4 and 4-4-4 is not >>>>> meaningful performance-wise. Are there flash chips that support one >>>>> but not the other? >>>> >>>> That's quite unlikely. >>>> >>>>> For _write() the benefits are even harder to assess. >>>> >>>> The page program usually works on 256B pages, so the math is rather easy. >>>> >>>>> You can >>>>> presumably write at n-n-4 or n-n-2 if your controller and flash >>>>> supports it transferring the page faster. And then spend possibly >>>>> large amount of time waiting for the flash to get ready again. If the >>>>> programming time is fixed transferring the page faster may or may not >>>>> have benefits. It may at least free the bus for other devices to use. >>>>> >>>>> The _reg_ stuff is probably negligible altogether, >>>>> >>>>> Lastly the faster transfers of address bytes seem to be achieved with >>>>> increasingly longer command codes given how much the maximum command >>>>> length increased. So even in a page write where the address is a few % >>>>> of the transfer the benefit of these extra modes is dubious. >>>>> >>>>> Overall I wonder how much it is worthwhile to complicate the code to >>>>> get all these modes in every single function. >>>> >>>> In my opinion, 1-1-x makes sense as it is supported by most flashes, >>>> while n-m-x where n,m>1 does not make sense as it often requires some >>>> stateful change to non-volatile register with little gain. >>>> >>> >>> There is actually one thing that x-x-x modes make easier. If I were to >>> implement dual mode switch on my SPI master controller it would be >>> probably set for whole message and would not change mid-transfer. >> > >> >>> Still you can probably simulate x-x-x with 1-1-x by scattering the >>> 1-1-x command bits across more bytes. >> >> That's not how you usually implement it. It's quite often a shift register. >> > > Checking the manual there is a bit in a register that switches the > master controller to dual mode receive (only). So the master > controller can do 1-1-2 read (only). I don't use that feature because > afaict there is no code in m25p80 which does the switch and as pointed > out the reg_read commands are done in 1-1-1. I don't think I understand. Are you talking about some specific controller now ? > If there was similar bit for write you could do 2-2-2 write but any > other option would be quite challenging. > > Thanks > > Michal >
On 24 June 2016 at 00:50, Marek Vasut <marex@denx.de> wrote: > On 06/24/2016 12:43 AM, Michal Suchanek wrote: >> On 24 June 2016 at 00:14, Marek Vasut <marex@denx.de> wrote: >>> On 06/23/2016 11:58 PM, Michal Suchanek wrote: >>>> On 23 June 2016 at 22:46, Marek Vasut <marex@denx.de> wrote: >>>>> On 06/23/2016 10:35 PM, Michal Suchanek wrote: >>>>>> Hello, >>>>> >>>>> Hi, >>>>> >>>>>> this patch is kind of awesome. >>>>>> >>>>>> I have a few practical concerns however. >>>>>> >>>>>> On 20 June 2016 at 18:50, Cyrille Pitchen <cyrille.pitchen@atmel.com> wrote: >>>>>>> Before this patch, m25p80_read() supported few SPI protocols: >>>>>>> - regular SPI 1-1-1 >>>>>>> - SPI Dual Output 1-1-2 >>>>>>> - SPI Quad Output 1-1-4 >>>>>>> On the other hand, all other m25p80_*() hooks only supported SPI 1-1-1. >>>>>> >>>>>> Under typical use my estimate is that huge majority of data is >>>>>> transferred in _read() seconded by _write(). >>>>>> >>>>>> As I understand it the n-n-n means how many bits you transfer in >>>>>> parallel when sending command-address-data. >>>>>> >>>>>> In _read() the command and data overhead is negligible when you can >>>>>> read kilobytes at once. So difference between 1-1-4 and 4-4-4 is not >>>>>> meaningful performance-wise. Are there flash chips that support one >>>>>> but not the other? >>>>> >>>>> That's quite unlikely. >>>>> >>>>>> For _write() the benefits are even harder to assess. >>>>> >>>>> The page program usually works on 256B pages, so the math is rather easy. >>>>> >>>>>> You can >>>>>> presumably write at n-n-4 or n-n-2 if your controller and flash >>>>>> supports it transferring the page faster. And then spend possibly >>>>>> large amount of time waiting for the flash to get ready again. If the >>>>>> programming time is fixed transferring the page faster may or may not >>>>>> have benefits. It may at least free the bus for other devices to use. >>>>>> >>>>>> The _reg_ stuff is probably negligible altogether, >>>>>> >>>>>> Lastly the faster transfers of address bytes seem to be achieved with >>>>>> increasingly longer command codes given how much the maximum command >>>>>> length increased. So even in a page write where the address is a few % >>>>>> of the transfer the benefit of these extra modes is dubious. >>>>>> >>>>>> Overall I wonder how much it is worthwhile to complicate the code to >>>>>> get all these modes in every single function. >>>>> >>>>> In my opinion, 1-1-x makes sense as it is supported by most flashes, >>>>> while n-m-x where n,m>1 does not make sense as it often requires some >>>>> stateful change to non-volatile register with little gain. >>>>> >>>> >>>> There is actually one thing that x-x-x modes make easier. If I were to >>>> implement dual mode switch on my SPI master controller it would be >>>> probably set for whole message and would not change mid-transfer. >>> >> >>> >>>> Still you can probably simulate x-x-x with 1-1-x by scattering the >>>> 1-1-x command bits across more bytes. >>> >>> That's not how you usually implement it. It's quite often a shift register. >>> >> >> Checking the manual there is a bit in a register that switches the >> master controller to dual mode receive (only). So the master >> controller can do 1-1-2 read (only). I don't use that feature because >> afaict there is no code in m25p80 which does the switch and as pointed >> out the reg_read commands are done in 1-1-1. > > I don't think I understand. Are you talking about some specific > controller now ? > Yes, the sunxi spi controller can do dual read according to manual but not sure if anyone has tired that, At least the A31 variant can. Thanks Michal
Le 23/06/2016 22:35, Michal Suchanek a écrit : > Hello, > > this patch is kind of awesome. > > I have a few practical concerns however. > > On 20 June 2016 at 18:50, Cyrille Pitchen <cyrille.pitchen@atmel.com> wrote: >> Before this patch, m25p80_read() supported few SPI protocols: >> - regular SPI 1-1-1 >> - SPI Dual Output 1-1-2 >> - SPI Quad Output 1-1-4 >> On the other hand, all other m25p80_*() hooks only supported SPI 1-1-1. > > Under typical use my estimate is that huge majority of data is > transferred in _read() seconded by _write(). > > As I understand it the n-n-n means how many bits you transfer in > parallel when sending command-address-data. > > In _read() the command and data overhead is negligible when you can > read kilobytes at once. So difference between 1-1-4 and 4-4-4 is not > meaningful performance-wise. Are there flash chips that support one > but not the other? > > For _write() the benefits are even harder to assess. You can > presumably write at n-n-4 or n-n-2 if your controller and flash > supports it transferring the page faster. And then spend possibly > large amount of time waiting for the flash to get ready again. If the > programming time is fixed transferring the page faster may or may not > have benefits. It may at least free the bus for other devices to use. > This series of patches is not supposed to improve performances. Lets take a Macronix mx25l25673g as an example with a 512-byte read (3-byte address): 1 - Fast Read 1-1-4 (0 mode cycle, 8 wait states) command: 8 cycles address/mode/dummy: (3 * 8) + 0 + 8 = 32 cycles data: (512 * 8) / 4 = 1024 cycles total: 8 + 32 + 1024 = 1064 cycles 2 - Fast Read 1-4-4 (2 mode cycles, 4 wait states) command: 8 cycles address/mode/dummy: (3 * 8) / 4 + 2 + 4 = 12 cycles data: (512 * 8) / 4 = 1024 cycles total: 8 + 12 + 1024 = 1044 cycles theoretical benefit: < 2% The real purpose of the whole series is to provide solutions to some known issues. A - Currently the spi-nor force use the B7h op code to enter the 4-byte address mode of > 16MB memories. This mode is state-full; it changes the internal state of the SPI memory. Once in its 4-byte mode, the regular 3-byte Fast Read, Page Program and Sector Erase op codes are expected to be followed by a 4-byte address. Then if a spurious reset occurs, some bootloaders might be stuck as they still expect to use 3-byte addresses. On the other hand, when available, the 4-byte address instruction set is stateless. So we'd rather use it whenever possible. It's the purpose of patch 2. However adding the SPI_NOR_4B_OPCODES on some memory entry is not always possible. Again, let's take the Macronix mx25l25673g memory as an example: it shares the very same JEDEC ID with the older mx25l25635e. The 73g supports the 4-byte address instruction set whereas the 35e doesn't. Hence we can't add the SPI_NOR_4B_OPCODES on the 35e entry without introducing a bug. My first approach was to patch the spi-nor framework to allow two or more memory entries to share the same JEDEC ID and to select the right entry according to the compatible DT string. This solution was rejected. Then my latest solution is to rely on the SFDP tables: the 73g supports the optional 4-byte Address Instruction Table whereas the 35e doesn't support SFDP table at all. Hence, when this table is successfully parsed on the 73g, we know we can safely use the 4-byte op codes. For the 35e case, we still enter the 4-byte address mode as before since there is no other solution... So with this series of patches, no regression with 35e memories but a solution to properly use 73g memories without changing their internal state: the bootloader is now happy. B - Setting the old 'mode' argument of spi_nor_read() to SPI_NOR_QUAD just told the SPI controller supports Fast Read 1-1-4 but nothing about whether it also supports Page Program x-y-4 commands. It is interesting to make the difference between read and write hardware capabilities. Indeed we cannot assume that if a controller can do Fast Read 1-1-4 operations then it can also perform Page Program 1-1-4. I'm pretty sure this statement is false. So let's the SPI controller explicitly declares its read and write hardware capabilities. Once again, I'm not targeting performance improvement with Page Program. However, using other SPI protocols when available helps to deal with some memory quirks. This time, let's take the case of Micron n25q512*. Those memory are > 16MB so we fall into the same issue as described in A. For those Micron memories, depending on the part number (telling us whether the Reset pin is available...) the 12h op code stands for two different operations: - either 3-byte address Page Program x-4-4 (the standard 38h op code for this operation is not available and there is no op code for 4-byte Page Program 1-1-1) - either 4-byte address Page Program 1-1-1 (the standard command associated to the 12h op code) Since they are different part numbers of the same memory family, all those memories once again share the very same JEDEC ID and there no mean to dynamically discover the actual part number (or I didn't find such a mean yet). Anyway, even knowing the part number, depending on the result, the 4-byte Page Program 1-1-1 operation is simply not supported. Hopefully, for all part numbers, 4-byte Page Program 1-1-4 is always supported by the 34h op code, which is the standard hope code for this operation. Then with SPI controllers which explicitly supports Page Program 1-1-4, we could implement something to fix the Page Program operation on Micron memory above 16MB. Hence we have a solution! There are just examples, I guess we could find others. My point is that currently we can't use some QPSI memories properly and its a blocking issue. The whole series should be see as a bug fixes enabler rather than a performance improvement. > The _reg_ stuff is probably negligible altogether, > > Lastly the faster transfers of address bytes seem to be achieved with > increasingly longer command codes given how much the maximum command > length increased. So even in a page write where the address is a few % > of the transfer the benefit of these extra modes is dubious. I'm not sure to understand this point but I guess you refer to: -#define MAX_CMD_SIZE 6 +#define MAX_CMD_SIZE 16 If so, the actual command size doesn't change at all, the increase of this macro value is justified by another reason. Indeed, let's have a look into the m25p80_read_reg(): before the patch it was implemented using spi_write_then_read(). This later function uses an intermediate buffer, which might be kmalloc() allocated, to transfer data with spi_sync() and calls memcpy() to copy data from/to this imtermediate buffer to/from buffers provided as function parameters (txbuf and rxbuf). As the comment says, the purpose of this intermediate buffer is to be "DMA-safe". Then, after the patch, spi_write_then_read() is no longer used but we still need a DMA-safe buffer to perform the data transfer with spi_sync(). This can be achieve with the flash->command[] buffer; we just need to increase its size some more since it is now also filled with the read data whereas it was only filled with the command/address/dummy data before. > > Overall I wonder how much it is worthwhile to complicate the code to > get all these modes in every single function. > > Thanks > > Michal > Best regards, Cyrille
Hi all, + Marcin, According to Marcin's series of patches "mtd: spi-nor: DUAL/QUAD I/O read modes implementation" and the Spansion/Cypress S25FS512S datasheet, new Cypress QSPI memories no longer supports Fast Read 1-1-2 or Fast Read 1-1-4 but only Fast Read x-2-2 and Fast Read x-4-4. Hence one more example of the need to extend the spi-nor framework to support QSPI memories correctly. Also, looking at the datasheets of s25fL512s vs s25fS512s, the default factory settings for the number of (mode + wait state) cycles has changed for Fast Read 1-4-4 (EBh /ECh). default mode wait state L512s: 2 4 S512s: 2 8 Those factory default settings are provided by the Basic Flash Parameter Table from the Serial Flash Discoverable Parameters (JESD216). SFDP tables seem to be supported by both families. Hence this series of patches actually adds support to the new Spansion/Cypress QSPI memories. We will just have to care about setting a proper value in the mode cycles to avoid entering the continuous read mode by mistake. Looking at the JESD216 specification the 0xff value is likely to be a common value for all manufacturers to tell the QSPI memory not to enter (or leave) its continuous read/performance enhancement mode. Entering the continuous read mode tells the QSPI memory that the op code value will be implicit in the following command, which then will start from the address cycles. The continuous mode is mainly (only ?) use for XIP application. Adding support to the continuous read mode might be interesting but should be discussed in a dedicated thread. Best regards, Cyrille Le 27/06/2016 11:52, Cyrille Pitchen a écrit : > Le 23/06/2016 22:35, Michal Suchanek a écrit : >> Hello, >> >> this patch is kind of awesome. >> >> I have a few practical concerns however. >> >> On 20 June 2016 at 18:50, Cyrille Pitchen <cyrille.pitchen@atmel.com> wrote: >>> Before this patch, m25p80_read() supported few SPI protocols: >>> - regular SPI 1-1-1 >>> - SPI Dual Output 1-1-2 >>> - SPI Quad Output 1-1-4 >>> On the other hand, all other m25p80_*() hooks only supported SPI 1-1-1. >> >> Under typical use my estimate is that huge majority of data is >> transferred in _read() seconded by _write(). >> >> As I understand it the n-n-n means how many bits you transfer in >> parallel when sending command-address-data. >> >> In _read() the command and data overhead is negligible when you can >> read kilobytes at once. So difference between 1-1-4 and 4-4-4 is not >> meaningful performance-wise. Are there flash chips that support one >> but not the other? >> >> For _write() the benefits are even harder to assess. You can >> presumably write at n-n-4 or n-n-2 if your controller and flash >> supports it transferring the page faster. And then spend possibly >> large amount of time waiting for the flash to get ready again. If the >> programming time is fixed transferring the page faster may or may not >> have benefits. It may at least free the bus for other devices to use. >> > > This series of patches is not supposed to improve performances. Lets take > a Macronix mx25l25673g as an example with a 512-byte read (3-byte address): > > 1 - Fast Read 1-1-4 (0 mode cycle, 8 wait states) > command: 8 cycles > address/mode/dummy: (3 * 8) + 0 + 8 = 32 cycles > data: (512 * 8) / 4 = 1024 cycles > total: 8 + 32 + 1024 = 1064 cycles > > 2 - Fast Read 1-4-4 (2 mode cycles, 4 wait states) > command: 8 cycles > address/mode/dummy: (3 * 8) / 4 + 2 + 4 = 12 cycles > data: (512 * 8) / 4 = 1024 cycles > total: 8 + 12 + 1024 = 1044 cycles > > theoretical benefit: < 2% > > > The real purpose of the whole series is to provide solutions to some known > issues. > > A - Currently the spi-nor force use the B7h op code to enter the 4-byte > address mode of > 16MB memories. This mode is state-full; it changes the > internal state of the SPI memory. Once in its 4-byte mode, the regular 3-byte > Fast Read, Page Program and Sector Erase op codes are expected to be followed > by a 4-byte address. Then if a spurious reset occurs, some bootloaders might > be stuck as they still expect to use 3-byte addresses. > > On the other hand, when available, the 4-byte address instruction set is > stateless. So we'd rather use it whenever possible. It's the purpose of patch > 2. However adding the SPI_NOR_4B_OPCODES on some memory entry is not always > possible. Again, let's take the Macronix mx25l25673g memory as an example: > it shares the very same JEDEC ID with the older mx25l25635e. The 73g supports > the 4-byte address instruction set whereas the 35e doesn't. Hence we can't > add the SPI_NOR_4B_OPCODES on the 35e entry without introducing a bug. > > My first approach was to patch the spi-nor framework to allow two or more > memory entries to share the same JEDEC ID and to select the right entry > according to the compatible DT string. This solution was rejected. > > Then my latest solution is to rely on the SFDP tables: the 73g supports the > optional 4-byte Address Instruction Table whereas the 35e doesn't support SFDP > table at all. Hence, when this table is successfully parsed on the 73g, we know > we can safely use the 4-byte op codes. For the 35e case, we still enter the > 4-byte address mode as before since there is no other solution... > So with this series of patches, no regression with 35e memories but a solution > to properly use 73g memories without changing their internal state: the > bootloader is now happy. > > > B - Setting the old 'mode' argument of spi_nor_read() to SPI_NOR_QUAD just told > the SPI controller supports Fast Read 1-1-4 but nothing about whether it also > supports Page Program x-y-4 commands. It is interesting to make the difference > between read and write hardware capabilities. Indeed we cannot assume that if > a controller can do Fast Read 1-1-4 operations then it can also perform > Page Program 1-1-4. I'm pretty sure this statement is false. So let's the SPI > controller explicitly declares its read and write hardware capabilities. > > Once again, I'm not targeting performance improvement with Page Program. > However, using other SPI protocols when available helps to deal with some > memory quirks. > > This time, let's take the case of Micron n25q512*. Those memory > are > 16MB so we fall into the same issue as described in A. > For those Micron memories, depending on the part number (telling us whether the > Reset pin is available...) the 12h op code stands for two different operations: > - either 3-byte address Page Program x-4-4 (the standard 38h op code for this > operation is not available and there is no op code for 4-byte Page Program > 1-1-1) > - either 4-byte address Page Program 1-1-1 (the standard command associated to > the 12h op code) > > Since they are different part numbers of the same memory family, all those > memories once again share the very same JEDEC ID and there no mean to > dynamically discover the actual part number (or I didn't find such a mean yet). > Anyway, even knowing the part number, depending on the result, the 4-byte Page > Program 1-1-1 operation is simply not supported. > > Hopefully, for all part numbers, 4-byte Page Program 1-1-4 is always supported > by the 34h op code, which is the standard hope code for this operation. > > Then with SPI controllers which explicitly supports Page Program 1-1-4, we > could implement something to fix the Page Program operation on Micron memory > above 16MB. Hence we have a solution! > > There are just examples, I guess we could find others. My point is that > currently we can't use some QPSI memories properly and its a blocking issue. > The whole series should be see as a bug fixes enabler rather than a performance > improvement. > > >> The _reg_ stuff is probably negligible altogether, >> >> Lastly the faster transfers of address bytes seem to be achieved with >> increasingly longer command codes given how much the maximum command >> length increased. So even in a page write where the address is a few % >> of the transfer the benefit of these extra modes is dubious. > I'm not sure to understand this point but I guess you refer to: > -#define MAX_CMD_SIZE 6 > +#define MAX_CMD_SIZE 16 > > If so, the actual command size doesn't change at all, the increase of this > macro value is justified by another reason. Indeed, let's have a look into > the m25p80_read_reg(): before the patch it was implemented using > spi_write_then_read(). This later function uses an intermediate buffer, > which might be kmalloc() allocated, to transfer data with spi_sync() and calls > memcpy() to copy data from/to this imtermediate buffer to/from buffers > provided as function parameters (txbuf and rxbuf). > As the comment says, the purpose of this intermediate buffer is to be > "DMA-safe". > > Then, after the patch, spi_write_then_read() is no longer used but we still > need a DMA-safe buffer to perform the data transfer with spi_sync(). This can > be achieve with the flash->command[] buffer; we just need to increase its size > some more since it is now also filled with the read data whereas it was only > filled with the command/address/dummy data before. > >> >> Overall I wonder how much it is worthwhile to complicate the code to >> get all these modes in every single function. >> >> Thanks >> >> Michal >> > > Best regards, > > Cyrille >
Hello, > -----Original Message----- > From: Cyrille Pitchen [mailto:cyrille.pitchen@atmel.com] > Sent: Monday, June 27, 2016 2:38 PM > To: Michal Suchanek <hramrach@gmail.com> > Cc: Brian Norris <computersforpeace@gmail.com>; MTD Maling List <linux- > mtd@lists.infradead.org>; Marek Vašut <marex@denx.de>; Boris Brezillon > <boris.brezillon@free-electrons.com>; nicolas.ferre@atmel.com; Linux > Kernel Mailing List <linux-kernel@vger.kernel.org>; Krzeminski, Marcin > (Nokia - PL/Wroclaw) <marcin.krzeminski@nokia.com> > Subject: Re: [PATCH 7/9] mtd: m25p80: add support of dual and quad spi > protocols to all commands > > Hi all, > > + Marcin, > > According to Marcin's series of patches > "mtd: spi-nor: DUAL/QUAD I/O read modes implementation" > and the Spansion/Cypress S25FS512S datasheet, new Cypress QSPI > memories no longer supports Fast Read 1-1-2 or Fast Read 1-1-4 but only Fast > Read x-2-2 and Fast Read x-4-4. > > Hence one more example of the need to extend the spi-nor framework to > support QSPI memories correctly. > > Also, looking at the datasheets of s25fL512s vs s25fS512s, the default factory > settings for the number of (mode + wait state) cycles has changed for Fast > Read 1-4-4 (EBh /ECh). > > default mode wait state > L512s: 2 4 > S512s: 2 8 > > Those factory default settings are provided by the Basic Flash Parameter > Table from the Serial Flash Discoverable Parameters (JESD216). SFDP tables > seem to be supported by both families. > Hello, It is also possible to read number of dummy cycles (wait states) from the flash itself. On this Spansion and Microns(eg. N25Q512) user can change and program dummy cycles count. If we will use default (or hardcoded values like in spi-nor), we will fail fast reads in this case. I will probably implement this in spi-nor for Spansion, since I have use case with low SPI frequency, and dummy cycles count could be 0. > Hence this series of patches actually adds support to the new > Spansion/Cypress QSPI memories. > > We will just have to care about setting a proper value in the mode cycles to > avoid entering the continuous read mode by mistake. > Looking at the JESD216 specification the 0xff value is likely to be a common > value for all manufacturers to tell the QSPI memory not to enter (or leave) its > continuous read/performance enhancement mode. > > Entering the continuous read mode tells the QSPI memory that the op code > value will be implicit in the following command, which then will start from the > address cycles. The continuous mode is mainly (only ?) use for XIP > application. > > Adding support to the continuous read mode might be interesting but should > be discussed in a dedicated thread. > > Best regards, > > Cyrille There is one more think in fs512 and fl512 family. Those devices can have 8x4kB sectors at the beginning or end of the flash. Those sectors can not be erased with erase sector cmd(0xd8), but need to be erased with erase_4k(0x20). To allow erasing those sector mtd eraseregions are needed. According to fs512s datasheet conforming to JESD216 allows to read flash map, so it could be possible to create regions automatically. Best regards, Marcin > > Le 27/06/2016 11:52, Cyrille Pitchen a écrit : > > Le 23/06/2016 22:35, Michal Suchanek a écrit : > >> Hello, > >> > >> this patch is kind of awesome. > >> > >> I have a few practical concerns however. > >> > >> On 20 June 2016 at 18:50, Cyrille Pitchen <cyrille.pitchen@atmel.com> > wrote: > >>> Before this patch, m25p80_read() supported few SPI protocols: > >>> - regular SPI 1-1-1 > >>> - SPI Dual Output 1-1-2 > >>> - SPI Quad Output 1-1-4 > >>> On the other hand, all other m25p80_*() hooks only supported SPI 1-1-1. > >> > >> Under typical use my estimate is that huge majority of data is > >> transferred in _read() seconded by _write(). > >> > >> As I understand it the n-n-n means how many bits you transfer in > >> parallel when sending command-address-data. > >> > >> In _read() the command and data overhead is negligible when you can > >> read kilobytes at once. So difference between 1-1-4 and 4-4-4 is not > >> meaningful performance-wise. Are there flash chips that support one > >> but not the other? > >> > >> For _write() the benefits are even harder to assess. You can > >> presumably write at n-n-4 or n-n-2 if your controller and flash > >> supports it transferring the page faster. And then spend possibly > >> large amount of time waiting for the flash to get ready again. If the > >> programming time is fixed transferring the page faster may or may not > >> have benefits. It may at least free the bus for other devices to use. > >> > > > > This series of patches is not supposed to improve performances. Lets > > take a Macronix mx25l25673g as an example with a 512-byte read (3-byte > address): > > > > 1 - Fast Read 1-1-4 (0 mode cycle, 8 wait states) > > command: 8 cycles > > address/mode/dummy: (3 * 8) + 0 + 8 = 32 cycles > > data: (512 * 8) / 4 = 1024 cycles > > total: 8 + 32 + 1024 = 1064 cycles > > > > 2 - Fast Read 1-4-4 (2 mode cycles, 4 wait states) > > command: 8 cycles > > address/mode/dummy: (3 * 8) / 4 + 2 + 4 = 12 cycles > > data: (512 * 8) / 4 = 1024 cycles > > total: 8 + 12 + 1024 = 1044 cycles > > > > theoretical benefit: < 2% > > > > > > The real purpose of the whole series is to provide solutions to some > > known issues. > > > > A - Currently the spi-nor force use the B7h op code to enter the > > 4-byte address mode of > 16MB memories. This mode is state-full; it > > changes the internal state of the SPI memory. Once in its 4-byte mode, > > the regular 3-byte Fast Read, Page Program and Sector Erase op codes > > are expected to be followed by a 4-byte address. Then if a spurious > > reset occurs, some bootloaders might be stuck as they still expect to use 3- > byte addresses. > > > > On the other hand, when available, the 4-byte address instruction set > > is stateless. So we'd rather use it whenever possible. It's the > > purpose of patch 2. However adding the SPI_NOR_4B_OPCODES on some > > memory entry is not always possible. Again, let's take the Macronix > mx25l25673g memory as an example: > > it shares the very same JEDEC ID with the older mx25l25635e. The 73g > > supports the 4-byte address instruction set whereas the 35e doesn't. > > Hence we can't add the SPI_NOR_4B_OPCODES on the 35e entry without > introducing a bug. > > > > My first approach was to patch the spi-nor framework to allow two or > > more memory entries to share the same JEDEC ID and to select the right > > entry according to the compatible DT string. This solution was rejected. > > > > Then my latest solution is to rely on the SFDP tables: the 73g > > supports the optional 4-byte Address Instruction Table whereas the 35e > > doesn't support SFDP table at all. Hence, when this table is > > successfully parsed on the 73g, we know we can safely use the 4-byte > > op codes. For the 35e case, we still enter the 4-byte address mode as > before since there is no other solution... > > So with this series of patches, no regression with 35e memories but a > > solution to properly use 73g memories without changing their internal > > state: the bootloader is now happy. > > > > > > B - Setting the old 'mode' argument of spi_nor_read() to SPI_NOR_QUAD > > just told the SPI controller supports Fast Read 1-1-4 but nothing > > about whether it also supports Page Program x-y-4 commands. It is > > interesting to make the difference between read and write hardware > > capabilities. Indeed we cannot assume that if a controller can do Fast > > Read 1-1-4 operations then it can also perform Page Program 1-1-4. I'm > > pretty sure this statement is false. So let's the SPI controller explicitly > declares its read and write hardware capabilities. > > > > Once again, I'm not targeting performance improvement with Page > Program. > > However, using other SPI protocols when available helps to deal with > > some memory quirks. > > > > This time, let's take the case of Micron n25q512*. Those memory are > > > 16MB so we fall into the same issue as described in A. > > For those Micron memories, depending on the part number (telling us > > whether the Reset pin is available...) the 12h op code stands for two > different operations: > > - either 3-byte address Page Program x-4-4 (the standard 38h op code for > this > > operation is not available and there is no op code for 4-byte Page Program > > 1-1-1) > > - either 4-byte address Page Program 1-1-1 (the standard command > associated to > > the 12h op code) > > > > Since they are different part numbers of the same memory family, all > > those memories once again share the very same JEDEC ID and there no > > mean to dynamically discover the actual part number (or I didn't find such a > mean yet). > > Anyway, even knowing the part number, depending on the result, the > > 4-byte Page Program 1-1-1 operation is simply not supported. > > > > Hopefully, for all part numbers, 4-byte Page Program 1-1-4 is always > > supported by the 34h op code, which is the standard hope code for this > operation. > > > > Then with SPI controllers which explicitly supports Page Program > > 1-1-4, we could implement something to fix the Page Program operation > > on Micron memory above 16MB. Hence we have a solution! > > > > There are just examples, I guess we could find others. My point is > > that currently we can't use some QPSI memories properly and its a blocking > issue. > > The whole series should be see as a bug fixes enabler rather than a > > performance improvement. > > > > > >> The _reg_ stuff is probably negligible altogether, > >> > >> Lastly the faster transfers of address bytes seem to be achieved with > >> increasingly longer command codes given how much the maximum > command > >> length increased. So even in a page write where the address is a few > >> % of the transfer the benefit of these extra modes is dubious. > > I'm not sure to understand this point but I guess you refer to: > > -#define MAX_CMD_SIZE 6 > > +#define MAX_CMD_SIZE 16 > > > > If so, the actual command size doesn't change at all, the increase of > > this macro value is justified by another reason. Indeed, let's have a > > look into the m25p80_read_reg(): before the patch it was implemented > > using spi_write_then_read(). This later function uses an intermediate > > buffer, which might be kmalloc() allocated, to transfer data with > > spi_sync() and calls > > memcpy() to copy data from/to this imtermediate buffer to/from buffers > > provided as function parameters (txbuf and rxbuf). > > As the comment says, the purpose of this intermediate buffer is to be > > "DMA-safe". > > > > Then, after the patch, spi_write_then_read() is no longer used but we > > still need a DMA-safe buffer to perform the data transfer with > > spi_sync(). This can be achieve with the flash->command[] buffer; we > > just need to increase its size some more since it is now also filled > > with the read data whereas it was only filled with the > command/address/dummy data before. > > > >> > >> Overall I wonder how much it is worthwhile to complicate the code to > >> get all these modes in every single function. > >> > >> Thanks > >> > >> Michal > >> > > > > Best regards, > > > > Cyrille > >
diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c index f0a55c01406b..38778eac6c21 100644 --- a/drivers/mtd/devices/m25p80.c +++ b/drivers/mtd/devices/m25p80.c @@ -27,22 +27,64 @@ #include <linux/spi/flash.h> #include <linux/mtd/spi-nor.h> -#define MAX_CMD_SIZE 6 +#define MAX_CMD_SIZE 16 struct m25p { struct spi_device *spi; struct spi_nor spi_nor; u8 command[MAX_CMD_SIZE]; }; +static inline int m25p80_proto2nbits(enum spi_nor_protocol proto, + unsigned *code_nbits, + unsigned *addr_nbits, + unsigned *data_nbits) +{ + if (code_nbits) + *code_nbits = SNOR_PROTO_CODE_FROM_PROTO(proto); + if (addr_nbits) + *addr_nbits = SNOR_PROTO_ADDR_FROM_PROTO(proto); + if (data_nbits) + *data_nbits = SNOR_PROTO_DATA_FROM_PROTO(proto); + + return 0; +} + static int m25p80_read_reg(struct spi_nor *nor, u8 code, u8 *val, int len) { struct m25p *flash = nor->priv; struct spi_device *spi = flash->spi; + unsigned code_nbits, data_nbits; + struct spi_transfer xfers[2]; int ret; - ret = spi_write_then_read(spi, &code, 1, val, len); + /* Check the total length of command op code and data. */ + if (len + 1 > MAX_CMD_SIZE) + return -EINVAL; + + /* Get transfer protocols (addr_nbits is not relevant here). */ + ret = m25p80_proto2nbits(nor->reg_proto, + &code_nbits, NULL, &data_nbits); + if (ret < 0) + return ret; + + /* Set up transfers. */ + memset(xfers, 0, sizeof(xfers)); + + flash->command[0] = code; + xfers[0].len = 1; + xfers[0].tx_buf = flash->command; + xfers[0].tx_nbits = code_nbits; + + xfers[1].len = len; + xfers[1].rx_buf = &flash->command[1]; + xfers[1].rx_nbits = data_nbits; + + /* Process command. */ + ret = spi_sync_transfer(spi, xfers, 2); if (ret < 0) dev_err(&spi->dev, "error %d reading %x\n", ret, code); + else + memcpy(val, &flash->command[1], len); return ret; } @@ -65,12 +107,42 @@ static int m25p80_write_reg(struct spi_nor *nor, u8 opcode, u8 *buf, int len) { struct m25p *flash = nor->priv; struct spi_device *spi = flash->spi; + unsigned code_nbits, data_nbits, num_xfers = 1; + struct spi_transfer xfers[2]; + int ret; + + /* Check the total length of command op code and data. */ + if (buf && (len + 1 > MAX_CMD_SIZE)) + return -EINVAL; + + /* Get transfer protocols (addr_nbits is not relevant here). */ + ret = m25p80_proto2nbits(nor->reg_proto, + &code_nbits, NULL, &data_nbits); + if (ret < 0) + return ret; + + /* Set up transfer(s). */ + memset(xfers, 0, sizeof(xfers)); flash->command[0] = opcode; - if (buf) + xfers[0].len = 1; + xfers[0].tx_buf = flash->command; + xfers[0].tx_nbits = code_nbits; + + if (buf) { memcpy(&flash->command[1], buf, len); + if (data_nbits == code_nbits) { + xfers[0].len += len; + } else { + xfers[1].len = len; + xfers[1].tx_buf = &flash->command[1]; + xfers[1].tx_nbits = data_nbits; + num_xfers++; + } + } - return spi_write(spi, flash->command, len + 1); + /* Process command. */ + return spi_sync_transfer(spi, xfers, num_xfers); } static ssize_t m25p80_write(struct spi_nor *nor, loff_t to, size_t len, @@ -78,27 +150,48 @@ static ssize_t m25p80_write(struct spi_nor *nor, loff_t to, size_t len, { struct m25p *flash = nor->priv; struct spi_device *spi = flash->spi; - struct spi_transfer t[2] = {}; + unsigned code_nbits, addr_nbits, data_nbits, num_xfers = 1; + struct spi_transfer xfers[3]; struct spi_message m; int cmd_sz = m25p_cmdsz(nor); ssize_t ret; - spi_message_init(&m); - if (nor->program_opcode == SPINOR_OP_AAI_WP && nor->sst_write_second) cmd_sz = 1; - flash->command[0] = nor->program_opcode; - m25p_addr2cmd(nor, to, flash->command); + /* Get transfer protocols. */ + ret = m25p80_proto2nbits(nor->write_proto, + &code_nbits, &addr_nbits, &data_nbits); + if (ret < 0) + return ret; - t[0].tx_buf = flash->command; - t[0].len = cmd_sz; - spi_message_add_tail(&t[0], &m); + /* Set up transfers. */ + memset(xfers, 0, sizeof(xfers)); + + flash->command[0] = nor->program_opcode; + xfers[0].len = 1; + xfers[0].tx_buf = flash->command; + xfers[0].tx_nbits = code_nbits; + + if (cmd_sz > 1) { + m25p_addr2cmd(nor, to, flash->command); + if (addr_nbits == code_nbits) { + xfers[0].len += nor->addr_width; + } else { + xfers[1].len = nor->addr_width; + xfers[1].tx_buf = &flash->command[1]; + xfers[1].tx_nbits = addr_nbits; + num_xfers++; + } + } - t[1].tx_buf = buf; - t[1].len = len; - spi_message_add_tail(&t[1], &m); + xfers[num_xfers].len = len; + xfers[num_xfers].tx_buf = buf; + xfers[num_xfers].tx_nbits = data_nbits; + num_xfers++; + /* Process command. */ + spi_message_init_with_transfers(&m, xfers, num_xfers); ret = spi_sync(spi, &m); if (ret) return ret; @@ -109,18 +202,6 @@ static ssize_t m25p80_write(struct spi_nor *nor, loff_t to, size_t len, return ret; } -static inline unsigned int m25p80_rx_nbits(struct spi_nor *nor) -{ - switch (SNOR_PROTO_DATA_FROM_PROTO(nor->read_proto)) { - case 2: - return 2; - case 4: - return 4; - default: - return 0; - } -} - /* * Read an address range from the nor chip. The address range * may be any size provided it is within the physical boundaries. @@ -130,14 +211,22 @@ static ssize_t m25p80_read(struct spi_nor *nor, loff_t from, size_t len, { struct m25p *flash = nor->priv; struct spi_device *spi = flash->spi; - struct spi_transfer t[2]; - struct spi_message m; + unsigned code_nbits, addr_nbits, data_nbits, num_xfers = 1; unsigned int dummy = nor->read_dummy; ssize_t ret; + struct spi_transfer xfers[3]; + struct spi_message m; + + /* Get transfer protocols. */ + ret = m25p80_proto2nbits(nor->read_proto, + &code_nbits, &addr_nbits, &data_nbits); + if (ret < 0) + return ret; /* convert the dummy cycles to the number of bytes */ - dummy /= 8; + dummy = (dummy * addr_nbits) / 8; + /* Use the SPI flash API if supported. */ if (spi_flash_read_supported(spi)) { struct spi_flash_read_message msg; @@ -149,10 +238,9 @@ static ssize_t m25p80_read(struct spi_nor *nor, loff_t from, size_t len, msg.read_opcode = nor->read_opcode; msg.addr_width = nor->addr_width; msg.dummy_bytes = dummy; - /* TODO: Support other combinations */ - msg.opcode_nbits = SPI_NBITS_SINGLE; - msg.addr_nbits = SPI_NBITS_SINGLE; - msg.data_nbits = m25p80_rx_nbits(nor); + msg.opcode_nbits = code_nbits; + msg.addr_nbits = addr_nbits; + msg.data_nbits = data_nbits; ret = spi_flash_read(spi, &msg); if (ret < 0) @@ -160,21 +248,38 @@ static ssize_t m25p80_read(struct spi_nor *nor, loff_t from, size_t len, return msg.retlen; } - spi_message_init(&m); - memset(t, 0, (sizeof t)); + /* Set up transfers. */ + memset(xfers, 0, sizeof(xfers)); flash->command[0] = nor->read_opcode; - m25p_addr2cmd(nor, from, flash->command); + xfers[0].len = 1; + xfers[0].tx_buf = flash->command; + xfers[0].tx_nbits = code_nbits; - t[0].tx_buf = flash->command; - t[0].len = m25p_cmdsz(nor) + dummy; - spi_message_add_tail(&t[0], &m); + m25p_addr2cmd(nor, from, flash->command); + /* + * Clear all dummy/mode cycle bits to avoid sending some manufacturer + * specific pattern, which might make the memory enter its Continuous + * Read mode by mistake. + */ + memset(flash->command + 1 + nor->addr_width, 0, dummy); + + if (addr_nbits == code_nbits) { + xfers[0].len += nor->addr_width + dummy; + } else { + xfers[1].len = nor->addr_width + dummy; + xfers[1].tx_buf = &flash->command[1]; + xfers[1].tx_nbits = addr_nbits; + num_xfers++; + } - t[1].rx_buf = buf; - t[1].rx_nbits = m25p80_rx_nbits(nor); - t[1].len = min(len, spi_max_transfer_size(spi)); - spi_message_add_tail(&t[1], &m); + xfers[num_xfers].len = len; + xfers[num_xfers].rx_buf = buf; + xfers[num_xfers].rx_nbits = data_nbits; + num_xfers++; + /* Process command. */ + spi_message_init_with_transfers(&m, xfers, num_xfers); ret = spi_sync(spi, &m); if (ret) return ret;
Before this patch, m25p80_read() supported few SPI protocols: - regular SPI 1-1-1 - SPI Dual Output 1-1-2 - SPI Quad Output 1-1-4 On the other hand, all other m25p80_*() hooks only supported SPI 1-1-1. This patch adds support to all currently existing SPI protocols to cover as many protocols as possible. Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> --- drivers/mtd/devices/m25p80.c | 193 +++++++++++++++++++++++++++++++++---------- 1 file changed, 149 insertions(+), 44 deletions(-)