Message ID | 20190708173330.13217-1-martin.blumenstingl@googlemail.com |
---|---|
Headers | show |
Series | Amlogic 32-bit Meson SDHC MMC controller driver | expand |
Hello Jianxin, I thought I'd put my questions inline again so it's easier to follow me. I hope you can help clarify some of the questions I have. On Mon, Jul 8, 2019 at 7:33 PM Martin Blumenstingl <martin.blumenstingl@googlemail.com> wrote: > > WiP - only partially working - see performance numbers. > > Odroid-C1 eMMC (HS-200): > Amlogic's vendor driver @ Linux 3.10: > 7781351936 bytes (7.8 GB) copied, 134.714 s, 57.8 MB/s > This driver: > 7781351936 bytes (7.8 GB, 7.2 GiB) copied, 189.02 s, 41.2 MB/s > > EC-100 eMMC (HS MMC): > Amlogic's vendor driver @ Linux 3.10: > 15762194432 bytes (16 GB) copied, 422.967 s, 37.3 MB/s > This driver: > 15762194432 bytes (16 GB, 15 GiB) copied, 9232.65 s, 1.7 MB/s my EC-100 board uses high-speed MMC (no HS-200) mode only and it's really bad there on Odroid-C1 the MMC the performance is at ~70% of the 3.10 kernel driver my thinking is that phase tuning "fixes" the performance on Odroid-C1 (EC-100 doesn't use tuning because it's not using HS-200 mode). I could be wrong here though. Please let me know if you have any suggestions [...] > + if (mmc->actual_clock > 100000000) { > + rx_clk_phase = 1; > + } else if (mmc->actual_clock > 45000000) { > + if (ios->signal_voltage == MMC_SIGNAL_VOLTAGE_330) > + rx_clk_phase = 15; > + else > + rx_clk_phase = 11; > + } else if (mmc->actual_clock >= 25000000) { > + rx_clk_phase = 15; > + } else if (mmc->actual_clock > 5000000) { > + rx_clk_phase = 23; > + } else if (mmc->actual_clock > 1000000) { > + rx_clk_phase = 55; > + } else { > + rx_clk_phase = 1061; > + } this MMC clock frequency to RX clock phase mapping only seems to work for FCLK_DIV3 how do I calculate this dynamically? [...] > +static int meson_mx_sdhc_register_clks(struct meson_mx_sdhc_host *host) > +{ > + struct clk *mux_parents[MESON_SDHC_PARENT_CLKS]; > + struct clk *mux_clk, *div_clk; > + int i; > + > + for (i = 0; i < MESON_SDHC_PARENT_CLKS; i++) > + mux_parents[i] = host->parent_clks[i].clk; > + > + host->clkc_clk_src_sel.reg = host->base + MESON_SDHC_CLKC; > + host->clkc_clk_src_sel.shift = __ffs(MESON_SDHC_CLKC_CLK_SRC_SEL); > + host->clkc_clk_src_sel.mask = MESON_SDHC_CLKC_CLK_SRC_SEL >> > + host->clkc_clk_src_sel.shift; > + mux_clk = meson_mx_sdhc_register_clk(mmc_dev(host->mmc), > + &host->clkc_clk_src_sel.hw, > + "clk_src_sel", > + MESON_SDHC_PARENT_CLKS, > + mux_parents, > + CLK_SET_RATE_NO_REPARENT, > + &clk_mux_ops); > + if (IS_ERR(mux_clk)) > + return PTR_ERR(mux_clk); > + > + host->clkc_clk_div.reg = host->base + MESON_SDHC_CLKC; > + host->clkc_clk_div.shift = __ffs(MESON_SDHC_CLKC_CLK_DIV); > + host->clkc_clk_div.width = fls(MESON_SDHC_CLKC_CLK_DIV) - > + host->clkc_clk_div.shift; are there any constraints for the divider? the driver from the Amlogic kernel sources does this, but I'm not sure what this is trying to achieve (and why): clk_div = input_rate / clk_ios - !(input_rate%clk_ios); if (!(clk_div & 0x01)) // if even number, turn it to an odd one clk_div++; [...] > + mmc->max_busy_timeout = 0; // TODO: actual value? do you know the actual busy timeout of this IP block? Thank you for your time! Regards Martin
On Mon, 8 Jul 2019 at 19:33, Martin Blumenstingl <martin.blumenstingl@googlemail.com> wrote: > > WiP - only partially working - see performance numbers. > > Odroid-C1 eMMC (HS-200): > Amlogic's vendor driver @ Linux 3.10: > 7781351936 bytes (7.8 GB) copied, 134.714 s, 57.8 MB/s > This driver: > 7781351936 bytes (7.8 GB, 7.2 GiB) copied, 189.02 s, 41.2 MB/s > > EC-100 eMMC (HS MMC): > Amlogic's vendor driver @ Linux 3.10: > 15762194432 bytes (16 GB) copied, 422.967 s, 37.3 MB/s > This driver: > 15762194432 bytes (16 GB, 15 GiB) copied, 9232.65 s, 1.7 MB/s > > 1) Amlogic's vendor driver does some magic with the divider: > clk_div = input_rate / clk_ios - !(input_rate%clk_ios); > if (!(clk_div & 0x01)) // if even number, turn it to an odd one > clk_div++; > It's not clear to me whether what the reason behind this is, what is > supposed to be achieved with this? > > 2) The hardcoded RX clock phases are taken from the vendor driver. It > seems that these are only valid when fclk_div3 is used as input > clock (however, there are four more inputs). It's not clear to me how > to calculate the RX clock phases in set_ios based on the input clock > and the ios rate. > > 3) The hardware supports a timeout IRQ but the max_busy_timeout is not > documented anywhere. > > Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Martin, overall this looks good to me. Once you moved from RFC to a formal patch I will check again, of course. There are a couple of calls to readl_poll_timeout(), for different reasons, that I have some questions about, but we can discuss those in the next step. [...] Kind regards Uffe
Hi Ulf, On Thu, Aug 22, 2019 at 3:53 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Mon, 8 Jul 2019 at 19:33, Martin Blumenstingl > <martin.blumenstingl@googlemail.com> wrote: > > > > WiP - only partially working - see performance numbers. > > > > Odroid-C1 eMMC (HS-200): > > Amlogic's vendor driver @ Linux 3.10: > > 7781351936 bytes (7.8 GB) copied, 134.714 s, 57.8 MB/s > > This driver: > > 7781351936 bytes (7.8 GB, 7.2 GiB) copied, 189.02 s, 41.2 MB/s > > > > EC-100 eMMC (HS MMC): > > Amlogic's vendor driver @ Linux 3.10: > > 15762194432 bytes (16 GB) copied, 422.967 s, 37.3 MB/s > > This driver: > > 15762194432 bytes (16 GB, 15 GiB) copied, 9232.65 s, 1.7 MB/s > > > > 1) Amlogic's vendor driver does some magic with the divider: > > clk_div = input_rate / clk_ios - !(input_rate%clk_ios); > > if (!(clk_div & 0x01)) // if even number, turn it to an odd one > > clk_div++; > > It's not clear to me whether what the reason behind this is, what is > > supposed to be achieved with this? > > > > 2) The hardcoded RX clock phases are taken from the vendor driver. It > > seems that these are only valid when fclk_div3 is used as input > > clock (however, there are four more inputs). It's not clear to me how > > to calculate the RX clock phases in set_ios based on the input clock > > and the ios rate. > > > > 3) The hardware supports a timeout IRQ but the max_busy_timeout is not > > documented anywhere. > > > > Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> > > Martin, overall this looks good to me. Once you moved from RFC to a > formal patch I will check again, of course. OK, great in the meantime I got answers to my questions (off-list) from Jianxin. also someone asked me (just this week) for the .dts patches so he could test on his own board (I have them ready but didn't send them yet) unfortunately he ran into some data corruption on writing I can reproduce it but I didn't have time to debug this yet I'll send an updated version once I have resolved that - as non-RFC > There are a couple of calls to readl_poll_timeout(), for different > reasons, that I have some questions about, but we can discuss those in > the next step. sure. feel free to ask now since I still have to debug that data corruption problem as stated above Martin