mbox series

[RFC,v1,0/2] Amlogic 32-bit Meson SDHC MMC controller driver

Message ID 20190708173330.13217-1-martin.blumenstingl@googlemail.com
Headers show
Series Amlogic 32-bit Meson SDHC MMC controller driver | expand

Message

Martin Blumenstingl July 8, 2019, 5:33 p.m. UTC
Hello,

this is RFC v1 of the attempt to upstream the driver for the "SDHC" MMC
controller found on Meson6, Meson8, Meson8b and Meson8m2 SoCs.

The public S805 (Meson8b) datasheet has some documentation starting on
page 74: [0]

The goal of this RFC v1 is to:
- find out how to set up the MMC clock correctly with Jianxin's help (see
  the description of patch #2 with questions)
- get feedback from the MMC maintainers to see which bits need changing


[0] https://dn.odroid.com/S805/Datasheet/S805_Datasheet%20V0.8%2020150126.pdf


Martin Blumenstingl (2):
  dt-bindings: mmc: Document the Amlogic Meson SDHC MMC host controller
  mmc: host: meson-mx-sdhc: new driver for the Amlogic Meson SDHC host

 .../bindings/mmc/amlogic,meson-mx-sdhc.txt    |   34 +
 drivers/mmc/host/Kconfig                      |   14 +
 drivers/mmc/host/Makefile                     |    1 +
 drivers/mmc/host/meson-mx-sdhc.c              | 1161 +++++++++++++++++
 4 files changed, 1210 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mmc/amlogic,meson-mx-sdhc.txt
 create mode 100644 drivers/mmc/host/meson-mx-sdhc.c

Comments

Martin Blumenstingl July 8, 2019, 5:40 p.m. UTC | #1
Hello Jianxin,

I thought I'd put my questions inline again so it's easier to follow me.
I hope you can help clarify some of the questions I have.

On Mon, Jul 8, 2019 at 7:33 PM Martin Blumenstingl
<martin.blumenstingl@googlemail.com> wrote:
>
> WiP - only partially working - see performance numbers.
>
> Odroid-C1 eMMC (HS-200):
> Amlogic's vendor driver @ Linux 3.10:
>   7781351936 bytes (7.8 GB) copied, 134.714 s, 57.8 MB/s
> This driver:
>   7781351936 bytes (7.8 GB, 7.2 GiB) copied, 189.02 s, 41.2 MB/s
>
> EC-100 eMMC (HS MMC):
> Amlogic's vendor driver @ Linux 3.10:
>   15762194432 bytes (16 GB) copied, 422.967 s, 37.3 MB/s
> This driver:
>   15762194432 bytes (16 GB, 15 GiB) copied, 9232.65 s, 1.7 MB/s
my EC-100 board uses high-speed MMC (no HS-200) mode only and it's
really bad there
on Odroid-C1 the MMC the performance is at ~70% of the 3.10 kernel driver

my thinking is that phase tuning "fixes" the performance on Odroid-C1
(EC-100 doesn't use tuning because it's not using HS-200 mode).
I could be wrong here though. Please let me know if you have any suggestions


[...]
> +               if (mmc->actual_clock > 100000000) {
> +                       rx_clk_phase = 1;
> +               } else if (mmc->actual_clock > 45000000) {
> +                       if (ios->signal_voltage == MMC_SIGNAL_VOLTAGE_330)
> +                               rx_clk_phase = 15;
> +                       else
> +                               rx_clk_phase = 11;
> +               } else if (mmc->actual_clock >= 25000000) {
> +                       rx_clk_phase = 15;
> +               } else if (mmc->actual_clock > 5000000) {
> +                       rx_clk_phase = 23;
> +               } else if (mmc->actual_clock > 1000000) {
> +                       rx_clk_phase = 55;
> +               } else {
> +                       rx_clk_phase = 1061;
> +               }
this MMC clock frequency to RX clock phase mapping only seems to work
for FCLK_DIV3
how do I calculate this dynamically?

[...]
> +static int meson_mx_sdhc_register_clks(struct meson_mx_sdhc_host *host)
> +{
> +       struct clk *mux_parents[MESON_SDHC_PARENT_CLKS];
> +       struct clk *mux_clk, *div_clk;
> +       int i;
> +
> +       for (i = 0; i < MESON_SDHC_PARENT_CLKS; i++)
> +               mux_parents[i] = host->parent_clks[i].clk;
> +
> +       host->clkc_clk_src_sel.reg = host->base + MESON_SDHC_CLKC;
> +       host->clkc_clk_src_sel.shift = __ffs(MESON_SDHC_CLKC_CLK_SRC_SEL);
> +       host->clkc_clk_src_sel.mask = MESON_SDHC_CLKC_CLK_SRC_SEL >>
> +                                     host->clkc_clk_src_sel.shift;
> +       mux_clk = meson_mx_sdhc_register_clk(mmc_dev(host->mmc),
> +                                            &host->clkc_clk_src_sel.hw,
> +                                            "clk_src_sel",
> +                                            MESON_SDHC_PARENT_CLKS,
> +                                            mux_parents,
> +                                            CLK_SET_RATE_NO_REPARENT,
> +                                            &clk_mux_ops);
> +       if (IS_ERR(mux_clk))
> +               return PTR_ERR(mux_clk);
> +
> +       host->clkc_clk_div.reg = host->base + MESON_SDHC_CLKC;
> +       host->clkc_clk_div.shift = __ffs(MESON_SDHC_CLKC_CLK_DIV);
> +       host->clkc_clk_div.width = fls(MESON_SDHC_CLKC_CLK_DIV) -
> +                                  host->clkc_clk_div.shift;
are there any constraints for the divider?
the driver from the Amlogic kernel sources does this, but I'm not sure
what this is trying to achieve (and why):
       clk_div = input_rate / clk_ios - !(input_rate%clk_ios);
       if (!(clk_div & 0x01)) // if even number, turn it to an odd one
          clk_div++;

[...]
> +       mmc->max_busy_timeout = 0; // TODO: actual value?
do you know the actual busy timeout of this IP block?


Thank you for your time!
Regards
Martin
Ulf Hansson Aug. 22, 2019, 1:52 p.m. UTC | #2
On Mon, 8 Jul 2019 at 19:33, Martin Blumenstingl
<martin.blumenstingl@googlemail.com> wrote:
>
> WiP - only partially working - see performance numbers.
>
> Odroid-C1 eMMC (HS-200):
> Amlogic's vendor driver @ Linux 3.10:
>   7781351936 bytes (7.8 GB) copied, 134.714 s, 57.8 MB/s
> This driver:
>   7781351936 bytes (7.8 GB, 7.2 GiB) copied, 189.02 s, 41.2 MB/s
>
> EC-100 eMMC (HS MMC):
> Amlogic's vendor driver @ Linux 3.10:
>   15762194432 bytes (16 GB) copied, 422.967 s, 37.3 MB/s
> This driver:
>   15762194432 bytes (16 GB, 15 GiB) copied, 9232.65 s, 1.7 MB/s
>
> 1) Amlogic's vendor driver does some magic with the divider:
>       clk_div = input_rate / clk_ios - !(input_rate%clk_ios);
>       if (!(clk_div & 0x01)) // if even number, turn it to an odd one
>          clk_div++;
>    It's not clear to me whether what the reason behind this is, what is
>    supposed to be achieved with this?
>
> 2) The hardcoded RX clock phases are taken from the vendor driver. It
>    seems that these are only valid when fclk_div3 is used as input
>    clock (however, there are four more inputs). It's not clear to me how
>    to calculate the RX clock phases in set_ios based on the input clock
>    and the ios rate.
>
> 3) The hardware supports a timeout IRQ but the max_busy_timeout is not
>    documented anywhere.
>
> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>

Martin, overall this looks good to me. Once you moved from RFC to a
formal patch I will check again, of course.

There are a couple of calls to readl_poll_timeout(), for different
reasons, that I have some questions about, but we can discuss those in
the next step.

[...]

Kind regards
Uffe
Martin Blumenstingl Aug. 24, 2019, 9:34 p.m. UTC | #3
Hi Ulf,

On Thu, Aug 22, 2019 at 3:53 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Mon, 8 Jul 2019 at 19:33, Martin Blumenstingl
> <martin.blumenstingl@googlemail.com> wrote:
> >
> > WiP - only partially working - see performance numbers.
> >
> > Odroid-C1 eMMC (HS-200):
> > Amlogic's vendor driver @ Linux 3.10:
> >   7781351936 bytes (7.8 GB) copied, 134.714 s, 57.8 MB/s
> > This driver:
> >   7781351936 bytes (7.8 GB, 7.2 GiB) copied, 189.02 s, 41.2 MB/s
> >
> > EC-100 eMMC (HS MMC):
> > Amlogic's vendor driver @ Linux 3.10:
> >   15762194432 bytes (16 GB) copied, 422.967 s, 37.3 MB/s
> > This driver:
> >   15762194432 bytes (16 GB, 15 GiB) copied, 9232.65 s, 1.7 MB/s
> >
> > 1) Amlogic's vendor driver does some magic with the divider:
> >       clk_div = input_rate / clk_ios - !(input_rate%clk_ios);
> >       if (!(clk_div & 0x01)) // if even number, turn it to an odd one
> >          clk_div++;
> >    It's not clear to me whether what the reason behind this is, what is
> >    supposed to be achieved with this?
> >
> > 2) The hardcoded RX clock phases are taken from the vendor driver. It
> >    seems that these are only valid when fclk_div3 is used as input
> >    clock (however, there are four more inputs). It's not clear to me how
> >    to calculate the RX clock phases in set_ios based on the input clock
> >    and the ios rate.
> >
> > 3) The hardware supports a timeout IRQ but the max_busy_timeout is not
> >    documented anywhere.
> >
> > Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
>
> Martin, overall this looks good to me. Once you moved from RFC to a
> formal patch I will check again, of course.
OK, great

in the meantime I got answers to my questions (off-list) from Jianxin.

also someone asked me (just this week) for the .dts patches so he
could test on his own board (I have them ready but didn't send them
yet)
unfortunately he ran into some data corruption on writing
I can reproduce it but I didn't have time to debug this yet

I'll send an updated version once I have resolved that - as non-RFC

> There are a couple of calls to readl_poll_timeout(), for different
> reasons, that I have some questions about, but we can discuss those in
> the next step.
sure.
feel free to ask now since I still have to debug that data corruption
problem as stated above


Martin