Message ID | 20201221165501.717101-1-u.kleine-koenig@pengutronix.de |
---|---|
State | Changes Requested |
Headers | show |
Series | pwm: bcm2835: Improve period and duty cycle calculation | expand |
On 12/21/2020 8:55 AM, Uwe Kleine-König wrote: > With an input clk rate bigger than 2000000000, scaler would have been > zero which then would have resulted in a division by zero. > > Also the originally implemented algorithm divided by the result of a > division. This nearly always looses precision. Consider a requested period > of 1000000 ns. With an input clock frequency of 32786885 Hz the hardware > was configured with an actual period of 983869.007 ns (PERIOD = 32258) > while the hardware can provide 1000003.508 ns (PERIOD = 32787). > And note if the input clock frequency was 32786886 Hz instead, the hardware > was configured to 1016656.477 ns (PERIOD = 33333) while the optimal > setting results in 1000003.477 ns (PERIOD = 32787). > > This patch implements proper range checking and only divides once for > the calculation of period (and similar for duty_cycle). > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> > --- > Hello, > > during review of the bcm2835 driver I noticed this double division. > > I think the practical relevance is low however because the clock rate is > fixed to 10 MHz on this platform which doesn't result in these > deviations. (Is this right, what is the actual rate?) Currently this is correct but the PWM input clock can be configured from the divider of a PLL that runs at 500MHz so this change is potentially useful in that regard.
Hello Florian, On Mon, Dec 21, 2020 at 03:04:25PM -0800, Florian Fainelli wrote: > On 12/21/2020 8:55 AM, Uwe Kleine-König wrote: > > With an input clk rate bigger than 2000000000, scaler would have been > > zero which then would have resulted in a division by zero. > > > > Also the originally implemented algorithm divided by the result of a > > division. This nearly always looses precision. Consider a requested period > > of 1000000 ns. With an input clock frequency of 32786885 Hz the hardware > > was configured with an actual period of 983869.007 ns (PERIOD = 32258) > > while the hardware can provide 1000003.508 ns (PERIOD = 32787). > > And note if the input clock frequency was 32786886 Hz instead, the hardware > > was configured to 1016656.477 ns (PERIOD = 33333) while the optimal > > setting results in 1000003.477 ns (PERIOD = 32787). > > > > This patch implements proper range checking and only divides once for > > the calculation of period (and similar for duty_cycle). > > > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> > > --- > > Hello, > > > > during review of the bcm2835 driver I noticed this double division. > > > > I think the practical relevance is low however because the clock rate is > > fixed to 10 MHz on this platform which doesn't result in these > > deviations. (Is this right, what is the actual rate?) > > Currently this is correct but the PWM input clock can be configured from > the divider of a PLL that runs at 500MHz so this change is potentially > useful in that regard. Thanks for your feedback; is this an Ack? Best regards Uwe
Hi, On 21.12.20 at 17:55, Uwe Kleine-König wrote: > With an input clk rate bigger than 2000000000, scaler would have been > zero which then would have resulted in a division by zero. > > Also the originally implemented algorithm divided by the result of a > division. This nearly always looses precision. Consider a requested period > of 1000000 ns. With an input clock frequency of 32786885 Hz the hardware > was configured with an actual period of 983869.007 ns (PERIOD = 32258) > while the hardware can provide 1000003.508 ns (PERIOD = 32787). > And note if the input clock frequency was 32786886 Hz instead, the hardware > was configured to 1016656.477 ns (PERIOD = 33333) while the optimal > setting results in 1000003.477 ns (PERIOD = 32787). > > This patch implements proper range checking and only divides once for > the calculation of period (and similar for duty_cycle). > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> > --- > Hello, > > during review of the bcm2835 driver I noticed this double division. > > I think the practical relevance is low however because the clock rate is > fixed to 10 MHz on this platform which doesn't result in these > deviations. (Is this right, what is the actual rate?) clk_get_rate() on my Raspberry Pi 4 returns 9999999. > > Having said that this patch is only compile tested. I tested this on the RP and everything still works so far, so FWIW Tested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> > + > + if (state->period > max_period) > + period = max_period; > + else > + period = state->period; > + We return EINVAL if the value for period is too small, so should we not do the same for the case that the value is too high? Regards, Lino
Hello Lino, On Tue, Dec 22, 2020 at 05:34:00PM +0100, Lino Sanfilippo wrote: > On 21.12.20 at 17:55, Uwe Kleine-König wrote: > > With an input clk rate bigger than 2000000000, scaler would have been > > zero which then would have resulted in a division by zero. > > > > Also the originally implemented algorithm divided by the result of a > > division. This nearly always looses precision. Consider a requested period > > of 1000000 ns. With an input clock frequency of 32786885 Hz the hardware > > was configured with an actual period of 983869.007 ns (PERIOD = 32258) > > while the hardware can provide 1000003.508 ns (PERIOD = 32787). > > And note if the input clock frequency was 32786886 Hz instead, the hardware > > was configured to 1016656.477 ns (PERIOD = 33333) while the optimal > > setting results in 1000003.477 ns (PERIOD = 32787). > > > > This patch implements proper range checking and only divides once for > > the calculation of period (and similar for duty_cycle). > > > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> > > --- > > Hello, > > > > during review of the bcm2835 driver I noticed this double division. > > > > I think the practical relevance is low however because the clock rate is > > fixed to 10 MHz on this platform which doesn't result in these > > deviations. (Is this right, what is the actual rate?) > > clk_get_rate() on my Raspberry Pi 4 returns 9999999. oh, then my patch is a real improvement, but the problems are not as bad as my examples above. So when a period of 350ns was requested, the old algorithm provided 400.000040000004 ns and the improved yields 300.000030000003 ns. > I tested this on the RP and everything still works so far, so FWIW > Tested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Thanks. > > + if (state->period > max_period) > > + period = max_period; > > + else > > + period = state->period; > > + > > We return EINVAL if the value for period is too small, so should we not do the > same for the case that the value is too high? Ah, I didn't notice I changed the behaviour here, but indeed this is what I think is the right behaviour. (i.e. configure the biggest possible period not bigger than the requested period.) But this change doesn't belong into this patch, so I will fixup and resend. Best regards Uwe
diff --git a/drivers/pwm/pwm-bcm2835.c b/drivers/pwm/pwm-bcm2835.c index 6ff5f04b3e07..f937db32fa54 100644 --- a/drivers/pwm/pwm-bcm2835.c +++ b/drivers/pwm/pwm-bcm2835.c @@ -64,8 +64,9 @@ static int bcm2835_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, struct bcm2835_pwm *pc = to_bcm2835_pwm(chip); unsigned long rate = clk_get_rate(pc->clk); - unsigned long long period; - unsigned long scaler; + unsigned long long period_cycles; + u64 max_period, period, duty_cycle; + u32 val; if (!rate) { @@ -73,18 +74,34 @@ static int bcm2835_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, return -EINVAL; } - scaler = DIV_ROUND_CLOSEST(NSEC_PER_SEC, rate); + /* + * period_cycles must be a 32 bit value, so period * rate / NSEC_PER_SEC + * must be <= U32_MAX. As U32_MAX * NSEC_PER_SEC < U64_MAX the + * multiplication period * rate doesn't overflow. + */ + max_period = DIV_ROUND_DOWN_ULL((u64)U32_MAX * NSEC_PER_SEC, rate); + + if (state->period > max_period) + period = max_period; + else + period = state->period; + + if (state->duty_cycle > period) + duty_cycle = period; + else + duty_cycle = state->duty_cycle; + /* set period */ - period = DIV_ROUND_CLOSEST_ULL(state->period, scaler); + period_cycles = DIV_ROUND_CLOSEST_ULL(period * rate, NSEC_PER_SEC); - /* dont accept a period that is too small or has been truncated */ - if ((period < PERIOD_MIN) || (period > U32_MAX)) + /* don't accept a period that is too small */ + if (period_cycles < PERIOD_MIN) return -EINVAL; - writel(period, pc->base + PERIOD(pwm->hwpwm)); + writel(period_cycles, pc->base + PERIOD(pwm->hwpwm)); /* set duty cycle */ - val = DIV_ROUND_CLOSEST_ULL(state->duty_cycle, scaler); + val = DIV_ROUND_CLOSEST_ULL(duty_cycle * rate, NSEC_PER_SEC); writel(val, pc->base + DUTY(pwm->hwpwm)); /* set polarity */
With an input clk rate bigger than 2000000000, scaler would have been zero which then would have resulted in a division by zero. Also the originally implemented algorithm divided by the result of a division. This nearly always looses precision. Consider a requested period of 1000000 ns. With an input clock frequency of 32786885 Hz the hardware was configured with an actual period of 983869.007 ns (PERIOD = 32258) while the hardware can provide 1000003.508 ns (PERIOD = 32787). And note if the input clock frequency was 32786886 Hz instead, the hardware was configured to 1016656.477 ns (PERIOD = 33333) while the optimal setting results in 1000003.477 ns (PERIOD = 32787). This patch implements proper range checking and only divides once for the calculation of period (and similar for duty_cycle). Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> --- Hello, during review of the bcm2835 driver I noticed this double division. I think the practical relevance is low however because the clock rate is fixed to 10 MHz on this platform which doesn't result in these deviations. (Is this right, what is the actual rate?) Having said that this patch is only compile tested. Best regards Uwe drivers/pwm/pwm-bcm2835.c | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-)