diff mbox series

[v4] media: rc: pwm-ir-tx: Switch to atomic PWM API

Message ID YXlxhpZWf2mxJaMi@fedora
State Not Applicable
Headers show
Series [v4] media: rc: pwm-ir-tx: Switch to atomic PWM API | expand

Commit Message

Maíra Canal Oct. 27, 2021, 3:34 p.m. UTC
Remove legacy PWM interface (pwm_config, pwm_enable, pwm_disable) and
replace it for the atomic PWM API.

Signed-off-by: Maíra Canal <maira.canal@usp.br>
Reported-by: kernel test robot <lkp@intel.com>
---
V1 -> V2: Assign variables directly and simplify conditional statement
V2 -> V3: Fix declaration of undeclared variables
V3 -> V4: Fix DIV_ROUND_CLOSEST error with u64 variables
---
 drivers/media/rc/pwm-ir-tx.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

Comments

Uwe Kleine-König Oct. 28, 2021, 6:45 a.m. UTC | #1
On Wed, Oct 27, 2021 at 12:34:30PM -0300, Maíra Canal wrote:
> Remove legacy PWM interface (pwm_config, pwm_enable, pwm_disable) and
> replace it for the atomic PWM API.
> 
> Signed-off-by: Maíra Canal <maira.canal@usp.br>
> Reported-by: kernel test robot <lkp@intel.com>

While it's true that he kernel robot told you about a problem in an
earlier revision, adding the tag here is misleading, because in the end
you only see the tag in the commit history, and there is suggests that
the commit fixes something that was reported in the kernel tree before.

For this reason I usually only add a thanks after the tripple dash.

Also note this nitpick: Your S-o-b should always be the last thing.

> ---
> V1 -> V2: Assign variables directly and simplify conditional statement
> V2 -> V3: Fix declaration of undeclared variables
> V3 -> V4: Fix DIV_ROUND_CLOSEST error with u64 variables
> ---
>  drivers/media/rc/pwm-ir-tx.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/media/rc/pwm-ir-tx.c b/drivers/media/rc/pwm-ir-tx.c
> index 4bc28d2c9cc9..105a9c24f1e3 100644
> --- a/drivers/media/rc/pwm-ir-tx.c
> +++ b/drivers/media/rc/pwm-ir-tx.c
> @@ -53,22 +53,21 @@ static int pwm_ir_tx(struct rc_dev *dev, unsigned int *txbuf,
>  {
>  	struct pwm_ir *pwm_ir = dev->priv;
>  	struct pwm_device *pwm = pwm_ir->pwm;
> -	int i, duty, period;
> +	struct pwm_state state;
> +	int i;
>  	ktime_t edge;
>  	long delta;
>  
> -	period = DIV_ROUND_CLOSEST(NSEC_PER_SEC, pwm_ir->carrier);
> -	duty = DIV_ROUND_CLOSEST(pwm_ir->duty_cycle * period, 100);
> +	pwm_init_state(pwm, &state);
>  
> -	pwm_config(pwm, duty, period);
> +	state.period = DIV_ROUND_CLOSEST(NSEC_PER_SEC, pwm_ir->carrier);
> +	pwm_set_relative_duty_cycle(&state, pwm_ir->duty_cycle, 100);
>  
>  	edge = ktime_get();
>  
>  	for (i = 0; i < count; i++) {
> -		if (i % 2) // space
> -			pwm_disable(pwm);
> -		else
> -			pwm_enable(pwm);
> +		state.enabled = !(i % 2);
> +		pwm_apply_state(pwm, &state);
>  
>  		edge = ktime_add_us(edge, txbuf[i]);
>  		delta = ktime_us_delta(edge, ktime_get());
> @@ -76,7 +75,8 @@ static int pwm_ir_tx(struct rc_dev *dev, unsigned int *txbuf,
>  			usleep_range(delta, delta + 10);
>  	}
>  
> -	pwm_disable(pwm);
> +	state.enabled = false;
> +	pwm_apply_state(pwm, &state);
>  
>  	return count;
>  }

The conversion is right (I think), note this could be optimized a bit
further: state.period only depends on carrier which rarely changes, so
the calculation could be done in pwm_ir_set_carrier(). Ditto for duty
which only depends on state.period and pwm_ir->duty_cycle. (This is for
a separate commit though.)

Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

Thanks
Uwe
Sean Young Oct. 28, 2021, 9:14 a.m. UTC | #2
On Thu, Oct 28, 2021 at 08:45:13AM +0200, Uwe Kleine-König wrote:
> The conversion is right (I think),

We still have the problem that the pwm drivers calculate the period
incorrectly by rounding down (except pwm-bcm2835). So the period is not
as good as it could be in most cases, but this driver can't do anything
about that.

> note this could be optimized a bit
> further: state.period only depends on carrier which rarely changes, so
> the calculation could be done in pwm_ir_set_carrier(). Ditto for duty
> which only depends on state.period and pwm_ir->duty_cycle. (This is for
> a separate commit though.)

I'm not sure what caching this is much of a win. The calculation is a few
instructions, so you're not winning in the way of speed. On the flip side
you use more memory since pwm_state has to be kmalloc() rather than existing
just on the stack, and both ioctl handlers and the probe function need to
recalculate the period/duty cycle, so there is a slight increase in code size.

This change does not improve anything measurably and only increases code
complexity.

> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

Thanks for your review.


Sean
Uwe Kleine-König Oct. 28, 2021, 11:15 a.m. UTC | #3
Hello Sean,

On Thu, Oct 28, 2021 at 10:14:42AM +0100, Sean Young wrote:
> On Thu, Oct 28, 2021 at 08:45:13AM +0200, Uwe Kleine-König wrote:
> > The conversion is right (I think),
> 
> We still have the problem that the pwm drivers calculate the period
> incorrectly by rounding down (except pwm-bcm2835). So the period is not
> as good as it could be in most cases, but this driver can't do anything
> about that.

Yeah, some time ago I started coding a round_state function
(wip at
https://git.pengutronix.de/cgit/ukl/linux/commit/?h=pwm-wip&id=ae348eb6a55d6526f30ef4a49819197d9616391e)
but this was pushed down on my todo-list by more important stuff.

If you want to experiment with that ...

> > note this could be optimized a bit
> > further: state.period only depends on carrier which rarely changes, so
> > the calculation could be done in pwm_ir_set_carrier(). Ditto for duty
> > which only depends on state.period and pwm_ir->duty_cycle. (This is for
> > a separate commit though.)
> 
> I'm not sure what caching this is much of a win. The calculation is a few
> instructions, so you're not winning in the way of speed. On the flip side
> you use more memory since pwm_state has to be kmalloc() rather than existing

I tested a bit with this patch on top of Maíra's:

	diff --git a/drivers/media/rc/pwm-ir-tx.c b/drivers/media/rc/pwm-ir-tx.c
	index 105a9c24f1e3..7585c21775bc 100644
	--- a/drivers/media/rc/pwm-ir-tx.c
	+++ b/drivers/media/rc/pwm-ir-tx.c
	@@ -17,7 +17,7 @@
	 
	 struct pwm_ir {
		struct pwm_device *pwm;
	-	unsigned int carrier;
	+	struct pwm_state state;
		unsigned int duty_cycle;
	 };
	 
	@@ -32,6 +32,7 @@ static int pwm_ir_set_duty_cycle(struct rc_dev *dev, u32 duty_cycle)
		struct pwm_ir *pwm_ir = dev->priv;
	 
		pwm_ir->duty_cycle = duty_cycle;
	+	pwm_set_relative_duty_cycle(&pwm_ir->state, pwm_ir->duty_cycle, 100);
	 
		return 0;
	 }
	@@ -43,7 +44,8 @@ static int pwm_ir_set_carrier(struct rc_dev *dev, u32 carrier)
		if (!carrier)
			return -EINVAL;
	 
	-	pwm_ir->carrier = carrier;
	+	pwm_ir->state.period = DIV_ROUND_CLOSEST(NSEC_PER_SEC, carrier);
	+	pwm_set_relative_duty_cycle(&pwm_ir->state, pwm_ir->duty_cycle, 100);
	 
		return 0;
	 }
	@@ -53,21 +55,15 @@ static int pwm_ir_tx(struct rc_dev *dev, unsigned int *txbuf,
	 {
		struct pwm_ir *pwm_ir = dev->priv;
		struct pwm_device *pwm = pwm_ir->pwm;
	-	struct pwm_state state;
		int i;
		ktime_t edge;
		long delta;
	 
	-	pwm_init_state(pwm, &state);
	-
	-	state.period = DIV_ROUND_CLOSEST(NSEC_PER_SEC, pwm_ir->carrier);
	-	pwm_set_relative_duty_cycle(&state, pwm_ir->duty_cycle, 100);
	-
		edge = ktime_get();
	 
		for (i = 0; i < count; i++) {
	-		state.enabled = !(i % 2);
	-		pwm_apply_state(pwm, &state);
	+		pwm_ir->state.enabled = !(i % 2);
	+		pwm_apply_state(pwm, &pwm_ir->state);
	 
			edge = ktime_add_us(edge, txbuf[i]);
			delta = ktime_us_delta(edge, ktime_get());
	@@ -75,8 +71,8 @@ static int pwm_ir_tx(struct rc_dev *dev, unsigned int *txbuf,
				usleep_range(delta, delta + 10);
		}
	 
	-	state.enabled = false;
	-	pwm_apply_state(pwm, &state);
	+	pwm_ir->state.enabled = false;
	+	pwm_apply_state(pwm, &pwm_ir->state);
	 
		return count;
	 }
	@@ -95,8 +91,9 @@ static int pwm_ir_probe(struct platform_device *pdev)
		if (IS_ERR(pwm_ir->pwm))
			return PTR_ERR(pwm_ir->pwm);
	 
	-	pwm_ir->carrier = 38000;
	-	pwm_ir->duty_cycle = 50;
	+	pwm_ir->state.duty_cycle = 50;
	+	pwm_init_state(pwm_ir->pwm, &pwm_ir->state);
	+	pwm_ir_set_carrier(rcdev, 38000);
	 
		rcdev = devm_rc_allocate_device(&pdev->dev, RC_DRIVER_IR_RAW_TX);
		if (!rcdev)

bloat-o-meter reports (for an arm allmodconfig build)

	add/remove: 0/0 grow/shrink: 3/1 up/down: 644/-396 (248)
	Function                                     old     new   delta
	pwm_ir_probe                                 372     676    +304
	pwm_ir_set_carrier                           108     292    +184
	pwm_ir_set_duty_cycle                         68     224    +156
	pwm_ir_tx                                    908     512    -396
	Total: Before=2302, After=2550, chg +10.77%

struct pwm_ir increases from 12 bytes to 40 bytes.

The stack space required by pwm_ir_tx decreases from 60 to 36

I don't know exactly how kmalloc works internally. Maybe allocating a
structure of size 40 bytes doesn't need more memory than a structure of
size 12?

I didn't check how runtimes change, but the size decrease of pwm_ir_tx()
is nice and might save a bit of runtime.

Probably it depends on your focus if this change is good for you or not.

> just on the stack, and both ioctl handlers and the probe function need to
> recalculate the period/duty cycle, so there is a slight increase in code size.
> 
> This change does not improve anything measurably and only increases code
> complexity.

You did measure?

Best regards
Uwe
Sean Young Oct. 28, 2021, 12:26 p.m. UTC | #4
Hi Uwe,

On Thu, Oct 28, 2021 at 01:15:35PM +0200, Uwe Kleine-König wrote:
> On Thu, Oct 28, 2021 at 10:14:42AM +0100, Sean Young wrote:
> > On Thu, Oct 28, 2021 at 08:45:13AM +0200, Uwe Kleine-König wrote:
> > > The conversion is right (I think),
> > 
> > We still have the problem that the pwm drivers calculate the period
> > incorrectly by rounding down (except pwm-bcm2835). So the period is not
> > as good as it could be in most cases, but this driver can't do anything
> > about that.
> 
> Yeah, some time ago I started coding a round_state function
> (wip at
> https://git.pengutronix.de/cgit/ukl/linux/commit/?h=pwm-wip&id=ae348eb6a55d6526f30ef4a49819197d9616391e)
> but this was pushed down on my todo-list by more important stuff.

That looks great, thank you for working on that!

> If you want to experiment with that ...

I will have a look.

> > > note this could be optimized a bit
> > > further: state.period only depends on carrier which rarely changes, so
> > > the calculation could be done in pwm_ir_set_carrier(). Ditto for duty
> > > which only depends on state.period and pwm_ir->duty_cycle. (This is for
> > > a separate commit though.)
> > 
> > I'm not sure what caching this is much of a win. The calculation is a few
> > instructions, so you're not winning in the way of speed. On the flip side
> > you use more memory since pwm_state has to be kmalloc() rather than existing
> 
> I tested a bit with this patch on top of Maíra's:
> 
> 	diff --git a/drivers/media/rc/pwm-ir-tx.c b/drivers/media/rc/pwm-ir-tx.c
> 	index 105a9c24f1e3..7585c21775bc 100644
> 	--- a/drivers/media/rc/pwm-ir-tx.c
> 	+++ b/drivers/media/rc/pwm-ir-tx.c
> 	@@ -17,7 +17,7 @@
> 	 
> 	 struct pwm_ir {
> 		struct pwm_device *pwm;
> 	-	unsigned int carrier;
> 	+	struct pwm_state state;
> 		unsigned int duty_cycle;
> 	 };
> 	 
> 	@@ -32,6 +32,7 @@ static int pwm_ir_set_duty_cycle(struct rc_dev *dev, u32 duty_cycle)
> 		struct pwm_ir *pwm_ir = dev->priv;
> 	 
> 		pwm_ir->duty_cycle = duty_cycle;
> 	+	pwm_set_relative_duty_cycle(&pwm_ir->state, pwm_ir->duty_cycle, 100);
> 	 
> 		return 0;
> 	 }
> 	@@ -43,7 +44,8 @@ static int pwm_ir_set_carrier(struct rc_dev *dev, u32 carrier)
> 		if (!carrier)
> 			return -EINVAL;
> 	 
> 	-	pwm_ir->carrier = carrier;
> 	+	pwm_ir->state.period = DIV_ROUND_CLOSEST(NSEC_PER_SEC, carrier);
> 	+	pwm_set_relative_duty_cycle(&pwm_ir->state, pwm_ir->duty_cycle, 100);
> 	 
> 		return 0;
> 	 }
> 	@@ -53,21 +55,15 @@ static int pwm_ir_tx(struct rc_dev *dev, unsigned int *txbuf,
> 	 {
> 		struct pwm_ir *pwm_ir = dev->priv;
> 		struct pwm_device *pwm = pwm_ir->pwm;
> 	-	struct pwm_state state;
> 		int i;
> 		ktime_t edge;
> 		long delta;
> 	 
> 	-	pwm_init_state(pwm, &state);
> 	-
> 	-	state.period = DIV_ROUND_CLOSEST(NSEC_PER_SEC, pwm_ir->carrier);
> 	-	pwm_set_relative_duty_cycle(&state, pwm_ir->duty_cycle, 100);
> 	-
> 		edge = ktime_get();
> 	 
> 		for (i = 0; i < count; i++) {
> 	-		state.enabled = !(i % 2);
> 	-		pwm_apply_state(pwm, &state);
> 	+		pwm_ir->state.enabled = !(i % 2);
> 	+		pwm_apply_state(pwm, &pwm_ir->state);
> 	 
> 			edge = ktime_add_us(edge, txbuf[i]);
> 			delta = ktime_us_delta(edge, ktime_get());
> 	@@ -75,8 +71,8 @@ static int pwm_ir_tx(struct rc_dev *dev, unsigned int *txbuf,
> 				usleep_range(delta, delta + 10);
> 		}
> 	 
> 	-	state.enabled = false;
> 	-	pwm_apply_state(pwm, &state);
> 	+	pwm_ir->state.enabled = false;
> 	+	pwm_apply_state(pwm, &pwm_ir->state);
> 	 
> 		return count;
> 	 }
> 	@@ -95,8 +91,9 @@ static int pwm_ir_probe(struct platform_device *pdev)
> 		if (IS_ERR(pwm_ir->pwm))
> 			return PTR_ERR(pwm_ir->pwm);
> 	 
> 	-	pwm_ir->carrier = 38000;
> 	-	pwm_ir->duty_cycle = 50;
> 	+	pwm_ir->state.duty_cycle = 50;
> 	+	pwm_init_state(pwm_ir->pwm, &pwm_ir->state);
> 	+	pwm_ir_set_carrier(rcdev, 38000);
> 	 
> 		rcdev = devm_rc_allocate_device(&pdev->dev, RC_DRIVER_IR_RAW_TX);
> 		if (!rcdev)
> 
> bloat-o-meter reports (for an arm allmodconfig build)
> 
> 	add/remove: 0/0 grow/shrink: 3/1 up/down: 644/-396 (248)
> 	Function                                     old     new   delta
> 	pwm_ir_probe                                 372     676    +304
> 	pwm_ir_set_carrier                           108     292    +184
> 	pwm_ir_set_duty_cycle                         68     224    +156
> 	pwm_ir_tx                                    908     512    -396
> 	Total: Before=2302, After=2550, chg +10.77%

So 248 bytes more after your changes.

> struct pwm_ir increases from 12 bytes to 40 bytes.
> 
> The stack space required by pwm_ir_tx decreases from 60 to 36
> 
> I don't know exactly how kmalloc works internally. Maybe allocating a
> structure of size 40 bytes doesn't need more memory than a structure of
> size 12?
> 
> I didn't check how runtimes change, but the size decrease of pwm_ir_tx()
> is nice and might save a bit of runtime.

I'm not following, how is this decreasing runtime? 

> Probably it depends on your focus if this change is good for you or not.

Decreasing size is of course a good thing.

> > just on the stack, and both ioctl handlers and the probe function need to
> > recalculate the period/duty cycle, so there is a slight increase in code size.
> > 
> > This change does not improve anything measurably and only increases code
> > complexity.
> 
> You did measure?

Thanks for prototyping this.


Sean
Uwe Kleine-König Oct. 28, 2021, 6:05 p.m. UTC | #5
On Thu, Oct 28, 2021 at 01:26:10PM +0100, Sean Young wrote:
> Hi Uwe,
> 
> On Thu, Oct 28, 2021 at 01:15:35PM +0200, Uwe Kleine-König wrote:
> > On Thu, Oct 28, 2021 at 10:14:42AM +0100, Sean Young wrote:
> > > On Thu, Oct 28, 2021 at 08:45:13AM +0200, Uwe Kleine-König wrote:
> > > > The conversion is right (I think),
> > > 
> > > We still have the problem that the pwm drivers calculate the period
> > > incorrectly by rounding down (except pwm-bcm2835). So the period is not
> > > as good as it could be in most cases, but this driver can't do anything
> > > about that.
> > 
> > Yeah, some time ago I started coding a round_state function
> > (wip at
> > https://git.pengutronix.de/cgit/ukl/linux/commit/?h=pwm-wip&id=ae348eb6a55d6526f30ef4a49819197d9616391e)
> > but this was pushed down on my todo-list by more important stuff.
> 
> That looks great, thank you for working on that!
> 
> > If you want to experiment with that ...
> 
> I will have a look.
> 
> > > > note this could be optimized a bit
> > > > further: state.period only depends on carrier which rarely changes, so
> > > > the calculation could be done in pwm_ir_set_carrier(). Ditto for duty
> > > > which only depends on state.period and pwm_ir->duty_cycle. (This is for
> > > > a separate commit though.)
> > > 
> > > I'm not sure what caching this is much of a win. The calculation is a few
> > > instructions, so you're not winning in the way of speed. On the flip side
> > > you use more memory since pwm_state has to be kmalloc() rather than existing
> > 
> > I tested a bit with this patch on top of Maíra's:
> > 
> > 	diff --git a/drivers/media/rc/pwm-ir-tx.c b/drivers/media/rc/pwm-ir-tx.c
> > 	index 105a9c24f1e3..7585c21775bc 100644
> > 	--- a/drivers/media/rc/pwm-ir-tx.c
> > 	+++ b/drivers/media/rc/pwm-ir-tx.c
> > 	@@ -17,7 +17,7 @@
> > 	 
> > 	 struct pwm_ir {
> > 		struct pwm_device *pwm;
> > 	-	unsigned int carrier;
> > 	+	struct pwm_state state;
> > 		unsigned int duty_cycle;
> > 	 };
> > 	 
> > 	@@ -32,6 +32,7 @@ static int pwm_ir_set_duty_cycle(struct rc_dev *dev, u32 duty_cycle)
> > 		struct pwm_ir *pwm_ir = dev->priv;
> > 	 
> > 		pwm_ir->duty_cycle = duty_cycle;
> > 	+	pwm_set_relative_duty_cycle(&pwm_ir->state, pwm_ir->duty_cycle, 100);
> > 	 
> > 		return 0;
> > 	 }
> > 	@@ -43,7 +44,8 @@ static int pwm_ir_set_carrier(struct rc_dev *dev, u32 carrier)
> > 		if (!carrier)
> > 			return -EINVAL;
> > 	 
> > 	-	pwm_ir->carrier = carrier;
> > 	+	pwm_ir->state.period = DIV_ROUND_CLOSEST(NSEC_PER_SEC, carrier);
> > 	+	pwm_set_relative_duty_cycle(&pwm_ir->state, pwm_ir->duty_cycle, 100);
> > 	 
> > 		return 0;
> > 	 }
> > 	@@ -53,21 +55,15 @@ static int pwm_ir_tx(struct rc_dev *dev, unsigned int *txbuf,
> > 	 {
> > 		struct pwm_ir *pwm_ir = dev->priv;
> > 		struct pwm_device *pwm = pwm_ir->pwm;
> > 	-	struct pwm_state state;
> > 		int i;
> > 		ktime_t edge;
> > 		long delta;
> > 	 
> > 	-	pwm_init_state(pwm, &state);
> > 	-
> > 	-	state.period = DIV_ROUND_CLOSEST(NSEC_PER_SEC, pwm_ir->carrier);
> > 	-	pwm_set_relative_duty_cycle(&state, pwm_ir->duty_cycle, 100);
> > 	-
> > 		edge = ktime_get();
> > 	 
> > 		for (i = 0; i < count; i++) {
> > 	-		state.enabled = !(i % 2);
> > 	-		pwm_apply_state(pwm, &state);
> > 	+		pwm_ir->state.enabled = !(i % 2);
> > 	+		pwm_apply_state(pwm, &pwm_ir->state);
> > 	 
> > 			edge = ktime_add_us(edge, txbuf[i]);
> > 			delta = ktime_us_delta(edge, ktime_get());
> > 	@@ -75,8 +71,8 @@ static int pwm_ir_tx(struct rc_dev *dev, unsigned int *txbuf,
> > 				usleep_range(delta, delta + 10);
> > 		}
> > 	 
> > 	-	state.enabled = false;
> > 	-	pwm_apply_state(pwm, &state);
> > 	+	pwm_ir->state.enabled = false;
> > 	+	pwm_apply_state(pwm, &pwm_ir->state);
> > 	 
> > 		return count;
> > 	 }
> > 	@@ -95,8 +91,9 @@ static int pwm_ir_probe(struct platform_device *pdev)
> > 		if (IS_ERR(pwm_ir->pwm))
> > 			return PTR_ERR(pwm_ir->pwm);
> > 	 
> > 	-	pwm_ir->carrier = 38000;
> > 	-	pwm_ir->duty_cycle = 50;
> > 	+	pwm_ir->state.duty_cycle = 50;
> > 	+	pwm_init_state(pwm_ir->pwm, &pwm_ir->state);
> > 	+	pwm_ir_set_carrier(rcdev, 38000);
> > 	 
> > 		rcdev = devm_rc_allocate_device(&pdev->dev, RC_DRIVER_IR_RAW_TX);
> > 		if (!rcdev)
> > 
> > bloat-o-meter reports (for an arm allmodconfig build)
> > 
> > 	add/remove: 0/0 grow/shrink: 3/1 up/down: 644/-396 (248)
> > 	Function                                     old     new   delta
> > 	pwm_ir_probe                                 372     676    +304
> > 	pwm_ir_set_carrier                           108     292    +184
> > 	pwm_ir_set_duty_cycle                         68     224    +156
> > 	pwm_ir_tx                                    908     512    -396
> > 	Total: Before=2302, After=2550, chg +10.77%
> 
> So 248 bytes more after your changes.

ack. This is because the compiler inlines the division which accounts
for > 100 bytes.

> > struct pwm_ir increases from 12 bytes to 40 bytes.
> > 
> > The stack space required by pwm_ir_tx decreases from 60 to 36
> > 
> > I don't know exactly how kmalloc works internally. Maybe allocating a
> > structure of size 40 bytes doesn't need more memory than a structure of
> > size 12?
> > 
> > I didn't check how runtimes change, but the size decrease of pwm_ir_tx()
> > is nice and might save a bit of runtime.
> 
> I'm not following, how is this decreasing runtime? 

With my changes pwm_ir_tx got smaller and { pwm_ir_probe,
pwm_ir_set_carrier, pwm_ir_set_duty_cycle } got bigger. Now if for a
typical runtime pattern pwm_ir_probe and pwm_ir_set_carrier run once and
pwm_ir_set_duty_cycle 100 times and pwm_ir_tx 1000 times (no idea if
that is realistic) it might be a net win in sum.

Best regards
Uwe
Sean Young Oct. 29, 2021, 7:16 a.m. UTC | #6
On Thu, Oct 28, 2021 at 08:05:16PM +0200, Uwe Kleine-König wrote:
> On Thu, Oct 28, 2021 at 01:26:10PM +0100, Sean Young wrote:
> > > bloat-o-meter reports (for an arm allmodconfig build)
> > > 
> > > 	add/remove: 0/0 grow/shrink: 3/1 up/down: 644/-396 (248)
> > > 	Function                                     old     new   delta
> > > 	pwm_ir_probe                                 372     676    +304
> > > 	pwm_ir_set_carrier                           108     292    +184
> > > 	pwm_ir_set_duty_cycle                         68     224    +156
> > > 	pwm_ir_tx                                    908     512    -396
> > > 	Total: Before=2302, After=2550, chg +10.77%
> > 
> > So 248 bytes more after your changes.
> 
> ack. This is because the compiler inlines the division which accounts
> for > 100 bytes.

I'm surprised it's that large. This is on 32 bit?

> > > struct pwm_ir increases from 12 bytes to 40 bytes.
> > > 
> > > The stack space required by pwm_ir_tx decreases from 60 to 36
> > > 
> > > I don't know exactly how kmalloc works internally. Maybe allocating a
> > > structure of size 40 bytes doesn't need more memory than a structure of
> > > size 12?
> > > 
> > > I didn't check how runtimes change, but the size decrease of pwm_ir_tx()
> > > is nice and might save a bit of runtime.
> > 
> > I'm not following, how is this decreasing runtime? 
> 
> With my changes pwm_ir_tx got smaller and { pwm_ir_probe,
> pwm_ir_set_carrier, pwm_ir_set_duty_cycle } got bigger. Now if for a
> typical runtime pattern pwm_ir_probe and pwm_ir_set_carrier run once and
> pwm_ir_set_duty_cycle 100 times and pwm_ir_tx 1000 times (no idea if
> that is realistic) it might be a net win in sum.

The two most common programs for sending IR are

ir-ctl: https://git.linuxtv.org/v4l-utils.git/tree/utils/ir-ctl/ir-ctl.c#n1041
lircd: https://sourceforge.net/p/lirc/git/ci/master/tree/lib/transmit.c

For each transmission, the carrier is set. If the duty cyle is specified,
then that is set too. Then the transmit itself is done. Both of them
set the carrier and duty cycle (if required) for every transmission: setting
the carrier and duty cycle is a cheap operation, and it is device property
which can be overriden by another process. 

This means with your changes, if the carrier and duty cycle are both set
for each transmission, then we're doing more work. If only the carrier
is set for each transmission, then there is no net gain/loss (I think),
but the code size has increased.


Thanks for prototyping this.

Sean
Uwe Kleine-König Oct. 29, 2021, 11:06 a.m. UTC | #7
On Fri, Oct 29, 2021 at 08:16:08AM +0100, Sean Young wrote:
> On Thu, Oct 28, 2021 at 08:05:16PM +0200, Uwe Kleine-König wrote:
> > On Thu, Oct 28, 2021 at 01:26:10PM +0100, Sean Young wrote:
> > > > bloat-o-meter reports (for an arm allmodconfig build)
> > > > 
> > > > 	add/remove: 0/0 grow/shrink: 3/1 up/down: 644/-396 (248)
> > > > 	Function                                     old     new   delta
> > > > 	pwm_ir_probe                                 372     676    +304
> > > > 	pwm_ir_set_carrier                           108     292    +184
> > > > 	pwm_ir_set_duty_cycle                         68     224    +156
> > > > 	pwm_ir_tx                                    908     512    -396
> > > > 	Total: Before=2302, After=2550, chg +10.77%
> > > 
> > > So 248 bytes more after your changes.
> > 
> > ack. This is because the compiler inlines the division which accounts
> > for > 100 bytes.
> 
> I'm surprised it's that large. This is on 32 bit?

Yes, it's a 64 bit division on 32 bit ARM.

> > > > struct pwm_ir increases from 12 bytes to 40 bytes.
> > > > 
> > > > The stack space required by pwm_ir_tx decreases from 60 to 36
> > > > 
> > > > I don't know exactly how kmalloc works internally. Maybe allocating a
> > > > structure of size 40 bytes doesn't need more memory than a structure of
> > > > size 12?
> > > > 
> > > > I didn't check how runtimes change, but the size decrease of pwm_ir_tx()
> > > > is nice and might save a bit of runtime.
> > > 
> > > I'm not following, how is this decreasing runtime? 
> > 
> > With my changes pwm_ir_tx got smaller and { pwm_ir_probe,
> > pwm_ir_set_carrier, pwm_ir_set_duty_cycle } got bigger. Now if for a
> > typical runtime pattern pwm_ir_probe and pwm_ir_set_carrier run once and
> > pwm_ir_set_duty_cycle 100 times and pwm_ir_tx 1000 times (no idea if
> > that is realistic) it might be a net win in sum.
> 
> The two most common programs for sending IR are
> 
> ir-ctl: https://git.linuxtv.org/v4l-utils.git/tree/utils/ir-ctl/ir-ctl.c#n1041
> lircd: https://sourceforge.net/p/lirc/git/ci/master/tree/lib/transmit.c
> 
> For each transmission, the carrier is set. If the duty cyle is specified,
> then that is set too. Then the transmit itself is done. Both of them
> set the carrier and duty cycle (if required) for every transmission: setting
> the carrier and duty cycle is a cheap operation, and it is device property
> which can be overriden by another process. 
> 
> This means with your changes, if the carrier and duty cycle are both set
> for each transmission, then we're doing more work. If only the carrier
> is set for each transmission, then there is no net gain/loss (I think),
> but the code size has increased.

OK, then I discard my patch.

While reading that I wondered if it makes sense to have a callback that
sets both carrier and duty cycle and then remove the other two.

Best regards
Uwe
Sean Young Oct. 29, 2021, 11:54 a.m. UTC | #8
On Fri, Oct 29, 2021 at 01:06:02PM +0200, Uwe Kleine-König wrote:
> On Fri, Oct 29, 2021 at 08:16:08AM +0100, Sean Young wrote:
> > This means with your changes, if the carrier and duty cycle are both set
> > for each transmission, then we're doing more work. If only the carrier
> > is set for each transmission, then there is no net gain/loss (I think),
> > but the code size has increased.
> 
> OK, then I discard my patch.
> 
> While reading that I wondered if it makes sense to have a callback that
> sets both carrier and duty cycle and then remove the other two.

There are separate lirc ioctls to set carrier and duty cycle, that's why
there are separate callbacks.


Sean
Maíra Canal Oct. 29, 2021, 12:08 p.m. UTC | #9
I would like to thank you guys for the attention and this interesting
discussion. I'm looking for some work in the kernel and I would like
to know if you guys have any suggestions for beginner tasks in this
subsystem. I have solid knowledge in C programming, but I started in
the kernel a couple of weeks ago. Anyway, thank you for all the
feedback.

Maíra
Uwe Kleine-König Oct. 29, 2021, 3:18 p.m. UTC | #10
Hello,

On Fri, Oct 29, 2021 at 09:08:29AM -0300, Maíra Canal wrote:
> I would like to thank you guys for the attention and this interesting
> discussion. I'm looking for some work in the kernel and I would like
> to know if you guys have any suggestions for beginner tasks in this
> subsystem. I have solid knowledge in C programming, but I started in
> the kernel a couple of weeks ago. Anyway, thank you for all the
> feedback.

If you want something mathematically demanding, you can pick up the
patch I pointed out to Sean, I think I won't find the time in the near
future to work on this.

The background is that PWM drivers have to round most requests and there
is no uniformity among drivers and so if a consumer (e.g. the pwm-ir
driver) requests say 20000 ns, it will get 18000 from some drivers and
maybe 25000 from others. So the idea is to have a function
pwm_round_state that has fixed rounding rules such that a consumer can
pick the best setting for their use-case.

Something more mechanic in the PWM area is to convert drivers that still
implement .config/.enable/.disable to .apply. See
http://patchwork.ozlabs.org/project/linux-pwm/patch/20211029105617.210178-1-u.kleine-koenig@pengutronix.de/
for an example. The well-known good template is pwm_apply_legacy() after
applying
http://patchwork.ozlabs.org/project/linux-pwm/list/?series=251456 .

If you want something more global: The prototype of the remove callbacks
for platform devices returns an int:

	https://elixir.bootlin.com/linux/v5.15-rc7/source/include/linux/platform_device.h#L206

However the returned value is (nearly) ignored by the driver core:

	https://elixir.bootlin.com/linux/v5.15-rc7/source/drivers/base/platform.c#L1433

The longterm goal is to change the prototype of .remove to return void.
As a first step making all functions return 0 is a worthwile project.

The same problem exists for several other buses, one patch I sent to
work on this goal for i2c is:

	https://lore.kernel.org/r/20211021105657.72572-1-u.kleine-koenig@pengutronix.de

Best regards
Uwe
Sean Young Oct. 30, 2021, 9:21 a.m. UTC | #11
On Fri, Oct 29, 2021 at 09:08:29AM -0300, Maíra Canal wrote:
> I would like to thank you guys for the attention and this interesting
> discussion. I'm looking for some work in the kernel and I would like
> to know if you guys have any suggestions for beginner tasks in this
> subsystem. I have solid knowledge in C programming, but I started in
> the kernel a couple of weeks ago. Anyway, thank you for all the
> feedback.

Thank you for your contributions.

rc-core (drivers/media/rc) is in good shape and I don't know of any
outstanding issues.

There is a ton of work around dvb, but this requires actual dvb hardware
to test.

 - port drivers/media/usb/dvb-usb to drivers/media/usb/dvb-usb-v2
 - Re-write dvb frontend without dvb_attach()
 - Implement dma-buf for dvb

Like I said, you'll need actual hardware to test against, and this is
probably not beginner tasks.


Sean
Sean Young Oct. 31, 2021, 10:39 a.m. UTC | #12
Hi Uwe,

On Thu, Oct 28, 2021 at 01:15:35PM +0200, Uwe Kleine-König wrote:
> On Thu, Oct 28, 2021 at 10:14:42AM +0100, Sean Young wrote:
> > We still have the problem that the pwm drivers calculate the period
> > incorrectly by rounding down (except pwm-bcm2835). So the period is not
> > as good as it could be in most cases, but this driver can't do anything
> > about that.
> 
> Yeah, some time ago I started coding a round_state function
> (wip at
> https://git.pengutronix.de/cgit/ukl/linux/commit/?h=pwm-wip&id=ae348eb6a55d6526f30ef4a49819197d9616391e)
> but this was pushed down on my todo-list by more important stuff.
> 
> If you want to experiment with that ...

I was thinking about this problem this morning. 

- The pwm-ir-tx driver gets a carrier set in Hz, which it has to convert to
  a period (1e9 / carrier). There is loss of accuracy there.
- When it gets to the pwm driver, the period is converted into the format
  the pwm hardware expects. For example the pwm-bcm2835 driver converts
  it into clock cycles (1e9 / 8e8).

Both calculations involve loss of accuracy because of integer representation.

Would it make more sense for the pwm interface to use numer/denom rational
numbers?

struct rational {
	u64 numer;
	u64 denom;
};

If pwm-ir-tx would like to set the carrier, it could it like so:

	struct rational period = {
		.numer = NUSEC_PER_SEC,
		.denom = carrier,
	};

	pwm_set_period(&period);

Now pwm-bcm2835 could do it like so:

	int bcm2835_set_period(struct rational *period)
	{
		struct rational rate = {
			.numer = NUSEC_PER_SEC,
			.denum = clk_get_rate(clk),
		};

		rational_div(&rate, period);

		int step = rational_to_u64(&rate);
	}

Alternatively, since most of the pwm hardware is doing scaling based on the
clock (I think), would not make more sense for the pwm driver interface to
take a frequency rather than a period? Then the integer calculations can be
simpler: just divide the clock rate by the required frequency and you have
the period.

Just some thoughts.


Sean
Uwe Kleine-König Oct. 31, 2021, 5:40 p.m. UTC | #13
On Sun, Oct 31, 2021 at 10:39:34AM +0000, Sean Young wrote:
> Hi Uwe,
> 
> On Thu, Oct 28, 2021 at 01:15:35PM +0200, Uwe Kleine-König wrote:
> > On Thu, Oct 28, 2021 at 10:14:42AM +0100, Sean Young wrote:
> > > We still have the problem that the pwm drivers calculate the period
> > > incorrectly by rounding down (except pwm-bcm2835). So the period is not
> > > as good as it could be in most cases, but this driver can't do anything
> > > about that.
> > 
> > Yeah, some time ago I started coding a round_state function
> > (wip at
> > https://git.pengutronix.de/cgit/ukl/linux/commit/?h=pwm-wip&id=ae348eb6a55d6526f30ef4a49819197d9616391e)
> > but this was pushed down on my todo-list by more important stuff.
> > 
> > If you want to experiment with that ...
> 
> I was thinking about this problem this morning. 
> 
> - The pwm-ir-tx driver gets a carrier set in Hz, which it has to convert to
>   a period (1e9 / carrier). There is loss of accuracy there.

Ack, and the loss is known.

> - When it gets to the pwm driver, the period is converted into the format
>   the pwm hardware expects. For example the pwm-bcm2835 driver converts
>   it into clock cycles (1e9 / 8e8).
> 
> Both calculations involve loss of accuracy because of integer representation.
> 
> Would it make more sense for the pwm interface to use numer/denom rational
> numbers?

This is quite a complication because then all lowlevel driver needs to
do fractal math. I don't want to open that can of worms.

> struct rational {
> 	u64 numer;
> 	u64 denom;
> };
> 
> If pwm-ir-tx would like to set the carrier, it could it like so:
> 
> 	struct rational period = {
> 		.numer = NUSEC_PER_SEC,
> 		.denom = carrier,
> 	};
> 
> 	pwm_set_period(&period);
> 
> Now pwm-bcm2835 could do it like so:
> 
> 	int bcm2835_set_period(struct rational *period)
> 	{
> 		struct rational rate = {
> 			.numer = NUSEC_PER_SEC,
> 			.denum = clk_get_rate(clk),
> 		};
> 
> 		rational_div(&rate, period);
> 
> 		int step = rational_to_u64(&rate);
> 	}
> 
> Alternatively, since most of the pwm hardware is doing scaling based on the
> clock (I think), would not make more sense for the pwm driver interface to
> take a frequency rather than a period? Then the integer calculations can be
> simpler: just divide the clock rate by the required frequency and you have
> the period.

I think the rounding approach is easier and gives you the optimal
setting, too. With a carrier of say 455 KHz you want a period of
1e9 / (455 kHz) = 2197.802197802198 ns. Now assuming a pwm clk of 10 MHz
the pwm-bcm2835 driver can offer you 2100 ns and 2200 ns and the
pwm-ir-tx driver can pick the one it prefers. (OK, this works so nicely
because the pwm-bcm2835 driver has a nice clk rate, but also if it were
say 66666000 Hz, it would offer you 2191 ns and 2206 ns and you could
pick your favourite. (In this case there is a rounding error because the
actual periods are 2190.021900219002 ns and 2205.022050220502 ns, but I
would expect that this doesn't influence the choice and to improve here
we probably would have to change the unit of clk_get_rate, too.)

Best regards
Uwe
diff mbox series

Patch

diff --git a/drivers/media/rc/pwm-ir-tx.c b/drivers/media/rc/pwm-ir-tx.c
index 4bc28d2c9cc9..105a9c24f1e3 100644
--- a/drivers/media/rc/pwm-ir-tx.c
+++ b/drivers/media/rc/pwm-ir-tx.c
@@ -53,22 +53,21 @@  static int pwm_ir_tx(struct rc_dev *dev, unsigned int *txbuf,
 {
 	struct pwm_ir *pwm_ir = dev->priv;
 	struct pwm_device *pwm = pwm_ir->pwm;
-	int i, duty, period;
+	struct pwm_state state;
+	int i;
 	ktime_t edge;
 	long delta;
 
-	period = DIV_ROUND_CLOSEST(NSEC_PER_SEC, pwm_ir->carrier);
-	duty = DIV_ROUND_CLOSEST(pwm_ir->duty_cycle * period, 100);
+	pwm_init_state(pwm, &state);
 
-	pwm_config(pwm, duty, period);
+	state.period = DIV_ROUND_CLOSEST(NSEC_PER_SEC, pwm_ir->carrier);
+	pwm_set_relative_duty_cycle(&state, pwm_ir->duty_cycle, 100);
 
 	edge = ktime_get();
 
 	for (i = 0; i < count; i++) {
-		if (i % 2) // space
-			pwm_disable(pwm);
-		else
-			pwm_enable(pwm);
+		state.enabled = !(i % 2);
+		pwm_apply_state(pwm, &state);
 
 		edge = ktime_add_us(edge, txbuf[i]);
 		delta = ktime_us_delta(edge, ktime_get());
@@ -76,7 +75,8 @@  static int pwm_ir_tx(struct rc_dev *dev, unsigned int *txbuf,
 			usleep_range(delta, delta + 10);
 	}
 
-	pwm_disable(pwm);
+	state.enabled = false;
+	pwm_apply_state(pwm, &state);
 
 	return count;
 }