mbox series

[PULL,REQUEST] i2c-for-6.7-rc2

Message ID ZVf/pqw5YcF7sldg@shikoro
State Accepted
Headers show
Series [PULL,REQUEST] i2c-for-6.7-rc2 | expand

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git tags/i2c-for-6.7-rc2

Message

Wolfram Sang Nov. 18, 2023, 12:04 a.m. UTC
The following changes since commit b85ea95d086471afb4ad062012a4d73cd328fa86:

  Linux 6.7-rc1 (2023-11-12 16:19:07 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git tags/i2c-for-6.7-rc2

for you to fetch changes up to 382561d16854a747e6df71034da08d20d6013dfe:

  i2c: ocores: Move system PM hooks to the NOIRQ phase (2023-11-13 12:43:42 -0500)

----------------------------------------------------------------
Revert a not-working conversion to generic recovery for PXA, use proper
IO accessors for designware, and use proper PM level for ocores to allow
accessing interrupt providers late.

----------------------------------------------------------------
Jan Bottorff (1):
      i2c: designware: Fix corrupted memory seen in the ISR

Robert Marko (1):
      Revert "i2c: pxa: move to generic GPIO recovery"

Samuel Holland (1):
      i2c: ocores: Move system PM hooks to the NOIRQ phase


with much appreciated quality assurance from
----------------------------------------------------------------
Andi Shyti (1):
      (Rev.) i2c: ocores: Move system PM hooks to the NOIRQ phase

Serge Semin (2):
      (Test) i2c: designware: Fix corrupted memory seen in the ISR
      (Rev.) i2c: designware: Fix corrupted memory seen in the ISR

 drivers/i2c/busses/i2c-designware-common.c | 16 +++----
 drivers/i2c/busses/i2c-ocores.c            |  4 +-
 drivers/i2c/busses/i2c-pxa.c               | 76 ++++++++++++++++++++++++++----
 3 files changed, 78 insertions(+), 18 deletions(-)

Comments

Linus Torvalds Nov. 18, 2023, 5:56 p.m. UTC | #1
On Fri, 17 Nov 2023 at 16:05, Wolfram Sang <wsa@kernel.org> wrote:
>
> Jan Bottorff (1):
>       i2c: designware: Fix corrupted memory seen in the ISR

I have pulled this, but honestly, looking at the patch, I really get
the feeling that there's some deeper problem going on.

Either the designware driver doesn't do the right locking, or the
relaxed IO accesses improperly are escaping the locks that do exist.

Either way, just changing "writel_relaxed()" to "writel()" seems to be wrong.

Of course, it is entirely possible that those accesses should never
have been relaxed in the first place, and that the actual access
ordering between two accesses in the same thread matters. For example,
the code did

        *val = readw_relaxed(dev->base + reg) |
                (readw_relaxed(dev->base + reg + 2) << 16);

and if the order of those two readw's mattered, then the "relaxed" was
always entirely wrong.

But the commit message seems to very much imply a multi-thread issue,
and for *that* issue, doing "writel_relaxed" -> "writel" is very much
wrong. The only thing fixing threading issues is proper locks (or
_working_ locks).

Removing the "relaxed" may *hide* the issue, but doesn't really fix it.

For the arm64 people I brought in: this is now commit f726eaa787e9
("i2c: designware: Fix corrupted memory seen in the ISR") upstream.
I've done the pull, because even if this is purely a "hide the
problem" fix, it's better than what the code did. I'm just asking that
people look at this a bit more.

                   Linus
pr-tracker-bot@kernel.org Nov. 18, 2023, 6:02 p.m. UTC | #2
The pull request you sent on Fri, 17 Nov 2023 19:04:54 -0500:

> git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git tags/i2c-for-6.7-rc2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/23dfa043f6d5ac9339607f077f2c30cd50798e12

Thank you!
Will Deacon Nov. 20, 2023, 3:05 p.m. UTC | #3
On Sat, Nov 18, 2023 at 09:56:59AM -0800, Linus Torvalds wrote:
> On Fri, 17 Nov 2023 at 16:05, Wolfram Sang <wsa@kernel.org> wrote:
> >
> > Jan Bottorff (1):
> >       i2c: designware: Fix corrupted memory seen in the ISR
> 
> I have pulled this, but honestly, looking at the patch, I really get
> the feeling that there's some deeper problem going on.
> 
> Either the designware driver doesn't do the right locking, or the
> relaxed IO accesses improperly are escaping the locks that do exist.
> 
> Either way, just changing "writel_relaxed()" to "writel()" seems to be wrong.
> 
> Of course, it is entirely possible that those accesses should never
> have been relaxed in the first place, and that the actual access
> ordering between two accesses in the same thread matters. For example,
> the code did
> 
>         *val = readw_relaxed(dev->base + reg) |
>                 (readw_relaxed(dev->base + reg + 2) << 16);
> 
> and if the order of those two readw's mattered, then the "relaxed" was
> always entirely wrong.
> 
> But the commit message seems to very much imply a multi-thread issue,
> and for *that* issue, doing "writel_relaxed" -> "writel" is very much
> wrong. The only thing fixing threading issues is proper locks (or
> _working_ locks).
> 
> Removing the "relaxed" may *hide* the issue, but doesn't really fix it.
> 
> For the arm64 people I brought in: this is now commit f726eaa787e9
> ("i2c: designware: Fix corrupted memory seen in the ISR") upstream.
> I've done the pull, because even if this is purely a "hide the
> problem" fix, it's better than what the code did. I'm just asking that
> people look at this a bit more.

Thanks for putting me on CC. The original issue was discussed quite a bit
over at:

https://lore.kernel.org/all/20230913232938.420423-1-janb@os.amperecomputing.com/

and I think the high-level problem was something like:

1. CPU x writes some stuff to memory (I think one example was i2c_dw_xfer()
   setting 'dev->msg_read_idx' to 0)
2. CPU x writes to an I/O register on this I2C controller which generates
   an IRQ (end of i2c_dw_xfer_init())
3. CPU y takes the IRQ
4. CPU y reads 'dev->msg_read_idx' and doesn't see the write from (1)

(i2c folks: please chime in if I got this wrong)

the issue being that the writes in (1) are not ordered before the I/O
access in (2) if the relaxed accessor is used. Rather than upgrade only
the register writes which can trigger an interrupt, the more conservative
approach of upgrading everything to non-relaxed I/O accesses was taken
(which is probably necessary to deal with spurious IRQs properly anyway
because otherwise 'dev->msg_read_idx' could be read early in step (4)).

Your point about locking is interesting. I don't see any obvious locks
being taken in i2c_dw_isr(), so I don't think the issue is because relaxed
accesses are escaping existing locks. An alternative would be putting
steps (1) and (2) in a critical section and then taking the lock again
in the interrupt handler. Or did you have something else in mind?

Will
Wolfram Sang Nov. 20, 2023, 4:22 p.m. UTC | #4
Hi Will,

thanks a lot for this summary!

> and I think the high-level problem was something like:
> 
> 1. CPU x writes some stuff to memory (I think one example was i2c_dw_xfer()
>    setting 'dev->msg_read_idx' to 0)
> 2. CPU x writes to an I/O register on this I2C controller which generates
>    an IRQ (end of i2c_dw_xfer_init())
> 3. CPU y takes the IRQ
> 4. CPU y reads 'dev->msg_read_idx' and doesn't see the write from (1)
> 
> (i2c folks: please chime in if I got this wrong)

I admit that I didn't dive into this specific discussion. But we had
this kind of re-ordering problem in the past in i2c, so avoiding the
relaxed_* where possible came to be a good thing in my book. So, I
recommended removing it for all writes, not only the one causing
problems here. relaxed_* should only be used when really needed. So,
this is why I applied the patch, plus I trust the people giving their
tags after the in-depth discussion. But yeah, if somebody more
experienced with this driver could double-check against the potential
locking problem, this would be good.
Linus Torvalds Nov. 20, 2023, 5:32 p.m. UTC | #5
On Mon, 20 Nov 2023 at 07:05, Will Deacon <will@kernel.org> wrote:
>
> and I think the high-level problem was something like:
>
> 1. CPU x writes some stuff to memory (I think one example was i2c_dw_xfer()
>    setting 'dev->msg_read_idx' to 0)
> 2. CPU x writes to an I/O register on this I2C controller which generates
>    an IRQ (end of i2c_dw_xfer_init())
> 3. CPU y takes the IRQ
> 4. CPU y reads 'dev->msg_read_idx' and doesn't see the write from (1)
>
> (i2c folks: please chime in if I got this wrong)
>
> the issue being that the writes in (1) are not ordered before the I/O
> access in (2) if the relaxed accessor is used.

Ok, then removing relaxed is indeed the right thing to do. Because
yes, it's an actual ordering issue with the IO write, not some locking
issue.

Thanks for filling in the details, that patch looked iffy to me, but
it does sound like everything is good.

             Linus