Message ID | ZVf/pqw5YcF7sldg@shikoro |
---|---|
State | Accepted |
Headers | show |
Series | [PULL,REQUEST] i2c-for-6.7-rc2 | expand |
On Fri, 17 Nov 2023 at 16:05, Wolfram Sang <wsa@kernel.org> wrote: > > Jan Bottorff (1): > i2c: designware: Fix corrupted memory seen in the ISR I have pulled this, but honestly, looking at the patch, I really get the feeling that there's some deeper problem going on. Either the designware driver doesn't do the right locking, or the relaxed IO accesses improperly are escaping the locks that do exist. Either way, just changing "writel_relaxed()" to "writel()" seems to be wrong. Of course, it is entirely possible that those accesses should never have been relaxed in the first place, and that the actual access ordering between two accesses in the same thread matters. For example, the code did *val = readw_relaxed(dev->base + reg) | (readw_relaxed(dev->base + reg + 2) << 16); and if the order of those two readw's mattered, then the "relaxed" was always entirely wrong. But the commit message seems to very much imply a multi-thread issue, and for *that* issue, doing "writel_relaxed" -> "writel" is very much wrong. The only thing fixing threading issues is proper locks (or _working_ locks). Removing the "relaxed" may *hide* the issue, but doesn't really fix it. For the arm64 people I brought in: this is now commit f726eaa787e9 ("i2c: designware: Fix corrupted memory seen in the ISR") upstream. I've done the pull, because even if this is purely a "hide the problem" fix, it's better than what the code did. I'm just asking that people look at this a bit more. Linus
The pull request you sent on Fri, 17 Nov 2023 19:04:54 -0500:
> git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git tags/i2c-for-6.7-rc2
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/23dfa043f6d5ac9339607f077f2c30cd50798e12
Thank you!
On Sat, Nov 18, 2023 at 09:56:59AM -0800, Linus Torvalds wrote: > On Fri, 17 Nov 2023 at 16:05, Wolfram Sang <wsa@kernel.org> wrote: > > > > Jan Bottorff (1): > > i2c: designware: Fix corrupted memory seen in the ISR > > I have pulled this, but honestly, looking at the patch, I really get > the feeling that there's some deeper problem going on. > > Either the designware driver doesn't do the right locking, or the > relaxed IO accesses improperly are escaping the locks that do exist. > > Either way, just changing "writel_relaxed()" to "writel()" seems to be wrong. > > Of course, it is entirely possible that those accesses should never > have been relaxed in the first place, and that the actual access > ordering between two accesses in the same thread matters. For example, > the code did > > *val = readw_relaxed(dev->base + reg) | > (readw_relaxed(dev->base + reg + 2) << 16); > > and if the order of those two readw's mattered, then the "relaxed" was > always entirely wrong. > > But the commit message seems to very much imply a multi-thread issue, > and for *that* issue, doing "writel_relaxed" -> "writel" is very much > wrong. The only thing fixing threading issues is proper locks (or > _working_ locks). > > Removing the "relaxed" may *hide* the issue, but doesn't really fix it. > > For the arm64 people I brought in: this is now commit f726eaa787e9 > ("i2c: designware: Fix corrupted memory seen in the ISR") upstream. > I've done the pull, because even if this is purely a "hide the > problem" fix, it's better than what the code did. I'm just asking that > people look at this a bit more. Thanks for putting me on CC. The original issue was discussed quite a bit over at: https://lore.kernel.org/all/20230913232938.420423-1-janb@os.amperecomputing.com/ and I think the high-level problem was something like: 1. CPU x writes some stuff to memory (I think one example was i2c_dw_xfer() setting 'dev->msg_read_idx' to 0) 2. CPU x writes to an I/O register on this I2C controller which generates an IRQ (end of i2c_dw_xfer_init()) 3. CPU y takes the IRQ 4. CPU y reads 'dev->msg_read_idx' and doesn't see the write from (1) (i2c folks: please chime in if I got this wrong) the issue being that the writes in (1) are not ordered before the I/O access in (2) if the relaxed accessor is used. Rather than upgrade only the register writes which can trigger an interrupt, the more conservative approach of upgrading everything to non-relaxed I/O accesses was taken (which is probably necessary to deal with spurious IRQs properly anyway because otherwise 'dev->msg_read_idx' could be read early in step (4)). Your point about locking is interesting. I don't see any obvious locks being taken in i2c_dw_isr(), so I don't think the issue is because relaxed accesses are escaping existing locks. An alternative would be putting steps (1) and (2) in a critical section and then taking the lock again in the interrupt handler. Or did you have something else in mind? Will
Hi Will, thanks a lot for this summary! > and I think the high-level problem was something like: > > 1. CPU x writes some stuff to memory (I think one example was i2c_dw_xfer() > setting 'dev->msg_read_idx' to 0) > 2. CPU x writes to an I/O register on this I2C controller which generates > an IRQ (end of i2c_dw_xfer_init()) > 3. CPU y takes the IRQ > 4. CPU y reads 'dev->msg_read_idx' and doesn't see the write from (1) > > (i2c folks: please chime in if I got this wrong) I admit that I didn't dive into this specific discussion. But we had this kind of re-ordering problem in the past in i2c, so avoiding the relaxed_* where possible came to be a good thing in my book. So, I recommended removing it for all writes, not only the one causing problems here. relaxed_* should only be used when really needed. So, this is why I applied the patch, plus I trust the people giving their tags after the in-depth discussion. But yeah, if somebody more experienced with this driver could double-check against the potential locking problem, this would be good.
On Mon, 20 Nov 2023 at 07:05, Will Deacon <will@kernel.org> wrote: > > and I think the high-level problem was something like: > > 1. CPU x writes some stuff to memory (I think one example was i2c_dw_xfer() > setting 'dev->msg_read_idx' to 0) > 2. CPU x writes to an I/O register on this I2C controller which generates > an IRQ (end of i2c_dw_xfer_init()) > 3. CPU y takes the IRQ > 4. CPU y reads 'dev->msg_read_idx' and doesn't see the write from (1) > > (i2c folks: please chime in if I got this wrong) > > the issue being that the writes in (1) are not ordered before the I/O > access in (2) if the relaxed accessor is used. Ok, then removing relaxed is indeed the right thing to do. Because yes, it's an actual ordering issue with the IO write, not some locking issue. Thanks for filling in the details, that patch looked iffy to me, but it does sound like everything is good. Linus