diff mbox

[1/2] Pad iommu with an empty slot (necessary for SunOS 4.1.4)

Message ID 1273327815-21408-1-git-send-email-atar4qemu@gmail.com
State New
Headers show

Commit Message

Artyom Tarasenko May 8, 2010, 2:10 p.m. UTC
On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
Software shouldn't use aliased addresses, neither should it crash
when it uses (on the real hardware it wouldn't). Using empty_slot
instead of aliasing can help with debugging such accesses.
---
 hw/sun4m.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

Comments

Blue Swirl May 9, 2010, 7:30 a.m. UTC | #1
On 5/8/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
> On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
>  Software shouldn't use aliased addresses, neither should it crash
>  when it uses (on the real hardware it wouldn't). Using empty_slot
>  instead of aliasing can help with debugging such accesses.

TurboSPARC Microprocessor User's Manual shows that there are
additional pages after the main IOMMU for AFX registers. So this is
not board specific, but depends on CPU/IOMMU versions.

One approach would be that IOMMU_NREGS would be increased to cover
these registers (with the bump in savevm version field) and
iommu_init1() should check the version field to see how much MMIO to
provide.

But in order to avoid the savevm version change, iommu_init1() could
just install dummy MMIO (in the TurboSPARC case), if OBP does not care
if the read back data matches what has been written earlier. Because
from OBP point of view this is identical to what your patch results
in, I'd suppose this approach would also work.
Artyom Tarasenko May 9, 2010, 8:29 a.m. UTC | #2
2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
> On 5/8/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>> On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
>>  Software shouldn't use aliased addresses, neither should it crash
>>  when it uses (on the real hardware it wouldn't). Using empty_slot
>>  instead of aliasing can help with debugging such accesses.
>
> TurboSPARC Microprocessor User's Manual shows that there are
> additional pages after the main IOMMU for AFX registers. So this is
> not board specific, but depends on CPU/IOMMU versions.

I checked it on the real hw: on LX and SS-5 these are aliased MMU addresses.
SS-20 doesn't have any aliasing.

At what address the additional AFX registers are located?

> One approach would be that IOMMU_NREGS would be increased to cover
> these registers (with the bump in savevm version field) and
> iommu_init1() should check the version field to see how much MMIO to
> provide.

The problem I see here is that we already have too much registers: we
emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have only
0x20 registers which are aliased all the way.

> But in order to avoid the savevm version change, iommu_init1() could
> just install dummy MMIO (in the TurboSPARC case), if OBP does not care
> if the read back data matches what has been written earlier. Because
> from OBP point of view this is identical to what your patch results
> in, I'd suppose this approach would also work.

OBP doesn't seem to care about these addresses at all. It's only the "MUNIX"
SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only kernel available
during the installation, so it is currently not possible to install 4.1.4.
Surprisingly "GENERIC" kernel which is on the disk after the
installation doesn't
try to access these address ranges either, so a disk image taken from a live
system works.

Actually access to the non-connected/aliased addresses may also be a
consequence of phys_page_find bug I mentioned before. When I run
install with -m 64 and -m 256 it tries to access different
non-connected addresses. May also be a SunOS bug of course. 256m used
to be a lot back then.
Blue Swirl May 9, 2010, 8:48 a.m. UTC | #3
On 5/9/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>
> > On 5/8/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>  >> On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
>  >>  Software shouldn't use aliased addresses, neither should it crash
>  >>  when it uses (on the real hardware it wouldn't). Using empty_slot
>  >>  instead of aliasing can help with debugging such accesses.
>  >
>  > TurboSPARC Microprocessor User's Manual shows that there are
>  > additional pages after the main IOMMU for AFX registers. So this is
>  > not board specific, but depends on CPU/IOMMU versions.
>
>
> I checked it on the real hw: on LX and SS-5 these are aliased MMU addresses.
>  SS-20 doesn't have any aliasing.

But are your machines equipped with TurboSPARC or some other CPU?

>  At what address the additional AFX registers are located?

Here's complete TurboSPARC IOMMU address map:
 PA[30:0]          Register          Access
1000_0000       IOMMU Control         R/W
1000_0004    IOMMU Base Address       R/W
1000_0014   Flush All IOTLB Entries    W
1000_0018        Address Flush         W
1000_1000  Asynchronous Fault Status  R/W
1000_1004 Asynchronous Fault Address  R/W
1000_1010  SBus Slot Configuration 0   R/W
1000_1014  SBus Slot Configuration 1   R/W
1000_1018  SBus Slot Configuration 2   R/W
1000_101C  SBus Slot Configuration 3   R/W
1000_1020  SBus Slot Configuration 4   R/W
1000_1050     Memory Fault Status     R/W
1000_1054    Memory Fault Address     R/W
1000_2000     Module Identification    R/W
1000_3018      Mask Identification      R
1000_4000      AFX Queue Level         W
1000_6000      AFX Queue Level         R
1000_7000      AFX Queue Status        R

>  > One approach would be that IOMMU_NREGS would be increased to cover
>  > these registers (with the bump in savevm version field) and
>  > iommu_init1() should check the version field to see how much MMIO to
>  > provide.
>
>
> The problem I see here is that we already have too much registers: we
>  emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have only
>  0x20 registers which are aliased all the way.
>
>
>  > But in order to avoid the savevm version change, iommu_init1() could
>  > just install dummy MMIO (in the TurboSPARC case), if OBP does not care
>  > if the read back data matches what has been written earlier. Because
>  > from OBP point of view this is identical to what your patch results
>  > in, I'd suppose this approach would also work.
>
>
> OBP doesn't seem to care about these addresses at all. It's only the "MUNIX"
>  SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only kernel available
>  during the installation, so it is currently not possible to install 4.1.4.
>  Surprisingly "GENERIC" kernel which is on the disk after the
>  installation doesn't
>  try to access these address ranges either, so a disk image taken from a live
>  system works.
>
>  Actually access to the non-connected/aliased addresses may also be a
>  consequence of phys_page_find bug I mentioned before. When I run
>  install with -m 64 and -m 256 it tries to access different
>  non-connected addresses. May also be a SunOS bug of course. 256m used
>  to be a lot back then.

Perhaps with 256MB, memory probing advances blindly from memory to
IOMMU registers. Proll (used before OpenBIOS) did that once, with bad
results :-). If this is true, 64M, 128M and 192M should show identical
results and only with close or equal to 256M the accesses happen.
Artyom Tarasenko May 9, 2010, 10:32 p.m. UTC | #4
2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
> On 5/9/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>
>> > On 5/8/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>  >> On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
>>  >>  Software shouldn't use aliased addresses, neither should it crash
>>  >>  when it uses (on the real hardware it wouldn't). Using empty_slot
>>  >>  instead of aliasing can help with debugging such accesses.
>>  >
>>  > TurboSPARC Microprocessor User's Manual shows that there are
>>  > additional pages after the main IOMMU for AFX registers. So this is
>>  > not board specific, but depends on CPU/IOMMU versions.
>>
>>
>> I checked it on the real hw: on LX and SS-5 these are aliased MMU addresses.
>>  SS-20 doesn't have any aliasing.
>
> But are your machines equipped with TurboSPARC or some other CPU?

Good point, I must confess, I missed the word "Turbo" in your first
answer. LX and SS-20 don't.
But SS-5 must have a TurboSPARC CPU:

ok cd /FMI,MB86904
ok .attributes
context-table            00 00 00 00 03 ff f0 00 00 00 10 00
psr-implementation       00000000
psr-version              00000004
implementation           00000000
version                  00000004
cache-line-size          00000020
cache-nlines             00000200
page-size                00001000
dcache-line-size         00000010
dcache-nlines            00000200
dcache-associativity     00000001
icache-line-size         00000020
icache-nlines            00000200
icache-associativity     00000001
ncaches                  00000002
mmu-nctx                 00000100
sparc-version            00000008
mask_rev                 00000026
device_type              cpu
name                     FMI,MB86904

and still it behaves the same as TI,TMS390S10 from the LX. This is done on SS-5:

ok 10000000 20 spacel@ .
4000009
ok 14000000 20 spacel@ .
4000009
ok 14000004 20 spacel@ .
23000
ok 1f000004 20 spacel@ .
23000
ok 10000008 20 spacel@ .
4000009
ok 14000028 20 spacel@ .
4000009
ok 1000000c 20 spacel@ .
23000
ok 10000010 20 spacel@ .
4000009


LX is the same except for the IOMMU-version:

ok 10000000 20 spacel@ .
4000005
ok 14000000 20 spacel@ .
4000005
ok 18000000 20 spacel@ .
4000005
ok 1f000000 20 spacel@ .
4000005
ok 1ff00000 20 spacel@ .
4000005
ok 1fff0004 20 spacel@ .
1fe000
ok 10000004 20 spacel@ .
1fe000
ok 10000108 20 spacel@ .
41000005
ok 10000040 20 spacel@ .
41000005
ok 1fff0040 20 spacel@ .
41000005
ok 1fff0044 20 spacel@ .
1fe000
ok 1fff0024 20 spacel@ .
1fe000

>>  At what address the additional AFX registers are located?
>
> Here's complete TurboSPARC IOMMU address map:
>  PA[30:0]          Register          Access
> 1000_0000       IOMMU Control         R/W
> 1000_0004    IOMMU Base Address       R/W
> 1000_0014   Flush All IOTLB Entries    W
> 1000_0018        Address Flush         W
> 1000_1000  Asynchronous Fault Status  R/W
> 1000_1004 Asynchronous Fault Address  R/W
> 1000_1010  SBus Slot Configuration 0   R/W
> 1000_1014  SBus Slot Configuration 1   R/W
> 1000_1018  SBus Slot Configuration 2   R/W
> 1000_101C  SBus Slot Configuration 3   R/W
> 1000_1020  SBus Slot Configuration 4   R/W
> 1000_1050     Memory Fault Status     R/W
> 1000_1054    Memory Fault Address     R/W
> 1000_2000     Module Identification    R/W
> 1000_3018      Mask Identification      R
> 1000_4000      AFX Queue Level         W
> 1000_6000      AFX Queue Level         R
> 1000_7000      AFX Queue Status        R


But if I read it correctly 0x12fff294 (which makes SunOS crash with -m 32) is
well above this limit.

>>  > One approach would be that IOMMU_NREGS would be increased to cover
>>  > these registers (with the bump in savevm version field) and
>>  > iommu_init1() should check the version field to see how much MMIO to
>>  > provide.
>>
>>
>> The problem I see here is that we already have too much registers: we
>>  emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have only
>>  0x20 registers which are aliased all the way.
>>
>>
>>  > But in order to avoid the savevm version change, iommu_init1() could
>>  > just install dummy MMIO (in the TurboSPARC case), if OBP does not care
>>  > if the read back data matches what has been written earlier. Because
>>  > from OBP point of view this is identical to what your patch results
>>  > in, I'd suppose this approach would also work.
>>
>>
>> OBP doesn't seem to care about these addresses at all. It's only the "MUNIX"
>>  SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only kernel available
>>  during the installation, so it is currently not possible to install 4.1.4.
>>  Surprisingly "GENERIC" kernel which is on the disk after the
>>  installation doesn't
>>  try to access these address ranges either, so a disk image taken from a live
>>  system works.
>>
>>  Actually access to the non-connected/aliased addresses may also be a
>>  consequence of phys_page_find bug I mentioned before. When I run
>>  install with -m 64 and -m 256 it tries to access different
>>  non-connected addresses. May also be a SunOS bug of course. 256m used
>>  to be a lot back then.
>
> Perhaps with 256MB, memory probing advances blindly from memory to
> IOMMU registers. Proll (used before OpenBIOS) did that once, with bad
> results :-). If this is true, 64M, 128M and 192M should show identical
> results and only with close or equal to 256M the accesses happen.

32m: 0x12fff294
64m: 0x14fff294
192m:0x1cfff294
256m:0x20fff294

Memory probing? It would be strange that OS would do it itself. The OS
could just
ask OBP how much does it have. Here is the listing where it happens:

_swift_vac_rgnflush:            rd      %psr, %g2
_swift_vac_rgnflush+4:          andn    %g2, 0x20, %g5
_swift_vac_rgnflush+8:          mov     %g5, %psr
_swift_vac_rgnflush+0xc:        nop
_swift_vac_rgnflush+0x10:       nop
_swift_vac_rgnflush+0x14:       mov     0x100, %g5
_swift_vac_rgnflush+0x18:       lda     [%g5] 0x4, %g5
_swift_vac_rgnflush+0x1c:       sll     %o2, 0x2, %g1
_swift_vac_rgnflush+0x20:       sll     %g5, 0x4, %g5
_swift_vac_rgnflush+0x24:       add     %g5, %g1, %g5
_swift_vac_rgnflush+0x28:       lda     [%g5] 0x20, %g5

_swift_vac_rgnflush+0x28: is the fatal one.

kadb> $c
_swift_vac_rgnflush(?)
_vac_rgnflush() + 4
_hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70
_startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + 1414
_main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) + 14

Unfortunately (but not surprisingly) kadb doesn't allow debugging
cache-flush code, so I can't check what is in
[%g5] (aka sfar) on the real machine when this happens.

But the bug in phys_page_find would explain this accesses: sfar gets
the wrong address, and then the secondary access happens on this wrong
address instead of the original one.

fwiw the routine is called only once on the real hardware. It sort of
speaks for your hypothesis about the memory probing. Although it may
not necessarily probe for memory...
Blue Swirl May 10, 2010, 6:27 p.m. UTC | #5
On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>  > On 5/9/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>  >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>  >>
>  >> > On 5/8/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>  >>  >> On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
>  >>  >>  Software shouldn't use aliased addresses, neither should it crash
>  >>  >>  when it uses (on the real hardware it wouldn't). Using empty_slot
>  >>  >>  instead of aliasing can help with debugging such accesses.
>  >>  >
>  >>  > TurboSPARC Microprocessor User's Manual shows that there are
>  >>  > additional pages after the main IOMMU for AFX registers. So this is
>  >>  > not board specific, but depends on CPU/IOMMU versions.
>  >>
>  >>
>  >> I checked it on the real hw: on LX and SS-5 these are aliased MMU addresses.
>  >>  SS-20 doesn't have any aliasing.
>  >
>  > But are your machines equipped with TurboSPARC or some other CPU?
>
>
> Good point, I must confess, I missed the word "Turbo" in your first
>  answer. LX and SS-20 don't.
>  But SS-5 must have a TurboSPARC CPU:
>
>  ok cd /FMI,MB86904
>  ok .attributes
>  context-table            00 00 00 00 03 ff f0 00 00 00 10 00
>  psr-implementation       00000000
>  psr-version              00000004
>  implementation           00000000
>  version                  00000004
>  cache-line-size          00000020
>  cache-nlines             00000200
>  page-size                00001000
>  dcache-line-size         00000010
>  dcache-nlines            00000200
>  dcache-associativity     00000001
>  icache-line-size         00000020
>  icache-nlines            00000200
>  icache-associativity     00000001
>  ncaches                  00000002
>  mmu-nctx                 00000100
>  sparc-version            00000008
>  mask_rev                 00000026
>  device_type              cpu
>  name                     FMI,MB86904
>
>  and still it behaves the same as TI,TMS390S10 from the LX. This is done on SS-5:
>
>  ok 10000000 20 spacel@ .
>  4000009
>  ok 14000000 20 spacel@ .
>  4000009
>  ok 14000004 20 spacel@ .
>  23000
>  ok 1f000004 20 spacel@ .
>  23000
>  ok 10000008 20 spacel@ .
>  4000009
>  ok 14000028 20 spacel@ .
>  4000009
>  ok 1000000c 20 spacel@ .
>  23000
>  ok 10000010 20 spacel@ .
>  4000009
>
>
>  LX is the same except for the IOMMU-version:
>
>  ok 10000000 20 spacel@ .
>  4000005
>  ok 14000000 20 spacel@ .
>  4000005
>  ok 18000000 20 spacel@ .
>  4000005
>  ok 1f000000 20 spacel@ .
>  4000005
>  ok 1ff00000 20 spacel@ .
>  4000005
>  ok 1fff0004 20 spacel@ .
>  1fe000
>  ok 10000004 20 spacel@ .
>  1fe000
>  ok 10000108 20 spacel@ .
>  41000005
>  ok 10000040 20 spacel@ .
>  41000005
>  ok 1fff0040 20 spacel@ .
>  41000005
>  ok 1fff0044 20 spacel@ .
>  1fe000
>  ok 1fff0024 20 spacel@ .
>  1fe000
>
>
>  >>  At what address the additional AFX registers are located?
>  >
>  > Here's complete TurboSPARC IOMMU address map:
>  >  PA[30:0]          Register          Access
>  > 1000_0000       IOMMU Control         R/W
>  > 1000_0004    IOMMU Base Address       R/W
>  > 1000_0014   Flush All IOTLB Entries    W
>  > 1000_0018        Address Flush         W
>  > 1000_1000  Asynchronous Fault Status  R/W
>  > 1000_1004 Asynchronous Fault Address  R/W
>  > 1000_1010  SBus Slot Configuration 0   R/W
>  > 1000_1014  SBus Slot Configuration 1   R/W
>  > 1000_1018  SBus Slot Configuration 2   R/W
>  > 1000_101C  SBus Slot Configuration 3   R/W
>  > 1000_1020  SBus Slot Configuration 4   R/W
>  > 1000_1050     Memory Fault Status     R/W
>  > 1000_1054    Memory Fault Address     R/W
>  > 1000_2000     Module Identification    R/W
>  > 1000_3018      Mask Identification      R
>  > 1000_4000      AFX Queue Level         W
>  > 1000_6000      AFX Queue Level         R
>  > 1000_7000      AFX Queue Status        R
>
>
>
> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m 32) is
>  well above this limit.

Oh, so I also misread something. You are not talking about the
adjacent pages, but 16MB increments.

Earlier I sent a patch for a generic address alias device, would it be
useful for this?

Maybe we have a general design problem, perhaps unassigned access
faults should only be triggered inside SBus slots and ignored
elsewhere. If this is true, generic Sparc32 unassigned access handler
should just ignore the access and special fault generating slots
should be installed for empty SBus address ranges.

>  >>  > One approach would be that IOMMU_NREGS would be increased to cover
>  >>  > these registers (with the bump in savevm version field) and
>  >>  > iommu_init1() should check the version field to see how much MMIO to
>  >>  > provide.
>  >>
>  >>
>  >> The problem I see here is that we already have too much registers: we
>  >>  emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have only
>  >>  0x20 registers which are aliased all the way.
>  >>
>  >>
>  >>  > But in order to avoid the savevm version change, iommu_init1() could
>  >>  > just install dummy MMIO (in the TurboSPARC case), if OBP does not care
>  >>  > if the read back data matches what has been written earlier. Because
>  >>  > from OBP point of view this is identical to what your patch results
>  >>  > in, I'd suppose this approach would also work.
>  >>
>  >>
>  >> OBP doesn't seem to care about these addresses at all. It's only the "MUNIX"
>  >>  SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only kernel available
>  >>  during the installation, so it is currently not possible to install 4.1.4.
>  >>  Surprisingly "GENERIC" kernel which is on the disk after the
>  >>  installation doesn't
>  >>  try to access these address ranges either, so a disk image taken from a live
>  >>  system works.
>  >>
>  >>  Actually access to the non-connected/aliased addresses may also be a
>  >>  consequence of phys_page_find bug I mentioned before. When I run
>  >>  install with -m 64 and -m 256 it tries to access different
>  >>  non-connected addresses. May also be a SunOS bug of course. 256m used
>  >>  to be a lot back then.
>  >
>  > Perhaps with 256MB, memory probing advances blindly from memory to
>  > IOMMU registers. Proll (used before OpenBIOS) did that once, with bad
>  > results :-). If this is true, 64M, 128M and 192M should show identical
>  > results and only with close or equal to 256M the accesses happen.
>
>
> 32m: 0x12fff294
>  64m: 0x14fff294
>  192m:0x1cfff294
>  256m:0x20fff294
>
>  Memory probing? It would be strange that OS would do it itself. The OS
>  could just
>  ask OBP how much does it have. Here is the listing where it happens:
>
>  _swift_vac_rgnflush:            rd      %psr, %g2
>  _swift_vac_rgnflush+4:          andn    %g2, 0x20, %g5
>  _swift_vac_rgnflush+8:          mov     %g5, %psr
>  _swift_vac_rgnflush+0xc:        nop
>  _swift_vac_rgnflush+0x10:       nop
>  _swift_vac_rgnflush+0x14:       mov     0x100, %g5
>  _swift_vac_rgnflush+0x18:       lda     [%g5] 0x4, %g5
>  _swift_vac_rgnflush+0x1c:       sll     %o2, 0x2, %g1
>  _swift_vac_rgnflush+0x20:       sll     %g5, 0x4, %g5
>  _swift_vac_rgnflush+0x24:       add     %g5, %g1, %g5
>  _swift_vac_rgnflush+0x28:       lda     [%g5] 0x20, %g5
>
>  _swift_vac_rgnflush+0x28: is the fatal one.
>
>  kadb> $c
>  _swift_vac_rgnflush(?)
>  _vac_rgnflush() + 4
>  _hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70
>  _startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + 1414
>  _main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) + 14
>
>  Unfortunately (but not surprisingly) kadb doesn't allow debugging
>  cache-flush code, so I can't check what is in
>  [%g5] (aka sfar) on the real machine when this happens.

Linux code for Swift/TurboSPARC VAC flush should be similar.

>  But the bug in phys_page_find would explain this accesses: sfar gets
>  the wrong address, and then the secondary access happens on this wrong
>  address instead of the original one.

I doubt phys_page_find can be buggy, it is so vital for all architecture.

>  fwiw the routine is called only once on the real hardware. It sort of
>  speaks for your hypothesis about the memory probing. Although it may
>  not necessarily probe for memory...
>
>
>  --
>  Regards,
>  Artyom Tarasenko
>
>  solaris/sparc under qemu blog: http://tyom.blogspot.com/
>
Artyom Tarasenko May 10, 2010, 8:51 p.m. UTC | #6
2010/5/10 Blue Swirl <blauwirbel@gmail.com>:
> On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>  > On 5/9/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>  >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>  >>
>>  >> > On 5/8/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>  >>  >> On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
>>  >>  >>  Software shouldn't use aliased addresses, neither should it crash
>>  >>  >>  when it uses (on the real hardware it wouldn't). Using empty_slot
>>  >>  >>  instead of aliasing can help with debugging such accesses.
>>  >>  >
>>  >>  > TurboSPARC Microprocessor User's Manual shows that there are
>>  >>  > additional pages after the main IOMMU for AFX registers. So this is
>>  >>  > not board specific, but depends on CPU/IOMMU versions.
>>  >>
>>  >>
>>  >> I checked it on the real hw: on LX and SS-5 these are aliased MMU addresses.
>>  >>  SS-20 doesn't have any aliasing.
>>  >
>>  > But are your machines equipped with TurboSPARC or some other CPU?
>>
>>
>> Good point, I must confess, I missed the word "Turbo" in your first
>>  answer. LX and SS-20 don't.
>>  But SS-5 must have a TurboSPARC CPU:
>>
>>  ok cd /FMI,MB86904
>>  ok .attributes
>>  context-table            00 00 00 00 03 ff f0 00 00 00 10 00
>>  psr-implementation       00000000
>>  psr-version              00000004
>>  implementation           00000000
>>  version                  00000004
>>  cache-line-size          00000020
>>  cache-nlines             00000200
>>  page-size                00001000
>>  dcache-line-size         00000010
>>  dcache-nlines            00000200
>>  dcache-associativity     00000001
>>  icache-line-size         00000020
>>  icache-nlines            00000200
>>  icache-associativity     00000001
>>  ncaches                  00000002
>>  mmu-nctx                 00000100
>>  sparc-version            00000008
>>  mask_rev                 00000026
>>  device_type              cpu
>>  name                     FMI,MB86904
>>
>>  and still it behaves the same as TI,TMS390S10 from the LX. This is done on SS-5:
>>
>>  ok 10000000 20 spacel@ .
>>  4000009
>>  ok 14000000 20 spacel@ .
>>  4000009
>>  ok 14000004 20 spacel@ .
>>  23000
>>  ok 1f000004 20 spacel@ .
>>  23000
>>  ok 10000008 20 spacel@ .
>>  4000009
>>  ok 14000028 20 spacel@ .
>>  4000009
>>  ok 1000000c 20 spacel@ .
>>  23000
>>  ok 10000010 20 spacel@ .
>>  4000009
>>
>>
>>  LX is the same except for the IOMMU-version:
>>
>>  ok 10000000 20 spacel@ .
>>  4000005
>>  ok 14000000 20 spacel@ .
>>  4000005
>>  ok 18000000 20 spacel@ .
>>  4000005
>>  ok 1f000000 20 spacel@ .
>>  4000005
>>  ok 1ff00000 20 spacel@ .
>>  4000005
>>  ok 1fff0004 20 spacel@ .
>>  1fe000
>>  ok 10000004 20 spacel@ .
>>  1fe000
>>  ok 10000108 20 spacel@ .
>>  41000005
>>  ok 10000040 20 spacel@ .
>>  41000005
>>  ok 1fff0040 20 spacel@ .
>>  41000005
>>  ok 1fff0044 20 spacel@ .
>>  1fe000
>>  ok 1fff0024 20 spacel@ .
>>  1fe000
>>
>>
>>  >>  At what address the additional AFX registers are located?
>>  >
>>  > Here's complete TurboSPARC IOMMU address map:
>>  >  PA[30:0]          Register          Access
>>  > 1000_0000       IOMMU Control         R/W
>>  > 1000_0004    IOMMU Base Address       R/W
>>  > 1000_0014   Flush All IOTLB Entries    W
>>  > 1000_0018        Address Flush         W
>>  > 1000_1000  Asynchronous Fault Status  R/W
>>  > 1000_1004 Asynchronous Fault Address  R/W
>>  > 1000_1010  SBus Slot Configuration 0   R/W
>>  > 1000_1014  SBus Slot Configuration 1   R/W
>>  > 1000_1018  SBus Slot Configuration 2   R/W
>>  > 1000_101C  SBus Slot Configuration 3   R/W
>>  > 1000_1020  SBus Slot Configuration 4   R/W
>>  > 1000_1050     Memory Fault Status     R/W
>>  > 1000_1054    Memory Fault Address     R/W
>>  > 1000_2000     Module Identification    R/W
>>  > 1000_3018      Mask Identification      R
>>  > 1000_4000      AFX Queue Level         W
>>  > 1000_6000      AFX Queue Level         R
>>  > 1000_7000      AFX Queue Status        R
>>
>>
>>
>> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m 32) is
>>  well above this limit.
>
> Oh, so I also misread something. You are not talking about the
> adjacent pages, but 16MB increments.
>
> Earlier I sent a patch for a generic address alias device, would it be
> useful for this?

Should do as well. But I thought empty_slot is less overhead and
easier to debug.

> Maybe we have a general design problem, perhaps unassigned access
> faults should only be triggered inside SBus slots and ignored
> elsewhere. If this is true, generic Sparc32 unassigned access handler
> should just ignore the access and special fault generating slots
> should be installed for empty SBus address ranges.

My impression was that SS-5 and SS-20 do unassigned accesses a bit differently.
The current IOMMU implementation fits SS-20, which has no aliasing.

>>  >>  > One approach would be that IOMMU_NREGS would be increased to cover
>>  >>  > these registers (with the bump in savevm version field) and
>>  >>  > iommu_init1() should check the version field to see how much MMIO to
>>  >>  > provide.
>>  >>
>>  >>
>>  >> The problem I see here is that we already have too much registers: we
>>  >>  emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have only
>>  >>  0x20 registers which are aliased all the way.
>>  >>
>>  >>
>>  >>  > But in order to avoid the savevm version change, iommu_init1() could
>>  >>  > just install dummy MMIO (in the TurboSPARC case), if OBP does not care
>>  >>  > if the read back data matches what has been written earlier. Because
>>  >>  > from OBP point of view this is identical to what your patch results
>>  >>  > in, I'd suppose this approach would also work.
>>  >>
>>  >>
>>  >> OBP doesn't seem to care about these addresses at all. It's only the "MUNIX"
>>  >>  SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only kernel available
>>  >>  during the installation, so it is currently not possible to install 4.1.4.
>>  >>  Surprisingly "GENERIC" kernel which is on the disk after the
>>  >>  installation doesn't
>>  >>  try to access these address ranges either, so a disk image taken from a live
>>  >>  system works.
>>  >>
>>  >>  Actually access to the non-connected/aliased addresses may also be a
>>  >>  consequence of phys_page_find bug I mentioned before. When I run
>>  >>  install with -m 64 and -m 256 it tries to access different
>>  >>  non-connected addresses. May also be a SunOS bug of course. 256m used
>>  >>  to be a lot back then.
>>  >
>>  > Perhaps with 256MB, memory probing advances blindly from memory to
>>  > IOMMU registers. Proll (used before OpenBIOS) did that once, with bad
>>  > results :-). If this is true, 64M, 128M and 192M should show identical
>>  > results and only with close or equal to 256M the accesses happen.
>>
>>
>> 32m: 0x12fff294
>>  64m: 0x14fff294
>>  192m:0x1cfff294
>>  256m:0x20fff294
>>
>>  Memory probing? It would be strange that OS would do it itself. The OS
>>  could just
>>  ask OBP how much does it have. Here is the listing where it happens:
>>
>>  _swift_vac_rgnflush:            rd      %psr, %g2
>>  _swift_vac_rgnflush+4:          andn    %g2, 0x20, %g5
>>  _swift_vac_rgnflush+8:          mov     %g5, %psr
>>  _swift_vac_rgnflush+0xc:        nop
>>  _swift_vac_rgnflush+0x10:       nop
>>  _swift_vac_rgnflush+0x14:       mov     0x100, %g5
>>  _swift_vac_rgnflush+0x18:       lda     [%g5] 0x4, %g5
>>  _swift_vac_rgnflush+0x1c:       sll     %o2, 0x2, %g1
>>  _swift_vac_rgnflush+0x20:       sll     %g5, 0x4, %g5
>>  _swift_vac_rgnflush+0x24:       add     %g5, %g1, %g5
>>  _swift_vac_rgnflush+0x28:       lda     [%g5] 0x20, %g5
>>
>>  _swift_vac_rgnflush+0x28: is the fatal one.
>>
>>  kadb> $c
>>  _swift_vac_rgnflush(?)
>>  _vac_rgnflush() + 4
>>  _hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70
>>  _startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + 1414
>>  _main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) + 14
>>
>>  Unfortunately (but not surprisingly) kadb doesn't allow debugging
>>  cache-flush code, so I can't check what is in
>>  [%g5] (aka sfar) on the real machine when this happens.
>
> Linux code for Swift/TurboSPARC VAC flush should be similar.
>
>>  But the bug in phys_page_find would explain this accesses: sfar gets
>>  the wrong address, and then the secondary access happens on this wrong
>>  address instead of the original one.
>
> I doubt phys_page_find can be buggy, it is so vital for all architecture.

But you've seen the example of buggy behaviour I posted last Friday, right?
If it's not phys_page_find, it's either cpu_physical_memory_rw (which
is also pretty generic), or
the way SS-20 registers devices. Can it be that all the pages must be
registered in the proper order?

I think it's a pretty rare use case where you have a memory fault (not
a translation fault) on an unknown address. You may have such fault
during device probing, but in such case you know what address you are
probing, so you don't care about the sync fault address register.

Besides, do all architectures have sync fault address register?

>>  fwiw the routine is called only once on the real hardware. It sort of
>>  speaks for your hypothesis about the memory probing. Although it may
>>  not necessarily probe for memory...
>>
>>
Blue Swirl May 10, 2010, 9:05 p.m. UTC | #7
On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
> 2010/5/10 Blue Swirl <blauwirbel@gmail.com>:
>
> > On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>  >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>  >>  > On 5/9/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>  >>  >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>  >>  >>
>  >>  >> > On 5/8/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>  >>  >>  >> On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
>  >>  >>  >>  Software shouldn't use aliased addresses, neither should it crash
>  >>  >>  >>  when it uses (on the real hardware it wouldn't). Using empty_slot
>  >>  >>  >>  instead of aliasing can help with debugging such accesses.
>  >>  >>  >
>  >>  >>  > TurboSPARC Microprocessor User's Manual shows that there are
>  >>  >>  > additional pages after the main IOMMU for AFX registers. So this is
>  >>  >>  > not board specific, but depends on CPU/IOMMU versions.
>  >>  >>
>  >>  >>
>  >>  >> I checked it on the real hw: on LX and SS-5 these are aliased MMU addresses.
>  >>  >>  SS-20 doesn't have any aliasing.
>  >>  >
>  >>  > But are your machines equipped with TurboSPARC or some other CPU?
>  >>
>  >>
>  >> Good point, I must confess, I missed the word "Turbo" in your first
>  >>  answer. LX and SS-20 don't.
>  >>  But SS-5 must have a TurboSPARC CPU:
>  >>
>  >>  ok cd /FMI,MB86904
>  >>  ok .attributes
>  >>  context-table            00 00 00 00 03 ff f0 00 00 00 10 00
>  >>  psr-implementation       00000000
>  >>  psr-version              00000004
>  >>  implementation           00000000
>  >>  version                  00000004
>  >>  cache-line-size          00000020
>  >>  cache-nlines             00000200
>  >>  page-size                00001000
>  >>  dcache-line-size         00000010
>  >>  dcache-nlines            00000200
>  >>  dcache-associativity     00000001
>  >>  icache-line-size         00000020
>  >>  icache-nlines            00000200
>  >>  icache-associativity     00000001
>  >>  ncaches                  00000002
>  >>  mmu-nctx                 00000100
>  >>  sparc-version            00000008
>  >>  mask_rev                 00000026
>  >>  device_type              cpu
>  >>  name                     FMI,MB86904
>  >>
>  >>  and still it behaves the same as TI,TMS390S10 from the LX. This is done on SS-5:
>  >>
>  >>  ok 10000000 20 spacel@ .
>  >>  4000009
>  >>  ok 14000000 20 spacel@ .
>  >>  4000009
>  >>  ok 14000004 20 spacel@ .
>  >>  23000
>  >>  ok 1f000004 20 spacel@ .
>  >>  23000
>  >>  ok 10000008 20 spacel@ .
>  >>  4000009
>  >>  ok 14000028 20 spacel@ .
>  >>  4000009
>  >>  ok 1000000c 20 spacel@ .
>  >>  23000
>  >>  ok 10000010 20 spacel@ .
>  >>  4000009
>  >>
>  >>
>  >>  LX is the same except for the IOMMU-version:
>  >>
>  >>  ok 10000000 20 spacel@ .
>  >>  4000005
>  >>  ok 14000000 20 spacel@ .
>  >>  4000005
>  >>  ok 18000000 20 spacel@ .
>  >>  4000005
>  >>  ok 1f000000 20 spacel@ .
>  >>  4000005
>  >>  ok 1ff00000 20 spacel@ .
>  >>  4000005
>  >>  ok 1fff0004 20 spacel@ .
>  >>  1fe000
>  >>  ok 10000004 20 spacel@ .
>  >>  1fe000
>  >>  ok 10000108 20 spacel@ .
>  >>  41000005
>  >>  ok 10000040 20 spacel@ .
>  >>  41000005
>  >>  ok 1fff0040 20 spacel@ .
>  >>  41000005
>  >>  ok 1fff0044 20 spacel@ .
>  >>  1fe000
>  >>  ok 1fff0024 20 spacel@ .
>  >>  1fe000
>  >>
>  >>
>  >>  >>  At what address the additional AFX registers are located?
>  >>  >
>  >>  > Here's complete TurboSPARC IOMMU address map:
>  >>  >  PA[30:0]          Register          Access
>  >>  > 1000_0000       IOMMU Control         R/W
>  >>  > 1000_0004    IOMMU Base Address       R/W
>  >>  > 1000_0014   Flush All IOTLB Entries    W
>  >>  > 1000_0018        Address Flush         W
>  >>  > 1000_1000  Asynchronous Fault Status  R/W
>  >>  > 1000_1004 Asynchronous Fault Address  R/W
>  >>  > 1000_1010  SBus Slot Configuration 0   R/W
>  >>  > 1000_1014  SBus Slot Configuration 1   R/W
>  >>  > 1000_1018  SBus Slot Configuration 2   R/W
>  >>  > 1000_101C  SBus Slot Configuration 3   R/W
>  >>  > 1000_1020  SBus Slot Configuration 4   R/W
>  >>  > 1000_1050     Memory Fault Status     R/W
>  >>  > 1000_1054    Memory Fault Address     R/W
>  >>  > 1000_2000     Module Identification    R/W
>  >>  > 1000_3018      Mask Identification      R
>  >>  > 1000_4000      AFX Queue Level         W
>  >>  > 1000_6000      AFX Queue Level         R
>  >>  > 1000_7000      AFX Queue Status        R
>  >>
>  >>
>  >>
>  >> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m 32) is
>  >>  well above this limit.
>  >
>  > Oh, so I also misread something. You are not talking about the
>  > adjacent pages, but 16MB increments.
>  >
>  > Earlier I sent a patch for a generic address alias device, would it be
>  > useful for this?
>
>
> Should do as well. But I thought empty_slot is less overhead and
>  easier to debug.
>
>
>  > Maybe we have a general design problem, perhaps unassigned access
>  > faults should only be triggered inside SBus slots and ignored
>  > elsewhere. If this is true, generic Sparc32 unassigned access handler
>  > should just ignore the access and special fault generating slots
>  > should be installed for empty SBus address ranges.
>
>
> My impression was that SS-5 and SS-20 do unassigned accesses a bit differently.
>  The current IOMMU implementation fits SS-20, which has no aliasing.

It's probably rather the board design than just IOMMU.

>  >>  >>  > One approach would be that IOMMU_NREGS would be increased to cover
>  >>  >>  > these registers (with the bump in savevm version field) and
>  >>  >>  > iommu_init1() should check the version field to see how much MMIO to
>  >>  >>  > provide.
>  >>  >>
>  >>  >>
>  >>  >> The problem I see here is that we already have too much registers: we
>  >>  >>  emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have only
>  >>  >>  0x20 registers which are aliased all the way.
>  >>  >>
>  >>  >>
>  >>  >>  > But in order to avoid the savevm version change, iommu_init1() could
>  >>  >>  > just install dummy MMIO (in the TurboSPARC case), if OBP does not care
>  >>  >>  > if the read back data matches what has been written earlier. Because
>  >>  >>  > from OBP point of view this is identical to what your patch results
>  >>  >>  > in, I'd suppose this approach would also work.
>  >>  >>
>  >>  >>
>  >>  >> OBP doesn't seem to care about these addresses at all. It's only the "MUNIX"
>  >>  >>  SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only kernel available
>  >>  >>  during the installation, so it is currently not possible to install 4.1.4.
>  >>  >>  Surprisingly "GENERIC" kernel which is on the disk after the
>  >>  >>  installation doesn't
>  >>  >>  try to access these address ranges either, so a disk image taken from a live
>  >>  >>  system works.
>  >>  >>
>  >>  >>  Actually access to the non-connected/aliased addresses may also be a
>  >>  >>  consequence of phys_page_find bug I mentioned before. When I run
>  >>  >>  install with -m 64 and -m 256 it tries to access different
>  >>  >>  non-connected addresses. May also be a SunOS bug of course. 256m used
>  >>  >>  to be a lot back then.
>  >>  >
>  >>  > Perhaps with 256MB, memory probing advances blindly from memory to
>  >>  > IOMMU registers. Proll (used before OpenBIOS) did that once, with bad
>  >>  > results :-). If this is true, 64M, 128M and 192M should show identical
>  >>  > results and only with close or equal to 256M the accesses happen.
>  >>
>  >>
>  >> 32m: 0x12fff294
>  >>  64m: 0x14fff294
>  >>  192m:0x1cfff294
>  >>  256m:0x20fff294
>  >>
>  >>  Memory probing? It would be strange that OS would do it itself. The OS
>  >>  could just
>  >>  ask OBP how much does it have. Here is the listing where it happens:
>  >>
>  >>  _swift_vac_rgnflush:            rd      %psr, %g2
>  >>  _swift_vac_rgnflush+4:          andn    %g2, 0x20, %g5
>  >>  _swift_vac_rgnflush+8:          mov     %g5, %psr
>  >>  _swift_vac_rgnflush+0xc:        nop
>  >>  _swift_vac_rgnflush+0x10:       nop
>  >>  _swift_vac_rgnflush+0x14:       mov     0x100, %g5
>  >>  _swift_vac_rgnflush+0x18:       lda     [%g5] 0x4, %g5
>  >>  _swift_vac_rgnflush+0x1c:       sll     %o2, 0x2, %g1
>  >>  _swift_vac_rgnflush+0x20:       sll     %g5, 0x4, %g5
>  >>  _swift_vac_rgnflush+0x24:       add     %g5, %g1, %g5
>  >>  _swift_vac_rgnflush+0x28:       lda     [%g5] 0x20, %g5
>  >>
>  >>  _swift_vac_rgnflush+0x28: is the fatal one.
>  >>
>  >>  kadb> $c
>  >>  _swift_vac_rgnflush(?)
>  >>  _vac_rgnflush() + 4
>  >>  _hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70
>  >>  _startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + 1414
>  >>  _main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) + 14
>  >>
>  >>  Unfortunately (but not surprisingly) kadb doesn't allow debugging
>  >>  cache-flush code, so I can't check what is in
>  >>  [%g5] (aka sfar) on the real machine when this happens.
>  >
>  > Linux code for Swift/TurboSPARC VAC flush should be similar.
>  >
>  >>  But the bug in phys_page_find would explain this accesses: sfar gets
>  >>  the wrong address, and then the secondary access happens on this wrong
>  >>  address instead of the original one.
>  >
>  > I doubt phys_page_find can be buggy, it is so vital for all architecture.
>
>
> But you've seen the example of buggy behaviour I posted last Friday, right?
>  If it's not phys_page_find, it's either cpu_physical_memory_rw (which
>  is also pretty generic), or
>  the way SS-20 registers devices. Can it be that all the pages must be
>  registered in the proper order?

How about unassigned access handler, could it be suspected?

>  I think it's a pretty rare use case where you have a memory fault (not
>  a translation fault) on an unknown address. You may have such fault
>  during device probing, but in such case you know what address you are
>  probing, so you don't care about the sync fault address register.
>
>  Besides, do all architectures have sync fault address register?

No, I think system level checks like that and IOMMU-like controls on
most architectures are very poor compared to Sparc32. Server and
mainframe systems may be a bit better.

>  >>  fwiw the routine is called only once on the real hardware. It sort of
>  >>  speaks for your hypothesis about the memory probing. Although it may
>  >>  not necessarily probe for memory...
>  >>
>  >>
>
>  --
>  Regards,
>  Artyom Tarasenko
>
>  solaris/sparc under qemu blog: http://tyom.blogspot.com/
>
Artyom Tarasenko May 21, 2010, 5:23 p.m. UTC | #8
2010/5/10 Blue Swirl <blauwirbel@gmail.com>:
> On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>> 2010/5/10 Blue Swirl <blauwirbel@gmail.com>:
>>
>> > On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>  >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>  >>  > On 5/9/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>  >>  >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>  >>  >>
>>  >>  >> > On 5/8/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>  >>  >>  >> On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
>>  >>  >>  >>  Software shouldn't use aliased addresses, neither should it crash
>>  >>  >>  >>  when it uses (on the real hardware it wouldn't). Using empty_slot
>>  >>  >>  >>  instead of aliasing can help with debugging such accesses.
>>  >>  >>  >
>>  >>  >>  > TurboSPARC Microprocessor User's Manual shows that there are
>>  >>  >>  > additional pages after the main IOMMU for AFX registers. So this is
>>  >>  >>  > not board specific, but depends on CPU/IOMMU versions.
>>  >>  >>
>>  >>  >>
>>  >>  >> I checked it on the real hw: on LX and SS-5 these are aliased MMU addresses.
>>  >>  >>  SS-20 doesn't have any aliasing.
>>  >>  >
>>  >>  > But are your machines equipped with TurboSPARC or some other CPU?
>>  >>
>>  >>
>>  >> Good point, I must confess, I missed the word "Turbo" in your first
>>  >>  answer. LX and SS-20 don't.
>>  >>  But SS-5 must have a TurboSPARC CPU:
>>  >>
>>  >>  ok cd /FMI,MB86904
>>  >>  ok .attributes
>>  >>  context-table            00 00 00 00 03 ff f0 00 00 00 10 00
>>  >>  psr-implementation       00000000
>>  >>  psr-version              00000004
>>  >>  implementation           00000000
>>  >>  version                  00000004
>>  >>  cache-line-size          00000020
>>  >>  cache-nlines             00000200
>>  >>  page-size                00001000
>>  >>  dcache-line-size         00000010
>>  >>  dcache-nlines            00000200
>>  >>  dcache-associativity     00000001
>>  >>  icache-line-size         00000020
>>  >>  icache-nlines            00000200
>>  >>  icache-associativity     00000001
>>  >>  ncaches                  00000002
>>  >>  mmu-nctx                 00000100
>>  >>  sparc-version            00000008
>>  >>  mask_rev                 00000026
>>  >>  device_type              cpu
>>  >>  name                     FMI,MB86904
>>  >>
>>  >>  and still it behaves the same as TI,TMS390S10 from the LX. This is done on SS-5:
>>  >>
>>  >>  ok 10000000 20 spacel@ .
>>  >>  4000009
>>  >>  ok 14000000 20 spacel@ .
>>  >>  4000009
>>  >>  ok 14000004 20 spacel@ .
>>  >>  23000
>>  >>  ok 1f000004 20 spacel@ .
>>  >>  23000
>>  >>  ok 10000008 20 spacel@ .
>>  >>  4000009
>>  >>  ok 14000028 20 spacel@ .
>>  >>  4000009
>>  >>  ok 1000000c 20 spacel@ .
>>  >>  23000
>>  >>  ok 10000010 20 spacel@ .
>>  >>  4000009
>>  >>
>>  >>
>>  >>  LX is the same except for the IOMMU-version:
>>  >>
>>  >>  ok 10000000 20 spacel@ .
>>  >>  4000005
>>  >>  ok 14000000 20 spacel@ .
>>  >>  4000005
>>  >>  ok 18000000 20 spacel@ .
>>  >>  4000005
>>  >>  ok 1f000000 20 spacel@ .
>>  >>  4000005
>>  >>  ok 1ff00000 20 spacel@ .
>>  >>  4000005
>>  >>  ok 1fff0004 20 spacel@ .
>>  >>  1fe000
>>  >>  ok 10000004 20 spacel@ .
>>  >>  1fe000
>>  >>  ok 10000108 20 spacel@ .
>>  >>  41000005
>>  >>  ok 10000040 20 spacel@ .
>>  >>  41000005
>>  >>  ok 1fff0040 20 spacel@ .
>>  >>  41000005
>>  >>  ok 1fff0044 20 spacel@ .
>>  >>  1fe000
>>  >>  ok 1fff0024 20 spacel@ .
>>  >>  1fe000
>>  >>
>>  >>
>>  >>  >>  At what address the additional AFX registers are located?
>>  >>  >
>>  >>  > Here's complete TurboSPARC IOMMU address map:
>>  >>  >  PA[30:0]          Register          Access
>>  >>  > 1000_0000       IOMMU Control         R/W
>>  >>  > 1000_0004    IOMMU Base Address       R/W
>>  >>  > 1000_0014   Flush All IOTLB Entries    W
>>  >>  > 1000_0018        Address Flush         W
>>  >>  > 1000_1000  Asynchronous Fault Status  R/W
>>  >>  > 1000_1004 Asynchronous Fault Address  R/W
>>  >>  > 1000_1010  SBus Slot Configuration 0   R/W
>>  >>  > 1000_1014  SBus Slot Configuration 1   R/W
>>  >>  > 1000_1018  SBus Slot Configuration 2   R/W
>>  >>  > 1000_101C  SBus Slot Configuration 3   R/W
>>  >>  > 1000_1020  SBus Slot Configuration 4   R/W
>>  >>  > 1000_1050     Memory Fault Status     R/W
>>  >>  > 1000_1054    Memory Fault Address     R/W
>>  >>  > 1000_2000     Module Identification    R/W
>>  >>  > 1000_3018      Mask Identification      R
>>  >>  > 1000_4000      AFX Queue Level         W
>>  >>  > 1000_6000      AFX Queue Level         R
>>  >>  > 1000_7000      AFX Queue Status        R
>>  >>
>>  >>
>>  >>
>>  >> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m 32) is
>>  >>  well above this limit.
>>  >
>>  > Oh, so I also misread something. You are not talking about the
>>  > adjacent pages, but 16MB increments.
>>  >
>>  > Earlier I sent a patch for a generic address alias device, would it be
>>  > useful for this?
>>
>>
>> Should do as well. But I thought empty_slot is less overhead and
>>  easier to debug.
>>

Also the aliasing patch would require one more parameter: the size of
area which has to be aliased. Except we implement stubs for all
missing devices and and do aliasing of the connected port ranges. And
then again, SS-20 doesn't have aliasing in this area at all.

What do you think about this (empty_slot) solution (except that I
missed the SoB line)? Meanwhile it's tested with SunOS 4.1.3U1 too.

>>> Maybe we have a general design problem, perhaps unassigned access
>>> faults should only be triggered inside SBus slots and ignored
>>> elsewhere. If this is true, generic Sparc32 unassigned access handler
>>> should just ignore the access and special fault generating slots
>>> should be installed for empty SBus address ranges.

Agreed that they should be special for SBus, because SS-20 OBP is
not happy with the fault we are currently generating. But otherwise I think qemu
does it correct. On SS-5:

ok f7ff0000 2f spacel@ .
Data Access Error
ok sfar@ .
f7ff0000
ok 20000000 2f spacel@ .
Data Access Error
ok sfar@ .
20000000
ok 40000000 20 spacel@ .
Data Access Error
ok sfar@ .
40000000

Neither ff7ff0000 nor f20000000, nor 40000000 are in SBus range,  right?

>> My impression was that SS-5 and SS-20 do unassigned accesses a bit differently.
>>  The current IOMMU implementation fits SS-20, which has no aliasing.
>
> It's probably rather the board design than just IOMMU.

Agreed. That's why I bound the patch to machine hwdef  and not to iommu.

>>  >>  >>  > One approach would be that IOMMU_NREGS would be increased to cover
>>  >>  >>  > these registers (with the bump in savevm version field) and
>>  >>  >>  > iommu_init1() should check the version field to see how much MMIO to
>>  >>  >>  > provide.
>>  >>  >>
>>  >>  >>
>>  >>  >> The problem I see here is that we already have too much registers: we
>>  >>  >>  emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have only
>>  >>  >>  0x20 registers which are aliased all the way.
>>  >>  >>
>>  >>  >>
>>  >>  >>  > But in order to avoid the savevm version change, iommu_init1() could
>>  >>  >>  > just install dummy MMIO (in the TurboSPARC case), if OBP does not care
>>  >>  >>  > if the read back data matches what has been written earlier. Because
>>  >>  >>  > from OBP point of view this is identical to what your patch results
>>  >>  >>  > in, I'd suppose this approach would also work.
>>  >>  >>
>>  >>  >>
>>  >>  >> OBP doesn't seem to care about these addresses at all. It's only the "MUNIX"
>>  >>  >>  SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only kernel available
>>  >>  >>  during the installation, so it is currently not possible to install 4.1.4.
>>  >>  >>  Surprisingly "GENERIC" kernel which is on the disk after the
>>  >>  >>  installation doesn't
>>  >>  >>  try to access these address ranges either, so a disk image taken from a live
>>  >>  >>  system works.
>>  >>  >>
>>  >>  >>  Actually access to the non-connected/aliased addresses may also be a
>>  >>  >>  consequence of phys_page_find bug I mentioned before. When I run
>>  >>  >>  install with -m 64 and -m 256 it tries to access different
>>  >>  >>  non-connected addresses. May also be a SunOS bug of course. 256m used
>>  >>  >>  to be a lot back then.
>>  >>  >
>>  >>  > Perhaps with 256MB, memory probing advances blindly from memory to
>>  >>  > IOMMU registers. Proll (used before OpenBIOS) did that once, with bad
>>  >>  > results :-). If this is true, 64M, 128M and 192M should show identical
>>  >>  > results and only with close or equal to 256M the accesses happen.
>>  >>
>>  >>
>>  >> 32m: 0x12fff294
>>  >>  64m: 0x14fff294
>>  >>  192m:0x1cfff294
>>  >>  256m:0x20fff294
>>  >>
>>  >>  Memory probing? It would be strange that OS would do it itself. The OS
>>  >>  could just
>>  >>  ask OBP how much does it have. Here is the listing where it happens:
>>  >>
>>  >>  _swift_vac_rgnflush:            rd      %psr, %g2
>>  >>  _swift_vac_rgnflush+4:          andn    %g2, 0x20, %g5
>>  >>  _swift_vac_rgnflush+8:          mov     %g5, %psr
>>  >>  _swift_vac_rgnflush+0xc:        nop
>>  >>  _swift_vac_rgnflush+0x10:       nop
>>  >>  _swift_vac_rgnflush+0x14:       mov     0x100, %g5
>>  >>  _swift_vac_rgnflush+0x18:       lda     [%g5] 0x4, %g5
>>  >>  _swift_vac_rgnflush+0x1c:       sll     %o2, 0x2, %g1
>>  >>  _swift_vac_rgnflush+0x20:       sll     %g5, 0x4, %g5
>>  >>  _swift_vac_rgnflush+0x24:       add     %g5, %g1, %g5
>>  >>  _swift_vac_rgnflush+0x28:       lda     [%g5] 0x20, %g5
>>  >>
>>  >>  _swift_vac_rgnflush+0x28: is the fatal one.
>>  >>
>>  >>  kadb> $c
>>  >>  _swift_vac_rgnflush(?)
>>  >>  _vac_rgnflush() + 4
>>  >>  _hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70
>>  >>  _startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + 1414
>>  >>  _main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) + 14
>>  >>
>>  >>  Unfortunately (but not surprisingly) kadb doesn't allow debugging
>>  >>  cache-flush code, so I can't check what is in
>>  >>  [%g5] (aka sfar) on the real machine when this happens.
>>  >
>>  > Linux code for Swift/TurboSPARC VAC flush should be similar.

Do you have an idea why would anyone try reading a value referenced in sfar?
Especially during flushing? I can't imagine a case where it wouldn't
produce a fault.

>>  >>  But the bug in phys_page_find would explain this accesses: sfar gets
>>  >>  the wrong address, and then the secondary access happens on this wrong
>>  >>  address instead of the original one.
>>  >
>>  > I doubt phys_page_find can be buggy, it is so vital for all architecture.
>>
>>
>> But you've seen the example of buggy behaviour I posted last Friday, right?
>>  If it's not phys_page_find, it's either cpu_physical_memory_rw (which
>>  is also pretty generic), or
>>  the way SS-20 registers devices. Can it be that all the pages must be
>>  registered in the proper order?
>
> How about unassigned access handler, could it be suspected?

Doesn't look like it: it gets a physical address as a parameter. How
would it know the address is wrong?

>>  I think it's a pretty rare use case where you have a memory fault (not
>>  a translation fault) on an unknown address. You may have such fault
>>  during device probing, but in such case you know what address you are
>>  probing, so you don't care about the sync fault address register.
>>
>>  Besides, do all architectures have sync fault address register?
>
> No, I think system level checks like that and IOMMU-like controls on
> most architectures are very poor compared to Sparc32. Server and
> mainframe systems may be a bit better.

And do we have any mainframe emulated good enough to have a user base
and hence bug reports?

>>  >>  fwiw the routine is called only once on the real hardware. It sort of
>>  >>  speaks for your hypothesis about the memory probing. Although it may
>>  >>  not necessarily probe for memory...
>>  >>
>>  >>
Blue Swirl May 21, 2010, 9:12 p.m. UTC | #9
On Fri, May 21, 2010 at 5:23 PM, Artyom Tarasenko
<atar4qemu@googlemail.com> wrote:
> 2010/5/10 Blue Swirl <blauwirbel@gmail.com>:
>> On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>> 2010/5/10 Blue Swirl <blauwirbel@gmail.com>:
>>>
>>> > On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>  >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>>  >>  > On 5/9/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>  >>  >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>>  >>  >>
>>>  >>  >> > On 5/8/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>  >>  >>  >> On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
>>>  >>  >>  >>  Software shouldn't use aliased addresses, neither should it crash
>>>  >>  >>  >>  when it uses (on the real hardware it wouldn't). Using empty_slot
>>>  >>  >>  >>  instead of aliasing can help with debugging such accesses.
>>>  >>  >>  >
>>>  >>  >>  > TurboSPARC Microprocessor User's Manual shows that there are
>>>  >>  >>  > additional pages after the main IOMMU for AFX registers. So this is
>>>  >>  >>  > not board specific, but depends on CPU/IOMMU versions.
>>>  >>  >>
>>>  >>  >>
>>>  >>  >> I checked it on the real hw: on LX and SS-5 these are aliased MMU addresses.
>>>  >>  >>  SS-20 doesn't have any aliasing.
>>>  >>  >
>>>  >>  > But are your machines equipped with TurboSPARC or some other CPU?
>>>  >>
>>>  >>
>>>  >> Good point, I must confess, I missed the word "Turbo" in your first
>>>  >>  answer. LX and SS-20 don't.
>>>  >>  But SS-5 must have a TurboSPARC CPU:
>>>  >>
>>>  >>  ok cd /FMI,MB86904
>>>  >>  ok .attributes
>>>  >>  context-table            00 00 00 00 03 ff f0 00 00 00 10 00
>>>  >>  psr-implementation       00000000
>>>  >>  psr-version              00000004
>>>  >>  implementation           00000000
>>>  >>  version                  00000004
>>>  >>  cache-line-size          00000020
>>>  >>  cache-nlines             00000200
>>>  >>  page-size                00001000
>>>  >>  dcache-line-size         00000010
>>>  >>  dcache-nlines            00000200
>>>  >>  dcache-associativity     00000001
>>>  >>  icache-line-size         00000020
>>>  >>  icache-nlines            00000200
>>>  >>  icache-associativity     00000001
>>>  >>  ncaches                  00000002
>>>  >>  mmu-nctx                 00000100
>>>  >>  sparc-version            00000008
>>>  >>  mask_rev                 00000026
>>>  >>  device_type              cpu
>>>  >>  name                     FMI,MB86904
>>>  >>
>>>  >>  and still it behaves the same as TI,TMS390S10 from the LX. This is done on SS-5:
>>>  >>
>>>  >>  ok 10000000 20 spacel@ .
>>>  >>  4000009
>>>  >>  ok 14000000 20 spacel@ .
>>>  >>  4000009
>>>  >>  ok 14000004 20 spacel@ .
>>>  >>  23000
>>>  >>  ok 1f000004 20 spacel@ .
>>>  >>  23000
>>>  >>  ok 10000008 20 spacel@ .
>>>  >>  4000009
>>>  >>  ok 14000028 20 spacel@ .
>>>  >>  4000009
>>>  >>  ok 1000000c 20 spacel@ .
>>>  >>  23000
>>>  >>  ok 10000010 20 spacel@ .
>>>  >>  4000009
>>>  >>
>>>  >>
>>>  >>  LX is the same except for the IOMMU-version:
>>>  >>
>>>  >>  ok 10000000 20 spacel@ .
>>>  >>  4000005
>>>  >>  ok 14000000 20 spacel@ .
>>>  >>  4000005
>>>  >>  ok 18000000 20 spacel@ .
>>>  >>  4000005
>>>  >>  ok 1f000000 20 spacel@ .
>>>  >>  4000005
>>>  >>  ok 1ff00000 20 spacel@ .
>>>  >>  4000005
>>>  >>  ok 1fff0004 20 spacel@ .
>>>  >>  1fe000
>>>  >>  ok 10000004 20 spacel@ .
>>>  >>  1fe000
>>>  >>  ok 10000108 20 spacel@ .
>>>  >>  41000005
>>>  >>  ok 10000040 20 spacel@ .
>>>  >>  41000005
>>>  >>  ok 1fff0040 20 spacel@ .
>>>  >>  41000005
>>>  >>  ok 1fff0044 20 spacel@ .
>>>  >>  1fe000
>>>  >>  ok 1fff0024 20 spacel@ .
>>>  >>  1fe000
>>>  >>
>>>  >>
>>>  >>  >>  At what address the additional AFX registers are located?
>>>  >>  >
>>>  >>  > Here's complete TurboSPARC IOMMU address map:
>>>  >>  >  PA[30:0]          Register          Access
>>>  >>  > 1000_0000       IOMMU Control         R/W
>>>  >>  > 1000_0004    IOMMU Base Address       R/W
>>>  >>  > 1000_0014   Flush All IOTLB Entries    W
>>>  >>  > 1000_0018        Address Flush         W
>>>  >>  > 1000_1000  Asynchronous Fault Status  R/W
>>>  >>  > 1000_1004 Asynchronous Fault Address  R/W
>>>  >>  > 1000_1010  SBus Slot Configuration 0   R/W
>>>  >>  > 1000_1014  SBus Slot Configuration 1   R/W
>>>  >>  > 1000_1018  SBus Slot Configuration 2   R/W
>>>  >>  > 1000_101C  SBus Slot Configuration 3   R/W
>>>  >>  > 1000_1020  SBus Slot Configuration 4   R/W
>>>  >>  > 1000_1050     Memory Fault Status     R/W
>>>  >>  > 1000_1054    Memory Fault Address     R/W
>>>  >>  > 1000_2000     Module Identification    R/W
>>>  >>  > 1000_3018      Mask Identification      R
>>>  >>  > 1000_4000      AFX Queue Level         W
>>>  >>  > 1000_6000      AFX Queue Level         R
>>>  >>  > 1000_7000      AFX Queue Status        R
>>>  >>
>>>  >>
>>>  >>
>>>  >> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m 32) is
>>>  >>  well above this limit.
>>>  >
>>>  > Oh, so I also misread something. You are not talking about the
>>>  > adjacent pages, but 16MB increments.
>>>  >
>>>  > Earlier I sent a patch for a generic address alias device, would it be
>>>  > useful for this?
>>>
>>>
>>> Should do as well. But I thought empty_slot is less overhead and
>>>  easier to debug.
>>>
>
> Also the aliasing patch would require one more parameter: the size of
> area which has to be aliased. Except we implement stubs for all
> missing devices and and do aliasing of the connected port ranges. And
> then again, SS-20 doesn't have aliasing in this area at all.
>
> What do you think about this (empty_slot) solution (except that I
> missed the SoB line)? Meanwhile it's tested with SunOS 4.1.3U1 too.

I'm slightly against it, of course it would help for this but I think
we may be missing a bigger problem.

>>>> Maybe we have a general design problem, perhaps unassigned access
>>>> faults should only be triggered inside SBus slots and ignored
>>>> elsewhere. If this is true, generic Sparc32 unassigned access handler
>>>> should just ignore the access and special fault generating slots
>>>> should be installed for empty SBus address ranges.
>
> Agreed that they should be special for SBus, because SS-20 OBP is
> not happy with the fault we are currently generating. But otherwise I think qemu
> does it correct. On SS-5:
>
> ok f7ff0000 2f spacel@ .
> Data Access Error
> ok sfar@ .
> f7ff0000
> ok 20000000 2f spacel@ .
> Data Access Error
> ok sfar@ .
> 20000000
> ok 40000000 20 spacel@ .
> Data Access Error
> ok sfar@ .
> 40000000
>
> Neither ff7ff0000 nor f20000000, nor 40000000 are in SBus range,  right?

40000000 is on SS-5. So is the SBus Control Space in 0x10000000 to
0x1fffffff the only area besides DRAM where the accesses won't trap?

>>> My impression was that SS-5 and SS-20 do unassigned accesses a bit differently.
>>>  The current IOMMU implementation fits SS-20, which has no aliasing.
>>
>> It's probably rather the board design than just IOMMU.
>
> Agreed. That's why I bound the patch to machine hwdef  and not to iommu.
>
>>>  >>  >>  > One approach would be that IOMMU_NREGS would be increased to cover
>>>  >>  >>  > these registers (with the bump in savevm version field) and
>>>  >>  >>  > iommu_init1() should check the version field to see how much MMIO to
>>>  >>  >>  > provide.
>>>  >>  >>
>>>  >>  >>
>>>  >>  >> The problem I see here is that we already have too much registers: we
>>>  >>  >>  emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have only
>>>  >>  >>  0x20 registers which are aliased all the way.
>>>  >>  >>
>>>  >>  >>
>>>  >>  >>  > But in order to avoid the savevm version change, iommu_init1() could
>>>  >>  >>  > just install dummy MMIO (in the TurboSPARC case), if OBP does not care
>>>  >>  >>  > if the read back data matches what has been written earlier. Because
>>>  >>  >>  > from OBP point of view this is identical to what your patch results
>>>  >>  >>  > in, I'd suppose this approach would also work.
>>>  >>  >>
>>>  >>  >>
>>>  >>  >> OBP doesn't seem to care about these addresses at all. It's only the "MUNIX"
>>>  >>  >>  SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only kernel available
>>>  >>  >>  during the installation, so it is currently not possible to install 4.1.4.
>>>  >>  >>  Surprisingly "GENERIC" kernel which is on the disk after the
>>>  >>  >>  installation doesn't
>>>  >>  >>  try to access these address ranges either, so a disk image taken from a live
>>>  >>  >>  system works.
>>>  >>  >>
>>>  >>  >>  Actually access to the non-connected/aliased addresses may also be a
>>>  >>  >>  consequence of phys_page_find bug I mentioned before. When I run
>>>  >>  >>  install with -m 64 and -m 256 it tries to access different
>>>  >>  >>  non-connected addresses. May also be a SunOS bug of course. 256m used
>>>  >>  >>  to be a lot back then.
>>>  >>  >
>>>  >>  > Perhaps with 256MB, memory probing advances blindly from memory to
>>>  >>  > IOMMU registers. Proll (used before OpenBIOS) did that once, with bad
>>>  >>  > results :-). If this is true, 64M, 128M and 192M should show identical
>>>  >>  > results and only with close or equal to 256M the accesses happen.
>>>  >>
>>>  >>
>>>  >> 32m: 0x12fff294
>>>  >>  64m: 0x14fff294
>>>  >>  192m:0x1cfff294
>>>  >>  256m:0x20fff294
>>>  >>
>>>  >>  Memory probing? It would be strange that OS would do it itself. The OS
>>>  >>  could just
>>>  >>  ask OBP how much does it have. Here is the listing where it happens:
>>>  >>
>>>  >>  _swift_vac_rgnflush:            rd      %psr, %g2
>>>  >>  _swift_vac_rgnflush+4:          andn    %g2, 0x20, %g5
>>>  >>  _swift_vac_rgnflush+8:          mov     %g5, %psr
>>>  >>  _swift_vac_rgnflush+0xc:        nop
>>>  >>  _swift_vac_rgnflush+0x10:       nop
>>>  >>  _swift_vac_rgnflush+0x14:       mov     0x100, %g5
>>>  >>  _swift_vac_rgnflush+0x18:       lda     [%g5] 0x4, %g5
>>>  >>  _swift_vac_rgnflush+0x1c:       sll     %o2, 0x2, %g1
>>>  >>  _swift_vac_rgnflush+0x20:       sll     %g5, 0x4, %g5
>>>  >>  _swift_vac_rgnflush+0x24:       add     %g5, %g1, %g5
>>>  >>  _swift_vac_rgnflush+0x28:       lda     [%g5] 0x20, %g5
>>>  >>
>>>  >>  _swift_vac_rgnflush+0x28: is the fatal one.
>>>  >>
>>>  >>  kadb> $c
>>>  >>  _swift_vac_rgnflush(?)
>>>  >>  _vac_rgnflush() + 4
>>>  >>  _hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70
>>>  >>  _startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + 1414
>>>  >>  _main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) + 14
>>>  >>
>>>  >>  Unfortunately (but not surprisingly) kadb doesn't allow debugging
>>>  >>  cache-flush code, so I can't check what is in
>>>  >>  [%g5] (aka sfar) on the real machine when this happens.
>>>  >
>>>  > Linux code for Swift/TurboSPARC VAC flush should be similar.
>
> Do you have an idea why would anyone try reading a value referenced in sfar?
> Especially during flushing? I can't imagine a case where it wouldn't
> produce a fault.

No idea, the fault should be inevitable. An explanation how VAC
(Virtually Addressed Cache?) works could help.

>>>  >>  But the bug in phys_page_find would explain this accesses: sfar gets
>>>  >>  the wrong address, and then the secondary access happens on this wrong
>>>  >>  address instead of the original one.
>>>  >
>>>  > I doubt phys_page_find can be buggy, it is so vital for all architecture.
>>>
>>>
>>> But you've seen the example of buggy behaviour I posted last Friday, right?
>>>  If it's not phys_page_find, it's either cpu_physical_memory_rw (which
>>>  is also pretty generic), or
>>>  the way SS-20 registers devices. Can it be that all the pages must be
>>>  registered in the proper order?
>>
>> How about unassigned access handler, could it be suspected?
>
> Doesn't look like it: it gets a physical address as a parameter. How
> would it know the address is wrong?

It wouldn't, but IIRC Paul claimed earlier that the unassigned memory
handling in QEMU could have problems.

>>>  I think it's a pretty rare use case where you have a memory fault (not
>>>  a translation fault) on an unknown address. You may have such fault
>>>  during device probing, but in such case you know what address you are
>>>  probing, so you don't care about the sync fault address register.
>>>
>>>  Besides, do all architectures have sync fault address register?
>>
>> No, I think system level checks like that and IOMMU-like controls on
>> most architectures are very poor compared to Sparc32. Server and
>> mainframe systems may be a bit better.
>
> And do we have any mainframe emulated good enough to have a user base
> and hence bug reports?

The only IOMMU implemented is Sparc32 one so far. I don't know about
S390x architecture, that should definitely be mainframe class. AMD
IOMMU may be in QEMU one day.

About bugs, IIRC NetBSD 3.x crash could be related to IOMMU.

>>>  >>  fwiw the routine is called only once on the real hardware. It sort of
>>>  >>  speaks for your hypothesis about the memory probing. Although it may
>>>  >>  not necessarily probe for memory...
>>>  >>
>>>  >>
>
>
> --
> Regards,
> Artyom Tarasenko
>
> solaris/sparc under qemu blog: http://tyom.blogspot.com/
>
Artyom Tarasenko May 25, 2010, 5 p.m. UTC | #10
2010/5/21 Blue Swirl <blauwirbel@gmail.com>:
> On Fri, May 21, 2010 at 5:23 PM, Artyom Tarasenko
> <atar4qemu@googlemail.com> wrote:
>> 2010/5/10 Blue Swirl <blauwirbel@gmail.com>:
>>> On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>> 2010/5/10 Blue Swirl <blauwirbel@gmail.com>:
>>>>
>>>> > On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>  >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>>>  >>  > On 5/9/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>  >>  >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>>>  >>  >>
>>>>  >>  >> > On 5/8/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>  >>  >>  >> On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
>>>>  >>  >>  >>  Software shouldn't use aliased addresses, neither should it crash
>>>>  >>  >>  >>  when it uses (on the real hardware it wouldn't). Using empty_slot
>>>>  >>  >>  >>  instead of aliasing can help with debugging such accesses.
>>>>  >>  >>  >
>>>>  >>  >>  > TurboSPARC Microprocessor User's Manual shows that there are
>>>>  >>  >>  > additional pages after the main IOMMU for AFX registers. So this is
>>>>  >>  >>  > not board specific, but depends on CPU/IOMMU versions.
>>>>  >>  >>
>>>>  >>  >>
>>>>  >>  >> I checked it on the real hw: on LX and SS-5 these are aliased MMU addresses.
>>>>  >>  >>  SS-20 doesn't have any aliasing.
>>>>  >>  >
>>>>  >>  > But are your machines equipped with TurboSPARC or some other CPU?
>>>>  >>
>>>>  >>
>>>>  >> Good point, I must confess, I missed the word "Turbo" in your first
>>>>  >>  answer. LX and SS-20 don't.
>>>>  >>  But SS-5 must have a TurboSPARC CPU:
>>>>  >>
>>>>  >>  ok cd /FMI,MB86904
>>>>  >>  ok .attributes
>>>>  >>  context-table            00 00 00 00 03 ff f0 00 00 00 10 00
>>>>  >>  psr-implementation       00000000
>>>>  >>  psr-version              00000004
>>>>  >>  implementation           00000000
>>>>  >>  version                  00000004
>>>>  >>  cache-line-size          00000020
>>>>  >>  cache-nlines             00000200
>>>>  >>  page-size                00001000
>>>>  >>  dcache-line-size         00000010
>>>>  >>  dcache-nlines            00000200
>>>>  >>  dcache-associativity     00000001
>>>>  >>  icache-line-size         00000020
>>>>  >>  icache-nlines            00000200
>>>>  >>  icache-associativity     00000001
>>>>  >>  ncaches                  00000002
>>>>  >>  mmu-nctx                 00000100
>>>>  >>  sparc-version            00000008
>>>>  >>  mask_rev                 00000026
>>>>  >>  device_type              cpu
>>>>  >>  name                     FMI,MB86904
>>>>  >>
>>>>  >>  and still it behaves the same as TI,TMS390S10 from the LX. This is done on SS-5:
>>>>  >>
>>>>  >>  ok 10000000 20 spacel@ .
>>>>  >>  4000009
>>>>  >>  ok 14000000 20 spacel@ .
>>>>  >>  4000009
>>>>  >>  ok 14000004 20 spacel@ .
>>>>  >>  23000
>>>>  >>  ok 1f000004 20 spacel@ .
>>>>  >>  23000
>>>>  >>  ok 10000008 20 spacel@ .
>>>>  >>  4000009
>>>>  >>  ok 14000028 20 spacel@ .
>>>>  >>  4000009
>>>>  >>  ok 1000000c 20 spacel@ .
>>>>  >>  23000
>>>>  >>  ok 10000010 20 spacel@ .
>>>>  >>  4000009
>>>>  >>
>>>>  >>
>>>>  >>  LX is the same except for the IOMMU-version:
>>>>  >>
>>>>  >>  ok 10000000 20 spacel@ .
>>>>  >>  4000005
>>>>  >>  ok 14000000 20 spacel@ .
>>>>  >>  4000005
>>>>  >>  ok 18000000 20 spacel@ .
>>>>  >>  4000005
>>>>  >>  ok 1f000000 20 spacel@ .
>>>>  >>  4000005
>>>>  >>  ok 1ff00000 20 spacel@ .
>>>>  >>  4000005
>>>>  >>  ok 1fff0004 20 spacel@ .
>>>>  >>  1fe000
>>>>  >>  ok 10000004 20 spacel@ .
>>>>  >>  1fe000
>>>>  >>  ok 10000108 20 spacel@ .
>>>>  >>  41000005
>>>>  >>  ok 10000040 20 spacel@ .
>>>>  >>  41000005
>>>>  >>  ok 1fff0040 20 spacel@ .
>>>>  >>  41000005
>>>>  >>  ok 1fff0044 20 spacel@ .
>>>>  >>  1fe000
>>>>  >>  ok 1fff0024 20 spacel@ .
>>>>  >>  1fe000
>>>>  >>
>>>>  >>
>>>>  >>  >>  At what address the additional AFX registers are located?
>>>>  >>  >
>>>>  >>  > Here's complete TurboSPARC IOMMU address map:
>>>>  >>  >  PA[30:0]          Register          Access
>>>>  >>  > 1000_0000       IOMMU Control         R/W
>>>>  >>  > 1000_0004    IOMMU Base Address       R/W
>>>>  >>  > 1000_0014   Flush All IOTLB Entries    W
>>>>  >>  > 1000_0018        Address Flush         W
>>>>  >>  > 1000_1000  Asynchronous Fault Status  R/W
>>>>  >>  > 1000_1004 Asynchronous Fault Address  R/W
>>>>  >>  > 1000_1010  SBus Slot Configuration 0   R/W
>>>>  >>  > 1000_1014  SBus Slot Configuration 1   R/W
>>>>  >>  > 1000_1018  SBus Slot Configuration 2   R/W
>>>>  >>  > 1000_101C  SBus Slot Configuration 3   R/W
>>>>  >>  > 1000_1020  SBus Slot Configuration 4   R/W
>>>>  >>  > 1000_1050     Memory Fault Status     R/W
>>>>  >>  > 1000_1054    Memory Fault Address     R/W
>>>>  >>  > 1000_2000     Module Identification    R/W
>>>>  >>  > 1000_3018      Mask Identification      R
>>>>  >>  > 1000_4000      AFX Queue Level         W
>>>>  >>  > 1000_6000      AFX Queue Level         R
>>>>  >>  > 1000_7000      AFX Queue Status        R
>>>>  >>
>>>>  >>
>>>>  >>
>>>>  >> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m 32) is
>>>>  >>  well above this limit.
>>>>  >
>>>>  > Oh, so I also misread something. You are not talking about the
>>>>  > adjacent pages, but 16MB increments.
>>>>  >
>>>>  > Earlier I sent a patch for a generic address alias device, would it be
>>>>  > useful for this?
>>>>
>>>>
>>>> Should do as well. But I thought empty_slot is less overhead and
>>>>  easier to debug.
>>>>
>>
>> Also the aliasing patch would require one more parameter: the size of
>> area which has to be aliased. Except we implement stubs for all
>> missing devices and and do aliasing of the connected port ranges. And
>> then again, SS-20 doesn't have aliasing in this area at all.
>>
>> What do you think about this (empty_slot) solution (except that I
>> missed the SoB line)? Meanwhile it's tested with SunOS 4.1.3U1 too.
>
> I'm slightly against it, of course it would help for this but I think
> we may be missing a bigger problem.
>
>>>>> Maybe we have a general design problem, perhaps unassigned access
>>>>> faults should only be triggered inside SBus slots and ignored
>>>>> elsewhere. If this is true, generic Sparc32 unassigned access handler
>>>>> should just ignore the access and special fault generating slots
>>>>> should be installed for empty SBus address ranges.
>>
>> Agreed that they should be special for SBus, because SS-20 OBP is
>> not happy with the fault we are currently generating. But otherwise I think qemu
>> does it correct. On SS-5:
>>
>> ok f7ff0000 2f spacel@ .
>> Data Access Error
>> ok sfar@ .
>> f7ff0000
>> ok 20000000 2f spacel@ .
>> Data Access Error
>> ok sfar@ .
>> 20000000
>> ok 40000000 20 spacel@ .
>> Data Access Error
>> ok sfar@ .
>> 40000000
>>
>> Neither ff7ff0000 nor f20000000, nor 40000000 are in SBus range,  right?
>
> 40000000 is on SS-5.

Ah. I was only aware of the control space. What ranges does SBus take?

> So is the SBus Control Space in 0x10000000 to
> 0x1fffffff the only area besides DRAM where the accesses won't trap?

At least some area after ROM is aliased too. Also on SS-10 with a
non-active frame buffer
writing to SX registers makes no visible effect and reading from them
produces no fault but a NMI.

>>>> My impression was that SS-5 and SS-20 do unassigned accesses a bit differently.
>>>>  The current IOMMU implementation fits SS-20, which has no aliasing.
>>>
>>> It's probably rather the board design than just IOMMU.
>>
>> Agreed. That's why I bound the patch to machine hwdef  and not to iommu.
>>
>>>>  >>  >>  > One approach would be that IOMMU_NREGS would be increased to cover
>>>>  >>  >>  > these registers (with the bump in savevm version field) and
>>>>  >>  >>  > iommu_init1() should check the version field to see how much MMIO to
>>>>  >>  >>  > provide.
>>>>  >>  >>
>>>>  >>  >>
>>>>  >>  >> The problem I see here is that we already have too much registers: we
>>>>  >>  >>  emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have only
>>>>  >>  >>  0x20 registers which are aliased all the way.
>>>>  >>  >>
>>>>  >>  >>
>>>>  >>  >>  > But in order to avoid the savevm version change, iommu_init1() could
>>>>  >>  >>  > just install dummy MMIO (in the TurboSPARC case), if OBP does not care
>>>>  >>  >>  > if the read back data matches what has been written earlier. Because
>>>>  >>  >>  > from OBP point of view this is identical to what your patch results
>>>>  >>  >>  > in, I'd suppose this approach would also work.
>>>>  >>  >>
>>>>  >>  >>
>>>>  >>  >> OBP doesn't seem to care about these addresses at all. It's only the "MUNIX"
>>>>  >>  >>  SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only kernel available
>>>>  >>  >>  during the installation, so it is currently not possible to install 4.1.4.
>>>>  >>  >>  Surprisingly "GENERIC" kernel which is on the disk after the
>>>>  >>  >>  installation doesn't
>>>>  >>  >>  try to access these address ranges either, so a disk image taken from a live
>>>>  >>  >>  system works.
>>>>  >>  >>
>>>>  >>  >>  Actually access to the non-connected/aliased addresses may also be a
>>>>  >>  >>  consequence of phys_page_find bug I mentioned before. When I run
>>>>  >>  >>  install with -m 64 and -m 256 it tries to access different
>>>>  >>  >>  non-connected addresses. May also be a SunOS bug of course. 256m used
>>>>  >>  >>  to be a lot back then.
>>>>  >>  >
>>>>  >>  > Perhaps with 256MB, memory probing advances blindly from memory to
>>>>  >>  > IOMMU registers. Proll (used before OpenBIOS) did that once, with bad
>>>>  >>  > results :-). If this is true, 64M, 128M and 192M should show identical
>>>>  >>  > results and only with close or equal to 256M the accesses happen.
>>>>  >>
>>>>  >>
>>>>  >> 32m: 0x12fff294
>>>>  >>  64m: 0x14fff294
>>>>  >>  192m:0x1cfff294
>>>>  >>  256m:0x20fff294
>>>>  >>
>>>>  >>  Memory probing? It would be strange that OS would do it itself. The OS
>>>>  >>  could just
>>>>  >>  ask OBP how much does it have. Here is the listing where it happens:
>>>>  >>
>>>>  >>  _swift_vac_rgnflush:            rd      %psr, %g2
>>>>  >>  _swift_vac_rgnflush+4:          andn    %g2, 0x20, %g5
>>>>  >>  _swift_vac_rgnflush+8:          mov     %g5, %psr
>>>>  >>  _swift_vac_rgnflush+0xc:        nop
>>>>  >>  _swift_vac_rgnflush+0x10:       nop
>>>>  >>  _swift_vac_rgnflush+0x14:       mov     0x100, %g5
>>>>  >>  _swift_vac_rgnflush+0x18:       lda     [%g5] 0x4, %g5
>>>>  >>  _swift_vac_rgnflush+0x1c:       sll     %o2, 0x2, %g1
>>>>  >>  _swift_vac_rgnflush+0x20:       sll     %g5, 0x4, %g5
>>>>  >>  _swift_vac_rgnflush+0x24:       add     %g5, %g1, %g5
>>>>  >>  _swift_vac_rgnflush+0x28:       lda     [%g5] 0x20, %g5
>>>>  >>
>>>>  >>  _swift_vac_rgnflush+0x28: is the fatal one.
>>>>  >>
>>>>  >>  kadb> $c
>>>>  >>  _swift_vac_rgnflush(?)
>>>>  >>  _vac_rgnflush() + 4
>>>>  >>  _hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70
>>>>  >>  _startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + 1414
>>>>  >>  _main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) + 14
>>>>  >>
>>>>  >>  Unfortunately (but not surprisingly) kadb doesn't allow debugging
>>>>  >>  cache-flush code, so I can't check what is in
>>>>  >>  [%g5] (aka sfar) on the real machine when this happens.
>>>>  >
>>>>  > Linux code for Swift/TurboSPARC VAC flush should be similar.
>>
>> Do you have an idea why would anyone try reading a value referenced in sfar?
>> Especially during flushing? I can't imagine a case where it wouldn't
>> produce a fault.
>
> No idea, the fault should be inevitable. An explanation how VAC
> (Virtually Addressed Cache?) works could help.

Is it available somewhere? An explanation how PAC works is interesting
too, cause when emulating SS-20, Solaris boots hangs where it normally
says that PAC is initialized.

>>>>  >>  But the bug in phys_page_find would explain this accesses: sfar gets
>>>>  >>  the wrong address, and then the secondary access happens on this wrong
>>>>  >>  address instead of the original one.
>>>>  >
>>>>  > I doubt phys_page_find can be buggy, it is so vital for all architecture.
>>>>
>>>>
>>>> But you've seen the example of buggy behaviour I posted last Friday, right?
>>>>  If it's not phys_page_find, it's either cpu_physical_memory_rw (which
>>>>  is also pretty generic), or
>>>>  the way SS-20 registers devices. Can it be that all the pages must be
>>>>  registered in the proper order?
>>>
>>> How about unassigned access handler, could it be suspected?
>>
>> Doesn't look like it: it gets a physical address as a parameter. How
>> would it know the address is wrong?
>
> It wouldn't, but IIRC Paul claimed earlier that the unassigned memory
> handling in QEMU could have problems.

But I thought Paul also fixed the problems? There was a patch from him.

>>>>  I think it's a pretty rare use case where you have a memory fault (not
>>>>  a translation fault) on an unknown address. You may have such fault
>>>>  during device probing, but in such case you know what address you are
>>>>  probing, so you don't care about the sync fault address register.
>>>>
>>>>  Besides, do all architectures have sync fault address register?
>>>
>>> No, I think system level checks like that and IOMMU-like controls on
>>> most architectures are very poor compared to Sparc32. Server and
>>> mainframe systems may be a bit better.
>>
>> And do we have any mainframe emulated good enough to have a user base
>> and hence bug reports?
>
> The only IOMMU implemented is Sparc32 one so far. I don't know about
> S390x architecture, that should definitely be mainframe class. AMD
> IOMMU may be in QEMU one day.
>
> About bugs, IIRC NetBSD 3.x crash could be related to IOMMU.

What does indicate it? It happens where the disk sizes are normally
reported, so it could be a scsi/dma/irq/fpu issue as well.

>>>>  >>  fwiw the routine is called only once on the real hardware. It sort of
>>>>  >>  speaks for your hypothesis about the memory probing. Although it may
>>>>  >>  not necessarily probe for memory...
>>>>  >>
Blue Swirl May 25, 2010, 7:56 p.m. UTC | #11
On Tue, May 25, 2010 at 5:00 PM, Artyom Tarasenko
<atar4qemu@googlemail.com> wrote:
> 2010/5/21 Blue Swirl <blauwirbel@gmail.com>:
>> On Fri, May 21, 2010 at 5:23 PM, Artyom Tarasenko
>> <atar4qemu@googlemail.com> wrote:
>>> 2010/5/10 Blue Swirl <blauwirbel@gmail.com>:
>>>> On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>> 2010/5/10 Blue Swirl <blauwirbel@gmail.com>:
>>>>>
>>>>> > On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>>  >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>>>>  >>  > On 5/9/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>>  >>  >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>>>>  >>  >>
>>>>>  >>  >> > On 5/8/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>>  >>  >>  >> On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
>>>>>  >>  >>  >>  Software shouldn't use aliased addresses, neither should it crash
>>>>>  >>  >>  >>  when it uses (on the real hardware it wouldn't). Using empty_slot
>>>>>  >>  >>  >>  instead of aliasing can help with debugging such accesses.
>>>>>  >>  >>  >
>>>>>  >>  >>  > TurboSPARC Microprocessor User's Manual shows that there are
>>>>>  >>  >>  > additional pages after the main IOMMU for AFX registers. So this is
>>>>>  >>  >>  > not board specific, but depends on CPU/IOMMU versions.
>>>>>  >>  >>
>>>>>  >>  >>
>>>>>  >>  >> I checked it on the real hw: on LX and SS-5 these are aliased MMU addresses.
>>>>>  >>  >>  SS-20 doesn't have any aliasing.
>>>>>  >>  >
>>>>>  >>  > But are your machines equipped with TurboSPARC or some other CPU?
>>>>>  >>
>>>>>  >>
>>>>>  >> Good point, I must confess, I missed the word "Turbo" in your first
>>>>>  >>  answer. LX and SS-20 don't.
>>>>>  >>  But SS-5 must have a TurboSPARC CPU:
>>>>>  >>
>>>>>  >>  ok cd /FMI,MB86904
>>>>>  >>  ok .attributes
>>>>>  >>  context-table            00 00 00 00 03 ff f0 00 00 00 10 00
>>>>>  >>  psr-implementation       00000000
>>>>>  >>  psr-version              00000004
>>>>>  >>  implementation           00000000
>>>>>  >>  version                  00000004
>>>>>  >>  cache-line-size          00000020
>>>>>  >>  cache-nlines             00000200
>>>>>  >>  page-size                00001000
>>>>>  >>  dcache-line-size         00000010
>>>>>  >>  dcache-nlines            00000200
>>>>>  >>  dcache-associativity     00000001
>>>>>  >>  icache-line-size         00000020
>>>>>  >>  icache-nlines            00000200
>>>>>  >>  icache-associativity     00000001
>>>>>  >>  ncaches                  00000002
>>>>>  >>  mmu-nctx                 00000100
>>>>>  >>  sparc-version            00000008
>>>>>  >>  mask_rev                 00000026
>>>>>  >>  device_type              cpu
>>>>>  >>  name                     FMI,MB86904
>>>>>  >>
>>>>>  >>  and still it behaves the same as TI,TMS390S10 from the LX. This is done on SS-5:
>>>>>  >>
>>>>>  >>  ok 10000000 20 spacel@ .
>>>>>  >>  4000009
>>>>>  >>  ok 14000000 20 spacel@ .
>>>>>  >>  4000009
>>>>>  >>  ok 14000004 20 spacel@ .
>>>>>  >>  23000
>>>>>  >>  ok 1f000004 20 spacel@ .
>>>>>  >>  23000
>>>>>  >>  ok 10000008 20 spacel@ .
>>>>>  >>  4000009
>>>>>  >>  ok 14000028 20 spacel@ .
>>>>>  >>  4000009
>>>>>  >>  ok 1000000c 20 spacel@ .
>>>>>  >>  23000
>>>>>  >>  ok 10000010 20 spacel@ .
>>>>>  >>  4000009
>>>>>  >>
>>>>>  >>
>>>>>  >>  LX is the same except for the IOMMU-version:
>>>>>  >>
>>>>>  >>  ok 10000000 20 spacel@ .
>>>>>  >>  4000005
>>>>>  >>  ok 14000000 20 spacel@ .
>>>>>  >>  4000005
>>>>>  >>  ok 18000000 20 spacel@ .
>>>>>  >>  4000005
>>>>>  >>  ok 1f000000 20 spacel@ .
>>>>>  >>  4000005
>>>>>  >>  ok 1ff00000 20 spacel@ .
>>>>>  >>  4000005
>>>>>  >>  ok 1fff0004 20 spacel@ .
>>>>>  >>  1fe000
>>>>>  >>  ok 10000004 20 spacel@ .
>>>>>  >>  1fe000
>>>>>  >>  ok 10000108 20 spacel@ .
>>>>>  >>  41000005
>>>>>  >>  ok 10000040 20 spacel@ .
>>>>>  >>  41000005
>>>>>  >>  ok 1fff0040 20 spacel@ .
>>>>>  >>  41000005
>>>>>  >>  ok 1fff0044 20 spacel@ .
>>>>>  >>  1fe000
>>>>>  >>  ok 1fff0024 20 spacel@ .
>>>>>  >>  1fe000
>>>>>  >>
>>>>>  >>
>>>>>  >>  >>  At what address the additional AFX registers are located?
>>>>>  >>  >
>>>>>  >>  > Here's complete TurboSPARC IOMMU address map:
>>>>>  >>  >  PA[30:0]          Register          Access
>>>>>  >>  > 1000_0000       IOMMU Control         R/W
>>>>>  >>  > 1000_0004    IOMMU Base Address       R/W
>>>>>  >>  > 1000_0014   Flush All IOTLB Entries    W
>>>>>  >>  > 1000_0018        Address Flush         W
>>>>>  >>  > 1000_1000  Asynchronous Fault Status  R/W
>>>>>  >>  > 1000_1004 Asynchronous Fault Address  R/W
>>>>>  >>  > 1000_1010  SBus Slot Configuration 0   R/W
>>>>>  >>  > 1000_1014  SBus Slot Configuration 1   R/W
>>>>>  >>  > 1000_1018  SBus Slot Configuration 2   R/W
>>>>>  >>  > 1000_101C  SBus Slot Configuration 3   R/W
>>>>>  >>  > 1000_1020  SBus Slot Configuration 4   R/W
>>>>>  >>  > 1000_1050     Memory Fault Status     R/W
>>>>>  >>  > 1000_1054    Memory Fault Address     R/W
>>>>>  >>  > 1000_2000     Module Identification    R/W
>>>>>  >>  > 1000_3018      Mask Identification      R
>>>>>  >>  > 1000_4000      AFX Queue Level         W
>>>>>  >>  > 1000_6000      AFX Queue Level         R
>>>>>  >>  > 1000_7000      AFX Queue Status        R
>>>>>  >>
>>>>>  >>
>>>>>  >>
>>>>>  >> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m 32) is
>>>>>  >>  well above this limit.
>>>>>  >
>>>>>  > Oh, so I also misread something. You are not talking about the
>>>>>  > adjacent pages, but 16MB increments.
>>>>>  >
>>>>>  > Earlier I sent a patch for a generic address alias device, would it be
>>>>>  > useful for this?
>>>>>
>>>>>
>>>>> Should do as well. But I thought empty_slot is less overhead and
>>>>>  easier to debug.
>>>>>
>>>
>>> Also the aliasing patch would require one more parameter: the size of
>>> area which has to be aliased. Except we implement stubs for all
>>> missing devices and and do aliasing of the connected port ranges. And
>>> then again, SS-20 doesn't have aliasing in this area at all.
>>>
>>> What do you think about this (empty_slot) solution (except that I
>>> missed the SoB line)? Meanwhile it's tested with SunOS 4.1.3U1 too.
>>
>> I'm slightly against it, of course it would help for this but I think
>> we may be missing a bigger problem.
>>
>>>>>> Maybe we have a general design problem, perhaps unassigned access
>>>>>> faults should only be triggered inside SBus slots and ignored
>>>>>> elsewhere. If this is true, generic Sparc32 unassigned access handler
>>>>>> should just ignore the access and special fault generating slots
>>>>>> should be installed for empty SBus address ranges.
>>>
>>> Agreed that they should be special for SBus, because SS-20 OBP is
>>> not happy with the fault we are currently generating. But otherwise I think qemu
>>> does it correct. On SS-5:
>>>
>>> ok f7ff0000 2f spacel@ .
>>> Data Access Error
>>> ok sfar@ .
>>> f7ff0000
>>> ok 20000000 2f spacel@ .
>>> Data Access Error
>>> ok sfar@ .
>>> 20000000
>>> ok 40000000 20 spacel@ .
>>> Data Access Error
>>> ok sfar@ .
>>> 40000000
>>>
>>> Neither ff7ff0000 nor f20000000, nor 40000000 are in SBus range,  right?
>>
>> 40000000 is on SS-5.
>
> Ah. I was only aware of the control space. What ranges does SBus take?

On SS-5, 30000000 to 7fffffff, each slot taking 10000000. There's AFX
bus on 20000000.

OBP property '/iommu/sbus/ranges' shows these (also other ranges)

>
>> So is the SBus Control Space in 0x10000000 to
>> 0x1fffffff the only area besides DRAM where the accesses won't trap?
>
> At least some area after ROM is aliased too. Also on SS-10 with a
> non-active frame buffer
> writing to SX registers makes no visible effect and reading from them
> produces no fault but a NMI.

Then we should cover the whole area after IOMMU with empty slot
device. ROM probably doesn't matter.

>>>>> My impression was that SS-5 and SS-20 do unassigned accesses a bit differently.
>>>>>  The current IOMMU implementation fits SS-20, which has no aliasing.
>>>>
>>>> It's probably rather the board design than just IOMMU.
>>>
>>> Agreed. That's why I bound the patch to machine hwdef  and not to iommu.
>>>
>>>>>  >>  >>  > One approach would be that IOMMU_NREGS would be increased to cover
>>>>>  >>  >>  > these registers (with the bump in savevm version field) and
>>>>>  >>  >>  > iommu_init1() should check the version field to see how much MMIO to
>>>>>  >>  >>  > provide.
>>>>>  >>  >>
>>>>>  >>  >>
>>>>>  >>  >> The problem I see here is that we already have too much registers: we
>>>>>  >>  >>  emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have only
>>>>>  >>  >>  0x20 registers which are aliased all the way.
>>>>>  >>  >>
>>>>>  >>  >>
>>>>>  >>  >>  > But in order to avoid the savevm version change, iommu_init1() could
>>>>>  >>  >>  > just install dummy MMIO (in the TurboSPARC case), if OBP does not care
>>>>>  >>  >>  > if the read back data matches what has been written earlier. Because
>>>>>  >>  >>  > from OBP point of view this is identical to what your patch results
>>>>>  >>  >>  > in, I'd suppose this approach would also work.
>>>>>  >>  >>
>>>>>  >>  >>
>>>>>  >>  >> OBP doesn't seem to care about these addresses at all. It's only the "MUNIX"
>>>>>  >>  >>  SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only kernel available
>>>>>  >>  >>  during the installation, so it is currently not possible to install 4.1.4.
>>>>>  >>  >>  Surprisingly "GENERIC" kernel which is on the disk after the
>>>>>  >>  >>  installation doesn't
>>>>>  >>  >>  try to access these address ranges either, so a disk image taken from a live
>>>>>  >>  >>  system works.
>>>>>  >>  >>
>>>>>  >>  >>  Actually access to the non-connected/aliased addresses may also be a
>>>>>  >>  >>  consequence of phys_page_find bug I mentioned before. When I run
>>>>>  >>  >>  install with -m 64 and -m 256 it tries to access different
>>>>>  >>  >>  non-connected addresses. May also be a SunOS bug of course. 256m used
>>>>>  >>  >>  to be a lot back then.
>>>>>  >>  >
>>>>>  >>  > Perhaps with 256MB, memory probing advances blindly from memory to
>>>>>  >>  > IOMMU registers. Proll (used before OpenBIOS) did that once, with bad
>>>>>  >>  > results :-). If this is true, 64M, 128M and 192M should show identical
>>>>>  >>  > results and only with close or equal to 256M the accesses happen.
>>>>>  >>
>>>>>  >>
>>>>>  >> 32m: 0x12fff294
>>>>>  >>  64m: 0x14fff294
>>>>>  >>  192m:0x1cfff294
>>>>>  >>  256m:0x20fff294
>>>>>  >>
>>>>>  >>  Memory probing? It would be strange that OS would do it itself. The OS
>>>>>  >>  could just
>>>>>  >>  ask OBP how much does it have. Here is the listing where it happens:
>>>>>  >>
>>>>>  >>  _swift_vac_rgnflush:            rd      %psr, %g2
>>>>>  >>  _swift_vac_rgnflush+4:          andn    %g2, 0x20, %g5
>>>>>  >>  _swift_vac_rgnflush+8:          mov     %g5, %psr
>>>>>  >>  _swift_vac_rgnflush+0xc:        nop
>>>>>  >>  _swift_vac_rgnflush+0x10:       nop
>>>>>  >>  _swift_vac_rgnflush+0x14:       mov     0x100, %g5
>>>>>  >>  _swift_vac_rgnflush+0x18:       lda     [%g5] 0x4, %g5
>>>>>  >>  _swift_vac_rgnflush+0x1c:       sll     %o2, 0x2, %g1
>>>>>  >>  _swift_vac_rgnflush+0x20:       sll     %g5, 0x4, %g5
>>>>>  >>  _swift_vac_rgnflush+0x24:       add     %g5, %g1, %g5
>>>>>  >>  _swift_vac_rgnflush+0x28:       lda     [%g5] 0x20, %g5
>>>>>  >>
>>>>>  >>  _swift_vac_rgnflush+0x28: is the fatal one.
>>>>>  >>
>>>>>  >>  kadb> $c
>>>>>  >>  _swift_vac_rgnflush(?)
>>>>>  >>  _vac_rgnflush() + 4
>>>>>  >>  _hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70
>>>>>  >>  _startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + 1414
>>>>>  >>  _main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) + 14
>>>>>  >>
>>>>>  >>  Unfortunately (but not surprisingly) kadb doesn't allow debugging
>>>>>  >>  cache-flush code, so I can't check what is in
>>>>>  >>  [%g5] (aka sfar) on the real machine when this happens.
>>>>>  >
>>>>>  > Linux code for Swift/TurboSPARC VAC flush should be similar.
>>>
>>> Do you have an idea why would anyone try reading a value referenced in sfar?
>>> Especially during flushing? I can't imagine a case where it wouldn't
>>> produce a fault.
>>
>> No idea, the fault should be inevitable. An explanation how VAC
>> (Virtually Addressed Cache?) works could help.
>
> Is it available somewhere? An explanation how PAC works is interesting
> too, cause when emulating SS-20, Solaris boots hangs where it normally
> says that PAC is initialized.
>
>>>>>  >>  But the bug in phys_page_find would explain this accesses: sfar gets
>>>>>  >>  the wrong address, and then the secondary access happens on this wrong
>>>>>  >>  address instead of the original one.
>>>>>  >
>>>>>  > I doubt phys_page_find can be buggy, it is so vital for all architecture.
>>>>>
>>>>>
>>>>> But you've seen the example of buggy behaviour I posted last Friday, right?
>>>>>  If it's not phys_page_find, it's either cpu_physical_memory_rw (which
>>>>>  is also pretty generic), or
>>>>>  the way SS-20 registers devices. Can it be that all the pages must be
>>>>>  registered in the proper order?
>>>>
>>>> How about unassigned access handler, could it be suspected?
>>>
>>> Doesn't look like it: it gets a physical address as a parameter. How
>>> would it know the address is wrong?
>>
>> It wouldn't, but IIRC Paul claimed earlier that the unassigned memory
>> handling in QEMU could have problems.
>
> But I thought Paul also fixed the problems? There was a patch from him.
>
>>>>>  I think it's a pretty rare use case where you have a memory fault (not
>>>>>  a translation fault) on an unknown address. You may have such fault
>>>>>  during device probing, but in such case you know what address you are
>>>>>  probing, so you don't care about the sync fault address register.
>>>>>
>>>>>  Besides, do all architectures have sync fault address register?
>>>>
>>>> No, I think system level checks like that and IOMMU-like controls on
>>>> most architectures are very poor compared to Sparc32. Server and
>>>> mainframe systems may be a bit better.
>>>
>>> And do we have any mainframe emulated good enough to have a user base
>>> and hence bug reports?
>>
>> The only IOMMU implemented is Sparc32 one so far. I don't know about
>> S390x architecture, that should definitely be mainframe class. AMD
>> IOMMU may be in QEMU one day.
>>
>> About bugs, IIRC NetBSD 3.x crash could be related to IOMMU.
>
> What does indicate it? It happens where the disk sizes are normally
> reported, so it could be a scsi/dma/irq/fpu issue as well.

IIRC the DVMA address was 0xfc004000, but the mapped entries were for
0xfc000000 to 0xfc003fff.

>
>>>>>  >>  fwiw the routine is called only once on the real hardware. It sort of
>>>>>  >>  speaks for your hypothesis about the memory probing. Although it may
>>>>>  >>  not necessarily probe for memory...
>>>>>  >>
>
>
> --
> Regards,
> Artyom Tarasenko
>
> solaris/sparc under qemu blog: http://tyom.blogspot.com/
>
Artyom Tarasenko May 26, 2010, 7:04 p.m. UTC | #12
2010/5/25 Blue Swirl <blauwirbel@gmail.com>:
> On Tue, May 25, 2010 at 5:00 PM, Artyom Tarasenko
> <atar4qemu@googlemail.com> wrote:
>> 2010/5/21 Blue Swirl <blauwirbel@gmail.com>:
>>> On Fri, May 21, 2010 at 5:23 PM, Artyom Tarasenko
>>> <atar4qemu@googlemail.com> wrote:
>>>> 2010/5/10 Blue Swirl <blauwirbel@gmail.com>:
>>>>> On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>>> 2010/5/10 Blue Swirl <blauwirbel@gmail.com>:
>>>>>>
>>>>>> > On 5/10/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>>>  >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>>>>>  >>  > On 5/9/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>>>  >>  >> 2010/5/9 Blue Swirl <blauwirbel@gmail.com>:
>>>>>>  >>  >>
>>>>>>  >>  >> > On 5/8/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
>>>>>>  >>  >>  >> On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
>>>>>>  >>  >>  >>  Software shouldn't use aliased addresses, neither should it crash
>>>>>>  >>  >>  >>  when it uses (on the real hardware it wouldn't). Using empty_slot
>>>>>>  >>  >>  >>  instead of aliasing can help with debugging such accesses.
>>>>>>  >>  >>  >
>>>>>>  >>  >>  > TurboSPARC Microprocessor User's Manual shows that there are
>>>>>>  >>  >>  > additional pages after the main IOMMU for AFX registers. So this is
>>>>>>  >>  >>  > not board specific, but depends on CPU/IOMMU versions.
>>>>>>  >>  >>
>>>>>>  >>  >>
>>>>>>  >>  >> I checked it on the real hw: on LX and SS-5 these are aliased MMU addresses.
>>>>>>  >>  >>  SS-20 doesn't have any aliasing.
>>>>>>  >>  >
>>>>>>  >>  > But are your machines equipped with TurboSPARC or some other CPU?
>>>>>>  >>
>>>>>>  >>
>>>>>>  >> Good point, I must confess, I missed the word "Turbo" in your first
>>>>>>  >>  answer. LX and SS-20 don't.
>>>>>>  >>  But SS-5 must have a TurboSPARC CPU:
>>>>>>  >>
>>>>>>  >>  ok cd /FMI,MB86904
>>>>>>  >>  ok .attributes
>>>>>>  >>  context-table            00 00 00 00 03 ff f0 00 00 00 10 00
>>>>>>  >>  psr-implementation       00000000
>>>>>>  >>  psr-version              00000004
>>>>>>  >>  implementation           00000000
>>>>>>  >>  version                  00000004
>>>>>>  >>  cache-line-size          00000020
>>>>>>  >>  cache-nlines             00000200
>>>>>>  >>  page-size                00001000
>>>>>>  >>  dcache-line-size         00000010
>>>>>>  >>  dcache-nlines            00000200
>>>>>>  >>  dcache-associativity     00000001
>>>>>>  >>  icache-line-size         00000020
>>>>>>  >>  icache-nlines            00000200
>>>>>>  >>  icache-associativity     00000001
>>>>>>  >>  ncaches                  00000002
>>>>>>  >>  mmu-nctx                 00000100
>>>>>>  >>  sparc-version            00000008
>>>>>>  >>  mask_rev                 00000026
>>>>>>  >>  device_type              cpu
>>>>>>  >>  name                     FMI,MB86904
>>>>>>  >>
>>>>>>  >>  and still it behaves the same as TI,TMS390S10 from the LX. This is done on SS-5:
>>>>>>  >>
>>>>>>  >>  ok 10000000 20 spacel@ .
>>>>>>  >>  4000009
>>>>>>  >>  ok 14000000 20 spacel@ .
>>>>>>  >>  4000009
>>>>>>  >>  ok 14000004 20 spacel@ .
>>>>>>  >>  23000
>>>>>>  >>  ok 1f000004 20 spacel@ .
>>>>>>  >>  23000
>>>>>>  >>  ok 10000008 20 spacel@ .
>>>>>>  >>  4000009
>>>>>>  >>  ok 14000028 20 spacel@ .
>>>>>>  >>  4000009
>>>>>>  >>  ok 1000000c 20 spacel@ .
>>>>>>  >>  23000
>>>>>>  >>  ok 10000010 20 spacel@ .
>>>>>>  >>  4000009
>>>>>>  >>
>>>>>>  >>
>>>>>>  >>  LX is the same except for the IOMMU-version:
>>>>>>  >>
>>>>>>  >>  ok 10000000 20 spacel@ .
>>>>>>  >>  4000005
>>>>>>  >>  ok 14000000 20 spacel@ .
>>>>>>  >>  4000005
>>>>>>  >>  ok 18000000 20 spacel@ .
>>>>>>  >>  4000005
>>>>>>  >>  ok 1f000000 20 spacel@ .
>>>>>>  >>  4000005
>>>>>>  >>  ok 1ff00000 20 spacel@ .
>>>>>>  >>  4000005
>>>>>>  >>  ok 1fff0004 20 spacel@ .
>>>>>>  >>  1fe000
>>>>>>  >>  ok 10000004 20 spacel@ .
>>>>>>  >>  1fe000
>>>>>>  >>  ok 10000108 20 spacel@ .
>>>>>>  >>  41000005
>>>>>>  >>  ok 10000040 20 spacel@ .
>>>>>>  >>  41000005
>>>>>>  >>  ok 1fff0040 20 spacel@ .
>>>>>>  >>  41000005
>>>>>>  >>  ok 1fff0044 20 spacel@ .
>>>>>>  >>  1fe000
>>>>>>  >>  ok 1fff0024 20 spacel@ .
>>>>>>  >>  1fe000
>>>>>>  >>
>>>>>>  >>
>>>>>>  >>  >>  At what address the additional AFX registers are located?
>>>>>>  >>  >
>>>>>>  >>  > Here's complete TurboSPARC IOMMU address map:
>>>>>>  >>  >  PA[30:0]          Register          Access
>>>>>>  >>  > 1000_0000       IOMMU Control         R/W
>>>>>>  >>  > 1000_0004    IOMMU Base Address       R/W
>>>>>>  >>  > 1000_0014   Flush All IOTLB Entries    W
>>>>>>  >>  > 1000_0018        Address Flush         W
>>>>>>  >>  > 1000_1000  Asynchronous Fault Status  R/W
>>>>>>  >>  > 1000_1004 Asynchronous Fault Address  R/W
>>>>>>  >>  > 1000_1010  SBus Slot Configuration 0   R/W
>>>>>>  >>  > 1000_1014  SBus Slot Configuration 1   R/W
>>>>>>  >>  > 1000_1018  SBus Slot Configuration 2   R/W
>>>>>>  >>  > 1000_101C  SBus Slot Configuration 3   R/W
>>>>>>  >>  > 1000_1020  SBus Slot Configuration 4   R/W
>>>>>>  >>  > 1000_1050     Memory Fault Status     R/W
>>>>>>  >>  > 1000_1054    Memory Fault Address     R/W
>>>>>>  >>  > 1000_2000     Module Identification    R/W
>>>>>>  >>  > 1000_3018      Mask Identification      R
>>>>>>  >>  > 1000_4000      AFX Queue Level         W
>>>>>>  >>  > 1000_6000      AFX Queue Level         R
>>>>>>  >>  > 1000_7000      AFX Queue Status        R
>>>>>>  >>
>>>>>>  >>
>>>>>>  >>
>>>>>>  >> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m 32) is
>>>>>>  >>  well above this limit.
>>>>>>  >
>>>>>>  > Oh, so I also misread something. You are not talking about the
>>>>>>  > adjacent pages, but 16MB increments.
>>>>>>  >
>>>>>>  > Earlier I sent a patch for a generic address alias device, would it be
>>>>>>  > useful for this?
>>>>>>
>>>>>>
>>>>>> Should do as well. But I thought empty_slot is less overhead and
>>>>>>  easier to debug.
>>>>>>
>>>>
>>>> Also the aliasing patch would require one more parameter: the size of
>>>> area which has to be aliased. Except we implement stubs for all
>>>> missing devices and and do aliasing of the connected port ranges. And
>>>> then again, SS-20 doesn't have aliasing in this area at all.
>>>>
>>>> What do you think about this (empty_slot) solution (except that I
>>>> missed the SoB line)? Meanwhile it's tested with SunOS 4.1.3U1 too.
>>>
>>> I'm slightly against it, of course it would help for this but I think
>>> we may be missing a bigger problem.
>>>
>>>>>>> Maybe we have a general design problem, perhaps unassigned access
>>>>>>> faults should only be triggered inside SBus slots and ignored
>>>>>>> elsewhere. If this is true, generic Sparc32 unassigned access handler
>>>>>>> should just ignore the access and special fault generating slots
>>>>>>> should be installed for empty SBus address ranges.
>>>>
>>>> Agreed that they should be special for SBus, because SS-20 OBP is
>>>> not happy with the fault we are currently generating. But otherwise I think qemu
>>>> does it correct. On SS-5:
>>>>
>>>> ok f7ff0000 2f spacel@ .
>>>> Data Access Error
>>>> ok sfar@ .
>>>> f7ff0000
>>>> ok 20000000 2f spacel@ .
>>>> Data Access Error
>>>> ok sfar@ .
>>>> 20000000
>>>> ok 40000000 20 spacel@ .
>>>> Data Access Error
>>>> ok sfar@ .
>>>> 40000000
>>>>
>>>> Neither ff7ff0000 nor f20000000, nor 40000000 are in SBus range,  right?
>>>
>>> 40000000 is on SS-5.
>>
>> Ah. I was only aware of the control space. What ranges does SBus take?
>
> On SS-5, 30000000 to 7fffffff, each slot taking 10000000. There's AFX
> bus on 20000000.
>
> OBP property '/iommu/sbus/ranges' shows these (also other ranges)
>
>>
>>> So is the SBus Control Space in 0x10000000 to
>>> 0x1fffffff the only area besides DRAM where the accesses won't trap?
>>
>> At least some area after ROM is aliased too. Also on SS-10 with a
>> non-active frame buffer
>> writing to SX registers makes no visible effect and reading from them
>> produces no fault but a NMI.
>
> Then we should cover the whole area after IOMMU with empty slot
> device. ROM probably doesn't matter.

You mean up to sbus (the example above has shown that 40000000
produced a fault)?
But that's what the patch is actually doing. More examples on SS-5:

ok 1ffffffc 20 spacel@ .
3fe000
ok 20000000 20 spacel@ .
c8840140
ok 20010000 20 spacel@ .
c8840140
ok 2f010000 20 spacel@ .
c8840140
ok 30000000 20 spacel@ .
Data Access Error
ok sfar@ . sfsr@ .
30000000 836
ok 40000000 20 spacel@ .
Data Access Error
ok sfar@ . sfsr@ .
40000000 836
ok 50000000 20 spacel@ .
fd03774a
ok 60000000 20 spacel@ .
Data Access Error
ok 70000000 20 spacel@ .
10802f66

ok show-sbus
SBus slot 5 ledma le SUNW,bpp espdma esp
SBus slot 4 power-management SUNW,CS4231
SBus slot 1
SBus slot 2
SBus slot 3 cgsix
SBus slot 0

The ranges 20000000,50000000 and 70000000 don't produce a fault cause
there are devices attached. The areas 30000000,40000000 and  60000000
do cause there is nothing.
So qemu does more or less the right thing. It's even more right in
case of SS-20 cause iommu there is not aliased and accessing the
addresses after iommu also produces faults.

Was going to put some more empty slots into SS-10/20 (VSIMMs, SX)
after we are done with SS-5 (due to technical limitations I can switch
access from one real SS model to another one once a few days only).
Bob Breuer May 27, 2010, 4:34 p.m. UTC | #13
Artyom Tarasenko wrote:
> Was going to put some more empty slots into SS-10/20 (VSIMMs, SX)
> after we are done with SS-5 (due to technical limitations I can switch
> access from one real SS model to another one once a few days only).
>   
I have a partial implementation of the SS-20 VSIMM (cg14) that I've been
working on.  With the Sun firmware, I have working text console, color
boot logo, and programmable video resolutions up to 1600x1280.

Bob
Artyom Tarasenko May 28, 2010, 9:53 p.m. UTC | #14
> 32m: 0x12fff394
> 64m: 0x14fff394
> 192m:0x1cfff394
> 256m:0x20fff394
>
> Memory probing? It would be strange that OS would do it itself. The OS
> could just
> ask OBP how much does it have. Here is the listing where it happens:
>
> _swift_vac_rgnflush:            rd      %psr, %g2
> _swift_vac_rgnflush+4:          andn    %g2, 0x20, %g5
> _swift_vac_rgnflush+8:          mov     %g5, %psr
> _swift_vac_rgnflush+0xc:        nop
> _swift_vac_rgnflush+0x10:       nop
> _swift_vac_rgnflush+0x14:       mov     0x100, %g5
> _swift_vac_rgnflush+0x18:       lda     [%g5] 0x4, %g5
> _swift_vac_rgnflush+0x1c:       sll     %o2, 0x2, %g1
> _swift_vac_rgnflush+0x20:       sll     %g5, 0x4, %g5
> _swift_vac_rgnflush+0x24:       add     %g5, %g1, %g5
> _swift_vac_rgnflush+0x28:       lda     [%g5] 0x20, %g5
>
> _swift_vac_rgnflush+0x28: is the fatal one.
>
> kadb> $c
> _swift_vac_rgnflush(?)
> _vac_rgnflush() + 4
> _hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70
> _startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + 1414
> _main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) + 14
>
> Unfortunately (but not surprisingly) kadb doesn't allow debugging
> cache-flush code, so I can't check what is in
> [%g5] (aka sfar) on the real machine when this happens.

I was telling fairy tales here and no one stopped me. [%g5] is not
sfar, it's the context pointer,
so the code makes much more sense!

And I guess, SunOS 4.1.4 is buggy. I've managed to reproduce the
complete case on the real machine. The trick is to set the breakpoint
before the interrupts are switched off:

kadb> _swift_vac_rgnflush:b
kadb> :c
breakpoint      _swift_vac_rgnflush:            rd      %psr, %g2
kadb> <o2=X
                44000e5
kadb> $q
Type  'go' to resume
Type  help  for more information
ok 100 4 spacel@ .
3fff00

So at _swift_vac_rgnflush+0x28 it would access (44000e5<<2) + (3fff00
<< 4) = 14fff394. Which is outside of IOMMU.

ok 14fff394 20 spacel@ .
3fe000

This seems to be an alias to

ok 14000004 20 spacel@ .
3fe000

So, it seems to be safe to pad iommu with an empty slot. I guess we
are not missing anything more serious. Alternatively we can use your
aliasing patch.

What do you say?

P.S. What is also interesting about SunOS 4.1.4 is that only the
single-cpu kernel (which is used during the installation) calls
_swift_vac_rgnflush on initialization. The smp kernel just doesn't
have this call in _hat_setup_kas. Maybe they have noticed the bug and
corrected it?
Blue Swirl May 29, 2010, 8:23 a.m. UTC | #15
On Fri, May 28, 2010 at 9:53 PM, Artyom Tarasenko
<atar4qemu@googlemail.com> wrote:
>> 32m: 0x12fff394
>> 64m: 0x14fff394
>> 192m:0x1cfff394
>> 256m:0x20fff394
>>
>> Memory probing? It would be strange that OS would do it itself. The OS
>> could just
>> ask OBP how much does it have. Here is the listing where it happens:
>>
>> _swift_vac_rgnflush:            rd      %psr, %g2
>> _swift_vac_rgnflush+4:          andn    %g2, 0x20, %g5
>> _swift_vac_rgnflush+8:          mov     %g5, %psr
>> _swift_vac_rgnflush+0xc:        nop
>> _swift_vac_rgnflush+0x10:       nop
>> _swift_vac_rgnflush+0x14:       mov     0x100, %g5
>> _swift_vac_rgnflush+0x18:       lda     [%g5] 0x4, %g5
>> _swift_vac_rgnflush+0x1c:       sll     %o2, 0x2, %g1
>> _swift_vac_rgnflush+0x20:       sll     %g5, 0x4, %g5
>> _swift_vac_rgnflush+0x24:       add     %g5, %g1, %g5
>> _swift_vac_rgnflush+0x28:       lda     [%g5] 0x20, %g5
>>
>> _swift_vac_rgnflush+0x28: is the fatal one.
>>
>> kadb> $c
>> _swift_vac_rgnflush(?)
>> _vac_rgnflush() + 4
>> _hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70
>> _startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + 1414
>> _main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) + 14
>>
>> Unfortunately (but not surprisingly) kadb doesn't allow debugging
>> cache-flush code, so I can't check what is in
>> [%g5] (aka sfar) on the real machine when this happens.
>
> I was telling fairy tales here and no one stopped me. [%g5] is not
> sfar, it's the context pointer,
> so the code makes much more sense!
>
> And I guess, SunOS 4.1.4 is buggy. I've managed to reproduce the
> complete case on the real machine. The trick is to set the breakpoint
> before the interrupts are switched off:
>
> kadb> _swift_vac_rgnflush:b
> kadb> :c
> breakpoint      _swift_vac_rgnflush:            rd      %psr, %g2
> kadb> <o2=X
>                44000e5
> kadb> $q
> Type  'go' to resume
> Type  help  for more information
> ok 100 4 spacel@ .
> 3fff00
>
> So at _swift_vac_rgnflush+0x28 it would access (44000e5<<2) + (3fff00
> << 4) = 14fff394. Which is outside of IOMMU.
>
> ok 14fff394 20 spacel@ .
> 3fe000
>
> This seems to be an alias to
>
> ok 14000004 20 spacel@ .
> 3fe000
>
> So, it seems to be safe to pad iommu with an empty slot. I guess we
> are not missing anything more serious. Alternatively we can use your
> aliasing patch.
>
> What do you say?

Thanks, applied.

> P.S. What is also interesting about SunOS 4.1.4 is that only the
> single-cpu kernel (which is used during the installation) calls
> _swift_vac_rgnflush on initialization. The smp kernel just doesn't
> have this call in _hat_setup_kas. Maybe they have noticed the bug and
> corrected it?
>
> --
> Regards,
> Artyom Tarasenko
>
> solaris/sparc under qemu blog: http://tyom.blogspot.com/
>
diff mbox

Patch

diff --git a/hw/sun4m.c b/hw/sun4m.c
index 9a79120..e31d039 100644
--- a/hw/sun4m.c
+++ b/hw/sun4m.c
@@ -93,7 +93,7 @@ 
 #define ESCC_CLOCK 4915200
 
 struct sun4m_hwdef {
-    target_phys_addr_t iommu_base, slavio_base;
+    target_phys_addr_t iommu_base, iommu_pad_base, iommu_pad_len, slavio_base;
     target_phys_addr_t intctl_base, counter_base, nvram_base, ms_kb_base;
     target_phys_addr_t serial_base, fd_base;
     target_phys_addr_t afx_base, idreg_base, dma_base, esp_base, le_base;
@@ -850,6 +850,14 @@  static void sun4m_hw_init(const struct sun4m_hwdef *hwdef, ram_addr_t RAM_size,
     iommu = iommu_init(hwdef->iommu_base, hwdef->iommu_version,
                        slavio_irq[30]);
 
+    if (hwdef->iommu_pad_base) {
+        /* On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
+           Software shouldn't use aliased addresses, neither should it crash
+           when does. Using empty_slot instead of aliasing can help with
+           debugging such accesses */
+        empty_slot_init(hwdef->iommu_pad_base,hwdef->iommu_pad_len);
+    }
+
     espdma = sparc32_dma_init(hwdef->dma_base, slavio_irq[18],
                               iommu, &espdma_irq);
 
@@ -961,6 +969,8 @@  static const struct sun4m_hwdef sun4m_hwdefs[] = {
     /* SS-5 */
     {
         .iommu_base   = 0x10000000,
+        .iommu_pad_base = 0x10004000,
+        .iommu_pad_len  = 0x0fffb000,
         .tcx_base     = 0x50000000,
         .cs_base      = 0x6c000000,
         .slavio_base  = 0x70000000,
@@ -1087,6 +1097,8 @@  static const struct sun4m_hwdef sun4m_hwdefs[] = {
     /* LX */
     {
         .iommu_base   = 0x10000000,
+        .iommu_pad_base = 0x10004000,
+        .iommu_pad_len  = 0x0fffb000,
         .tcx_base     = 0x50000000,
         .slavio_base  = 0x70000000,
         .ms_kb_base   = 0x71000000,