diff mbox

qemu-ppc can't run static uClibc binaries.

Message ID 201002140236.28953.rob@landley.net
State New
Headers show

Commit Message

Rob Landley Feb. 14, 2010, 8:36 a.m. UTC
On Thursday 11 February 2010 06:32:12 Alexander Graf wrote:
> Rob Landley wrote:
> > Static binaries that run under the Linux kernel don't run under qemu-ppc.
> >  For example, the prebuilt busybox binaries here:
> >
> >   http://busybox.net/downloads/binaries/1.16.0/busybox-powerpc
> >
> > Don't run under qemu-ppc, but runs just fine under qemu-system-ppc with
> > the image at:
> >
> >  
> > http://impactlinux.com/fwl/downloads/binaries/system-image-powerpc.tar.bz
> >2
> >
> > The reason is that the "powerpc spec" that qemu was written to is for
> > AIX, not for Linux, and thus the register layout qemu application
> > emulation provides for powerpc doesn't match what the kernel is actually
> > doing.
> >
> > For dynamically linked executables, the dynamic linker reorganizes the
> > register contents to match the AIX spec from IBM, but statically linked
> > binaries get what the kernel provides directly.  Thus binaries statically
> > linked against uClibc won't run under qemu-ppc, but run under
> > qemu-system-ppc just fine.
> >
> > I tracked down this problem in 2007:
> >
> >   http://landley.net/notes-2007.html#28-03-2007
> >
> > And reported it on the list at the time:
> >
> >   http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00713.html
> >   http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00720.html
> >   http://lists.gnu.org/archive/html/qemu-devel/2007-04/msg00315.html
> >
> > However, the then-maintainer of powerpc believed nobody else ever had the
> > right to touch "her code":
> >
> >   http://lists.gnu.org/archive/html/qemu-devel/2007-04/msg00198.html
> >
> > And I was unable to convince her that insisting reality change to match a
> > spec which wasn't even for the right platform was not a useful approach. 
> > Thus the binary in the first link still won't run under qemu-ppc three
> > years later, despite running fine under a real Linux kernel.
>
> Patches are always welcome. The only thing you might want to make sure
> is that dynamically linked binaries also still continue to work :-).

Attached.

This may help explain the issue:

  http://sources.redhat.com/ml/libc-alpha/2003-03/msg00272.html

It's not a question of dynamically linked Linux binaries.  They work just fine 
with either register layout.  The dynamic linker converts the Linux layout to 
the AIX layout, and is reentrant so it won't do it a second time if it's 
already been converted.

The problem is that BSD wants the AIX layout, and hence this comment in linux-
user/elfload.c function init_thread():

    /* Note that isn't exactly what regular kernel does
     * but this is what the ABI wants and is needed to allow
     * execution of PPC BSD programs.
     */

I.E. whoever wrote this already knows it's not what the Linux kernel is 
actually doing, and they're not doing it for Linux, they're doing it for BSD.

The fix is probably to add #ifdef CONFIG_BSD around the appropriate chunk of 
code.  Attached is a patch to do that (plus tweaks to make the "you have an 
unused variable, break the build!" logic shut up about it).

(Yes, I tested that a dynamically linked hello world still worked for me.)

Rob

Comments

Alexander Graf Feb. 14, 2010, 2:41 p.m. UTC | #1
Am Sun 14 Feb 2010 09:36:27 AM CET schrieb Rob Landley <rob@landley.net>:

> On Thursday 11 February 2010 06:32:12 Alexander Graf wrote:
>> Rob Landley wrote:
>> > Static binaries that run under the Linux kernel don't run under qemu-ppc.
>> >  For example, the prebuilt busybox binaries here:
>> >
>> >   http://busybox.net/downloads/binaries/1.16.0/busybox-powerpc
>> >
>> > Don't run under qemu-ppc, but runs just fine under qemu-system-ppc with
>> > the image at:
>> >
>> >
>> > http://impactlinux.com/fwl/downloads/binaries/system-image-powerpc.tar.bz
>> >2
>> >
>> > The reason is that the "powerpc spec" that qemu was written to is for
>> > AIX, not for Linux, and thus the register layout qemu application
>> > emulation provides for powerpc doesn't match what the kernel is actually
>> > doing.
>> >
>> > For dynamically linked executables, the dynamic linker reorganizes the
>> > register contents to match the AIX spec from IBM, but statically linked
>> > binaries get what the kernel provides directly.  Thus binaries statically
>> > linked against uClibc won't run under qemu-ppc, but run under
>> > qemu-system-ppc just fine.
>> >
>> > I tracked down this problem in 2007:
>> >
>> >   http://landley.net/notes-2007.html#28-03-2007
>> >
>> > And reported it on the list at the time:
>> >
>> >   http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00713.html
>> >   http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00720.html
>> >   http://lists.gnu.org/archive/html/qemu-devel/2007-04/msg00315.html
>> >
>> > However, the then-maintainer of powerpc believed nobody else ever had the
>> > right to touch "her code":
>> >
>> >   http://lists.gnu.org/archive/html/qemu-devel/2007-04/msg00198.html
>> >
>> > And I was unable to convince her that insisting reality change to match a
>> > spec which wasn't even for the right platform was not a useful approach.
>> > Thus the binary in the first link still won't run under qemu-ppc three
>> > years later, despite running fine under a real Linux kernel.
>>
>> Patches are always welcome. The only thing you might want to make sure
>> is that dynamically linked binaries also still continue to work :-).
>
> Attached.
>
> This may help explain the issue:
>
>   http://sources.redhat.com/ml/libc-alpha/2003-03/msg00272.html
>
> It's not a question of dynamically linked Linux binaries.  They work  
>  just fine
> with either register layout.  The dynamic linker converts the Linux layout to
> the AIX layout, and is reentrant so it won't do it a second time if it's
> already been converted.
>
> The problem is that BSD wants the AIX layout, and hence this comment  
>  in linux-
> user/elfload.c function init_thread():
>
>     /* Note that isn't exactly what regular kernel does
>      * but this is what the ABI wants and is needed to allow
>      * execution of PPC BSD programs.
>      */
>
> I.E. whoever wrote this already knows it's not what the Linux kernel is
> actually doing, and they're not doing it for Linux, they're doing it for BSD.
>
> The fix is probably to add #ifdef CONFIG_BSD around the appropriate chunk of
> code.  Attached is a patch to do that (plus tweaks to make the "you have an
> unused variable, break the build!" logic shut up about it).
>
> (Yes, I tested that a dynamically linked hello world still worked for me.)

I don't see why it would fail. The link above states that for  
statically linked binaries, r1 points to all the variables. For  
dynamically linked ones, you also get pointers in some regs.

So the only case I can imagine that this breaks anything is that  
uClibc requires register state to be 0.


Alex
Rob Landley Feb. 15, 2010, 11:10 a.m. UTC | #2
On Sunday 14 February 2010 08:41:00 Alexander Graf wrote:
> Am Sun 14 Feb 2010 09:36:27 AM CET schrieb Rob Landley <rob@landley.net>:
> > On Thursday 11 February 2010 06:32:12 Alexander Graf wrote:
> >> Rob Landley wrote:
> >> > Static binaries that run under the Linux kernel don't run under
> >> > qemu-ppc. For example, the prebuilt busybox binaries here:
> >> >
> >> >   http://busybox.net/downloads/binaries/1.16.0/busybox-powerpc
> >> >
> >> > Don't run under qemu-ppc, but runs just fine under qemu-system-ppc
> >> > with the image at:
> >> >
> >> >
> >> > http://impactlinux.com/fwl/downloads/binaries/system-image-powerpc.tar
> >> >.bz 2
> >> >
> >> > The reason is that the "powerpc spec" that qemu was written to is for
> >> > AIX, not for Linux, and thus the register layout qemu application
> >> > emulation provides for powerpc doesn't match what the kernel is
> >> > actually doing.
> >> >
> >> > For dynamically linked executables, the dynamic linker reorganizes the
> >> > register contents to match the AIX spec from IBM, but statically
> >> > linked binaries get what the kernel provides directly.  Thus binaries
> >> > statically linked against uClibc won't run under qemu-ppc, but run
> >> > under qemu-system-ppc just fine.
> >> >
> >> > I tracked down this problem in 2007:
> >> >
> >> >   http://landley.net/notes-2007.html#28-03-2007
> >> >
> >> > And reported it on the list at the time:
> >> >
> >> >   http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00713.html
> >> >   http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00720.html
> >> >   http://lists.gnu.org/archive/html/qemu-devel/2007-04/msg00315.html
> >> >
> >> > However, the then-maintainer of powerpc believed nobody else ever had
> >> > the right to touch "her code":
> >> >
> >> >   http://lists.gnu.org/archive/html/qemu-devel/2007-04/msg00198.html
> >> >
> >> > And I was unable to convince her that insisting reality change to
> >> > match a spec which wasn't even for the right platform was not a useful
> >> > approach. Thus the binary in the first link still won't run under
> >> > qemu-ppc three years later, despite running fine under a real Linux
> >> > kernel.
> >>
> >> Patches are always welcome. The only thing you might want to make sure
> >> is that dynamically linked binaries also still continue to work :-).
> >
> > Attached.
> >
> > This may help explain the issue:
> >
> >   http://sources.redhat.com/ml/libc-alpha/2003-03/msg00272.html
> >
> > It's not a question of dynamically linked Linux binaries.  They work
> >  just fine
> > with either register layout.  The dynamic linker converts the Linux
> > layout to the AIX layout, and is reentrant so it won't do it a second
> > time if it's already been converted.
> >
> > The problem is that BSD wants the AIX layout, and hence this comment
> >  in linux-
> > user/elfload.c function init_thread():
> >
> >     /* Note that isn't exactly what regular kernel does
> >      * but this is what the ABI wants and is needed to allow
> >      * execution of PPC BSD programs.
> >      */
> >
> > I.E. whoever wrote this already knows it's not what the Linux kernel is
> > actually doing, and they're not doing it for Linux, they're doing it for
> > BSD.
> >
> > The fix is probably to add #ifdef CONFIG_BSD around the appropriate chunk
> > of code.  Attached is a patch to do that (plus tweaks to make the "you
> > have an unused variable, break the build!" logic shut up about it).
> >
> > (Yes, I tested that a dynamically linked hello world still worked for
> > me.)
>
> I don't see why it would fail. The link above states that for
> statically linked binaries, r1 points to all the variables. For
> dynamically linked ones, you also get pointers in some regs.
>
> So the only case I can imagine that this breaks anything is that
> uClibc requires register state to be 0.

Yes, r3 (which is the exit code from the "exec" syscall, and thus 0 if it 
worked).  In the BSD layout, it's argc (which can never be 0).

  http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00720.html

Rob
Alexander Graf Feb. 15, 2010, 11:19 a.m. UTC | #3
On 15.02.2010, at 12:10, Rob Landley wrote:

> On Sunday 14 February 2010 08:41:00 Alexander Graf wrote:
>> Am Sun 14 Feb 2010 09:36:27 AM CET schrieb Rob Landley <rob@landley.net>:
>>> On Thursday 11 February 2010 06:32:12 Alexander Graf wrote:
>>>> Rob Landley wrote:
>>>>> Static binaries that run under the Linux kernel don't run under
>>>>> qemu-ppc. For example, the prebuilt busybox binaries here:
>>>>> 
>>>>>  http://busybox.net/downloads/binaries/1.16.0/busybox-powerpc
>>>>> 
>>>>> Don't run under qemu-ppc, but runs just fine under qemu-system-ppc
>>>>> with the image at:
>>>>> 
>>>>> 
>>>>> http://impactlinux.com/fwl/downloads/binaries/system-image-powerpc.tar
>>>>> .bz 2
>>>>> 
>>>>> The reason is that the "powerpc spec" that qemu was written to is for
>>>>> AIX, not for Linux, and thus the register layout qemu application
>>>>> emulation provides for powerpc doesn't match what the kernel is
>>>>> actually doing.
>>>>> 
>>>>> For dynamically linked executables, the dynamic linker reorganizes the
>>>>> register contents to match the AIX spec from IBM, but statically
>>>>> linked binaries get what the kernel provides directly.  Thus binaries
>>>>> statically linked against uClibc won't run under qemu-ppc, but run
>>>>> under qemu-system-ppc just fine.
>>>>> 
>>>>> I tracked down this problem in 2007:
>>>>> 
>>>>>  http://landley.net/notes-2007.html#28-03-2007
>>>>> 
>>>>> And reported it on the list at the time:
>>>>> 
>>>>>  http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00713.html
>>>>>  http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00720.html
>>>>>  http://lists.gnu.org/archive/html/qemu-devel/2007-04/msg00315.html
>>>>> 
>>>>> However, the then-maintainer of powerpc believed nobody else ever had
>>>>> the right to touch "her code":
>>>>> 
>>>>>  http://lists.gnu.org/archive/html/qemu-devel/2007-04/msg00198.html
>>>>> 
>>>>> And I was unable to convince her that insisting reality change to
>>>>> match a spec which wasn't even for the right platform was not a useful
>>>>> approach. Thus the binary in the first link still won't run under
>>>>> qemu-ppc three years later, despite running fine under a real Linux
>>>>> kernel.
>>>> 
>>>> Patches are always welcome. The only thing you might want to make sure
>>>> is that dynamically linked binaries also still continue to work :-).
>>> 
>>> Attached.
>>> 
>>> This may help explain the issue:
>>> 
>>>  http://sources.redhat.com/ml/libc-alpha/2003-03/msg00272.html
>>> 
>>> It's not a question of dynamically linked Linux binaries.  They work
>>> just fine
>>> with either register layout.  The dynamic linker converts the Linux
>>> layout to the AIX layout, and is reentrant so it won't do it a second
>>> time if it's already been converted.
>>> 
>>> The problem is that BSD wants the AIX layout, and hence this comment
>>> in linux-
>>> user/elfload.c function init_thread():
>>> 
>>>    /* Note that isn't exactly what regular kernel does
>>>     * but this is what the ABI wants and is needed to allow
>>>     * execution of PPC BSD programs.
>>>     */
>>> 
>>> I.E. whoever wrote this already knows it's not what the Linux kernel is
>>> actually doing, and they're not doing it for Linux, they're doing it for
>>> BSD.
>>> 
>>> The fix is probably to add #ifdef CONFIG_BSD around the appropriate chunk
>>> of code.  Attached is a patch to do that (plus tweaks to make the "you
>>> have an unused variable, break the build!" logic shut up about it).
>>> 
>>> (Yes, I tested that a dynamically linked hello world still worked for
>>> me.)
>> 
>> I don't see why it would fail. The link above states that for
>> statically linked binaries, r1 points to all the variables. For
>> dynamically linked ones, you also get pointers in some regs.
>> 
>> So the only case I can imagine that this breaks anything is that
>> uClibc requires register state to be 0.
> 
> Yes, r3 (which is the exit code from the "exec" syscall, and thus 0 if it 
> worked).  In the BSD layout, it's argc (which can never be 0).
> 
>  http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00720.html

So what you really want is something like

#ifdef CONFIG_LINUX_USER
/* exec return value is always 0 */
env->gpr[3] = 0;
#endif

just after the #endif in your patch. If you had inlined your patch I could've commented it there.


Alex
Rob Landley Feb. 15, 2010, 12:58 p.m. UTC | #4
On Monday 15 February 2010 05:19:24 Alexander Graf wrote:
> On 15.02.2010, at 12:10, Rob Landley wrote:
> > On Sunday 14 February 2010 08:41:00 Alexander Graf wrote:
> >> So the only case I can imagine that this breaks anything is that
> >> uClibc requires register state to be 0.
> >
> > Yes, r3 (which is the exit code from the "exec" syscall, and thus 0 if it
> > worked).  In the BSD layout, it's argc (which can never be 0).
> >
> >  http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00720.html
>
> So what you really want is something like
>
> #ifdef CONFIG_LINUX_USER
> /* exec return value is always 0 */
> env->gpr[3] = 0;
> #endif
>
> just after the #endif in your patch. If you had inlined your patch I
> could've commented it there.

Unfortunately kmail plays fast and loose with whitespace when I inline stuff.  
(Not always, but I can't tell by inspection when it's decided it was hungry 
for tabs or wanted to throw in that horrible UTF8 escaped whitespace.)

I didn't explicitly set it because they're initialized to zero in function 
main() on line 2654 of linux-user/main.c.  (Any regs we don't explicitly set 
to some other value start out zeroed in qemu.)

If you prefer to make the requirements explicit, that works too, but a comment 
might do just as well.  (I tend to prefer removing unnecessary work Linux 
doesn't need done, rather than adding extra code to undo the unnecessary work 
afterwards.  Force of habit from years on busybox and such.)

Rob
Alexander Graf Feb. 15, 2010, 1:01 p.m. UTC | #5
On 15.02.2010, at 13:58, Rob Landley wrote:

> On Monday 15 February 2010 05:19:24 Alexander Graf wrote:
>> On 15.02.2010, at 12:10, Rob Landley wrote:
>>> On Sunday 14 February 2010 08:41:00 Alexander Graf wrote:
>>>> So the only case I can imagine that this breaks anything is that
>>>> uClibc requires register state to be 0.
>>> 
>>> Yes, r3 (which is the exit code from the "exec" syscall, and thus 0 if it
>>> worked).  In the BSD layout, it's argc (which can never be 0).
>>> 
>>> http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00720.html
>> 
>> So what you really want is something like
>> 
>> #ifdef CONFIG_LINUX_USER
>> /* exec return value is always 0 */
>> env->gpr[3] = 0;
>> #endif
>> 
>> just after the #endif in your patch. If you had inlined your patch I
>> could've commented it there.
> 
> Unfortunately kmail plays fast and loose with whitespace when I inline stuff.  
> (Not always, but I can't tell by inspection when it's decided it was hungry 
> for tabs or wanted to throw in that horrible UTF8 escaped whitespace.)

git-send-mail is your friend :-).

> I didn't explicitly set it because they're initialized to zero in function 
> main() on line 2654 of linux-user/main.c.  (Any regs we don't explicitly set 
> to some other value start out zeroed in qemu.)

So it should work already?

> If you prefer to make the requirements explicit, that works too, but a comment 
> might do just as well.  (I tend to prefer removing unnecessary work Linux 
> doesn't need done, rather than adding extra code to undo the unnecessary work 
> afterwards.  Force of habit from years on busybox and such.)

Well, I personally prefer to always use the same code paths whenever possible. That makes the code less prone to failure in odd configurations. And we have a lot of different combinations of those in Qemu.

But this is Riku's call. He's the linux-user maintainer.


Alex
Michael S. Tsirkin Feb. 15, 2010, 1:08 p.m. UTC | #6
On Mon, Feb 15, 2010 at 06:58:33AM -0600, Rob Landley wrote:
> On Monday 15 February 2010 05:19:24 Alexander Graf wrote:
> > On 15.02.2010, at 12:10, Rob Landley wrote:
> > > On Sunday 14 February 2010 08:41:00 Alexander Graf wrote:
> > >> So the only case I can imagine that this breaks anything is that
> > >> uClibc requires register state to be 0.
> > >
> > > Yes, r3 (which is the exit code from the "exec" syscall, and thus 0 if it
> > > worked).  In the BSD layout, it's argc (which can never be 0).
> > >
> > >  http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00720.html
> >
> > So what you really want is something like
> >
> > #ifdef CONFIG_LINUX_USER
> > /* exec return value is always 0 */
> > env->gpr[3] = 0;
> > #endif
> >
> > just after the #endif in your patch. If you had inlined your patch I
> > could've commented it there.
> 
> Unfortunately kmail plays fast and loose with whitespace when I inline stuff.  
> (Not always, but I can't tell by inspection when it's decided it was hungry 
> for tabs or wanted to throw in that horrible UTF8 escaped whitespace.)

See Documentation/email-clients.txt under linux source tree.
Rob Landley Feb. 16, 2010, 12:52 a.m. UTC | #7
On Monday 15 February 2010 07:08:33 Michael S. Tsirkin wrote:
> On Mon, Feb 15, 2010 at 06:58:33AM -0600, Rob Landley wrote:
> > On Monday 15 February 2010 05:19:24 Alexander Graf wrote:
> > > On 15.02.2010, at 12:10, Rob Landley wrote:
> > > > On Sunday 14 February 2010 08:41:00 Alexander Graf wrote:
> > > >> So the only case I can imagine that this breaks anything is that
> > > >> uClibc requires register state to be 0.
> > > >
> > > > Yes, r3 (which is the exit code from the "exec" syscall, and thus 0
> > > > if it worked).  In the BSD layout, it's argc (which can never be 0).
> > > >
> > > >  http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00720.html
> > >
> > > So what you really want is something like
> > >
> > > #ifdef CONFIG_LINUX_USER
> > > /* exec return value is always 0 */
> > > env->gpr[3] = 0;
> > > #endif
> > >
> > > just after the #endif in your patch. If you had inlined your patch I
> > > could've commented it there.
> >
> > Unfortunately kmail plays fast and loose with whitespace when I inline
> > stuff. (Not always, but I can't tell by inspection when it's decided it
> > was hungry for tabs or wanted to throw in that horrible UTF8 escaped
> > whitespace.)
>
> See Documentation/email-clients.txt under linux source tree.

I did.  That doesn't cover the different bugs in different versions, what 
happens when you use kmail under xfce, and so on.  (It also doesn't mention 
that you have to disable wordwrap for the entire message and hit return by 
hand at the end of each line to get kmail not to wrap inline includes.  Or 
that some versions of kmail embed NUL bytes into inline includes, for some 
reason.)

I could make it work, just didn't know this list had a tropism for inline.  
(Varies per list and I wander through a bunch of 'em.)  Over on the -hda sets 
/dev/hdc topic I posted a patch inline which was ignored because the behavior 
the Linux kernel has consistently had for the past decade or more isn't 
considered especially important.  Thus I didn't think you were likely 
following lkml tropes.

*shrug*  Now I know...

Rob
Stuart Brady Feb. 16, 2010, 8:21 a.m. UTC | #8
On Mon, Feb 15, 2010 at 12:19:24PM +0100, Alexander Graf wrote:
> So what you really want is something like
> 
> #ifdef CONFIG_LINUX_USER
> /* exec return value is always 0 */
> env->gpr[3] = 0;
> #endif
> 
> just after the #endif in your patch. If you had inlined your patch I could've commented it there.

I've clearly misunderstood something, but isn't CONFIG_LINUX_USER always
going to be defined when building linux-user/elfload.c, and doesn't 
CONFIG_BSD relate to the host that you're building for, not the target?

I can't remember whether Jocelyn was interested in running BSD binaries
under Linux or under BSD.  The former seems reasonable, although even if
that did work for PPC at one point, I doubt that's still the case...

Cheers,
Alexander Graf Feb. 16, 2010, 9:31 a.m. UTC | #9
On 16.02.2010, at 01:52, Rob Landley wrote:

> On Monday 15 February 2010 07:08:33 Michael S. Tsirkin wrote:
>> On Mon, Feb 15, 2010 at 06:58:33AM -0600, Rob Landley wrote:
>>> On Monday 15 February 2010 05:19:24 Alexander Graf wrote:
>>>> On 15.02.2010, at 12:10, Rob Landley wrote:
>>>>> On Sunday 14 February 2010 08:41:00 Alexander Graf wrote:
>>>>>> So the only case I can imagine that this breaks anything is that
>>>>>> uClibc requires register state to be 0.
>>>>> 
>>>>> Yes, r3 (which is the exit code from the "exec" syscall, and thus 0
>>>>> if it worked).  In the BSD layout, it's argc (which can never be 0).
>>>>> 
>>>>> http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00720.html
>>>> 
>>>> So what you really want is something like
>>>> 
>>>> #ifdef CONFIG_LINUX_USER
>>>> /* exec return value is always 0 */
>>>> env->gpr[3] = 0;
>>>> #endif
>>>> 
>>>> just after the #endif in your patch. If you had inlined your patch I
>>>> could've commented it there.
>>> 
>>> Unfortunately kmail plays fast and loose with whitespace when I inline
>>> stuff. (Not always, but I can't tell by inspection when it's decided it
>>> was hungry for tabs or wanted to throw in that horrible UTF8 escaped
>>> whitespace.)
>> 
>> See Documentation/email-clients.txt under linux source tree.
> 
> I did.  That doesn't cover the different bugs in different versions, what 
> happens when you use kmail under xfce, and so on.  (It also doesn't mention 
> that you have to disable wordwrap for the entire message and hit return by 
> hand at the end of each line to get kmail not to wrap inline includes.  Or 
> that some versions of kmail embed NUL bytes into inline includes, for some 
> reason.)
> 
> I could make it work, just didn't know this list had a tropism for inline.  
> (Varies per list and I wander through a bunch of 'em.)  Over on the -hda sets 
> /dev/hdc topic I posted a patch inline which was ignored because the behavior 
> the Linux kernel has consistently had for the past decade or more isn't 
> considered especially important.  Thus I didn't think you were likely 
> following lkml tropes.

If swapping the parameter was the right solution I would've submitted a patch long ago :-). Unfortunately it's not as easy.
But the inlining is really only about simple commenting. It's a lot nicer to have context when you say "this doesn't make sense" or so :-).

Either way - it's good to see someone interested in the topic actually sending patches. Reviewing and commenting doesn't mean I don't like what you're doing. In this case it just means I'm pretty sure it doesn't solve the problem, but only the symptoms.


Alex
Rob Landley Feb. 16, 2010, 6:14 p.m. UTC | #10
On Tuesday 16 February 2010 03:31:16 Alexander Graf wrote:
> On 16.02.2010, at 01:52, Rob Landley wrote:
> If swapping the parameter was the right solution I would've submitted a
> patch long ago :-). Unfortunately it's not as easy.

I agree that making a single controller handle four drives is a _better_ fix.  
(Somebody said that current Linux kernels notice the DMA failure and fall back 
to PIO-ing the drive, or some such.  I take it that MacOS doesn't?)

I just want it fixed, and if that's the direction qemu would prefer to go on 
that issue, I'd like to encourage that in any way I can.  I just don' t know 
how...

> But the inlining is
> really only about simple commenting. It's a lot nicer to have context when
> you say "this doesn't make sense" or so :-).

Understood.  I can do that here in future.

> Either way - it's good to see someone interested in the topic actually
> sending patches. Reviewing and commenting doesn't mean I don't like what
> you're doing. In this case it just means I'm pretty sure it doesn't solve
> the problem, but only the symptoms.

Thanks.  I'm interested, but overwhelmed.

My FWL project is an attempt to make as many different targets as possible work 
the same way, generally under QEMU.  This lets me regression test Linux and 
uClibc and busybox and such across all of 'em.  (Eventually from a nightly 
cron job rebuilding everything from scratch on an 8-way server, with automatic 
"git bisect" telling me what commit broke it.)

So far I've got arm, mips, powerpc, and x86/x86-64 building little native 
development environments, which can then natively build dropbear and strace 
inside qemu (optionally calling out to the cross compiler via distcc).  Each 
of those has a working CPU emulation (with mmu) on a board with a network 
card, three disks, at least 256 megs of memory, serial console, and clock 
chip.

I've also got a bunch of "sort of working, but not well enough to run builds 
natively under" targets on top of that (arm big endian, sh4, sparc...)  I 
occasionally make puppy eyes at the m68k guys to see if something beyond 
coldfire is likely to go in, I've even poked at alpha a couple times (but gcc 
dying with an internal compiler error on a hardware platform end-of-lifed a 
decade ago isn't high on my todo list), I should re-check cris to see if it's 
grown any non-toy boards yet, getting S-360 working is on my todo list...

Unfortunately, what this means is I haven't got the bandwidth to become an 
expert on each of these targets.  I'm reasonably good at flailing about wildly, 
but I'm totally out of my depth most of the time and I know it.

Often I have to come up with The Wrong Fix(tm):

  http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-December/078700.html

And then the people who know what's going on do a proper fix:

  http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-January/079436.html

Or in the case of some maintainers, decline to do so because they honestly 
don't care if anybody else but their employer ever actually uses the thing, 
thus I'm stuck carrying The Wrong Fix as a patch:

  http://www.spinics.net/lists/linux-sh/msg04146.html

I hate it when that happens...

Rob
Rob Landley Feb. 16, 2010, 6:31 p.m. UTC | #11
On Monday 15 February 2010 07:01:02 Alexander Graf wrote:
> On 15.02.2010, at 13:58, Rob Landley wrote:
> > On Monday 15 February 2010 05:19:24 Alexander Graf wrote:
> >> On 15.02.2010, at 12:10, Rob Landley wrote:
> >>> On Sunday 14 February 2010 08:41:00 Alexander Graf wrote:
> >>>> So the only case I can imagine that this breaks anything is that
> >>>> uClibc requires register state to be 0.
> >>>
> >>> Yes, r3 (which is the exit code from the "exec" syscall, and thus 0 if
> >>> it worked).  In the BSD layout, it's argc (which can never be 0).
> >>>
> >>> http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00720.html
> >>
> >> So what you really want is something like
> >>
> >> #ifdef CONFIG_LINUX_USER
> >> /* exec return value is always 0 */
> >> env->gpr[3] = 0;
> >> #endif
> >>
> >> just after the #endif in your patch. If you had inlined your patch I
> >> could've commented it there.
> >
> > Unfortunately kmail plays fast and loose with whitespace when I inline
> > stuff. (Not always, but I can't tell by inspection when it's decided it
> > was hungry for tabs or wanted to throw in that horrible UTF8 escaped
> > whitespace.)
>
> git-send-mail is your friend :-).

No git command is my friend.  The user interface of that monstrosity should be 
<strike>condemned</strike> banished with full bell book and dribbly candles.

And I dunno how it would interface with kmail.  (No other program on my laptop 
has the ssh tunnel to my mail server set up so it can send anything.)  But out 
of curiosity...

  landley@driftwood:~$ git send-mail
  git: 'send-mail' is not a git-command. See 'git --help'.
  landley@driftwood:~$ git-send-mail
  bash: git-send-mail: command not found
  landley@driftwood:~$ git send mail
  git: 'send' is not a git-command. See 'git --help'.

Since it's unlikely that this could send a patch I haven't checked into git 
(and I generally treat git as a read-only resource), learning more about it 
goes on the todo list I expect.

> > I didn't explicitly set it because they're initialized to zero in
> > function main() on line 2654 of linux-user/main.c.  (Any regs we don't
> > explicitly set to some other value start out zeroed in qemu.)
>
> So it should work already?

*shrug*  It doesn't.

Let's see, one of the lines I #ifdefed out (line 535-ish of linux-
user/elfload.c) is:

    get_user_ual(_regs->gpr[3], pos);

Rummage, rummage... get_user_ual() is a wrapper for get_user() which is a 
wrapper for __get_user() which assigns to its first argument.  So yeah, that's 
setting _regs->gpr[3] to a nonzero value.

I don't know if that's the _only_ problem the block of code I #ifdefed out was 
causing, but in general that whole section is specifically for BSD, and makes 
the behavior diverge from that of the Linux kernel.

> > If you prefer to make the requirements explicit, that works too, but a
> > comment might do just as well.  (I tend to prefer removing unnecessary
> > work Linux doesn't need done, rather than adding extra code to undo the
> > unnecessary work afterwards.  Force of habit from years on busybox and
> > such.)
>
> Well, I personally prefer to always use the same code paths whenever
> possible. That makes the code less prone to failure in odd configurations.
> And we have a lot of different combinations of those in Qemu.

Not all Linux binaries are going to run on BSD.  This is a case where the 
behavior of the two honestly diverges, and existing code depends on the actual 
linux-kernel behavior.

> But this is Riku's call. He's the linux-user maintainer.

Hi Riku!

/me makes puppy eyes at Riku.

> Alex

Rob
Alexander Graf Feb. 16, 2010, 6:36 p.m. UTC | #12
On 16.02.2010, at 19:31, Rob Landley wrote:

> On Monday 15 February 2010 07:01:02 Alexander Graf wrote:
>> On 15.02.2010, at 13:58, Rob Landley wrote:
>>> On Monday 15 February 2010 05:19:24 Alexander Graf wrote:
>>>> On 15.02.2010, at 12:10, Rob Landley wrote:
>>>>> On Sunday 14 February 2010 08:41:00 Alexander Graf wrote:
>>>>>> So the only case I can imagine that this breaks anything is that
>>>>>> uClibc requires register state to be 0.
>>>>> 
>>>>> Yes, r3 (which is the exit code from the "exec" syscall, and thus 0 if
>>>>> it worked).  In the BSD layout, it's argc (which can never be 0).
>>>>> 
>>>>> http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00720.html
>>>> 
>>>> So what you really want is something like
>>>> 
>>>> #ifdef CONFIG_LINUX_USER
>>>> /* exec return value is always 0 */
>>>> env->gpr[3] = 0;
>>>> #endif
>>>> 
>>>> just after the #endif in your patch. If you had inlined your patch I
>>>> could've commented it there.
>>> 
>>> Unfortunately kmail plays fast and loose with whitespace when I inline
>>> stuff. (Not always, but I can't tell by inspection when it's decided it
>>> was hungry for tabs or wanted to throw in that horrible UTF8 escaped
>>> whitespace.)
>> 
>> git-send-mail is your friend :-).
> 
> No git command is my friend.  The user interface of that monstrosity should be 
> <strike>condemned</strike> banished with full bell book and dribbly candles.
> 
> And I dunno how it would interface with kmail.  (No other program on my laptop 
> has the ssh tunnel to my mail server set up so it can send anything.)  But out 
> of curiosity...
> 
>  landley@driftwood:~$ git send-mail
>  git: 'send-mail' is not a git-command. See 'git --help'.
>  landley@driftwood:~$ git-send-mail
>  bash: git-send-mail: command not found
>  landley@driftwood:~$ git send mail
>  git: 'send' is not a git-command. See 'git --help'.
> 
> Since it's unlikely that this could send a patch I haven't checked into git 
> (and I generally treat git as a read-only resource), learning more about it 
> goes on the todo list I expect.
> 
>>> I didn't explicitly set it because they're initialized to zero in
>>> function main() on line 2654 of linux-user/main.c.  (Any regs we don't
>>> explicitly set to some other value start out zeroed in qemu.)
>> 
>> So it should work already?
> 
> *shrug*  It doesn't.
> 
> Let's see, one of the lines I #ifdefed out (line 535-ish of linux-
> user/elfload.c) is:
> 
>    get_user_ual(_regs->gpr[3], pos);
> 
> Rummage, rummage... get_user_ual() is a wrapper for get_user() which is a 
> wrapper for __get_user() which assigns to its first argument.  So yeah, that's 
> setting _regs->gpr[3] to a nonzero value.

Well I was wondering on the order of execution. If main() already sets the GPRs to 0 it should be 0. I assume the elf reading code comes after that? If so, your patch looks correct.


Alex
Rob Landley Feb. 16, 2010, 7:14 p.m. UTC | #13
On Tuesday 16 February 2010 12:36:15 Alexander Graf wrote:
> On 16.02.2010, at 19:31, Rob Landley wrote:
> > Let's see, one of the lines I #ifdefed out (line 535-ish of linux-
> > user/elfload.c) is:
> >
> >    get_user_ual(_regs->gpr[3], pos);
> >
> > Rummage, rummage... get_user_ual() is a wrapper for get_user() which is a
> > wrapper for __get_user() which assigns to its first argument.  So yeah,
> > that's setting _regs->gpr[3] to a nonzero value.
>
> Well I was wondering on the order of execution. If main() already sets the
> GPRs to 0 it should be 0. I assume the elf reading code comes after that?
> If so, your patch looks correct.

The main() code memsets all the registers to zero when the array is allocated, 
then passes the register array as the first argument to the target-specific 
init_thread(), which can initialize them to other values.

So yeah, main() calls  the elf reading code after the memset.

Rob
Artyom Tarasenko Feb. 17, 2010, 9:24 a.m. UTC | #14
2010/2/16 Rob Landley <rob@landley.net>:
> On Tuesday 16 February 2010 03:31:16 Alexander Graf wrote:
>> On 16.02.2010, at 01:52, Rob Landley wrote:
>> If swapping the parameter was the right solution I would've submitted a
>> patch long ago :-). Unfortunately it's not as easy.
>
> I agree that making a single controller handle four drives is a _better_ fix.
> (Somebody said that current Linux kernels notice the DMA failure and fall back
> to PIO-ing the drive, or some such.  I take it that MacOS doesn't?)
>
> I just want it fixed, and if that's the direction qemu would prefer to go on
> that issue, I'd like to encourage that in any way I can.  I just don' t know
> how...
>
>> But the inlining is
>> really only about simple commenting. It's a lot nicer to have context when
>> you say "this doesn't make sense" or so :-).
>
> Understood.  I can do that here in future.
>
>> Either way - it's good to see someone interested in the topic actually
>> sending patches. Reviewing and commenting doesn't mean I don't like what
>> you're doing. In this case it just means I'm pretty sure it doesn't solve
>> the problem, but only the symptoms.
>
> Thanks.  I'm interested, but overwhelmed.
>
> My FWL project is an attempt to make as many different targets as possible work
> the same way, generally under QEMU.  This lets me regression test Linux and
> uClibc and busybox and such across all of 'em.  (Eventually from a nightly
> cron job rebuilding everything from scratch on an 8-way server, with automatic
> "git bisect" telling me what commit broke it.)
>
> So far I've got arm, mips, powerpc, and x86/x86-64 building little native
> development environments, which can then natively build dropbear and strace
> inside qemu (optionally calling out to the cross compiler via distcc).  Each
> of those has a working CPU emulation (with mmu) on a board with a network
> card, three disks, at least 256 megs of memory, serial console, and clock
> chip.
>
> I've also got a bunch of "sort of working, but not well enough to run builds
> natively under" targets on top of that (arm big endian, sh4, sparc...)

What's not well enough on sparc?
Paolo Bonzini Feb. 17, 2010, 3:45 p.m. UTC | #15
On 02/17/2010 10:24 AM, Artyom Tarasenko wrote:
>> I've also got a bunch of "sort of working, but not well enough
>> to run builds natively under" targets on top of that (arm big
>> endian, sh4, sparc...)
> What's not well enough on sparc?

 From http://permalink.gmane.org/gmane.comp.emulators.qemu/63610:

On Linux, sparc-softmmu can boot Linux (sparc-test) image, but QEMU
crashes just before command line. On OpenBSD, the same test reaches
command prompt.
Rob Landley Feb. 17, 2010, 4:36 p.m. UTC | #16
On Wednesday 17 February 2010 03:24:58 Artyom Tarasenko wrote:
> > I've also got a bunch of "sort of working, but not well enough to run
> > builds natively under" targets on top of that (arm big endian, sh4,
> > sparc...)
>
> What's not well enough on sparc?

More than one thing, unfortunately.  (Symptoms I can give you, causes I'm iffy 
on.)

1) the uClibc 0.9.32.2 dynamic linker isn't working on 32-bit sparc.  (I 
believe it's a uClibc issue, not specifically qemu problem.  But then since I 
don't own real sparc hardware, I dunno.)

2) Not all of the system calls work (again, probably a uClibc issue).

3) I'm not sure my toolchain configuration is quite matching the instruction 
set qemu-system-sparc is emulating by default, I get occasional illegal 
instruction errors (but not reliably).  (Are #1 and #2 related to this?  
Dunno.)

If I statically link everything I can at least get to a command prompt, but 
only with init=/bin/sh.  (If I try to run it through the boot script, it dies 
with various errors from "cannot allocate memory" to "Illegal insruction".)

If you're curious you can play around with the prebuilt binary at 
http://impactlinux.com/fwl/downloads/binaries/system-image-sparc.tar.bz2 but 
you'll have to boot it like this to bypass the boot script:

  KERNEL_EXTRA="init=/bin/ash" ./run-emulator.sh

Then to see it misbehave:

  # mount -t tmpfs /tmp /tmp
  # cd /tmp
  /tmp # ls
  ls: can't open '.': Cannot allocate memory
  /tmp # mount -t proc /proc /proc
  /tmp # ls -l /proc
  Illegal instruction

The toolchain is configured with "sparc-unknown-linux" which you'd think would 
be generic sparc, but apparently not...

If you prefer to build it from source, you can download 
http://impactlinux.com/fwl/downloads/firmware-0.9.10.tar.bz2 and run 
"./build.sh sparc", then look in the "build" directory when it's done.  I'd 
happily explain how the build scripts work and what config options they're 
passing to the the toolchain and kernel and such, but this isn't the list for 
that.  (http://impactlinux.com/fwl has a link to the FWL mailing list if 
you're interested.)

I'd be thrilled to get some help on it actually, not a sparc expert...

Thanks,

Rob
Rob Landley Feb. 17, 2010, 6:55 p.m. UTC | #17
On Wednesday 17 February 2010 09:45:48 Paolo Bonzini wrote:
> On 02/17/2010 10:24 AM, Artyom Tarasenko wrote:
> >> I've also got a bunch of "sort of working, but not well enough
> >> to run builds natively under" targets on top of that (arm big
> >> endian, sh4, sparc...)
> >
> > What's not well enough on sparc?
>
>  From http://permalink.gmane.org/gmane.comp.emulators.qemu/63610:
>
> On Linux, sparc-softmmu can boot Linux (sparc-test) image, but QEMU
> crashes just before command line. On OpenBSD, the same test reaches
> command prompt.

Actually the sparc-test image from http://wiki.qemu.org/download/sparc-
test-0.2.tar.gz boots and gets me a command line just fine, and I've never had 
it die with strange errors that look like mismatched system calls and such.  
(Under ubuntu 8.04, using qemu-git from a week or so back, but this behavior's 
been consistent since I first tried it.0

That image is A) built with an unknown compiler, B) running glibc (not 
uClibc), c) a crippled toy image.  (It's a read-only root filesystem that 
hasn't got a mount point for /proc.  Obviously never mean to actually be used 
for anything but very simple smoke testing.)

But it does imply that qemu is capable of decently running _something_ on 
sparc, so the problems I'm seeing are more likely to be uClibc or toolchain 
issues.

Alas the image has no hint how to reproduce it.  Doesn't say what toolchain it 
was built with, what kernel .config was used, and so on.  (The arm one at least 
had /proc/config.gz...)

Well, actually if you "mount -t proc proc lost+found" and then cat 
lost+found/version it says gcc version 2.95.4 20010319 (prerelease).  So it 
was built with a random cvs snapshot of egcs from 2001, configured who knows 
how, and it's running a 2.6.11 kernel from 5 years ago (again with who knows 
what .config).  So my problem could be that I'm using a kernel 22 versions 
newer, or I'm using gcc 4.2 toolchain, or that either is configured differently.

But I'm still guessing uClibc is the most likely culprit...

Rob
Blue Swirl Feb. 17, 2010, 8:46 p.m. UTC | #18
On Wed, Feb 17, 2010 at 8:55 PM, Rob Landley <rob@landley.net> wrote:
> On Wednesday 17 February 2010 09:45:48 Paolo Bonzini wrote:
>> On 02/17/2010 10:24 AM, Artyom Tarasenko wrote:
>> >> I've also got a bunch of "sort of working, but not well enough
>> >> to run builds natively under" targets on top of that (arm big
>> >> endian, sh4, sparc...)
>> >
>> > What's not well enough on sparc?
>>
>>  From http://permalink.gmane.org/gmane.comp.emulators.qemu/63610:
>>
>> On Linux, sparc-softmmu can boot Linux (sparc-test) image, but QEMU
>> crashes just before command line. On OpenBSD, the same test reaches
>> command prompt.

That's status for sparc host. On x86 host, everything should work fine
except for a few known issues.

> Actually the sparc-test image from http://wiki.qemu.org/download/sparc-
> test-0.2.tar.gz boots and gets me a command line just fine, and I've never had
> it die with strange errors that look like mismatched system calls and such.
> (Under ubuntu 8.04, using qemu-git from a week or so back, but this behavior's
> been consistent since I first tried it.0
>
> That image is A) built with an unknown compiler, B) running glibc (not
> uClibc), c) a crippled toy image.  (It's a read-only root filesystem that
> hasn't got a mount point for /proc.  Obviously never mean to actually be used
> for anything but very simple smoke testing.)
>
> But it does imply that qemu is capable of decently running _something_ on
> sparc, so the problems I'm seeing are more likely to be uClibc or toolchain
> issues.
>
> Alas the image has no hint how to reproduce it.  Doesn't say what toolchain it
> was built with, what kernel .config was used, and so on.  (The arm one at least
> had /proc/config.gz...)
>
> Well, actually if you "mount -t proc proc lost+found" and then cat
> lost+found/version it says gcc version 2.95.4 20010319 (prerelease).  So it
> was built with a random cvs snapshot of egcs from 2001, configured who knows
> how, and it's running a 2.6.11 kernel from 5 years ago (again with who knows
> what .config).  So my problem could be that I'm using a kernel 22 versions
> newer, or I'm using gcc 4.2 toolchain, or that either is configured differently.

The compiler was probably Debian gcc 2.95 package as distributed that
time, not some random cvs snapshot of egcs. I can't find the original
kernel config because I have edited it since, but the attached version
should not be too far from it. The kernel itself is straight 2.6.11
plus this patch to fix TCX display. I think the ramdisk contents are
from the user emulator test set, I didn't build those.

Perhaps we should build a new set of test suites for all architectures
from a single known stack of tools and sources.
Artyom Tarasenko Feb. 18, 2010, 11:21 a.m. UTC | #19
2010/2/17 Rob Landley <rob@landley.net>:
> On Wednesday 17 February 2010 09:45:48 Paolo Bonzini wrote:
>> On 02/17/2010 10:24 AM, Artyom Tarasenko wrote:
>> >> I've also got a bunch of "sort of working, but not well enough
>> >> to run builds natively under" targets on top of that (arm big
>> >> endian, sh4, sparc...)
>> >
>> > What's not well enough on sparc?
>>
>>  From http://permalink.gmane.org/gmane.comp.emulators.qemu/63610:
>>
>> On Linux, sparc-softmmu can boot Linux (sparc-test) image, but QEMU
>> crashes just before command line. On OpenBSD, the same test reaches
>> command prompt.
>
> Actually the sparc-test image from http://wiki.qemu.org/download/sparc-
> test-0.2.tar.gz boots and gets me a command line just fine, and I've never had
> it die with strange errors that look like mismatched system calls and such.
> (Under ubuntu 8.04, using qemu-git from a week or so back, but this behavior's
> been consistent since I first tried it.0
>
> That image is A) built with an unknown compiler, B) running glibc (not
> uClibc), c) a crippled toy image.  (It's a read-only root filesystem that
> hasn't got a mount point for /proc.  Obviously never mean to actually be used
> for anything but very simple smoke testing.)
>
> But it does imply that qemu is capable of decently running _something_ on
> sparc, so the problems I'm seeing are more likely to be uClibc or toolchain
> issues.

qemu-sparc can decently run debian-40r8: gcc and all the other stuff
seem to work.

Most versions of the NetBSD boot. Some require the original OBP
though. The only known to me version which definetely doesn't boot is
3.0.2.

Also since the last dma fix Solaris 2.4-2.5.1 seems to be also fully
functional. Don't have a suitable compiler to check whether it's
working under Solaris though.

Debian-40r8 should have all the necessary stuff to build the uClibc
toolchain, right?
Artyom Tarasenko Feb. 18, 2010, 11:38 a.m. UTC | #20
2010/2/17 Blue Swirl <blauwirbel@gmail.com>:
> On Wed, Feb 17, 2010 at 8:55 PM, Rob Landley <rob@landley.net> wrote:
>> On Wednesday 17 February 2010 09:45:48 Paolo Bonzini wrote:
>>> On 02/17/2010 10:24 AM, Artyom Tarasenko wrote:
>>> >> I've also got a bunch of "sort of working, but not well enough
>>> >> to run builds natively under" targets on top of that (arm big
>>> >> endian, sh4, sparc...)
>>> >
>>> > What's not well enough on sparc?
>>>
>>>  From http://permalink.gmane.org/gmane.comp.emulators.qemu/63610:
>>>
>>> On Linux, sparc-softmmu can boot Linux (sparc-test) image, but QEMU
>>> crashes just before command line. On OpenBSD, the same test reaches
>>> command prompt.
>
> That's status for sparc host. On x86 host, everything should work fine
> except for a few known issues.
>
>> Actually the sparc-test image from http://wiki.qemu.org/download/sparc-
>> test-0.2.tar.gz boots and gets me a command line just fine, and I've never had
>> it die with strange errors that look like mismatched system calls and such.
>> (Under ubuntu 8.04, using qemu-git from a week or so back, but this behavior's
>> been consistent since I first tried it.0
>>
>> That image is A) built with an unknown compiler, B) running glibc (not
>> uClibc), c) a crippled toy image.  (It's a read-only root filesystem that
>> hasn't got a mount point for /proc.  Obviously never mean to actually be used
>> for anything but very simple smoke testing.)
>>
>> But it does imply that qemu is capable of decently running _something_ on
>> sparc, so the problems I'm seeing are more likely to be uClibc or toolchain
>> issues.
>>
>> Alas the image has no hint how to reproduce it.  Doesn't say what toolchain it
>> was built with, what kernel .config was used, and so on.  (The arm one at least
>> had /proc/config.gz...)
>>
>> Well, actually if you "mount -t proc proc lost+found" and then cat
>> lost+found/version it says gcc version 2.95.4 20010319 (prerelease).  So it
>> was built with a random cvs snapshot of egcs from 2001, configured who knows
>> how, and it's running a 2.6.11 kernel from 5 years ago (again with who knows
>> what .config).  So my problem could be that I'm using a kernel 22 versions
>> newer, or I'm using gcc 4.2 toolchain, or that either is configured differently.
>
> The compiler was probably Debian gcc 2.95 package as distributed that
> time, not some random cvs snapshot of egcs. I can't find the original
> kernel config because I have edited it since, but the attached version
> should not be too far from it. The kernel itself is straight 2.6.11
> plus this patch to fix TCX display. I think the ramdisk contents are
> from the user emulator test set, I didn't build those.
>
> Perhaps we should build a new set of test suites for all architectures
> from a single known stack of tools and sources.

And still based on preferably old enogh kernel version which wasn't qemu-aware.
The comments in the kenel source like "this could be a qemu bug" from the Rob's
mail "proper fix"
(http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-January/079436.html)
scare me.
Rob Landley Feb. 18, 2010, 1:05 p.m. UTC | #21
On Wednesday 17 February 2010 14:46:15 Blue Swirl wrote:
> > Alas the image has no hint how to reproduce it.  Doesn't say what
> > toolchain it was built with, what kernel .config was used, and so on.
> >  (The arm one at least had /proc/config.gz...)
> >
> > Well, actually if you "mount -t proc proc lost+found" and then cat
> > lost+found/version it says gcc version 2.95.4 20010319 (prerelease).  So
> > it was built with a random cvs snapshot of egcs from 2001, configured who
> > knows how, and it's running a 2.6.11 kernel from 5 years ago (again with
> > who knows what .config).  So my problem could be that I'm using a kernel
> > 22 versions newer, or I'm using gcc 4.2 toolchain, or that either is
> > configured differently.
>
> The compiler was probably Debian gcc 2.95 package as distributed that
> time, not some random cvs snapshot of egcs.

Ok.  It was the word "prerelease" that made me think snapshot.

I don't suppose you know what the --target tuple that compiler was configured 
with was?  (Should be in the output of "gcc -v"?)

> I can't find the original
> kernel config because I have edited it since, but the attached version
> should not be too far from it. The kernel itself is straight 2.6.11
> plus this patch to fix TCX display. I think the ramdisk contents are
> from the user emulator test set, I didn't build those.

Cool.  The point is, what you've got works under qemu, so the problems I'm 
seeing are less likely to be qemu and more likely to be uClibc or toolchain 
config.  I'll compare your kernel .config with mine.

I'm using serial console, not graphics, so the display patch shouldn't affect 
me directly.

Ben Taylor has offered to try my root filesystem on some real Sparc hardware if 
he can find a Linux for sparc boot cd that works on what he's got.  (The uClibc 
guys say that sparc works for them on real hardware, or did at one point...  
Always funk having 4 or more possible reasons for behavior to be wonky.  
Toolchain, libc, kernel, emulator...  Wheee...)

> Perhaps we should build a new set of test suites for all architectures
> from a single known stack of tools and sources.

That's pretty much exactly what I'm trying to do with my project.  If you look 
in download.sh the URLs for all the source packages I'm downloading are in one 
place, and all the per-target information is in the sources/targets 
directories.  The build scripts themselves are completely target-agnostic.

My release goal for the 1.0 version of my project is "have at least one 
working image for every qemu-system-blah that's actually capable of booting 
Linux".

I've got about half of 'em so far.  I need to tackle the 64-bit ones sooner or 
later.  And then break down and deal with nommu...

Rob
Rob Landley Feb. 18, 2010, 1:14 p.m. UTC | #22
On Thursday 18 February 2010 05:21:16 Artyom Tarasenko wrote:
> 2010/2/17 Rob Landley <rob@landley.net>:
> qemu-sparc can decently run debian-40r8: gcc and all the other stuff
> seem to work.
>
> Most versions of the NetBSD boot. Some require the original OBP
> though. The only known to me version which definetely doesn't boot is
> 3.0.2.
>
> Also since the last dma fix Solaris 2.4-2.5.1 seems to be also fully
> functional. Don't have a suitable compiler to check whether it's
> working under Solaris though.
>
> Debian-40r8 should have all the necessary stuff to build the uClibc
> toolchain, right?

I'm happy to boot it under the emulator and poke around.  Do you have a URL to 
the install ISO you tried?

A quick google found:

  http://mirror.leaseweb.com/debian-cdimage/archive/4.0_r8/sparc/iso-cd/

Which has a 19-part iso image install.  *boggle*  (Reminds me of installing 
OS/2 from floppies...)

Possibly the "netinst" one...?  Right, I'll give it a whirl.

Thanks,

Rob
Rob Landley Feb. 18, 2010, 1:17 p.m. UTC | #23
On Thursday 18 February 2010 05:38:01 Artyom Tarasenko wrote:
> 2010/2/17 Blue Swirl <blauwirbel@gmail.com>:
> > On Wed, Feb 17, 2010 at 8:55 PM, Rob Landley <rob@landley.net> wrote:
> >> On Wednesday 17 February 2010 09:45:48 Paolo Bonzini wrote:
> >>> On 02/17/2010 10:24 AM, Artyom Tarasenko wrote:
> >>> >> I've also got a bunch of "sort of working, but not well enough
> >>> >> to run builds natively under" targets on top of that (arm big
> >>> >> endian, sh4, sparc...)
> >>> >
> >>> > What's not well enough on sparc?
> >>>
> >>>  From http://permalink.gmane.org/gmane.comp.emulators.qemu/63610:
> >>>
> >>> On Linux, sparc-softmmu can boot Linux (sparc-test) image, but QEMU
> >>> crashes just before command line. On OpenBSD, the same test reaches
> >>> command prompt.
> >
> > That's status for sparc host. On x86 host, everything should work fine
> > except for a few known issues.
> >
> >> Actually the sparc-test image from http://wiki.qemu.org/download/sparc-
> >> test-0.2.tar.gz boots and gets me a command line just fine, and I've
> >> never had it die with strange errors that look like mismatched system
> >> calls and such. (Under ubuntu 8.04, using qemu-git from a week or so
> >> back, but this behavior's been consistent since I first tried it.0
> >>
> >> That image is A) built with an unknown compiler, B) running glibc (not
> >> uClibc), c) a crippled toy image.  (It's a read-only root filesystem
> >> that hasn't got a mount point for /proc.  Obviously never mean to
> >> actually be used for anything but very simple smoke testing.)
> >>
> >> But it does imply that qemu is capable of decently running _something_
> >> on sparc, so the problems I'm seeing are more likely to be uClibc or
> >> toolchain issues.
> >>
> >> Alas the image has no hint how to reproduce it.  Doesn't say what
> >> toolchain it was built with, what kernel .config was used, and so on.
> >>  (The arm one at least had /proc/config.gz...)
> >>
> >> Well, actually if you "mount -t proc proc lost+found" and then cat
> >> lost+found/version it says gcc version 2.95.4 20010319 (prerelease).  So
> >> it was built with a random cvs snapshot of egcs from 2001, configured
> >> who knows how, and it's running a 2.6.11 kernel from 5 years ago (again
> >> with who knows what .config).  So my problem could be that I'm using a
> >> kernel 22 versions newer, or I'm using gcc 4.2 toolchain, or that either
> >> is configured differently.
> >
> > The compiler was probably Debian gcc 2.95 package as distributed that
> > time, not some random cvs snapshot of egcs. I can't find the original
> > kernel config because I have edited it since, but the attached version
> > should not be too far from it. The kernel itself is straight 2.6.11
> > plus this patch to fix TCX display. I think the ramdisk contents are
> > from the user emulator test set, I didn't build those.
> >
> > Perhaps we should build a new set of test suites for all architectures
> > from a single known stack of tools and sources.
>
> And still based on preferably old enogh kernel version which wasn't
> qemu-aware. The comments in the kenel source like "this could be a qemu
> bug" from the Rob's mail "proper fix"
> (http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-January/079436.html)
> scare me.

Unless sparc is also using the zilog serial chip (the driver for which has 
"pmac" in its name), that was a power macintosh issue. :)

And yeah, qemu's behavior was apparently a bit iffy with regard to what the 
hardware was actually doing, but not beyond what the datasheets said could 
happen, and the kernel guys put in a workaround...

Rob
Artyom Tarasenko Feb. 18, 2010, 2:10 p.m. UTC | #24
2010/2/18 Rob Landley <rob@landley.net>:
> On Thursday 18 February 2010 05:38:01 Artyom Tarasenko wrote:
>> 2010/2/17 Blue Swirl <blauwirbel@gmail.com>:
>> > On Wed, Feb 17, 2010 at 8:55 PM, Rob Landley <rob@landley.net> wrote:
>> >> On Wednesday 17 February 2010 09:45:48 Paolo Bonzini wrote:
>> >>> On 02/17/2010 10:24 AM, Artyom Tarasenko wrote:
>> >>> >> I've also got a bunch of "sort of working, but not well enough
>> >>> >> to run builds natively under" targets on top of that (arm big
>> >>> >> endian, sh4, sparc...)
>> >>> >
>> >>> > What's not well enough on sparc?
>> >>>
>> >>>  From http://permalink.gmane.org/gmane.comp.emulators.qemu/63610:
>> >>>
>> >>> On Linux, sparc-softmmu can boot Linux (sparc-test) image, but QEMU
>> >>> crashes just before command line. On OpenBSD, the same test reaches
>> >>> command prompt.
>> >
>> > That's status for sparc host. On x86 host, everything should work fine
>> > except for a few known issues.
>> >
>> >> Actually the sparc-test image from http://wiki.qemu.org/download/sparc-
>> >> test-0.2.tar.gz boots and gets me a command line just fine, and I've
>> >> never had it die with strange errors that look like mismatched system
>> >> calls and such. (Under ubuntu 8.04, using qemu-git from a week or so
>> >> back, but this behavior's been consistent since I first tried it.0
>> >>
>> >> That image is A) built with an unknown compiler, B) running glibc (not
>> >> uClibc), c) a crippled toy image.  (It's a read-only root filesystem
>> >> that hasn't got a mount point for /proc.  Obviously never mean to
>> >> actually be used for anything but very simple smoke testing.)
>> >>
>> >> But it does imply that qemu is capable of decently running _something_
>> >> on sparc, so the problems I'm seeing are more likely to be uClibc or
>> >> toolchain issues.
>> >>
>> >> Alas the image has no hint how to reproduce it.  Doesn't say what
>> >> toolchain it was built with, what kernel .config was used, and so on.
>> >>  (The arm one at least had /proc/config.gz...)
>> >>
>> >> Well, actually if you "mount -t proc proc lost+found" and then cat
>> >> lost+found/version it says gcc version 2.95.4 20010319 (prerelease).  So
>> >> it was built with a random cvs snapshot of egcs from 2001, configured
>> >> who knows how, and it's running a 2.6.11 kernel from 5 years ago (again
>> >> with who knows what .config).  So my problem could be that I'm using a
>> >> kernel 22 versions newer, or I'm using gcc 4.2 toolchain, or that either
>> >> is configured differently.
>> >
>> > The compiler was probably Debian gcc 2.95 package as distributed that
>> > time, not some random cvs snapshot of egcs. I can't find the original
>> > kernel config because I have edited it since, but the attached version
>> > should not be too far from it. The kernel itself is straight 2.6.11
>> > plus this patch to fix TCX display. I think the ramdisk contents are
>> > from the user emulator test set, I didn't build those.
>> >
>> > Perhaps we should build a new set of test suites for all architectures
>> > from a single known stack of tools and sources.
>>
>> And still based on preferably old enogh kernel version which wasn't
>> qemu-aware. The comments in the kenel source like "this could be a qemu
>> bug" from the Rob's mail "proper fix"
>> (http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-January/079436.html)
>> scare me.
>
> Unless sparc is also using the zilog serial chip (the driver for which has
> "pmac" in its name), that was a power macintosh issue. :)

Sparc does use the Zilog serial chip, according to  hw/escc.c it is a bit
different from the pmac chip though. There seem to be issues with its emulation,
for instance, Solaris 1.1.2 hangs after putting 2 bytes (which I
believe is the hw buffer size):

http://tyom.blogspot.com/2009/09/wh-sunos-414-under-qemu-system-sparc.html

> And yeah, qemu's behavior was apparently a bit iffy with regard to what the
> hardware was actually doing, but not beyond what the datasheets said could
> happen, and the kernel guys put in a workaround...

Good for the kernel, bad for qemu: lots of chips have bad
documentation. Nevertheless they have some certain behavior, which
qemu should emulate. Now linux kernel has this workaround, which makes
it more robust, but less useful for regression testing.
Artyom Tarasenko Feb. 18, 2010, 2:19 p.m. UTC | #25
2010/2/18 Rob Landley <rob@landley.net>:
> On Thursday 18 February 2010 05:21:16 Artyom Tarasenko wrote:
>> 2010/2/17 Rob Landley <rob@landley.net>:
>> qemu-sparc can decently run debian-40r8: gcc and all the other stuff
>> seem to work.
>>
>> Most versions of the NetBSD boot. Some require the original OBP
>> though. The only known to me version which definetely doesn't boot is
>> 3.0.2.
>>
>> Also since the last dma fix Solaris 2.4-2.5.1 seems to be also fully
>> functional. Don't have a suitable compiler to check whether it's
>> working under Solaris though.
>>
>> Debian-40r8 should have all the necessary stuff to build the uClibc
>> toolchain, right?
>
> I'm happy to boot it under the emulator and poke around.  Do you have a URL to
> the install ISO you tried?

http://cdimage.debian.org/cdimage/archive/4.0_r8/sparc/iso-dvd/

Is just a link from the official Etch docu:
http://www.debian.org/releases/etch/debian-installer/

>
> A quick google found:
>
>  http://mirror.leaseweb.com/debian-cdimage/archive/4.0_r8/sparc/iso-cd/
>
> Which has a 19-part iso image install.  *boggle*  (Reminds me of installing
> OS/2 from floppies...)

Yea. While installing OS/2 2.x it's multitasking was handy: Only ~3
first floppies where actually needed, all the other ones could be
written from the images in a process parallel to the installation.

>
> Possibly the "netinst" one...?  Right, I'll give it a whirl.

Check here: http://www.debian.org/releases/etch/debian-installer/
Rob Landley Feb. 20, 2010, 5:17 p.m. UTC | #26
On Thursday 18 February 2010 05:21:16 Artyom Tarasenko wrote:
> 2010/2/17 Rob Landley <rob@landley.net>:
> > But it does imply that qemu is capable of decently running _something_ on
> > sparc, so the problems I'm seeing are more likely to be uClibc or
> > toolchain issues.
>
> qemu-sparc can decently run debian-40r8: gcc and all the other stuff
> seem to work.
>
> Most versions of the NetBSD boot. Some require the original OBP
> though. The only known to me version which definetely doesn't boot is
> 3.0.2.
>
> Also since the last dma fix Solaris 2.4-2.5.1 seems to be also fully
> functional. Don't have a suitable compiler to check whether it's
> working under Solaris though.
>
> Debian-40r8 should have all the necessary stuff to build the uClibc
> toolchain, right?

So I did a network install of that Debian image into a 4 gig disk image, and 
made some progress.

First a quick bug report: qemu-system-sparc tries to set the video window to 
900 pixels vertical, but my laptop's display is only 800 pixels tall, and the 
window manager trims it a bit more than that for the toolbar.  The kernel 
booting up seems to think the graphics window is still its original size 
renders text off the bottom of it.  But for some reason I can grab the window 
and resize it, and when I do this the emulated kernel's frame buffer gets the 
update and resizes its console to show the correct number of lines of text for 
the new size!  (So my question is, why didn't it get the size right when the 
window manager first resized it before I manually resized it again?)

Anyway: yay emulated sparc debian, I installed it, got a reasonable 
environment going, extracted my root filesystem image under there and chrooted 
into it... and everything worked fine.  (Well, trying to run a dynamically 
linked "hello world" still died with a bus error, but using the static busybox 
I could mount a tmpfs and list its contents, which I never could before.)

My plan had been to use sparc-debian's copy of gdb to track down why the 
binaries were going funky... but in that environment, they were behaving 
themselves.  Same binaries, built with the same toolchain, same qemu-system-
sparc, same -M and -cpu and so on...

So I think "A-ha!  Booting a different kernel!  That's gotta be it!"

The debian-sparc image is using a 2.6.18 kernel (and I'm using a 2.6.32 
kernel), but it installed the relevant .config in /boot, so I copied that out 
with scp, did a "make oldconfig" up to 2.6.32 (holding down the enter key until 
it shut up), stripped out all the modules and disabled module support, put 
back in CONFIG_SERIAL_SUNZILOG_CONSOLE=y and friends, procfs, sysfs, and tmpfs 
(strange things to have as modules?), and CONFIG_SQUASHFS (that's my default 
root filesystem format).

I booted the result up with init=/bin/ash, did a "mount -t tmpfs /tmp /tmp", 
and then:

  / # ls -l /tmp
  Illegal instruction

It's still misbehaving.  Huh.

This is as close as I can get to the debian kernel config without adding module 
support to my images (which is unnecessary complication for what they do).  I 
can try an ext2 root filesystem image but I don't see how that would cause 
this.

The part I don't understand is that same busybox binary, built with the same 
toolchain, worked just fine under the Debian kernel.  I'd blame my toolchain, 
but in a slightly different context THE BINARIES WORKED...

I don't understand what's going wrong here.  Did the kernel break on sparc 
sometime between 2.6.18 and 2.6.32 and nobody noticed?  Is sparc using 
software emulated floating point at the kernel level and that's configured as a 
module?  (Except I don't think busybox ls uses floating point...)

Do any sparc people understand what's going on here?  My next step is to grab 
a 2.6.18 kernel and try to get _that_ to work with the tweaked debian config 
(and an ext2 root filesystem since squashfs wasn't merged back then and had a 
format change when it was merged).  But I'm mostly flailing around blind 
here...

Thanks,

Rob
Blue Swirl Feb. 20, 2010, 5:34 p.m. UTC | #27
On 2/20/10, Rob Landley <rob@landley.net> wrote:
> On Thursday 18 February 2010 05:21:16 Artyom Tarasenko wrote:
>  > 2010/2/17 Rob Landley <rob@landley.net>:
>  > > But it does imply that qemu is capable of decently running _something_ on
>  > > sparc, so the problems I'm seeing are more likely to be uClibc or
>  > > toolchain issues.
>  >
>  > qemu-sparc can decently run debian-40r8: gcc and all the other stuff
>  > seem to work.
>  >
>  > Most versions of the NetBSD boot. Some require the original OBP
>  > though. The only known to me version which definetely doesn't boot is
>  > 3.0.2.
>  >
>  > Also since the last dma fix Solaris 2.4-2.5.1 seems to be also fully
>  > functional. Don't have a suitable compiler to check whether it's
>  > working under Solaris though.
>  >
>  > Debian-40r8 should have all the necessary stuff to build the uClibc
>  > toolchain, right?
>
>  So I did a network install of that Debian image into a 4 gig disk image, and
>  made some progress.
>
>  First a quick bug report: qemu-system-sparc tries to set the video window to
>  900 pixels vertical, but my laptop's display is only 800 pixels tall, and the
>  window manager trims it a bit more than that for the toolbar.  The kernel
>  booting up seems to think the graphics window is still its original size
>  renders text off the bottom of it.  But for some reason I can grab the window
>  and resize it, and when I do this the emulated kernel's frame buffer gets the
>  update and resizes its console to show the correct number of lines of text for
>  the new size!  (So my question is, why didn't it get the size right when the
>  window manager first resized it before I manually resized it again?)
>
>  Anyway: yay emulated sparc debian, I installed it, got a reasonable
>  environment going, extracted my root filesystem image under there and chrooted
>  into it... and everything worked fine.  (Well, trying to run a dynamically
>  linked "hello world" still died with a bus error, but using the static busybox
>  I could mount a tmpfs and list its contents, which I never could before.)
>
>  My plan had been to use sparc-debian's copy of gdb to track down why the
>  binaries were going funky... but in that environment, they were behaving
>  themselves.  Same binaries, built with the same toolchain, same qemu-system-
>  sparc, same -M and -cpu and so on...
>
>  So I think "A-ha!  Booting a different kernel!  That's gotta be it!"
>
>  The debian-sparc image is using a 2.6.18 kernel (and I'm using a 2.6.32
>  kernel), but it installed the relevant .config in /boot, so I copied that out
>  with scp, did a "make oldconfig" up to 2.6.32 (holding down the enter key until
>  it shut up), stripped out all the modules and disabled module support, put
>  back in CONFIG_SERIAL_SUNZILOG_CONSOLE=y and friends, procfs, sysfs, and tmpfs
>  (strange things to have as modules?), and CONFIG_SQUASHFS (that's my default
>  root filesystem format).
>
>  I booted the result up with init=/bin/ash, did a "mount -t tmpfs /tmp /tmp",
>  and then:
>
>   / # ls -l /tmp
>   Illegal instruction
>
>  It's still misbehaving.  Huh.
>
>  This is as close as I can get to the debian kernel config without adding module
>  support to my images (which is unnecessary complication for what they do).  I
>  can try an ext2 root filesystem image but I don't see how that would cause
>  this.
>
>  The part I don't understand is that same busybox binary, built with the same
>  toolchain, worked just fine under the Debian kernel.  I'd blame my toolchain,
>  but in a slightly different context THE BINARIES WORKED...
>
>  I don't understand what's going wrong here.  Did the kernel break on sparc
>  sometime between 2.6.18 and 2.6.32 and nobody noticed?  Is sparc using
>  software emulated floating point at the kernel level and that's configured as a
>  module?  (Except I don't think busybox ls uses floating point...)

Sparc32 is not maintained anymore so maybe it broke at some point.
There was some discussion a few years ago.

>  Do any sparc people understand what's going on here?  My next step is to grab
>  a 2.6.18 kernel and try to get _that_ to work with the tweaked debian config
>  (and an ext2 root filesystem since squashfs wasn't merged back then and had a
>  format change when it was merged).  But I'm mostly flailing around blind
>  here...

I'm also trying different kernels using my .config. But already 2.6.12
hangs in ESP probe.
Rob Landley Feb. 20, 2010, 6:38 p.m. UTC | #28
On Saturday 20 February 2010 11:34:44 Blue Swirl wrote:
> On 2/20/10, Rob Landley <rob@landley.net> wrote:
> >  I don't understand what's going wrong here.  Did the kernel break on
> > sparc sometime between 2.6.18 and 2.6.32 and nobody noticed?  Is sparc
> > using software emulated floating point at the kernel level and that's
> > configured as a module?  (Except I don't think busybox ls uses floating
> > point...)
>
> Sparc32 is not maintained anymore so maybe it broke at some point.
> There was some discussion a few years ago.

Not maintained on the Linux kernel side, or not maintained under qemu?  It 
seems to be working under debian, but the 2.6.18 kernel is from 2006.

> >  Do any sparc people understand what's going on here?  My next step is to
> > grab a 2.6.18 kernel and try to get _that_ to work with the tweaked
> > debian config (and an ext2 root filesystem since squashfs wasn't merged
> > back then and had a format change when it was merged).  But I'm mostly
> > flailing around blind here...
>
> I'm also trying different kernels using my .config. But already 2.6.12
> hangs in ESP probe.

I've got 2.6.32 booting to a command prompt (albeit with serial console and 
intentionall restricted set of hardware).  But then it misbehaves.

I'll try getting 2.6.18 to build with a known .config, and then bisect forward 
if that seems to work...

Thanks,

Rob
Artyom Tarasenko Feb. 20, 2010, 9:39 p.m. UTC | #29
2010/2/20 Blue Swirl <blauwirbel@gmail.com>:
> On 2/20/10, Rob Landley <rob@landley.net> wrote:
>> On Thursday 18 February 2010 05:21:16 Artyom Tarasenko wrote:
>>  > 2010/2/17 Rob Landley <rob@landley.net>:
>>  > > But it does imply that qemu is capable of decently running _something_ on
>>  > > sparc, so the problems I'm seeing are more likely to be uClibc or
>>  > > toolchain issues.
>>  >
>>  > qemu-sparc can decently run debian-40r8: gcc and all the other stuff
>>  > seem to work.
>>  >
>>  > Most versions of the NetBSD boot. Some require the original OBP
>>  > though. The only known to me version which definetely doesn't boot is
>>  > 3.0.2.
>>  >
>>  > Also since the last dma fix Solaris 2.4-2.5.1 seems to be also fully
>>  > functional. Don't have a suitable compiler to check whether it's
>>  > working under Solaris though.
>>  >
>>  > Debian-40r8 should have all the necessary stuff to build the uClibc
>>  > toolchain, right?
>>
>>  So I did a network install of that Debian image into a 4 gig disk image, and
>>  made some progress.
>>
>>  First a quick bug report: qemu-system-sparc tries to set the video window to
>>  900 pixels vertical, but my laptop's display is only 800 pixels tall, and the
>>  window manager trims it a bit more than that for the toolbar.  The kernel
>>  booting up seems to think the graphics window is still its original size
>>  renders text off the bottom of it.  But for some reason I can grab the window
>>  and resize it, and when I do this the emulated kernel's frame buffer gets the
>>  update and resizes its console to show the correct number of lines of text for
>>  the new size!  (So my question is, why didn't it get the size right when the
>>  window manager first resized it before I manually resized it again?)
>>
>>  Anyway: yay emulated sparc debian, I installed it, got a reasonable
>>  environment going, extracted my root filesystem image under there and chrooted
>>  into it... and everything worked fine.  (Well, trying to run a dynamically
>>  linked "hello world" still died with a bus error, but using the static busybox
>>  I could mount a tmpfs and list its contents, which I never could before.)
>>
>>  My plan had been to use sparc-debian's copy of gdb to track down why the
>>  binaries were going funky... but in that environment, they were behaving
>>  themselves.  Same binaries, built with the same toolchain, same qemu-system-
>>  sparc, same -M and -cpu and so on...
>>
>>  So I think "A-ha!  Booting a different kernel!  That's gotta be it!"
>>
>>  The debian-sparc image is using a 2.6.18 kernel (and I'm using a 2.6.32
>>  kernel), but it installed the relevant .config in /boot, so I copied that out
>>  with scp, did a "make oldconfig" up to 2.6.32 (holding down the enter key until
>>  it shut up), stripped out all the modules and disabled module support, put
>>  back in CONFIG_SERIAL_SUNZILOG_CONSOLE=y and friends, procfs, sysfs, and tmpfs
>>  (strange things to have as modules?), and CONFIG_SQUASHFS (that's my default
>>  root filesystem format).
>>
>>  I booted the result up with init=/bin/ash, did a "mount -t tmpfs /tmp /tmp",
>>  and then:
>>
>>   / # ls -l /tmp
>>   Illegal instruction
>>
>>  It's still misbehaving.  Huh.
>>
>>  This is as close as I can get to the debian kernel config without adding module
>>  support to my images (which is unnecessary complication for what they do).  I
>>  can try an ext2 root filesystem image but I don't see how that would cause
>>  this.
>>
>>  The part I don't understand is that same busybox binary, built with the same
>>  toolchain, worked just fine under the Debian kernel.  I'd blame my toolchain,
>>  but in a slightly different context THE BINARIES WORKED...
>>
>>  I don't understand what's going wrong here.  Did the kernel break on sparc
>>  sometime between 2.6.18 and 2.6.32 and nobody noticed?  Is sparc using
>>  software emulated floating point at the kernel level and that's configured as a
>>  module?  (Except I don't think busybox ls uses floating point...)
>
> Sparc32 is not maintained anymore so maybe it broke at some point.
> There was some discussion a few years ago.
>
>>  Do any sparc people understand what's going on here?  My next step is to grab
>>  a 2.6.18 kernel and try to get _that_ to work with the tweaked debian config
>>  (and an ext2 root filesystem since squashfs wasn't merged back then and had a
>>  format change when it was merged).  But I'm mostly flailing around blind
>>  here...
>
> I'm also trying different kernels using my .config. But already 2.6.12
> hangs in ESP probe.

Does it work on a real hw? 2.6.18 definitely does.
We still have bug(s) in ESP though: Solaris also hangs in ESP probe
after a soft reset in OBP.
Blue Swirl Feb. 20, 2010, 9:59 p.m. UTC | #30
On 2/20/10, Rob Landley <rob@landley.net> wrote:
> On Saturday 20 February 2010 11:34:44 Blue Swirl wrote:
>  > On 2/20/10, Rob Landley <rob@landley.net> wrote:
>
> > >  I don't understand what's going wrong here.  Did the kernel break on
>  > > sparc sometime between 2.6.18 and 2.6.32 and nobody noticed?  Is sparc
>  > > using software emulated floating point at the kernel level and that's
>  > > configured as a module?  (Except I don't think busybox ls uses floating
>  > > point...)
>  >
>  > Sparc32 is not maintained anymore so maybe it broke at some point.
>  > There was some discussion a few years ago.
>
>
> Not maintained on the Linux kernel side, or not maintained under qemu?  It
>  seems to be working under debian, but the 2.6.18 kernel is from 2006.

On kernel side. I try to maintain QEMU side.

>  > >  Do any sparc people understand what's going on here?  My next step is to
>  > > grab a 2.6.18 kernel and try to get _that_ to work with the tweaked
>  > > debian config (and an ext2 root filesystem since squashfs wasn't merged
>  > > back then and had a format change when it was merged).  But I'm mostly
>  > > flailing around blind here...
>  >
>  > I'm also trying different kernels using my .config. But already 2.6.12
>  > hangs in ESP probe.
>
>
> I've got 2.6.32 booting to a command prompt (albeit with serial console and
>  intentionall restricted set of hardware).  But then it misbehaves.
>
>  I'll try getting 2.6.18 to build with a known .config, and then bisect forward
>  if that seems to work...

Good plan. Bisecting backwards could be interesting too, to find out
which releases are actually working out of the box.
Artyom Tarasenko Feb. 20, 2010, 9:59 p.m. UTC | #31
2010/2/20 Rob Landley <rob@landley.net>:
> On Saturday 20 February 2010 11:34:44 Blue Swirl wrote:
>> On 2/20/10, Rob Landley <rob@landley.net> wrote:
>> >  I don't understand what's going wrong here.  Did the kernel break on
>> > sparc sometime between 2.6.18 and 2.6.32 and nobody noticed?  Is sparc
>> > using software emulated floating point at the kernel level and that's
>> > configured as a module?  (Except I don't think busybox ls uses floating
>> > point...)
>>
>> Sparc32 is not maintained anymore so maybe it broke at some point.
>> There was some discussion a few years ago.
>
> Not maintained on the Linux kernel side, or not maintained under qemu?  It
> seems to be working under debian, but the 2.6.18 kernel is from 2006.
>
>> >  Do any sparc people understand what's going on here?  My next step is to
>> > grab a 2.6.18 kernel and try to get _that_ to work with the tweaked
>> > debian config (and an ext2 root filesystem since squashfs wasn't merged
>> > back then and had a format change when it was merged).  But I'm mostly
>> > flailing around blind here...
>>
>> I'm also trying different kernels using my .config. But already 2.6.12
>> hangs in ESP probe.
>
> I've got 2.6.32 booting to a command prompt (albeit with serial console and
> intentionall restricted set of hardware).  But then it misbehaves.
>
> I'll try getting 2.6.18 to build with a known .config, and then bisect forward
> if that seems to work...

You can also try aurora linux. They had a bit newer kernel. Don't know
how stable is it on a real hw though.
Blue Swirl Feb. 20, 2010, 10:03 p.m. UTC | #32
On 2/20/10, Artyom Tarasenko <atar4qemu@googlemail.com> wrote:
> 2010/2/20 Blue Swirl <blauwirbel@gmail.com>:
>
> > On 2/20/10, Rob Landley <rob@landley.net> wrote:
>  >> On Thursday 18 February 2010 05:21:16 Artyom Tarasenko wrote:
>  >>  > 2010/2/17 Rob Landley <rob@landley.net>:
>  >>  > > But it does imply that qemu is capable of decently running _something_ on
>  >>  > > sparc, so the problems I'm seeing are more likely to be uClibc or
>  >>  > > toolchain issues.
>  >>  >
>  >>  > qemu-sparc can decently run debian-40r8: gcc and all the other stuff
>  >>  > seem to work.
>  >>  >
>  >>  > Most versions of the NetBSD boot. Some require the original OBP
>  >>  > though. The only known to me version which definetely doesn't boot is
>  >>  > 3.0.2.
>  >>  >
>  >>  > Also since the last dma fix Solaris 2.4-2.5.1 seems to be also fully
>  >>  > functional. Don't have a suitable compiler to check whether it's
>  >>  > working under Solaris though.
>  >>  >
>  >>  > Debian-40r8 should have all the necessary stuff to build the uClibc
>  >>  > toolchain, right?
>  >>
>  >>  So I did a network install of that Debian image into a 4 gig disk image, and
>  >>  made some progress.
>  >>
>  >>  First a quick bug report: qemu-system-sparc tries to set the video window to
>  >>  900 pixels vertical, but my laptop's display is only 800 pixels tall, and the
>  >>  window manager trims it a bit more than that for the toolbar.  The kernel
>  >>  booting up seems to think the graphics window is still its original size
>  >>  renders text off the bottom of it.  But for some reason I can grab the window
>  >>  and resize it, and when I do this the emulated kernel's frame buffer gets the
>  >>  update and resizes its console to show the correct number of lines of text for
>  >>  the new size!  (So my question is, why didn't it get the size right when the
>  >>  window manager first resized it before I manually resized it again?)
>  >>
>  >>  Anyway: yay emulated sparc debian, I installed it, got a reasonable
>  >>  environment going, extracted my root filesystem image under there and chrooted
>  >>  into it... and everything worked fine.  (Well, trying to run a dynamically
>  >>  linked "hello world" still died with a bus error, but using the static busybox
>  >>  I could mount a tmpfs and list its contents, which I never could before.)
>  >>
>  >>  My plan had been to use sparc-debian's copy of gdb to track down why the
>  >>  binaries were going funky... but in that environment, they were behaving
>  >>  themselves.  Same binaries, built with the same toolchain, same qemu-system-
>  >>  sparc, same -M and -cpu and so on...
>  >>
>  >>  So I think "A-ha!  Booting a different kernel!  That's gotta be it!"
>  >>
>  >>  The debian-sparc image is using a 2.6.18 kernel (and I'm using a 2.6.32
>  >>  kernel), but it installed the relevant .config in /boot, so I copied that out
>  >>  with scp, did a "make oldconfig" up to 2.6.32 (holding down the enter key until
>  >>  it shut up), stripped out all the modules and disabled module support, put
>  >>  back in CONFIG_SERIAL_SUNZILOG_CONSOLE=y and friends, procfs, sysfs, and tmpfs
>  >>  (strange things to have as modules?), and CONFIG_SQUASHFS (that's my default
>  >>  root filesystem format).
>  >>
>  >>  I booted the result up with init=/bin/ash, did a "mount -t tmpfs /tmp /tmp",
>  >>  and then:
>  >>
>  >>   / # ls -l /tmp
>  >>   Illegal instruction
>  >>
>  >>  It's still misbehaving.  Huh.
>  >>
>  >>  This is as close as I can get to the debian kernel config without adding module
>  >>  support to my images (which is unnecessary complication for what they do).  I
>  >>  can try an ext2 root filesystem image but I don't see how that would cause
>  >>  this.
>  >>
>  >>  The part I don't understand is that same busybox binary, built with the same
>  >>  toolchain, worked just fine under the Debian kernel.  I'd blame my toolchain,
>  >>  but in a slightly different context THE BINARIES WORKED...
>  >>
>  >>  I don't understand what's going wrong here.  Did the kernel break on sparc
>  >>  sometime between 2.6.18 and 2.6.32 and nobody noticed?  Is sparc using
>  >>  software emulated floating point at the kernel level and that's configured as a
>  >>  module?  (Except I don't think busybox ls uses floating point...)
>  >
>  > Sparc32 is not maintained anymore so maybe it broke at some point.
>  > There was some discussion a few years ago.
>  >
>  >>  Do any sparc people understand what's going on here?  My next step is to grab
>  >>  a 2.6.18 kernel and try to get _that_ to work with the tweaked debian config
>  >>  (and an ext2 root filesystem since squashfs wasn't merged back then and had a
>  >>  format change when it was merged).  But I'm mostly flailing around blind
>  >>  here...
>  >
>  > I'm also trying different kernels using my .config. But already 2.6.12
>  > hangs in ESP probe.
>
>
> Does it work on a real hw? 2.6.18 definitely does.
>  We still have bug(s) in ESP though: Solaris also hangs in ESP probe
>  after a soft reset in OBP.

Haven't tested. ESP actually seems to work on 2.6.13, at least CD-ROM
is identified without hang. There is some problem with initrd though.
Rob Landley Feb. 20, 2010, 11:12 p.m. UTC | #33
On Saturday 20 February 2010 15:59:31 Blue Swirl wrote:
> > I've got 2.6.32 booting to a command prompt (albeit with serial console
> > and intentionall restricted set of hardware).  But then it misbehaves.
> >
> >  I'll try getting 2.6.18 to build with a known .config, and then bisect
> > forward if that seems to work...
>
> Good plan. Bisecting backwards could be interesting too, to find out
> which releases are actually working out of the box.

I started by iterating through the release versions.  It's working up through 
2.6.28, then 2.6.29 has the out of memory error in my init script.

Bisecting now...

Rob
Rob Landley Feb. 21, 2010, 4:25 p.m. UTC | #34
On Saturday 20 February 2010 17:12:22 Rob Landley wrote:
> On Saturday 20 February 2010 15:59:31 Blue Swirl wrote:
> > > I've got 2.6.32 booting to a command prompt (albeit with serial console
> > > and intentionall restricted set of hardware).  But then it misbehaves.
> > >
> > >  I'll try getting 2.6.18 to build with a known .config, and then bisect
> > > forward if that seems to work...
> >
> > Good plan. Bisecting backwards could be interesting too, to find out
> > which releases are actually working out of the box.
>
> I started by iterating through the release versions.  It's working up
> through 2.6.28, then 2.6.29 has the out of memory error in my init script.
>
> Bisecting now...
>
> Rob

And the commit that broke it bisects to:

085219f79cad89291699bd2bfb21c9fdabafe65f is first bad commit
commit 085219f79cad89291699bd2bfb21c9fdabafe65f
Author: Sam Ravnborg <sam@ravnborg.org>
Date:   Fri Jan 2 18:47:34 2009 -0800

    sparc32: use proper types in struct stat
    
    Like sparc64 use proper types in struct stat
    
    Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

This commit breaks stat and makes sparc32 essentially unusable.  It changes 
the size of the various types in stat.h, and means that if you "mount -t tmpfs 
/tmp /tmp" and then try to ls /tmp, ls dies with a memory allocation error.

I've confirmed that reverting it fixes the problem.

Looking at the actual diff, here's the hunk that causes problems:

--- a/arch/sparc/include/asm/stat_32.h
+++ b/arch/sparc/include/asm/stat_32.h
        short           st_nlink;
-       unsigned short  st_uid;
-       unsigned short  st_gid;
+       uid_t           st_uid;
+       gid_t           st_gid;

The symptom (in my uClibc+busybox root filesystem) is:

/ # mount -t tmpfs /tmp /tmp
/ # ls -l /tmp
ls: can't open '/tmp': Cannot allocate memory
total 0

The problem is that both uid_t and gid_t are "int" instead of "short".  This 
patch changes the size of those types.  (I note that this is apparently a 
known issue, there's __compat_uid_t and friends in the sparc asm directory...)

Rob
David Miller Feb. 21, 2010, 11:57 p.m. UTC | #35
From: Rob Landley <rob@landley.net>
Date: Sun, 21 Feb 2010 10:25:09 -0600

> 085219f79cad89291699bd2bfb21c9fdabafe65f is first bad commit
> commit 085219f79cad89291699bd2bfb21c9fdabafe65f
> Author: Sam Ravnborg <sam@ravnborg.org>
> Date:   Fri Jan 2 18:47:34 2009 -0800
> 
>     sparc32: use proper types in struct stat
>     
>     Like sparc64 use proper types in struct stat
>     
>     Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> This commit breaks stat and makes sparc32 essentially unusable.  It changes 
> the size of the various types in stat.h, and means that if you "mount -t tmpfs 
> /tmp /tmp" and then try to ls /tmp, ls dies with a memory allocation error.
> 
> I've confirmed that reverting it fixes the problem.

Thanks for tracking this down Rob, I'll work on a fix and
push it around.
Bartlomiej Zolnierkiewicz Feb. 22, 2010, 12:28 a.m. UTC | #36
On Monday 22 February 2010 12:57:19 am David Miller wrote:
> From: Rob Landley <rob@landley.net>
> Date: Sun, 21 Feb 2010 10:25:09 -0600
> 
> > 085219f79cad89291699bd2bfb21c9fdabafe65f is first bad commit
> > commit 085219f79cad89291699bd2bfb21c9fdabafe65f
> > Author: Sam Ravnborg <sam@ravnborg.org>
> > Date:   Fri Jan 2 18:47:34 2009 -0800
> > 
> >     sparc32: use proper types in struct stat
> >     
> >     Like sparc64 use proper types in struct stat
> >     
> >     Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> > 
> > This commit breaks stat and makes sparc32 essentially unusable.  It changes 
> > the size of the various types in stat.h, and means that if you "mount -t tmpfs 
> > /tmp /tmp" and then try to ls /tmp, ls dies with a memory allocation error.
> > 
> > I've confirmed that reverting it fixes the problem.
> 
> Thanks for tracking this down Rob, I'll work on a fix and
> push it around.

Looking at how whole sparc32 has been apparently broken for over a year now
because of a purely cleanup patch I wonder if it would be appropriate to
make sparc32 into 'legacy only' and provide 'a stability promise' for it?

Just an idea.. ;)

--
Bartlomiej Zolnierkiewicz
Rob Landley Feb. 22, 2010, 2:03 a.m. UTC | #37
On Sunday 21 February 2010 18:28:20 Bartlomiej Zolnierkiewicz wrote:
> On Monday 22 February 2010 12:57:19 am David Miller wrote:
> > From: Rob Landley <rob@landley.net>
> > Date: Sun, 21 Feb 2010 10:25:09 -0600
> >
> > > 085219f79cad89291699bd2bfb21c9fdabafe65f is first bad commit
> > > commit 085219f79cad89291699bd2bfb21c9fdabafe65f
> > > Author: Sam Ravnborg <sam@ravnborg.org>
> > > Date:   Fri Jan 2 18:47:34 2009 -0800
> > >
> > >     sparc32: use proper types in struct stat
> > >
> > >     Like sparc64 use proper types in struct stat
> > >
> > >     Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
> > >     Signed-off-by: David S. Miller <davem@davemloft.net>
> > >
> > > This commit breaks stat and makes sparc32 essentially unusable.  It
> > > changes the size of the various types in stat.h, and means that if you
> > > "mount -t tmpfs /tmp /tmp" and then try to ls /tmp, ls dies with a
> > > memory allocation error.
> > >
> > > I've confirmed that reverting it fixes the problem.
> >
> > Thanks for tracking this down Rob, I'll work on a fix and
> > push it around.
>
> Looking at how whole sparc32 has been apparently broken for over a year now
> because of a purely cleanup patch I wonder if it would be appropriate to
> make sparc32 into 'legacy only' and provide 'a stability promise' for it?
>
> Just an idea.. ;)

Actually, the problem is that lots of people seem to expect current kernels to 
be broken on non-x86 targets, so they keep using old versions.  (In the case 
of the debian release everybody kept pointing me to on "but it works fine!" 
grounds, a 2.6.18 kernel.)  Lots of them only upgrade once idiots like me have 
gone across the minefield and made it safe. :)

"Current is always broken so nobody uses current" != "nobody uses this 
platform".  More "sparc people use distros rather than building their own 
systems from source, and tend not to be aggressive about upgrading".

Back in 2007 arm was broken for me for two or three releases (according to my 
blog it broke in 2.6.20 and the patch that fixed it ( 
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=4454/1 ) was 
not yet in 2.6.22-rc7.  That doesn't mean arm isn't widely used, just that 
nobody with that hardware was seriously trying to use the current version of 
the kernel.

My Firmware LInux project is working on implementing automated regression 
testing under QEMU.  Once I've got a platform working (which sparc wasn't 
until now) I can provide much more prompt breakage reports in future, at least 
for the basic stuff like this...

Rob
Aurelien Jarno Feb. 28, 2010, 9:05 p.m. UTC | #38
On Tue, Feb 16, 2010 at 08:21:45AM +0000, Stuart Brady wrote:
> On Mon, Feb 15, 2010 at 12:19:24PM +0100, Alexander Graf wrote:
> > So what you really want is something like
> > 
> > #ifdef CONFIG_LINUX_USER
> > /* exec return value is always 0 */
> > env->gpr[3] = 0;
> > #endif
> > 
> > just after the #endif in your patch. If you had inlined your patch I could've commented it there.
> 
> I've clearly misunderstood something, but isn't CONFIG_LINUX_USER always
> going to be defined when building linux-user/elfload.c, and doesn't 
> CONFIG_BSD relate to the host that you're building for, not the target?

Yes, CONFIG_LINUX_USER will always be defined in linux-user/elfload.c,
while CONFIG_BSD_USER will always be defined in bsd-user/elfload.c. The
same way using CONFIG_BSD in linux-user/elfload.c doesn't make sense,
as this code will never been compiled.

> I can't remember whether Jocelyn was interested in running BSD binaries
> under Linux or under BSD.  The former seems reasonable, although even if
> that did work for PPC at one point, I doubt that's still the case...
> 

That's sound strange, if you do that I think you will need to use the
bsd-user code, not the linux-user code.

I have to say I don't really understand the reason why this BSD code is
there.
diff mbox

Patch

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 1d5f651..eaabdac 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -513,12 +513,11 @@  do {                                                                    \
 static inline void init_thread(struct target_pt_regs *_regs, struct image_info *infop)
 {
     abi_ulong pos = infop->start_stack;
-    abi_ulong tmp;
 #if defined(TARGET_PPC64) && !defined(TARGET_ABI32)
     abi_ulong entry, toc;
 #endif
 
-    _regs->gpr[1] = infop->start_stack;
+    _regs->gpr[1] = pos;
 #if defined(TARGET_PPC64) && !defined(TARGET_ABI32)
     entry = ldq_raw(infop->entry) + infop->load_addr;
     toc = ldq_raw(infop->entry + 8) + infop->load_addr;
@@ -526,6 +525,8 @@  static inline void init_thread(struct target_pt_regs *_regs, struct image_info *
     infop->entry = entry;
 #endif
     _regs->nip = infop->entry;
+
+#if defined(CONFIG_BSD)
     /* Note that isn't exactly what regular kernel does
      * but this is what the ABI wants and is needed to allow
      * execution of PPC BSD programs.
@@ -534,9 +535,13 @@  static inline void init_thread(struct target_pt_regs *_regs, struct image_info *
     get_user_ual(_regs->gpr[3], pos);
     pos += sizeof(abi_ulong);
     _regs->gpr[4] = pos;
-    for (tmp = 1; tmp != 0; pos += sizeof(abi_ulong))
-        tmp = ldl(pos);
+    for (;;) {
+        abi_ulong tmp = pos;
+        pos += sizeof(abi_ulong);
+        if (!ldl(tmp)) break;
+    }
     _regs->gpr[5] = pos;
+#endif
 }
 
 /* See linux kernel: arch/powerpc/include/asm/elf.h.  */