Message ID | 1403559614-4096-1-git-send-email-paul@archlinuxmips.org |
---|---|
State | New |
Headers | show |
On 23 June 2014 22:40, Paul Burton <paul@archlinuxmips.org> wrote: > The ptr argument to the ipc syscall was incorrectly being used as the > value of the argument union for the SEMCTL call. It is actually, as its > name would suggest, a pointer to that union. Have you checked this on other architectures than MIPS? I have a vague recollection that there are between-arch differences regarding handling of the semctl argument... Also, VERIFY_READ doesn't seem right for some of the semctl operations which will modify the target_semun. thanks -- PMM
On Mon, Jun 23, 2014 at 11:12:42PM +0100, Peter Maydell wrote: > On 23 June 2014 22:40, Paul Burton <paul@archlinuxmips.org> wrote: > > The ptr argument to the ipc syscall was incorrectly being used as the > > value of the argument union for the SEMCTL call. It is actually, as its > > name would suggest, a pointer to that union. > > Have you checked this on other architectures than MIPS? > I have a vague recollection that there are between-arch > differences regarding handling of the semctl argument... I haven't tried running code for any other targets, but the pointer is dereferenced from generic code in Linux, see ipc/syscall.c: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/ipc/syscall.c#n39 > Also, VERIFY_READ doesn't seem right for some of the > semctl operations which will modify the target_semun. > > thanks > -- PMM That part I think you're right about, I'll switch to VERIFY_WRITE. Thanks, Paul
On 23 June 2014 23:18, Paul Burton <paul@archlinuxmips.org> wrote: > On Mon, Jun 23, 2014 at 11:12:42PM +0100, Peter Maydell wrote: >> On 23 June 2014 22:40, Paul Burton <paul@archlinuxmips.org> wrote: >> > The ptr argument to the ipc syscall was incorrectly being used as the >> > value of the argument union for the SEMCTL call. It is actually, as its >> > name would suggest, a pointer to that union. >> >> Have you checked this on other architectures than MIPS? >> I have a vague recollection that there are between-arch >> differences regarding handling of the semctl argument... > > I haven't tried running code for any other targets, but the pointer is > dereferenced from generic code in Linux, see ipc/syscall.c: > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/ipc/syscall.c#n39 I see that code has a NULL pointer check which your patch doesn't... >> Also, VERIFY_READ doesn't seem right for some of the >> semctl operations which will modify the target_semun. > That part I think you're right about, I'll switch to VERIFY_WRITE. On the other hand that doesn't line up with the kernel code, which just does a get_user() of a single target_ulong and passes that to the sys_semctl function (which then might interpret it as a user pointer to a thing that needs locking and might be written to). That would suggest that you should be using get_user_ual() here, passing an abi_ulong into do_semctl() and probably fixing up that function and its callers. Basically I think the semctl code is probably buggy in a number of ways and so I'm dubious about a patch that's trying to make a very small change to it without looking at the larger picture and testing and fixing bugs on more than one architecture. (http://patchwork.ozlabs.org/patch/217249/ is a previous attempt at fixing up semctl; on reflection I may have been wrong about the relevance of compat_sys_semctl, though.) thanks -- PMM
On Mon, Jun 23, 2014 at 11:18:25PM +0100, Paul Burton wrote: > > Also, VERIFY_READ doesn't seem right for some of the > > semctl operations which will modify the target_semun. > > > > thanks > > -- PMM > > That part I think you're right about, I'll switch to VERIFY_WRITE. Actually no, I don't think you're right about that afterall. The argument union itself is never modified. I imagine if it were then it would be painful in the case of the semctl syscall where the union is passed directly as an argument, rather than as a pointer as it is for the ipc syscall. What may be modified is the data pointed to by the pointers within union semun. That is already handled by do_semctl & the translate functions it calls. /me is not fond of this API... So anyway, I believe the patch is good as-is. Thanks, Paul
On 23 June 2014 23:36, Paul Burton <paul@archlinuxmips.org> wrote: > Actually no, I don't think you're right about that afterall. The > argument union itself is never modified. I imagine if it were then it > would be painful in the case of the semctl syscall where the union is > passed directly as an argument, rather than as a pointer as it is for > the ipc syscall. > > What may be modified is the data pointed to by the pointers within union > semun. That is already handled by do_semctl & the translate functions it > calls. Except if you look at do_semctl you see code like: case GETVAL: case SETVAL: arg.val = tswap32(target_su.val); ret = get_errno(semctl(semid, semnum, cmd, arg)); target_su.val = tswap32(arg.val); break; which clearly is just modifying fields in the target_semun union. So something's wrong (probably that code)... thanks -- PMM
On Mon, Jun 23, 2014 at 11:35:17PM +0100, Peter Maydell wrote: > >> Have you checked this on other architectures than MIPS? > >> I have a vague recollection that there are between-arch > >> differences regarding handling of the semctl argument... > > > > I haven't tried running code for any other targets, but the pointer is > > dereferenced from generic code in Linux, see ipc/syscall.c: > > > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/ipc/syscall.c#n39 > > I see that code has a NULL pointer check which your patch doesn't... True, a NULL pointer would lead to EFAULT with my patch where the kernel will give EINVAL. I'll fix it. > >> Also, VERIFY_READ doesn't seem right for some of the > >> semctl operations which will modify the target_semun. > > > That part I think you're right about, I'll switch to VERIFY_WRITE. > > On the other hand that doesn't line up with the kernel code, > which just does a get_user() of a single target_ulong and > passes that to the sys_semctl function (which then might > interpret it as a user pointer to a thing that needs locking > and might be written to). We've crossed emails, I just noted the same thing :) > That would suggest that you > should be using get_user_ual() here, passing an abi_ulong > into do_semctl() and probably fixing up that function and its > callers. Well as far as I can tell the union will always be the size of a long anyway, so I see no harm in using lock_user(_struct) as I did. I'll switch if you insist, but the result will be the same. > Basically I think the semctl code is probably buggy in a > number of ways Perhaps, I suspect you know better than I. > and so I'm dubious about a patch that's > trying to make a very small change to it Isn't that precisely how good bisectable bug fixes should be made? > without looking > at the larger picture and testing and fixing bugs on more > than one architecture. I pointed you to the kernel code which dereferences the pointer & it's quite clearly architecture neutral, so I'm not sure what you're getting at with the architecture comment. There's quite clearly a bug in QEMU here, and this patch fixes it. I hope you're not saying you don't want it merged because it doesn't fix some other hypothetical bugs in the same area. > (http://patchwork.ozlabs.org/patch/217249/ is a previous > attempt at fixing up semctl; on reflection I may have been > wrong about the relevance of compat_sys_semctl, though.) In terms of the compat_ wrappers, note that compat_sys_ipc also dereferences the argument as a pointer: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/ipc/compat.c#n350 And that compat_sys_semctl does not. As far as I can see they both match the behaviour of the non-compat versions with regards to this, so that seems irrelevant. I do agree that the patch you link to is wrong though, I'm guessing the author had confused semctl(...) and ipc(SEMCTL, ...). Thanks, Paul
On Mon, Jun 23, 2014 at 11:42:14PM +0100, Peter Maydell wrote: > On 23 June 2014 23:36, Paul Burton <paul@archlinuxmips.org> wrote: > > Actually no, I don't think you're right about that afterall. The > > argument union itself is never modified. I imagine if it were then it > > would be painful in the case of the semctl syscall where the union is > > passed directly as an argument, rather than as a pointer as it is for > > the ipc syscall. > > > > What may be modified is the data pointed to by the pointers within union > > semun. That is already handled by do_semctl & the translate functions it > > calls. > > Except if you look at do_semctl you see code like: > case GETVAL: > case SETVAL: > arg.val = tswap32(target_su.val); > ret = get_errno(semctl(semid, semnum, cmd, arg)); > target_su.val = tswap32(arg.val); > break; > > which clearly is just modifying fields in the target_semun union. > So something's wrong (probably that code)... Yes, both Linux & man semctl agree that GETVAL returns the value of the semaphore as the return value of the syscall. So I believe the assignment to target_su.val there is (functionally harmless) garbage. Thanks, Paul
On 24 June 2014 00:06, Paul Burton <paul@archlinuxmips.org> wrote: > On Mon, Jun 23, 2014 at 11:35:17PM +0100, Peter Maydell wrote: >> and so I'm dubious about a patch that's >> trying to make a very small change to it > > Isn't that precisely how good bisectable bug fixes should be made? The key is in the second half of this sentence: >> without looking >> at the larger picture and testing and fixing bugs on more >> than one architecture. > > I pointed you to the kernel code which dereferences the pointer & it's > quite clearly architecture neutral, so I'm not sure what you're getting > at with the architecture comment. > > There's quite clearly a bug in QEMU here, and this patch fixes it. I > hope you're not saying you don't want it merged because it doesn't fix > some other hypothetical bugs in the same area. What I mean is that I would expect that any attempt to fix behaviour in this area ought to result in a series of three or four patches which fix various bugs (of which this would just be one). When an area of code is pretty clearly bogus like this one, then in my experience an attempt to make a small fix to it without just going ahead and overhauling it is likely to result in accidentally breaking existing working uses which happened to work more or less by fluke. This is particularly true if you only test one guest architecture; you can reasonably make that level of limited testing in areas where the codebase is sane, but not where it is clearly dubious. So yes, I would prefer this not to be merged unless either (a) it comes as part of a series that cleans up the code for handling semctl so it's not obviously broken (b) it has been tested to confirm that it doesn't obviously regress any guest architecture (or at least not any of the more important ones), and ideally both. I don't think this is an enormous amount of work (maybe a couple of days?); I'm really just recommending the approach and amount of cleanup that I would do if I found I needed to make a fix in this area myself. thanks -- PMM
On Tue, Jun 24, 2014 at 12:21:42AM +0100, Peter Maydell wrote: > On 24 June 2014 00:06, Paul Burton <paul@archlinuxmips.org> wrote: > > On Mon, Jun 23, 2014 at 11:35:17PM +0100, Peter Maydell wrote: > >> and so I'm dubious about a patch that's > >> trying to make a very small change to it > > > > Isn't that precisely how good bisectable bug fixes should be made? > > The key is in the second half of this sentence: > > >> without looking > >> at the larger picture and testing and fixing bugs on more > >> than one architecture. > > > > I pointed you to the kernel code which dereferences the pointer & it's > > quite clearly architecture neutral, so I'm not sure what you're getting > > at with the architecture comment. > > > > There's quite clearly a bug in QEMU here, and this patch fixes it. I > > hope you're not saying you don't want it merged because it doesn't fix > > some other hypothetical bugs in the same area. > > What I mean is that I would expect that any attempt to fix > behaviour in this area ought to result in a series of three > or four patches which fix various bugs (of which this > would just be one). When an area of code is pretty > clearly bogus like this one, then in my experience an > attempt to make a small fix to it without just going ahead > and overhauling it is likely to result in accidentally > breaking existing working uses which happened to work > more or less by fluke. This is particularly true if you only > test one guest architecture; you can reasonably make that > level of limited testing in areas where the codebase is > sane, but not where it is clearly dubious. > > So yes, I would prefer this not to be merged unless either > (a) it comes as part of a series that cleans up the code for > handling semctl so it's not obviously broken > (b) it has been tested to confirm that it doesn't obviously > regress any guest architecture (or at least not any of the > more important ones), > and ideally both. > > I don't think this is an enormous amount of work (maybe a > couple of days?); I'm really just recommending the approach > and amount of cleanup that I would do if I found I needed > to make a fix in this area myself. Well I disagree with your logic, but perhaps that's primarily because of your claim that the semctl code is "clearly bogus" and "obviously broken". Could you back that up? I know there's the one bogus line in the GETVAL/SETVAL case that was mentioned in another email, but is there anything else you consider broken? I see your point on testing, but frankly this code is generic for all architectures in Linux. I don't have the time to test each architecture and I don't have the time to test all software that may have previously been working by fluke. So what's the bar you'd like to set here? I traced the issue with fakeroot then compared the code & behaviour of QEMU with that of Linux. I found a difference. I fixed it. I checked that the kernel code for this is architecture neutral. So as far as I'm aware this patch fixes a bug and QEMU would be better with this patch than it is without it. But anyway, please do enlighten me: where are the bugs of which you speak? I'd like to see them fixed too :) Thanks, Paul
On 24 June 2014 00:53, Paul Burton <paul@archlinuxmips.org> wrote: > Well I disagree with your logic, but perhaps that's primarily because of > your claim that the semctl code is "clearly bogus" and "obviously > broken". Could you back that up? I know there's the one bogus line in > the GETVAL/SETVAL case that was mentioned in another email, but is there > anything else you consider broken? The type of the parameter we pass doesn't match what the kernel does, there's been at least one previous attempt to fix stuff in this area, and as you say the getval/setval stuff is broken. That's three things, which means (to my mind) that the first thing that needs to be done is examine the whole routine and determine how it ought to work, independent of what it happens to be doing now. Then you can implement the necessary fixes. I *don't* think this is a big job, or even that the code needs to be completely rewritten. > I see your point on testing, but frankly this code is generic for all > architectures in Linux. I don't have the time to test each architecture > and I don't have the time to test all software that may have previously > been working by fluke. So what's the bar you'd like to set here? Riku's the submaintainer here, so it's his decision in the end (and I think he has a set of tests he runs on patches as well), but one of the bars I have for reviewing patches is that I should be reasonably confident it won't regress existing behaviour for anybody. A simple patch to existing clear and correct code gives me that confidence; a set of patches that take a holistic approach to an obvious trouble spot do too; a pile of testing ditto. A tiny point fix added to something we know to be dubious doesn't give any confidence that it's going to interact correctly with the dubious code. > But anyway, please do enlighten me: where are the bugs of which you > speak? I'd like to see them fixed too :) You're essentially asking me to do the work for you. This is a recipe for me to say "sorry, I don't think we should accept your patch" -- it's you that has a reason to get this patch accepted, not me. thanks -- PMM
On Tue, Jun 24, 2014 at 09:19:45AM +0100, Peter Maydell wrote: > On 24 June 2014 00:53, Paul Burton <paul@archlinuxmips.org> wrote: > > Well I disagree with your logic, but perhaps that's primarily because of > > your claim that the semctl code is "clearly bogus" and "obviously > > broken". Could you back that up? I know there's the one bogus line in > > the GETVAL/SETVAL case that was mentioned in another email, but is there > > anything else you consider broken? > > The type of the parameter we pass doesn't match what the > kernel does, there's been at least one previous attempt to > fix stuff in this area, and as you say the getval/setval stuff > is broken. Thanks, that's useful. I'll change the code to use ulong instead of union target_semun if that'll make you happier, but that is simply moving casting down a level into do_semctl and won't change the way this runs. I'm not aware of GETVAL & SETVAL being broken in the sense of not working when used by target programs. There's one bogus line in the code which as far as I can see is harmless with the exception of making the code unclear to readers. I'm happy to submit a patch to remove that line. I can see that the fact that someone previously attempted, incorrectly, to fix a bug in this code may make you doubt its correctness and be wary of further patches to the code. What I disagree with is the notion that a bug fix shouldn't be merged until the code has been thoroughly examined for further bugs. > That's three things, which means (to my mind) that > the first thing that needs to be done is examine the whole > routine and determine how it ought to work, independent of > what it happens to be doing now. Then you can implement the > necessary fixes. I *don't* think this is a big job, or even that the > code needs to be completely rewritten. I agree it's a good idea for that to be done, and I'll do it when I get the time. I disagree that it means this bug fix shouldn't be merged, but there's no sense arguing about it so I'll leave it there and Riku can do as he wishes. > > I see your point on testing, but frankly this code is generic for all > > architectures in Linux. I don't have the time to test each architecture > > and I don't have the time to test all software that may have previously > > been working by fluke. So what's the bar you'd like to set here? > > Riku's the submaintainer here, so it's his decision in the end > (and I think he has a set of tests he runs on patches as well), > but one of the bars I have for reviewing patches is that I > should be reasonably confident it won't regress existing > behaviour for anybody. A simple patch to existing clear and > correct code gives me that confidence; a set of patches that > take a holistic approach to an obvious trouble spot do too; > a pile of testing ditto. A tiny point fix added to something we > know to be dubious doesn't give any confidence that it's > going to interact correctly with the dubious code. That's the thing, you say the code is known to be dubious (previously "bogus" and "broken"), but then when I ask you to back that up I get this: > > But anyway, please do enlighten me: where are the bugs of which you > > speak? I'd like to see them fixed too :) > > You're essentially asking me to do the work for you. That's not what I intend at all. I simply wanted to know why you consider the code bogus & broken. Hopefully the reasons are addressed above. > This is a recipe for me to say "sorry, I don't think we should > accept your patch" -- it's you that has a reason to get > this patch accepted, not me. The only reason I'd like it merged is that I want QEMU to work better, for the good of all who use it. I'd hope that's true of everyone working on it and in that sense I have no more investment in this patch being merged than anyone else. I'm not being paid to do this, and my build of QEMU works better than it did previously. Anyway things seem to be getting a little heated here which is probably not a productive path forwards, so I'll leave this alone until I write the GETVAL/SETVAL patch & look through the rest of the code. Paul
diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 92be371..c70d9d0 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -3272,8 +3272,16 @@ static abi_long do_ipc(unsigned int call, int first, ret = get_errno(semget(first, second, third)); break; - case IPCOP_semctl: - ret = do_semctl(first, second, third, (union target_semun)(abi_ulong) ptr); + case IPCOP_semctl: { + union target_semun *arg; + + if (!lock_user_struct(VERIFY_READ, arg, ptr, 1)) { + return -TARGET_EFAULT; + } + + ret = do_semctl(first, second, third, *arg); + unlock_user_struct(arg, ptr, 0); + } break; case IPCOP_msgget:
The ptr argument to the ipc syscall was incorrectly being used as the value of the argument union for the SEMCTL call. It is actually, as its name would suggest, a pointer to that union. Fix by dereferencing the pointer to obtain the target argument union. This fixes fakeroot, or at least version 1.20 for the MIPS target. Previously it would hang waiting on a semaphore which was not being initialised to the correct value. Signed-off-by: Paul Burton <paul@archlinuxmips.org> --- linux-user/syscall.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)