diff mbox series

[RFCv1,01/72] e2fsck: Fix unbalanced mutex unlock for BOUNCE_MTX

Message ID febbbd17b3cf4201aaae24e4adb61e4f8a80e9c9.1667822611.git.ritesh.list@gmail.com
State Under Review
Delegated to: Theodore Ts'o
Headers show
Series e2fsprogs: Parallel fsck support | expand

Commit Message

Ritesh Harjani (IBM) Nov. 7, 2022, 12:20 p.m. UTC
f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced
mutex unlock in below path.

This patch fixes it.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 lib/ext2fs/unix_io.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Theodore Ts'o Nov. 17, 2022, 4:02 p.m. UTC | #1
On Mon, Nov 07, 2022 at 05:50:49PM +0530, Ritesh Harjani (IBM) wrote:
> f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced
> mutex unlock in below path.
> 
> This patch fixes it.
> 
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

This has been fixed in upstream (for a while, actually); what commit
is your patches based upon?

						- Ted
Ritesh Harjani (IBM) Nov. 17, 2022, 6:45 p.m. UTC | #2
On 22/11/17 11:02AM, Theodore Ts'o wrote:
> On Mon, Nov 07, 2022 at 05:50:49PM +0530, Ritesh Harjani (IBM) wrote:
> > f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced
> > mutex unlock in below path.
> > 
> > This patch fixes it.
> > 
> > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> 
> This has been fixed in upstream (for a while, actually); 

Hello Ted, 

I think you are speaking about this patch here [1], which prevents the race
of using the fd by multiple threads at a time, by using BOUNCE_MTX lock.

The current patch on the other hand fixes the unbalanced mutex_unlock(), which 
can be reproduced with f_crashdisk test with UNIX_IO_FORCE_BOUNCE=yes.
(when tested with --enable-threadsan)

Below is the threadsan warning that I see.

WARNING: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong thread) (pid=135059)
    #0 pthread_mutex_unlock <null> (libtsan.so.0+0x3b64a)
    #1 mutex_unlock ../../../lib/ext2fs/unix_io.c:161 (e2fsck+0x5b392b)
    #2 raw_read_blk ../../../lib/ext2fs/unix_io.c:331 (e2fsck+0x5b392b)
    #3 unix_read_blk64 ../../../lib/ext2fs/unix_io.c:1001 (e2fsck+0x5b4981)
    #4 unix_read_blk ../../../lib/ext2fs/unix_io.c:1053 (e2fsck+0x5b5171)
    #5 ext2fs_open2 ../../../lib/ext2fs/openfs.c:228 (e2fsck+0x579e5c)
    #6 try_open_fs ../../e2fsck/unix.c:1242 (e2fsck+0x424e63)
    #7 main ../../e2fsck/unix.c:1604 (e2fsck+0x41a1d2)

  Location is heap block of size 376 at 0x7b4800000000 allocated by main thread:
    #0 malloc <null> (libtsan.so.0+0x32919)
    #1 ext2fs_get_mem ../../../lib/ext2fs/ext2fs.h:1944 (e2fsck+0x5aec30)
    #2 unix_open_channel ../../../lib/ext2fs/unix_io.c:698 (e2fsck+0x5aec30)
    #3 unix_open ../../../lib/ext2fs/unix_io.c:910 (e2fsck+0x5afd67)
    #4 ext2fs_open2 ../../../lib/ext2fs/openfs.c:175 (e2fsck+0x579918)
    #5 try_open_fs ../../e2fsck/unix.c:1242 (e2fsck+0x424e63)
    #6 main ../../e2fsck/unix.c:1604 (e2fsck+0x41a1d2)

  Mutex M22 (0x7b4800000128) created at:
    #0 pthread_mutex_init <null> (libtsan.so.0+0x49603)
    #1 unix_open_channel ../../../lib/ext2fs/unix_io.c:827 (e2fsck+0x5af5ef)
    #2 unix_open ../../../lib/ext2fs/unix_io.c:910 (e2fsck+0x5afd67)
    #3 ext2fs_open2 ../../../lib/ext2fs/openfs.c:175 (e2fsck+0x579918)
    #4 try_open_fs ../../e2fsck/unix.c:1242 (e2fsck+0x424e63)
    #5 main ../../e2fsck/unix.c:1604 (e2fsck+0x41a1d2)

SUMMARY: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong thread) (/lib64/libtsan.so.0+0x3b64a) in pthread_mutex_unlock

[1]: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=d557b9659ba97e093f842dcc7e2cfe9a7022c674

> what commit is your patches based upon?

1. This series is rebased on the latest e2fsprogs master branch.
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/log/

Cover letter might provide some more details on how the patch series is broken
into. The github link of the series can be found at 
https://github.com/riteshharjani/e2fsprogs/commits/pfsck-RFCv1

Note: I have provided some details in the cover letter about Patch-073 i.e.
("tests: Add multi-thread tests framework"). But putting it once again here... 
You can ignore this ^^^ last patch in the github link.
This was added to tests all existing e2fsck test with pfsck mode.
But later I realized that it will require more work then how it is in the current
form. Since I didn't want to hold the patch series any longer and I wanted this
series to go through the initial round of review, before it becomes any
bigger and difficult to review, hence I decided to drop the last patch from
sending it to mailing list.

Please let me know if you have any queries on any patch/series.

-ritesh
Andreas Dilger Nov. 18, 2022, 10:34 a.m. UTC | #3
On Nov 7, 2022, at 06:22, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote:
> 
> f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced
> mutex unlock in below path.
> 
> This patch fixes it.
> 
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
> lib/ext2fs/unix_io.c | 1 -
> 1 file changed, 1 deletion(-)
> 
> diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
> index e53db333..5b894826 100644
> --- a/lib/ext2fs/unix_io.c
> +++ b/lib/ext2fs/unix_io.c
> @@ -305,7 +305,6 @@ bounce_read:
>    while (size > 0) {
>        actual = read(data->dev, data->bounce, align_size);
>        if (actual != align_size) {
> -            mutex_unlock(data, BOUNCE_MTX);

This patch doesn't show enough context, but AFAIK this is jumping before mutex_down()
is called, so this *should* be correct as is?

Cheers, Andreas


>            actual = really_read;
>            buf -= really_read;
>            size += really_read;
> -- 
> 2.37.3
>
Ritesh Harjani (IBM) Nov. 18, 2022, 11:37 a.m. UTC | #4
On 22/11/18 04:34AM, Andreas Dilger wrote:
> On Nov 7, 2022, at 06:22, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote:
> > 
> > f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced
> > mutex unlock in below path.
> > 
> > This patch fixes it.
> > 
> > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> > ---
> > lib/ext2fs/unix_io.c | 1 -
> > 1 file changed, 1 deletion(-)
> > 
> > diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
> > index e53db333..5b894826 100644
> > --- a/lib/ext2fs/unix_io.c
> > +++ b/lib/ext2fs/unix_io.c
> > @@ -305,7 +305,6 @@ bounce_read:
> >    while (size > 0) {
> >        actual = read(data->dev, data->bounce, align_size);
> >        if (actual != align_size) {
> > -            mutex_unlock(data, BOUNCE_MTX);
> 
> This patch doesn't show enough context, but AFAIK this is jumping before mutex_down()
> is called, so this *should* be correct as is?

Thanks for the review, Andreas.

Yeah, the patch diff above is not sufficient since it doesn't share enuf
context.
But essentially when "actual" is not equal to "align_size", then in this if
condition it goes to label "short_read:", which always goto error_unlock,
where we anyways call mutex_unlock()

Looking at a lot of labels in this function, this definitely looks like 
something which can be cleaned up ("raw_read_blk()"). 
I will add that to my list of todos. 

I have also shared the threadsan warning which detects the unbalanced mutex 
unlock here [1]

[1]: https://lore.kernel.org/linux-ext4/cover.1667822611.git.ritesh.list@gmail.com/T/#md75b3ccb146e4433653bc2d7dd01329a9757ea26

-ritesh
Andreas Dilger Nov. 18, 2022, 1:20 p.m. UTC | #5
On Nov 18, 2022, at 05:37, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote:
> 
> On 22/11/18 04:34AM, Andreas Dilger wrote:
>>> On Nov 7, 2022, at 06:22, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote:
>>> 
>>> f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced
>>> mutex unlock in below path.
>>> 
>>> This patch fixes it.
>>> 
>>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>>> ---
>>> lib/ext2fs/unix_io.c | 1 -
>>> 1 file changed, 1 deletion(-)
>>> 
>>> diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
>>> index e53db333..5b894826 100644
>>> --- a/lib/ext2fs/unix_io.c
>>> +++ b/lib/ext2fs/unix_io.c
>>> @@ -305,7 +305,6 @@ bounce_read:
>>>   while (size > 0) {
>>>       actual = read(data->dev, data->bounce, align_size);
>>>       if (actual != align_size) {
>>> -            mutex_unlock(data, BOUNCE_MTX);
>> 
>> This patch doesn't show enough context, but AFAIK this is jumping before mutex_down()
>> is called, so this *should* be correct as is?
> 
> Thanks for the review, Andreas.
> 
> Yeah, the patch diff above is not sufficient since it doesn't share enuf
> context.
> But essentially when "actual" is not equal to "align_size", then in this if
> condition it goes to label "short_read:", which always goto error_unlock,
> where we anyways call mutex_unlock()
> 
> Looking at a lot of labels in this function, this definitely looks like 
> something which can be cleaned up ("raw_read_blk()"). 
> I will add that to my list of todos. 

You are correct, and it means this code is just not very clear to the reader. I think it
would make more sense to move the "short_read:" label to the end of the code:

                  actual = read(...);
                  if (actual != size)
                          goto error_short_read;
                  goto success_unlock;
        :
                 actual = read(...);
                 if (actual != align_size) {
                           actual = really_read;
                           buf -= really_read;
                           size += really_read;
                           goto error_short_read;
                 }
        :
success_unlock:
        mutex_unlock(...);
        return 0;

error_short_read:
        if (actual < 0) {
                 retval = errno;
                 actual = 0;
        } else {
                 retval = EXT2_ET_SHORT_READ;
        }
error_unlock:
        mutex_unlock(...);

That way the code follows the normal error handling convention and is less likely to be
surprising to the reader. 

Cheers, Andreas
Ritesh Harjani (IBM) Nov. 19, 2022, 3:46 a.m. UTC | #6
On 22/11/18 07:20AM, Andreas Dilger wrote:
> On Nov 18, 2022, at 05:37, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote:
> > 
> > On 22/11/18 04:34AM, Andreas Dilger wrote:
> >>> On Nov 7, 2022, at 06:22, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote:
> >>> 
> >>> f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced
> >>> mutex unlock in below path.
> >>> 
> >>> This patch fixes it.
> >>> 
> >>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> >>> ---
> >>> lib/ext2fs/unix_io.c | 1 -
> >>> 1 file changed, 1 deletion(-)
> >>> 
> >>> diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
> >>> index e53db333..5b894826 100644
> >>> --- a/lib/ext2fs/unix_io.c
> >>> +++ b/lib/ext2fs/unix_io.c
> >>> @@ -305,7 +305,6 @@ bounce_read:
> >>>   while (size > 0) {
> >>>       actual = read(data->dev, data->bounce, align_size);
> >>>       if (actual != align_size) {
> >>> -            mutex_unlock(data, BOUNCE_MTX);
> >> 
> >> This patch doesn't show enough context, but AFAIK this is jumping before mutex_down()
> >> is called, so this *should* be correct as is?
> > 
> > Thanks for the review, Andreas.
> > 
> > Yeah, the patch diff above is not sufficient since it doesn't share enuf
> > context.
> > But essentially when "actual" is not equal to "align_size", then in this if
> > condition it goes to label "short_read:", which always goto error_unlock,
> > where we anyways call mutex_unlock()
> > 
> > Looking at a lot of labels in this function, this definitely looks like 
> > something which can be cleaned up ("raw_read_blk()"). 
> > I will add that to my list of todos. 
> 
> You are correct, and it means this code is just not very clear to the reader. I think it
> would make more sense to move the "short_read:" label to the end of the code:
> 
>                   actual = read(...);
>                   if (actual != size)
>                           goto error_short_read;
>                   goto success_unlock;
>         :
>                  actual = read(...);
>                  if (actual != align_size) {
>                            actual = really_read;
>                            buf -= really_read;
>                            size += really_read;
>                            goto error_short_read;
>                  }
>         :
> success_unlock:
>         mutex_unlock(...);
>         return 0;
> 
> error_short_read:
>         if (actual < 0) {
>                  retval = errno;
>                  actual = 0;
>         } else {
>                  retval = EXT2_ET_SHORT_READ;
>         }
> error_unlock:
>         mutex_unlock(...);
> 
> That way the code follows the normal error handling convention and is less likely to be
> surprising to the reader. 

Yes, you are right. I will do the change in the next rev.

-ritesh
Theodore Ts'o Jan. 24, 2023, 4:40 p.m. UTC | #7
On Mon, Nov 07, 2022 at 05:50:49PM +0530, Ritesh Harjani (IBM) wrote:
> f_crashdisk test failed with UNIX_IO_FORCE_BOUNCE=yes due to unbalanced
> mutex unlock in below path.
> 
> This patch fixes it.
> 
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

Applied, thanks.

						- Ted
diff mbox series

Patch

diff --git a/lib/ext2fs/unix_io.c b/lib/ext2fs/unix_io.c
index e53db333..5b894826 100644
--- a/lib/ext2fs/unix_io.c
+++ b/lib/ext2fs/unix_io.c
@@ -305,7 +305,6 @@  bounce_read:
 	while (size > 0) {
 		actual = read(data->dev, data->bounce, align_size);
 		if (actual != align_size) {
-			mutex_unlock(data, BOUNCE_MTX);
 			actual = really_read;
 			buf -= really_read;
 			size += really_read;