diff mbox

[3/4] UBI: Fastmap: Care about the protection queue

Message ID 1412029248-22454-4-git-send-email-richard@nod.at
State Changes Requested
Headers show

Commit Message

Richard Weinberger Sept. 29, 2014, 10:20 p.m. UTC
Fastmap can miss a PEB if it is in the protection queue
and not jet in the used tree.
Treat every protected PEB as used.

Signed-off-by: Richard Weinberger <richard@nod.at>
---
 drivers/mtd/ubi/fastmap.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Tatyana Brokhman Oct. 2, 2014, 1:28 p.m. UTC | #1
Hi Richard

On 9/30/2014 1:20 AM, Richard Weinberger wrote:
> Fastmap can miss a PEB if it is in the protection queue
> and not jet in the used tree.
> Treat every protected PEB as used.
>
> Signed-off-by: Richard Weinberger <richard@nod.at>
> ---
>   drivers/mtd/ubi/fastmap.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
>
> diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
> index 2b0d8d6..2853a69 100644
> --- a/drivers/mtd/ubi/fastmap.c
> +++ b/drivers/mtd/ubi/fastmap.c
> @@ -1195,6 +1195,19 @@ static int ubi_write_fastmap(struct ubi_device *ubi,
>   		fm_pos += sizeof(*fec);
>   		ubi_assert(fm_pos <= ubi->fm_size);
>   	}
> +
> +	for (i = 0; i < UBI_PROT_QUEUE_LEN; i++) {
> +		list_for_each_entry(wl_e, &ubi->pq[i], u.list) {

why not list_for_each_entry_safe?

> +			fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);
> +
> +			fec->pnum = cpu_to_be32(wl_e->pnum);
> +			fec->ec = cpu_to_be32(wl_e->ec);
> +
> +			used_peb_count++;
> +			fm_pos += sizeof(*fec);
> +			ubi_assert(fm_pos <= ubi->fm_size);

Is fm_size ok with this addition or does it needs updating as well?

> +		}
> +	}
>   	fmh->used_peb_count = cpu_to_be32(used_peb_count);
>
>   	for (node = rb_first(&ubi->scrub); node; node = rb_next(node)) {
>

Thanks
Tanya Brokhman
Richard Weinberger Oct. 2, 2014, 1:32 p.m. UTC | #2
Am 02.10.2014 15:28, schrieb Tanya Brokhman:
> Hi Richard
> 
> On 9/30/2014 1:20 AM, Richard Weinberger wrote:
>> Fastmap can miss a PEB if it is in the protection queue
>> and not jet in the used tree.
>> Treat every protected PEB as used.
>>
>> Signed-off-by: Richard Weinberger <richard@nod.at>
>> ---
>>   drivers/mtd/ubi/fastmap.c | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
>> index 2b0d8d6..2853a69 100644
>> --- a/drivers/mtd/ubi/fastmap.c
>> +++ b/drivers/mtd/ubi/fastmap.c
>> @@ -1195,6 +1195,19 @@ static int ubi_write_fastmap(struct ubi_device *ubi,
>>           fm_pos += sizeof(*fec);
>>           ubi_assert(fm_pos <= ubi->fm_size);
>>       }
>> +
>> +    for (i = 0; i < UBI_PROT_QUEUE_LEN; i++) {
>> +        list_for_each_entry(wl_e, &ubi->pq[i], u.list) {
> 
> why not list_for_each_entry_safe?

Because we don't delete elements from this list while iterating over it.

>> +            fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);
>> +
>> +            fec->pnum = cpu_to_be32(wl_e->pnum);
>> +            fec->ec = cpu_to_be32(wl_e->ec);
>> +
>> +            used_peb_count++;
>> +            fm_pos += sizeof(*fec);
>> +            ubi_assert(fm_pos <= ubi->fm_size);
> 
> Is fm_size ok with this addition or does it needs updating as well?

It is okay. The fastmap size calculation reserves enough space for all possible
PEBs.

Thanks,
//richard
Tatyana Brokhman Oct. 2, 2014, 2:14 p.m. UTC | #3
On 10/2/2014 4:32 PM, Richard Weinberger wrote:
> Am 02.10.2014 15:28, schrieb Tanya Brokhman:
>> Hi Richard
>>
>> On 9/30/2014 1:20 AM, Richard Weinberger wrote:
>>> Fastmap can miss a PEB if it is in the protection queue
>>> and not jet in the used tree.
>>> Treat every protected PEB as used.
>>>
>>> Signed-off-by: Richard Weinberger <richard@nod.at>
>>> ---
>>>    drivers/mtd/ubi/fastmap.c | 13 +++++++++++++
>>>    1 file changed, 13 insertions(+)
>>>
>>> diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
>>> index 2b0d8d6..2853a69 100644
>>> --- a/drivers/mtd/ubi/fastmap.c
>>> +++ b/drivers/mtd/ubi/fastmap.c
>>> @@ -1195,6 +1195,19 @@ static int ubi_write_fastmap(struct ubi_device *ubi,
>>>            fm_pos += sizeof(*fec);
>>>            ubi_assert(fm_pos <= ubi->fm_size);
>>>        }
>>> +
>>> +    for (i = 0; i < UBI_PROT_QUEUE_LEN; i++) {
>>> +        list_for_each_entry(wl_e, &ubi->pq[i], u.list) {
>>
>> why not list_for_each_entry_safe?
>
> Because we don't delete elements from this list while iterating over it.
>
>>> +            fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);
>>> +
>>> +            fec->pnum = cpu_to_be32(wl_e->pnum);
>>> +            fec->ec = cpu_to_be32(wl_e->ec);
>>> +
>>> +            used_peb_count++;
>>> +            fm_pos += sizeof(*fec);
>>> +            ubi_assert(fm_pos <= ubi->fm_size);
>>
>> Is fm_size ok with this addition or does it needs updating as well?
>
> It is okay. The fastmap size calculation reserves enough space for all possible
> PEBs.
>
> Thanks,
> //richard
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>

Reviewed-by: Tanya Brokhman <tlinder@codeaurora.org>
Artem Bityutskiy Oct. 3, 2014, 2:31 p.m. UTC | #4
On Tue, 2014-09-30 at 00:20 +0200, Richard Weinberger wrote:
> +
> +	for (i = 0; i < UBI_PROT_QUEUE_LEN; i++) {
> +		list_for_each_entry(wl_e, &ubi->pq[i], u.list) {
> +			fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);

Similarly to my other posts, is it possible to not touch the WL-internal
things like 'ubi->pq[]' from fastmap.c? Can we come up with a function
at wl.c which fastmap.c could call instead?
Richard Weinberger Oct. 3, 2014, 7:06 p.m. UTC | #5
Am 03.10.2014 16:31, schrieb Artem Bityutskiy:
> On Tue, 2014-09-30 at 00:20 +0200, Richard Weinberger wrote:
>> +
>> +	for (i = 0; i < UBI_PROT_QUEUE_LEN; i++) {
>> +		list_for_each_entry(wl_e, &ubi->pq[i], u.list) {
>> +			fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);
> 
> Similarly to my other posts, is it possible to not touch the WL-internal
> things like 'ubi->pq[]' from fastmap.c? Can we come up with a function
> at wl.c which fastmap.c could call instead?

Fastmap needs basically access to all internal state of UBI, which lives mostly
within wl.c
It needs to iterate over the used, free, erase, scrub RB-trees, the protection queue, etc. to
collect the exact state of all PEBs.

In C++ or Java I would pass an iterator to fastmap.c using a function in wl.c but I'm not sure
how to do such a thing in a sane way in C.

What about having a wl-fastmap.c file which acts as glue layer between fastmap.c and wl.c?

Thanks,
//richard
Artem Bityutskiy Oct. 13, 2014, 1:17 p.m. UTC | #6
On Fri, 2014-10-03 at 21:06 +0200, Richard Weinberger wrote:
> Fastmap needs basically access to all internal state of UBI, which
> lives mostly
> within wl.c

Sounds like a very strong assertion, smells a bit fishy, need the
details.

> It needs to iterate over the used, free, erase, scrub RB-trees, the
> protection queue, etc. to
> collect the exact state of all PEBs.

When? In 'ubi_write_fastmap()' when forming the FM pool data structures?

I think you can just add a function like this to wl.c:

int ubi_wl_get_peb_info(int pnum, struct ubi_wl_peb_info *pebinfo)

Yoy will give it the PEB number you are interested in, it will return
you the information you need in 'pebinfo'. The information will contain
the EC, the state (used, free, etc).

If you need just the EC, then you do not even need the ubi_wl_peb_info
data structure.

Then you just iterate over all PEBs and fill the FM pool data
structures.

Would something like this work?

Artem.
Richard Weinberger Oct. 13, 2014, 2:30 p.m. UTC | #7
Am 13.10.2014 um 15:17 schrieb Artem Bityutskiy:
> On Fri, 2014-10-03 at 21:06 +0200, Richard Weinberger wrote:
>> Fastmap needs basically access to all internal state of UBI, which
>> lives mostly
>> within wl.c
> 
> Sounds like a very strong assertion, smells a bit fishy, need the
> details.

Fastmap need to know the exact state of each PEB.
I.e. is it free, used, scheduled for erase, etc...

>> It needs to iterate over the used, free, erase, scrub RB-trees, the
>> protection queue, etc. to
>> collect the exact state of all PEBs.
> 
> When? In 'ubi_write_fastmap()' when forming the FM pool data structures?

Yes.

> I think you can just add a function like this to wl.c:
> 
> int ubi_wl_get_peb_info(int pnum, struct ubi_wl_peb_info *pebinfo)
> 
> Yoy will give it the PEB number you are interested in, it will return
> you the information you need in 'pebinfo'. The information will contain
> the EC, the state (used, free, etc).
> 
> If you need just the EC, then you do not even need the ubi_wl_peb_info
> data structure.
> 
> Then you just iterate over all PEBs and fill the FM pool data
> structures.
> 
> Would something like this work?

The interface would work but some work in wl.c is needed.
For example if I want to find out in which state PEB 1 is wl.c would have to
look int free free, used free, protection queue, etc.. to tell me the state. This is slow.

But we could add the state information to struct ubi_wl_entry by adding a single integer attribute called "state" or "flags".
Then wl.c can set the state every time the PEB moves between internal data structures.
I.e. upon it moves from the used to free rb tree the flag UBI_STATE_FREE will be cleared and UBI_STATE_USED set.
This will also give us a very efficient way to ensure a sane state.
In fact, I have already written such a feature. I needed it to hunt down a Fastmap bug which lead to a state where a PEB existed
in both the used tree and the protection queue. Using the ->state attribute in ubi_wl_entry I was able to add various
cheap asserts to the code and found the incorrect transaction.

The cost of this bloating the ubi_wl_entry struct by sizeof(int) bytes.
But we'll get very efficient variants of self_check_in_wl_tree() and friends.

Also the implementation of ubi_wl_get_peb_info() would be easy.
It will be mostly like:
int ubi_wl_get_peb_info(int pnum, struct ubi_wl_peb_info *pebinfo)
{
...
e = ubi->lookuptbl[pnum];
fill_pebinfo(pebinfo, e->state);
...
}

Would this make you happy? :)

Thanks,
//richard
Artem Bityutskiy Oct. 13, 2014, 3:23 p.m. UTC | #8
On Mon, 2014-10-13 at 16:30 +0200, Richard Weinberger wrote:
> Am 13.10.2014 um 15:17 schrieb Artem Bityutskiy:
> > On Fri, 2014-10-03 at 21:06 +0200, Richard Weinberger wrote:
> >> Fastmap needs basically access to all internal state of UBI, which
> >> lives mostly
> >> within wl.c
> > 
> > Sounds like a very strong assertion, smells a bit fishy, need the
> > details.
> 
> Fastmap need to know the exact state of each PEB.
> I.e. is it free, used, scheduled for erase, etc...
> 
> >> It needs to iterate over the used, free, erase, scrub RB-trees, the
> >> protection queue, etc. to
> >> collect the exact state of all PEBs.
> > 
> > When? In 'ubi_write_fastmap()' when forming the FM pool data structures?
> 
> Yes.
> 
> > I think you can just add a function like this to wl.c:
> > 
> > int ubi_wl_get_peb_info(int pnum, struct ubi_wl_peb_info *pebinfo)
> > 
> > Yoy will give it the PEB number you are interested in, it will return
> > you the information you need in 'pebinfo'. The information will contain
> > the EC, the state (used, free, etc).
> > 
> > If you need just the EC, then you do not even need the ubi_wl_peb_info
> > data structure.
> > 
> > Then you just iterate over all PEBs and fill the FM pool data
> > structures.
> > 
> > Would something like this work?
> 
> The interface would work but some work in wl.c is needed.
> For example if I want to find out in which state PEB 1 is wl.c would have to
> look int free free, used free, protection queue, etc.. to tell me the state. This is slow.

Well, used and free are RB-trees, looking them up is slow.

If what you need is to go through all used and free PEBs, then you can
introduce some kind of

struct ubi_wl_entry *ubi_wl_get_next_used(struct ubi_wl_entry *prev)

function, and similar functions for the free.

I would return you the next entry (or NULL would indicate the end), and
it would take the previous entry as the input, or NULL for the first
call.

We'd need to take the locks before calling this function. This is
cleaner than what we do now, right?

> But we could add the state information to struct ubi_wl_entry by adding a single integer attribute called "state" or "flags".

But there is a price - memory consumption. We do not want to pay it just
for making the inter-subsystems boundaries better, there ought to be a
better reason.

Say, for an (imaginary) 8GiB NAND chip with 128KiB PEB size this would
cost 256KiB of RAM.

Squeezing the state into the last 2 bits a memory reference would be
another possibility, BTW. Not elegant, though...

> Would this make you happy? :)

Not very, I'd save this for the last resort solution.
Bityutskiy, Artem Oct. 13, 2014, 3:28 p.m. UTC | #9
On Mon, 2014-10-13 at 18:23 +0300, Artem Bityutskiy wrote:
> > The interface would work but some work in wl.c is needed.
> > For example if I want to find out in which state PEB 1 is wl.c would have to
> > look int free free, used free, protection queue, etc.. to tell me the state. This is slow.
> 
> Well, used and free are RB-trees, looking them up is slow.

I wanted to say "not slow" here, sorry.
Richard Weinberger Oct. 13, 2014, 9:04 p.m. UTC | #10
Am 13.10.2014 um 17:23 schrieb Artem Bityutskiy:
> Well, used and free are RB-trees, looking them up is slow.

This is true but we'd have to look it up in multiple trees and the protection queue...

> If what you need is to go through all used and free PEBs, then you can
> introduce some kind of
> 
> struct ubi_wl_entry *ubi_wl_get_next_used(struct ubi_wl_entry *prev)
> 
> function, and similar functions for the free.
> 
> I would return you the next entry (or NULL would indicate the end), and
> it would take the previous entry as the input, or NULL for the first
> call.
> 
> We'd need to take the locks before calling this function. This is
> cleaner than what we do now, right?

ubi_update_fastmap() takes ubi->wl_lock anyway to block any changes in the free, used, etc. trees
to make sure that the to be taken state snapshot is consistent.

>> But we could add the state information to struct ubi_wl_entry by adding a single integer attribute called "state" or "flags".
> 
> But there is a price - memory consumption. We do not want to pay it just
> for making the inter-subsystems boundaries better, there ought to be a
> better reason.
> 
> Say, for an (imaginary) 8GiB NAND chip with 128KiB PEB size this would
> cost 256KiB of RAM.

Is 128KiB PEB size still realistic on modern NANDs?
Even if, 256KiB are not much and the kernel consumes this additionally with
every new release.
But I can understand your concerns.

> Squeezing the state into the last 2 bits a memory reference would be
> another possibility, BTW. Not elegant, though...
> 
>> Would this make you happy? :)
> 
> Not very, I'd save this for the last resort solution.

Okay, I'll try harder to make you happy.

Thanks,
//richard
Artem Bityutskiy Oct. 14, 2014, 10:23 a.m. UTC | #11
On Mon, 2014-10-13 at 23:04 +0200, Richard Weinberger wrote:
> Am 13.10.2014 um 17:23 schrieb Artem Bityutskiy:
> > Well, used and free are RB-trees, looking them up is slow.
> 
> This is true but we'd have to look it up in multiple trees and the protection queue...

Right. 2 RB-trees, and one list. The list is empty most of the time, or
contains one element. 

So we'd look-up 2 RB-trees most of the time. Very rarely we'd need to
look at the list containing very few elements.

Not that bad, I think.

> ubi_update_fastmap() takes ubi->wl_lock anyway to block any changes in the free, used, etc. trees
> to make sure that the to be taken state snapshot is consistent.

I think this is fine.

> > But there is a price - memory consumption. We do not want to pay it just
> > for making the inter-subsystems boundaries better, there ought to be a
> > better reason.
> > 
> > Say, for an (imaginary) 8GiB NAND chip with 128KiB PEB size this would
> > cost 256KiB of RAM.
> 
> Is 128KiB PEB size still realistic on modern NANDs?
> Even if, 256KiB are not much and the kernel consumes this additionally with
> every new release.

Right, but the point is that bigger systems use eMMC and wouldn't bother
with raw flash. Raw flash trending down the smaller systems, where a
hundred of kilobytes matters.

> But I can understand your concerns.

Thanks, yes, my point is to be careful about the RAM we consume, and try
to avoid growing this data structure, an only do it if we have to.

> Okay, I'll try harder to make you happy.

Well, making me happy is not the point of course :-)

Thanks!

--
Artem
Tatyana Brokhman Oct. 14, 2014, 12:21 p.m. UTC | #12
On 10/14/2014 1:23 PM, Artem Bityutskiy wrote:
> On Mon, 2014-10-13 at 23:04 +0200, Richard Weinberger wrote:
>> Am 13.10.2014 um 17:23 schrieb Artem Bityutskiy:
>>> Well, used and free are RB-trees, looking them up is slow.
>>
>> This is true but we'd have to look it up in multiple trees and the protection queue...
>
> Right. 2 RB-trees, and one list. The list is empty most of the time, or
> contains one element.
>
> So we'd look-up 2 RB-trees most of the time. Very rarely we'd need to
> look at the list containing very few elements.
>
> Not that bad, I think.
>
>> ubi_update_fastmap() takes ubi->wl_lock anyway to block any changes in the free, used, etc. trees
>> to make sure that the to be taken state snapshot is consistent.
>
> I think this is fine.
>
>>> But there is a price - memory consumption. We do not want to pay it just
>>> for making the inter-subsystems boundaries better, there ought to be a
>>> better reason.
>>>
>>> Say, for an (imaginary) 8GiB NAND chip with 128KiB PEB size this would
>>> cost 256KiB of RAM.
>>
>> Is 128KiB PEB size still realistic on modern NANDs?
>> Even if, 256KiB are not much and the kernel consumes this additionally with
>> every new release.
>
> Right, but the point is that bigger systems use eMMC and wouldn't bother
> with raw flash. Raw flash trending down the smaller systems, where a
> hundred of kilobytes matters.
>
>> But I can understand your concerns.
>
> Thanks, yes, my point is to be careful about the RAM we consume, and try
> to avoid growing this data structure, an only do it if we have to.
>
>> Okay, I'll try harder to make you happy.
>
> Well, making me happy is not the point of course :-)
>
> Thanks!
>
> --
> Artem
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>

Hi Artem/Richard

I think your discussion here stopped being relevant to this specific 
patch but went on to the fastmap feature design in general :)
This patch fixes a real bug in the current implementation of the 
feature. What you're discussing requires a re-writing and re-design of 
the feature. Perhaps this one can be merged and will be "fixed" later on 
when you agree on how you would like FM to access WL data structures in 
general?

Thanks,
Tanya Brokhman
Artem Bityutskiy Oct. 14, 2014, 1:02 p.m. UTC | #13
On Tue, 2014-10-14 at 15:21 +0300, Tanya Brokhman wrote:
> Hi Artem/Richard
> 
> I think your discussion here stopped being relevant to this specific 
> patch but went on to the fastmap feature design in general :)
> This patch fixes a real bug in the current implementation of the 
> feature. What you're discussing requires a re-writing and re-design of 
> the feature. Perhaps this one can be merged and will be "fixed" later on 
> when you agree on how you would like FM to access WL data structures in 
> general?

First of all, "re-writing and re-design of the feature" is an
overstatement. So far this is on the "cleaning things up" side of the
spectrum, closer to the "re-factoring" area.

WRT "merge the fix now and improve later" - this is a good argument for
an "inside a company" discussion, where the primary TTM is the driving
factor.

For the community TTM is a good thing, but quality comes first.

Now, if this was about a regression, one could apply time pressure on
the maintainer. But we are talking about a problem which was there from
day 0.

It is completely normal for the maintainer to push back various
hot-fixes for the problem and request some reasonable re-factoring
first. This is what I do. This is very very usual thing in the Linux
community.

So far I did not ask anything huge and unreasonable, I think. Just
cleaner inter-subsystem APIs, less of the "fastmap uses the other
subsystems' internals" kind of things.

--
Artem.
Tatyana Brokhman Oct. 14, 2014, 1:35 p.m. UTC | #14
On 10/14/2014 4:02 PM, Artem Bityutskiy wrote:
> On Tue, 2014-10-14 at 15:21 +0300, Tanya Brokhman wrote:
>> Hi Artem/Richard
>>
>> I think your discussion here stopped being relevant to this specific
>> patch but went on to the fastmap feature design in general :)
>> This patch fixes a real bug in the current implementation of the
>> feature. What you're discussing requires a re-writing and re-design of
>> the feature. Perhaps this one can be merged and will be "fixed" later on
>> when you agree on how you would like FM to access WL data structures in
>> general?
>
> First of all, "re-writing and re-design of the feature" is an
> overstatement. So far this is on the "cleaning things up" side of the
> spectrum, closer to the "re-factoring" area.
>
> WRT "merge the fix now and improve later" - this is a good argument for
> an "inside a company" discussion, where the primary TTM is the driving
> factor.
>
> For the community TTM is a good thing, but quality comes first.
>
> Now, if this was about a regression, one could apply time pressure on
> the maintainer. But we are talking about a problem which was there from
> day 0.
>
> It is completely normal for the maintainer to push back various
> hot-fixes for the problem and request some reasonable re-factoring
> first. This is what I do. This is very very usual thing in the Linux
> community.
>
> So far I did not ask anything huge and unreasonable, I think. Just
> cleaner inter-subsystem APIs, less of the "fastmap uses the other
> subsystems' internals" kind of things.
>
> --
> Artem.
>

Ok, accepted. It was just a suggestion. I'm all for quality coming 
first, even if you were asking for something "huge".

Thanks,
Tanya Brokhman
Richard Weinberger Oct. 16, 2014, 10:06 a.m. UTC | #15
Am 14.10.2014 um 12:23 schrieb Artem Bityutskiy:
> On Mon, 2014-10-13 at 23:04 +0200, Richard Weinberger wrote:
>> Am 13.10.2014 um 17:23 schrieb Artem Bityutskiy:
>>> Well, used and free are RB-trees, looking them up is slow.
>>
>> This is true but we'd have to look it up in multiple trees and the protection queue...
> 
> Right. 2 RB-trees, and one list. The list is empty most of the time, or
> contains one element. 

Actually it is the free, used and scrub tree plus the protection queue.
Then there is another place where PEBs can hide, the erase work queue.
This brings me to an issue I've recently identified and fixed in my local queue.

ubi_wl_put_peb() does the following:
                if (in_wl_tree(e, &ubi->used)) {
                        self_check_in_wl_tree(ubi, e, &ubi->used);
                        rb_erase(&e->u.rb, &ubi->used);
                } else if (in_wl_tree(e, &ubi->scrub)) {
                        self_check_in_wl_tree(ubi, e, &ubi->scrub);
                        rb_erase(&e->u.rb, &ubi->scrub);
                } else if (in_wl_tree(e, &ubi->erroneous)) {
                        self_check_in_wl_tree(ubi, e, &ubi->erroneous);
                        rb_erase(&e->u.rb, &ubi->erroneous);
                        ubi->erroneous_peb_count -= 1;
                        ubi_assert(ubi->erroneous_peb_count >= 0);
                        /* Erroneous PEBs should be tortured */
                        torture = 1;
                } else {
                        err = prot_queue_del(ubi, e->pnum);
                        if (err) {
                                ubi_err("PEB %d not found", pnum);
                                ubi_ro_mode(ubi);
                                spin_unlock(&ubi->wl_lock);
                                return err;
                        }
                }
        }
        spin_unlock(&ubi->wl_lock);

        err = schedule_erase(ubi, e, vol_id, lnum, torture);

If it happens that a fastmap write is needed between spin_unlock() and schedule_erase(), fastmap
will miss this PEB. Because it go already deleted from one of the RB trees or the protection queue but not
jet queued to the work list. Yes, I hit this bug in real world.

My solution is marking the ubi_wl_entry's ->state attribute with a flag like UBI_WLE_STATE_ERASE.
This way I was also able to get rid of the ubi_is_erase_work() wart.

> So we'd look-up 2 RB-trees most of the time. Very rarely we'd need to
> look at the list containing very few elements.
> 
> Not that bad, I think.
> 
>> ubi_update_fastmap() takes ubi->wl_lock anyway to block any changes in the free, used, etc. trees
>> to make sure that the to be taken state snapshot is consistent.
> 
> I think this is fine.
> 
>>> But there is a price - memory consumption. We do not want to pay it just
>>> for making the inter-subsystems boundaries better, there ought to be a
>>> better reason.
>>>
>>> Say, for an (imaginary) 8GiB NAND chip with 128KiB PEB size this would
>>> cost 256KiB of RAM.
>>
>> Is 128KiB PEB size still realistic on modern NANDs?
>> Even if, 256KiB are not much and the kernel consumes this additionally with
>> every new release.
> 
> Right, but the point is that bigger systems use eMMC and wouldn't bother
> with raw flash. Raw flash trending down the smaller systems, where a
> hundred of kilobytes matters.
> 
>> But I can understand your concerns.
> 
> Thanks, yes, my point is to be careful about the RAM we consume, and try
> to avoid growing this data structure, an only do it if we have to.

Currently I'm using an integer variable. But a char would also do it. I need only
a few flags.

What I'm trying to say is, state tracking would solve the "internal state accessing" problem in a clean and
sane way.

I gave the ubi_get_peb_state() idea a try. Unfortunately it won't work that well as expected.
The UBI Fastmap on-flash layout has a section which contains the state and EC counter of each PEB.
But the PEBs are not stored on flash starting with 0 to ubi->peb_count, they are stored by their state.
First free, then used, then scrub and finally to be erased PEBs. I've chosen this layout to avoid an
addition on-flash state attribute to keep the overall structure small.

So, writing a fastmap like:
for (i = 0; i < ubi->peb_count; i++)
  ubi_get_peb_state(i);

...won't work. I need to iterate first over the free RB tree, then over the used tree, and so on.
This way I'd have to allocate memory and store the state somewhere or iterate multiple times over the same RB trees.

What about this:
Let's create a file like fastmap-wl.c, it will contain all Fastmap functions which deal with internal
wl.c stuff. In wl.c we can add:
#ifdef CONFIG_MTD_UBI_FASTMAP
#include "fastmap-wl.c"
#endif

This way we can get rid of almost all #ifdefs in wl.c and don't have to export a lot of new fastmap specific helper functions in wl.c
to global scope.
Including a c file is common practice in Linux.
A prominent example is USB. See drivers/usb/host/ehci-hcd.c.

What do you think?

Thanks,
//richard
Artem Bityutskiy Oct. 16, 2014, 10:15 a.m. UTC | #16
On Thu, 2014-10-16 at 12:06 +0200, Richard Weinberger wrote:
> What I'm trying to say is, state tracking would solve the "internal state accessing" problem in a clean and
> sane way.

Can you squeeze the stat to the lookup talbe? It contains pointers, so
the last 2 bits could be used for the state.
Richard Weinberger Oct. 16, 2014, 11:07 a.m. UTC | #17
Am 16.10.2014 um 12:15 schrieb Artem Bityutskiy:
> On Thu, 2014-10-16 at 12:06 +0200, Richard Weinberger wrote:
>> What I'm trying to say is, state tracking would solve the "internal state accessing" problem in a clean and
>> sane way.
> 
> Can you squeeze the stat to the lookup talbe? It contains pointers, so
> the last 2 bits could be used for the state.

IMHO this is beyond horrible. :)
The goal is the make things clean in terms of not accessing internal state
from fastmap.c. If the price for that is abusing pointers in magic ways
I'd say no.

Rather consider the fastmap-wl.c idea...

Thanks,
//richard
Artem Bityutskiy Oct. 20, 2014, 2:46 p.m. UTC | #18
On Thu, 2014-10-16 at 12:06 +0200, Richard Weinberger wrote:
> Am 14.10.2014 um 12:23 schrieb Artem Bityutskiy:
> > On Mon, 2014-10-13 at 23:04 +0200, Richard Weinberger wrote:
> >> Am 13.10.2014 um 17:23 schrieb Artem Bityutskiy:
> >>> Well, used and free are RB-trees, looking them up is slow.
> >>
> >> This is true but we'd have to look it up in multiple trees and the protection queue...
> > 
> > Right. 2 RB-trees, and one list. The list is empty most of the time, or
> > contains one element. 
> 
> Actually it is the free, used and scrub tree plus the protection queue.

OK, still does not change my point: 3 trees and a list which is mostly
empty or short. Not that bad.

Besides, the used tree is the large one, contains most of the stuff. The
scrub tree is usually small, and mostly empty. The erroneous tree also
mostly empty.

So this in most of the cases, this will be about looking up a single
tree.

And again, all you need is:

1. all used
2. all scrub
3. all erroneous

> Then there is another place where PEBs can hide, the erase work queue.

That's just fastmap code not doing the right thing. We should not touch
the work queue directly at all. What we _should_ do instead is to make
it empty by asking the subsystem which manages it to flush it.

1. Lock the work queue to prevent anyone from submitting new jobs while
we are in process of writing the fastmap.
2. Flush all works
3. do all the fastmap write stuff
4. Unlock

> This brings me to an issue I've recently identified and fixed in my local queue.
> 
> ubi_wl_put_peb() does the following:
>                 if (in_wl_tree(e, &ubi->used)) {
>                         self_check_in_wl_tree(ubi, e, &ubi->used);
>                         rb_erase(&e->u.rb, &ubi->used);
>                 } else if (in_wl_tree(e, &ubi->scrub)) {
>                         self_check_in_wl_tree(ubi, e, &ubi->scrub);
>                         rb_erase(&e->u.rb, &ubi->scrub);
>                 } else if (in_wl_tree(e, &ubi->erroneous)) {
>                         self_check_in_wl_tree(ubi, e, &ubi->erroneous);
>                         rb_erase(&e->u.rb, &ubi->erroneous);
>                         ubi->erroneous_peb_count -= 1;
>                         ubi_assert(ubi->erroneous_peb_count >= 0);
>                         /* Erroneous PEBs should be tortured */
>                         torture = 1;
>                 } else {
>                         err = prot_queue_del(ubi, e->pnum);
>                         if (err) {
>                                 ubi_err("PEB %d not found", pnum);
>                                 ubi_ro_mode(ubi);
>                                 spin_unlock(&ubi->wl_lock);
>                                 return err;
>                         }
>                 }
>         }
>         spin_unlock(&ubi->wl_lock);
> 
>         err = schedule_erase(ubi, e, vol_id, lnum, torture);
> 
> If it happens that a fastmap write is needed between spin_unlock() and schedule_erase(), fastmap
> will miss this PEB.

You say is that LEBs change while you are writing fastmap. Of course
they do. We should have a locking mechanism in place which would prevent
this.

Mat be wl_lock needs to become a mutex, or may be there should be
separate mutex.

Or may be 'ubi_wl_put_peb()' should go through the fatmap subsystem
which should be able to block it if it is writing fastmap.

Ideally, you should be able to "freeze" the state (of course for a short
time), prepare fastmap data structures in memory, unfreeze, let users to
further IO, and wirte fastmap.

This is roughly what UBIFS does when committing. It freezes all writers,
builds the commit list, unfreezes the writers, and continues the commit
process.

Now, to be that advanced, you need a journal. Without journal, you do
freeze, prepare and write, unfreeze. The end results is that writers are
blocked for longer time, but this may be good enough.

> What do you think?

I think that UBI is memory consumption grows linearly with the flash
growth. Mount time was linear too, fastmap is supposed to improve this.

I know that there are people, who also reported their issues in this
mailing list, who are very concerned about UBI memory consumption.

And I think that fastmap should try really hard to not only improve the
mount time, but also not to make the memory consumption problem worse.

So yes, I understand code niceness argument. I really do. But this
should not make UBI problem to be an even worse problem.

Please, let's try much harder to go without increasing the memory
footprint.

Thanks!
Richard Weinberger Oct. 20, 2014, 3:17 p.m. UTC | #19
Am 20.10.2014 um 16:46 schrieb Artem Bityutskiy:
> On Thu, 2014-10-16 at 12:06 +0200, Richard Weinberger wrote:
>> Am 14.10.2014 um 12:23 schrieb Artem Bityutskiy:
>>> On Mon, 2014-10-13 at 23:04 +0200, Richard Weinberger wrote:
>>>> Am 13.10.2014 um 17:23 schrieb Artem Bityutskiy:
>>>>> Well, used and free are RB-trees, looking them up is slow.
>>>>
>>>> This is true but we'd have to look it up in multiple trees and the protection queue...
>>>
>>> Right. 2 RB-trees, and one list. The list is empty most of the time, or
>>> contains one element. 
>>
>> Actually it is the free, used and scrub tree plus the protection queue.
> 
> OK, still does not change my point: 3 trees and a list which is mostly
> empty or short. Not that bad.
> 
> Besides, the used tree is the large one, contains most of the stuff. The
> scrub tree is usually small, and mostly empty. The erroneous tree also
> mostly empty.
> 
> So this in most of the cases, this will be about looking up a single
> tree.
> 
> And again, all you need is:
> 
> 1. all used
> 2. all scrub
> 3. all erroneous

Agreed.

>> Then there is another place where PEBs can hide, the erase work queue.
> 
> That's just fastmap code not doing the right thing. We should not touch
> the work queue directly at all. What we _should_ do instead is to make
> it empty by asking the subsystem which manages it to flush it.
> 
> 1. Lock the work queue to prevent anyone from submitting new jobs while
> we are in process of writing the fastmap.
> 2. Flush all works
> 3. do all the fastmap write stuff
> 4. Unlock

Are you sure? Flushing all works before every fastmap write will slow UBI down.
The fastmap write is not cheap. The goal should be to make it faster.

>> This brings me to an issue I've recently identified and fixed in my local queue.
>>
>> ubi_wl_put_peb() does the following:
>>                 if (in_wl_tree(e, &ubi->used)) {
>>                         self_check_in_wl_tree(ubi, e, &ubi->used);
>>                         rb_erase(&e->u.rb, &ubi->used);
>>                 } else if (in_wl_tree(e, &ubi->scrub)) {
>>                         self_check_in_wl_tree(ubi, e, &ubi->scrub);
>>                         rb_erase(&e->u.rb, &ubi->scrub);
>>                 } else if (in_wl_tree(e, &ubi->erroneous)) {
>>                         self_check_in_wl_tree(ubi, e, &ubi->erroneous);
>>                         rb_erase(&e->u.rb, &ubi->erroneous);
>>                         ubi->erroneous_peb_count -= 1;
>>                         ubi_assert(ubi->erroneous_peb_count >= 0);
>>                         /* Erroneous PEBs should be tortured */
>>                         torture = 1;
>>                 } else {
>>                         err = prot_queue_del(ubi, e->pnum);
>>                         if (err) {
>>                                 ubi_err("PEB %d not found", pnum);
>>                                 ubi_ro_mode(ubi);
>>                                 spin_unlock(&ubi->wl_lock);
>>                                 return err;
>>                         }
>>                 }
>>         }
>>         spin_unlock(&ubi->wl_lock);
>>
>>         err = schedule_erase(ubi, e, vol_id, lnum, torture);
>>
>> If it happens that a fastmap write is needed between spin_unlock() and schedule_erase(), fastmap
>> will miss this PEB.
> 
> You say is that LEBs change while you are writing fastmap. Of course
> they do. We should have a locking mechanism in place which would prevent
> this.

Not really. As fastmap will take the work semaphore no work is processed while
writing the fastmap. In this case the PEB is in flight between RB trees and the
work queue. Fastmap can miss it. But it won't change.

> Mat be wl_lock needs to become a mutex, or may be there should be
> separate mutex.
> 
> Or may be 'ubi_wl_put_peb()' should go through the fatmap subsystem
> which should be able to block it if it is writing fastmap.
> 
> Ideally, you should be able to "freeze" the state (of course for a short
> time), prepare fastmap data structures in memory, unfreeze, let users to
> further IO, and wirte fastmap.

Fastmap does this already. It takes various locks to freeze the state.

> This is roughly what UBIFS does when committing. It freezes all writers,
> builds the commit list, unfreezes the writers, and continues the commit
> process.
> 
> Now, to be that advanced, you need a journal. Without journal, you do
> freeze, prepare and write, unfreeze. The end results is that writers are
> blocked for longer time, but this may be good enough.

I think we're becoming to scientific now.

>> What do you think?
> 
> I think that UBI is memory consumption grows linearly with the flash
> growth. Mount time was linear too, fastmap is supposed to improve this.

With fastmap you have anyways a higher memory consumption.
To begin with the vmalloc()'ed ubi->fm_buf which holds the complete fastmap
in memory while reading/writing it.

> I know that there are people, who also reported their issues in this
> mailing list, who are very concerned about UBI memory consumption.
> 
> And I think that fastmap should try really hard to not only improve the
> mount time, but also not to make the memory consumption problem worse.
> 
> So yes, I understand code niceness argument. I really do. But this
> should not make UBI problem to be an even worse problem.

You can up with the niceness arguments, and now you're suddenly extremely
focusing on the memory footprint.

> Please, let's try much harder to go without increasing the memory
> footprint.

I need to go back to the drawing board. But first I have to fix/analyze
various bugs. Then I'll maybe have time again for memory/design nicefications.
Don't await any new patches for fastmap within the next weeks.

Thanks,
//richard
Artem Bityutskiy Oct. 20, 2014, 3:40 p.m. UTC | #20
On Mon, 2014-10-20 at 17:17 +0200, Richard Weinberger wrote:
> > That's just fastmap code not doing the right thing. We should not touch
> > the work queue directly at all. What we _should_ do instead is to make
> > it empty by asking the subsystem which manages it to flush it.
> > 
> > 1. Lock the work queue to prevent anyone from submitting new jobs while
> > we are in process of writing the fastmap.
> > 2. Flush all works
> > 3. do all the fastmap write stuff
> > 4. Unlock
> 
> Are you sure? Flushing all works before every fastmap write will slow UBI down.
> The fastmap write is not cheap. The goal should be to make it faster.

Yes, you are right. So instead, we wanna "freeze" it, and save all PEBs
the jobs in fastmap too.

Why 'ubi_write_fastmap()' only cares about the erase jobs, and saves the
PEBs from the job as "must be erased".

What about the "move" jobs?

Also, say, PEB X is in the work queue waiting for erasure. Fastmap comes
along and saves it as "must be erased" in the fastmap. Fastmap finishes
its job, PEB X gets erased, and I write my data there, so PEB X is
referred to by LEB Y. Now I have power cut. Then I attach the flash
again. Surely it is not that fastmap just erases PEB X and I lose the
contents of LEB Y?
Richard Weinberger Oct. 20, 2014, 3:59 p.m. UTC | #21
Am 20.10.2014 um 17:40 schrieb Artem Bityutskiy:
> On Mon, 2014-10-20 at 17:17 +0200, Richard Weinberger wrote:
>>> That's just fastmap code not doing the right thing. We should not touch
>>> the work queue directly at all. What we _should_ do instead is to make
>>> it empty by asking the subsystem which manages it to flush it.
>>>
>>> 1. Lock the work queue to prevent anyone from submitting new jobs while
>>> we are in process of writing the fastmap.
>>> 2. Flush all works
>>> 3. do all the fastmap write stuff
>>> 4. Unlock
>>
>> Are you sure? Flushing all works before every fastmap write will slow UBI down.
>> The fastmap write is not cheap. The goal should be to make it faster.
> 
> Yes, you are right. So instead, we wanna "freeze" it, and save all PEBs
> the jobs in fastmap too.
> 
> Why 'ubi_write_fastmap()' only cares about the erase jobs, and saves the
> PEBs from the job as "must be erased".
> 
> What about the "move" jobs?

There are no move jobs. Do you mean ubi->move_from/to used in wear_leveling_worker()?
How could it happen that a fastmap is written between these? IIRC everything is done
under wl_lock. And the PEBs used have to go through the pool.

> Also, say, PEB X is in the work queue waiting for erasure. Fastmap comes
> along and saves it as "must be erased" in the fastmap. Fastmap finishes
> its job, PEB X gets erased, and I write my data there, so PEB X is
> referred to by LEB Y. Now I have power cut. Then I attach the flash
> again. Surely it is not that fastmap just erases PEB X and I lose the
> contents of LEB Y?

This cannot happen. If X is erased you cannot write data do it. I must first go
thought the pool and the pool is scanned while attaching.

Thanks,
//richard
Artem Bityutskiy Oct. 20, 2014, 4:09 p.m. UTC | #22
On Mon, 2014-10-20 at 17:59 +0200, Richard Weinberger wrote:
> > Also, say, PEB X is in the work queue waiting for erasure. Fastmap comes
> > along and saves it as "must be erased" in the fastmap. Fastmap finishes
> > its job, PEB X gets erased, and I write my data there, so PEB X is
> > referred to by LEB Y. Now I have power cut. Then I attach the flash
> > again. Surely it is not that fastmap just erases PEB X and I lose the
> > contents of LEB Y?
> 
> This cannot happen. If X is erased you cannot write data do it. I must first go
> thought the pool and the pool is scanned while attaching.

Just noticed from the code that we first add PEBs to the erase list, and
then we go an scan the pools.
Richard Weinberger Oct. 20, 2014, 4:17 p.m. UTC | #23
Am 20.10.2014 um 18:09 schrieb Artem Bityutskiy:
> On Mon, 2014-10-20 at 17:59 +0200, Richard Weinberger wrote:
>>> Also, say, PEB X is in the work queue waiting for erasure. Fastmap comes
>>> along and saves it as "must be erased" in the fastmap. Fastmap finishes
>>> its job, PEB X gets erased, and I write my data there, so PEB X is
>>> referred to by LEB Y. Now I have power cut. Then I attach the flash
>>> again. Surely it is not that fastmap just erases PEB X and I lose the
>>> contents of LEB Y?
>>
>> This cannot happen. If X is erased you cannot write data do it. I must first go
>> thought the pool and the pool is scanned while attaching.
> 
> Just noticed from the code that we first add PEBs to the erase list, and
> then we go an scan the pools.

In the attach code, yes. But this does not matter.
It cannot happen that a fastmap is written with a PEB being in a pool
and the erase job list.

If you figure out such a state, please tell me. I'll fix it. :-)

Thanks,
//richard
Richard Weinberger Oct. 20, 2014, 8:46 p.m. UTC | #24
Am 20.10.2014 um 17:40 schrieb Artem Bityutskiy:
> Also, say, PEB X is in the work queue waiting for erasure. Fastmap comes
> along and saves it as "must be erased" in the fastmap. Fastmap finishes
> its job, PEB X gets erased, and I write my data there, so PEB X is
> referred to by LEB Y. Now I have power cut. Then I attach the flash
> again. Surely it is not that fastmap just erases PEB X and I lose the
> contents of LEB Y?

After reading my mail again, I think my answer was not clear.
Please let me explain in detail.

PEB X waits for erasure. In this state it can get erased and will be moved into
the free RB tree.
If a new PEB is requested from UBI it will not get served from the free RB tree,
instead it will come from the pool. I.e. There is no way that the freshly erased PEB
will immediately used. If a a fastmap was written while PEB was queued for erasure
and a power cut happens after the erasure it will be erased again. This does not harm.
But it can never ever happen that the same PEB will be in a pool (and therefore maybe used)
and scheduled for erase. A freshly erased PEB can only be used after wandering into the pool.
But this would require a refill-pool operation first an a write of the fastmap, the said PEB
cannot be listed as erase work in the fastmap then.

Thanks,
//richard
diff mbox

Patch

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index 2b0d8d6..2853a69 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -1195,6 +1195,19 @@  static int ubi_write_fastmap(struct ubi_device *ubi,
 		fm_pos += sizeof(*fec);
 		ubi_assert(fm_pos <= ubi->fm_size);
 	}
+
+	for (i = 0; i < UBI_PROT_QUEUE_LEN; i++) {
+		list_for_each_entry(wl_e, &ubi->pq[i], u.list) {
+			fec = (struct ubi_fm_ec *)(fm_raw + fm_pos);
+
+			fec->pnum = cpu_to_be32(wl_e->pnum);
+			fec->ec = cpu_to_be32(wl_e->ec);
+
+			used_peb_count++;
+			fm_pos += sizeof(*fec);
+			ubi_assert(fm_pos <= ubi->fm_size);
+		}
+	}
 	fmh->used_peb_count = cpu_to_be32(used_peb_count);
 
 	for (node = rb_first(&ubi->scrub); node; node = rb_next(node)) {