mbox series

[0/5] Add flight recorder to MTDRAM

Message ID 20171206085039.27164-1-dirk.behme@de.bosch.com
Headers show
Series Add flight recorder to MTDRAM | expand

Message

Behme Dirk (CM/ESO2) Dec. 6, 2017, 8:50 a.m. UTC
From: Manfred Spraul <manfred@colorfullife.com>

Hi,

The series adds a flight recorder to MTDRAM.
This allows very efficient power fail testing:
From the flight recorder output, it is possible to recreate every image
that might have existed between the start of the recording and the end.

Obviously, a user space tool is required, it is attached as the last
mail in the series.

Patches:

0001-mtdram-expose-write-size-and-writebuf-size-as-module:
	An initial cleanup: write_size and writebuf_size are
	hardcoded in the source code.
	Convert that to module parameters.

0002-mtdram-Add-flight-recorder.patch:
	Initial flight recorder

0003-mtdram-Allow-to-enable-disable-flight-recorder-mode-.patch:
	For the preparation step, or for evaluating dumps, the
	flight recorder doesn't make sense. Thus allow
	to disable it at runtime by writing to debugfs/mtdram

0004-mtdram-Convert-the-flight-recorder-to-a-ring-buffer.patch:
	The initial flight recorder is very simple, cleanup 1:
	Convert the kernel buffer to a proper ringbuffer.

0005-mtdram-flight-recorder-Add-checksums.patch:
	When using tool to simulate something, there is always the
	risk that the issue is in the tool and not in the production
	code. Thus add checksums, to detect tool issues.

--
	Manfred

Comments

Richard Weinberger Dec. 6, 2017, 10:41 a.m. UTC | #1
Dirk, Manfred,

Am Mittwoch, 6. Dezember 2017, 09:50:34 CET schrieb Dirk Behme:
> From: Manfred Spraul <manfred@colorfullife.com>
> 
> Hi,
> 
> The series adds a flight recorder to MTDRAM.

Thanks a lot for sharing your tool, this is highly appreciated.

> This allows very efficient power fail testing:
> From the flight recorder output, it is possible to recreate every image
> that might have existed between the start of the recording and the end.
> 
> Obviously, a user space tool is required, it is attached as the last
> mail in the series.

So, to understand this approach better I need to recap.
The "flight recorder" logs every single MTD operation (READ, ERASE, PROGRAM) 
to a file while the MTD is under load, right?

Then you take the log, replay it to a _file_ but instead of replaying
all N MTD operations only N - X operations are replayed?
The output file is later written back to MTDRAM to check how much UBIFS likes 
it?

While having such a tool would be awesome, we have to be very sure that it 
behaves correctly.
Yesterday I spent almost the whole night with staring at some of Manfred's 
images and I'm not sure whether what I see makes sense or can actually happen
on a real NAND or NOR flash. But I'm still investigating.

> Patches:
> 
> 0001-mtdram-expose-write-size-and-writebuf-size-as-module:
> 	An initial cleanup: write_size and writebuf_size are
> 	hardcoded in the source code.
> 	Convert that to module parameters.

MTDRAM is a special purpose MTD simulator, I'm not so sure whether it is a 
good idea to turn it into a NAND-alike zombie.

I said this already some time before, Boris and I are in general unhappy with 
the current MTD simulator zoo in MTD.
But fixing this is not your job, we have to. :-)

Thanks,
//richard
Manfred Spraul Dec. 6, 2017, 7:44 p.m. UTC | #2
Hi Richard,

On 12/06/2017 11:41 AM, Richard Weinberger wrote:
> Dirk, Manfred,
>
> Am Mittwoch, 6. Dezember 2017, 09:50:34 CET schrieb Dirk Behme:
>> From: Manfred Spraul <manfred@colorfullife.com>
>>
>> Hi,
>>
>> The series adds a flight recorder to MTDRAM.
> Thanks a lot for sharing your tool, this is highly appreciated.
>
>> This allows very efficient power fail testing:
>>  From the flight recorder output, it is possible to recreate every image
>> that might have existed between the start of the recording and the end.
>>
>> Obviously, a user space tool is required, it is attached as the last
>> mail in the series.
> So, to understand this approach better I need to recap.
> The "flight recorder" logs every single MTD operation (READ, ERASE, PROGRAM)
> to a file while the MTD is under load, right?
Only ERASE and PROGRAM. READ is not logged.
Would it help if READ is logged as well? (memory is cheap, ...)

What would be fairly simple is to add a backtrace for every ERASE and 
PROGRAM. I'll try to add that.

> Then you take the log, replay it to a _file_ but instead of replaying
> all N MTD operations only N - X operations are replayed?
Exactly.  Instead of replaying all N operations, only X operations are 
replayed.

image-168167.bin is after replaying 168167 operations.
image-168168.bin is after replaying one additional operation.
> The output file is later written back to MTDRAM to check how much UBIFS likes
> it?
Exactly.
> While having such a tool would be awesome, we have to be very sure that it
> behaves correctly.
> Yesterday I spent almost the whole night with staring at some of Manfred's
> images and I'm not sure whether what I see makes sense or can actually happen
> on a real NAND or NOR flash. But I'm still investigating.
 From my understand, the tool result is exactly identical to a powerfail 
immediately after PROGRAM.
What differs from realistic embedded systems is obviously performance:
RAM disk+2-core I3 is probably much faster & much more parallelism happens.

I have uploaded the initial image, the final image and the flight recording.

https://sourceforge.net/projects/calculix-rpm/files/ubifs/xattr/

--
     Manfred
Richard Weinberger Dec. 6, 2017, 7:59 p.m. UTC | #3
Manfred,

Am Mittwoch, 6. Dezember 2017, 20:44:55 CET schrieb Manfred Spraul:
> Hi Richard,
> 
> On 12/06/2017 11:41 AM, Richard Weinberger wrote:
> > Dirk, Manfred,
> > 
> > Am Mittwoch, 6. Dezember 2017, 09:50:34 CET schrieb Dirk Behme:
> >> From: Manfred Spraul <manfred@colorfullife.com>
> >> 
> >> Hi,
> >> 
> >> The series adds a flight recorder to MTDRAM.
> > 
> > Thanks a lot for sharing your tool, this is highly appreciated.
> > 
> >> This allows very efficient power fail testing:
> >>  From the flight recorder output, it is possible to recreate every image
> >> 
> >> that might have existed between the start of the recording and the end.
> >> 
> >> Obviously, a user space tool is required, it is attached as the last
> >> mail in the series.
> > 
> > So, to understand this approach better I need to recap.
> > The "flight recorder" logs every single MTD operation (READ, ERASE,
> > PROGRAM) to a file while the MTD is under load, right?
> 
> Only ERASE and PROGRAM. READ is not logged.
> Would it help if READ is logged as well? (memory is cheap, ...)
> 
> What would be fairly simple is to add a backtrace for every ERASE and
> PROGRAM. I'll try to add that.

Given a second thought, for power-cut testing READ is not important.
So no need to hurry.

> > Then you take the log, replay it to a _file_ but instead of replaying
> > all N MTD operations only N - X operations are replayed?
> 
> Exactly.  Instead of replaying all N operations, only X operations are
> replayed.
> 
> image-168167.bin is after replaying 168167 operations.
> image-168168.bin is after replaying one additional operation.

This is what I thought.
It worries me a bit that image-168167.bin shows a corrupted LPT (LEB property 
tree). The current logical operation of UBIFS is writing the index tree.

> > The output file is later written back to MTDRAM to check how much UBIFS
> > likes it?
> 
> Exactly.
> 
> > While having such a tool would be awesome, we have to be very sure that it
> > behaves correctly.
> > Yesterday I spent almost the whole night with staring at some of Manfred's
> > images and I'm not sure whether what I see makes sense or can actually
> > happen on a real NAND or NOR flash. But I'm still investigating.
> 
>  From my understand, the tool result is exactly identical to a powerfail
> immediately after PROGRAM.

Yes.

> What differs from realistic embedded systems is obviously performance:
> RAM disk+2-core I3 is probably much faster & much more parallelism happens.

Yep. I found also some UBI and UBIFS on my x86 system with nandsim and 
powercuts over the last years.

> I have uploaded the initial image, the final image and the flight recording.
> 
> https://sourceforge.net/projects/calculix-rpm/files/ubifs/xattr/

Cool! I'll give it a try.

Thanks,
//richard
Richard Weinberger Dec. 6, 2017, 8:57 p.m. UTC | #4
Manfred,

Am Mittwoch, 6. Dezember 2017, 20:59:43 CET schrieb Richard Weinberger:
> > I have uploaded the initial image, the final image and the flight
> > recording.
> > 
> > https://sourceforge.net/projects/calculix-rpm/files/ubifs/xattr/
> 
> Cool! I'll give it a try.

Hmm, I did:
./replay -e 131072 -w 2048 dump-before.bin record.bin 168167
 
This applies 168167 to dump-before.bin, right?
So I expect dump-before.bin to match the sha1sum of image-168167.bin.
But it doesn't.

Thanks,
//richard
Manfred Spraul Dec. 7, 2017, 4:06 p.m. UTC | #5
Hi Ricard,

On 12/06/2017 09:57 PM, Richard Weinberger wrote:
> Manfred,
>
> Am Mittwoch, 6. Dezember 2017, 20:59:43 CET schrieb Richard Weinberger:
>>> I have uploaded the initial image, the final image and the flight
>>> recording.
>>>
>>> https://sourceforge.net/projects/calculix-rpm/files/ubifs/xattr/
>> Cool! I'll give it a try.
> Hmm, I did:
> ./replay -e 131072 -w 2048 dump-before.bin record.bin 168167
>   
> This applies 168167 to dump-before.bin, right?
> So I expect dump-before.bin to match the sha1sum of image-168167.bin.
> But it doesn't.
./replay dump-before.bin record.bin 168167

If you specify "-w 2048", then replay.cpp splits large writes into 
multiple operations.
e.g.: there are some 126 kB writes.
> Thanks,
> //richard