Message ID | 20191003055342.7868-2-andrew@aj.id.au |
---|---|
State | Accepted |
Headers | show |
Series | [1/9] gard: Fix data corruption when clearing single records | expand |
On 10/3/19 11:23 AM, Andrew Jeffery wrote: > Attempting to clear a specific gard record leads to corruption of the target > record rather than the expected removal: > > $ ./opal-gard -f romulus.pnor list > No GARD entries to display > $ ./opal-gard -f romulus.pnor create /sys0/node0/proc1 > $ ./opal-gard -f romulus.pnor list > ID | Error | Type | Path > --------------------------------------------------------- > 00000001 | 00000000 | Manual | /Sys0/Node0/Proc1 > ========================================================= > $ ./opal-gard -f romulus.pnor clear 00000001 > Clearing gard record 0x00000001...done > $ ./opal-gard -f romulus.pnor list > ID | Error | Type | Path > --------------------------------------------------------- > 00000001 | 00000000 | Unknown | /Sys0/Node0/Proc1 > ========================================================= > > The GUARD partition needs to be compacted when clearing records as the > end of the list is a sentinel represented by the erased-flash state. The > compaction strategy is to read the trailing records and write them to > the offset of the record to be removed, followed by writing the sentinel > record at the offset of what was previously the last valid record. > > The corruption occurs due to incorrect calculation of the offset at which the > trailing records will be written. > > Cc: Skiboot Stable <skiboot-stable@lists.ozlabs.org> > Fixes: 5616c42d900a ("libflash/blocklevel: Make read/write be ECC agnostic for callers") > Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Patch looks good to me. Thanks for the fix. -Vasant
diff --git a/external/gard/gard.c b/external/gard/gard.c index f8c10324b7a8..6f75932bcd07 100644 --- a/external/gard/gard.c +++ b/external/gard/gard.c @@ -570,7 +570,7 @@ static int do_clear_i(struct gard_ctx *ctx, int pos, struct gard_record *gard, v return rc; } - rc = blocklevel_smart_write(ctx->bl, buf_pos - sizeof(gard), buf, buf_len); + rc = blocklevel_smart_write(ctx->bl, buf_pos - sizeof(*gard), buf, buf_len); free(buf); if (rc) { fprintf(stderr, "Couldn't write to flash at 0x%08x for len 0x%08x\n", diff --git a/external/gard/test/results/10-clear-single.err b/external/gard/test/results/10-clear-single.err new file mode 100644 index 000000000000..e69de29bb2d1 diff --git a/external/gard/test/results/10-clear-single.out b/external/gard/test/results/10-clear-single.out new file mode 100644 index 000000000000..904e61a560c8 --- /dev/null +++ b/external/gard/test/results/10-clear-single.out @@ -0,0 +1,7 @@ +Clearing the entire gard partition...done + ID | Error | Type | Path +--------------------------------------------------------- + 00000001 | 00000000 | Manual | /Sys0/Node0/Proc1 +========================================================= +Clearing gard record 0x00000001...done +No GARD entries to display diff --git a/external/gard/test/tests/10-clear-single b/external/gard/test/tests/10-clear-single new file mode 100644 index 000000000000..3a5f962ec16b --- /dev/null +++ b/external/gard/test/tests/10-clear-single @@ -0,0 +1,23 @@ +#!/bin/sh + +set -e + +DATA=$(mktemp) + +cleanup() { + rm -f $DATA +} + +trap cleanup EXIT + +dd if=/dev/zero of=$DATA bs=$((0x1000)) count=5 2>/dev/null + +run_binary "./opal-gard" "-p -e -f $DATA clear all" +run_binary "./opal-gard" "-p -e -f $DATA create /sys0/node0/proc1" +run_binary "./opal-gard" "-p -e -f $DATA list" +run_binary "./opal-gard" "-p -e -f $DATA clear 00000001" +run_binary "./opal-gard" "-p -e -f $DATA list" + +diff_with_result + +pass_test
Attempting to clear a specific gard record leads to corruption of the target record rather than the expected removal: $ ./opal-gard -f romulus.pnor list No GARD entries to display $ ./opal-gard -f romulus.pnor create /sys0/node0/proc1 $ ./opal-gard -f romulus.pnor list ID | Error | Type | Path --------------------------------------------------------- 00000001 | 00000000 | Manual | /Sys0/Node0/Proc1 ========================================================= $ ./opal-gard -f romulus.pnor clear 00000001 Clearing gard record 0x00000001...done $ ./opal-gard -f romulus.pnor list ID | Error | Type | Path --------------------------------------------------------- 00000001 | 00000000 | Unknown | /Sys0/Node0/Proc1 ========================================================= The GUARD partition needs to be compacted when clearing records as the end of the list is a sentinel represented by the erased-flash state. The compaction strategy is to read the trailing records and write them to the offset of the record to be removed, followed by writing the sentinel record at the offset of what was previously the last valid record. The corruption occurs due to incorrect calculation of the offset at which the trailing records will be written. Cc: Skiboot Stable <skiboot-stable@lists.ozlabs.org> Fixes: 5616c42d900a ("libflash/blocklevel: Make read/write be ECC agnostic for callers") Signed-off-by: Andrew Jeffery <andrew@aj.id.au> --- external/gard/gard.c | 2 +- .../gard/test/results/10-clear-single.err | 0 .../gard/test/results/10-clear-single.out | 7 ++++++ external/gard/test/tests/10-clear-single | 23 +++++++++++++++++++ 4 files changed, 31 insertions(+), 1 deletion(-) create mode 100644 external/gard/test/results/10-clear-single.err create mode 100644 external/gard/test/results/10-clear-single.out create mode 100644 external/gard/test/tests/10-clear-single