diff mbox series

[1/9] gard: Fix data corruption when clearing single records

Message ID 20191003055342.7868-2-andrew@aj.id.au
State Accepted
Headers show
Series [1/9] gard: Fix data corruption when clearing single records | expand

Commit Message

Andrew Jeffery Oct. 3, 2019, 5:53 a.m. UTC
Attempting to clear a specific gard record leads to corruption of the target
record rather than the expected removal:

    $ ./opal-gard -f romulus.pnor list
    No GARD entries to display
    $ ./opal-gard -f romulus.pnor create /sys0/node0/proc1
    $ ./opal-gard -f romulus.pnor list
     ID       | Error    | Type       | Path
    ---------------------------------------------------------
     00000001 | 00000000 | Manual     | /Sys0/Node0/Proc1
    =========================================================
    $ ./opal-gard -f romulus.pnor clear 00000001
    Clearing gard record 0x00000001...done
    $ ./opal-gard -f romulus.pnor list
     ID       | Error    | Type       | Path
    ---------------------------------------------------------
     00000001 | 00000000 | Unknown    | /Sys0/Node0/Proc1
    =========================================================

The GUARD partition needs to be compacted when clearing records as the
end of the list is a sentinel represented by the erased-flash state. The
compaction strategy is to read the trailing records and write them to
the offset of the record to be removed, followed by writing the sentinel
record at the offset of what was previously the last valid record.

The corruption occurs due to incorrect calculation of the offset at which the
trailing records will be written.

Cc: Skiboot Stable <skiboot-stable@lists.ozlabs.org>
Fixes: 5616c42d900a ("libflash/blocklevel: Make read/write be ECC agnostic for callers")
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
---
 external/gard/gard.c                          |  2 +-
 .../gard/test/results/10-clear-single.err     |  0
 .../gard/test/results/10-clear-single.out     |  7 ++++++
 external/gard/test/tests/10-clear-single      | 23 +++++++++++++++++++
 4 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 external/gard/test/results/10-clear-single.err
 create mode 100644 external/gard/test/results/10-clear-single.out
 create mode 100644 external/gard/test/tests/10-clear-single

Comments

Vasant Hegde Oct. 7, 2019, 9 a.m. UTC | #1
On 10/3/19 11:23 AM, Andrew Jeffery wrote:
> Attempting to clear a specific gard record leads to corruption of the target
> record rather than the expected removal:
> 
>      $ ./opal-gard -f romulus.pnor list
>      No GARD entries to display
>      $ ./opal-gard -f romulus.pnor create /sys0/node0/proc1
>      $ ./opal-gard -f romulus.pnor list
>       ID       | Error    | Type       | Path
>      ---------------------------------------------------------
>       00000001 | 00000000 | Manual     | /Sys0/Node0/Proc1
>      =========================================================
>      $ ./opal-gard -f romulus.pnor clear 00000001
>      Clearing gard record 0x00000001...done
>      $ ./opal-gard -f romulus.pnor list
>       ID       | Error    | Type       | Path
>      ---------------------------------------------------------
>       00000001 | 00000000 | Unknown    | /Sys0/Node0/Proc1
>      =========================================================
> 
> The GUARD partition needs to be compacted when clearing records as the
> end of the list is a sentinel represented by the erased-flash state. The
> compaction strategy is to read the trailing records and write them to
> the offset of the record to be removed, followed by writing the sentinel
> record at the offset of what was previously the last valid record.
> 
> The corruption occurs due to incorrect calculation of the offset at which the
> trailing records will be written.
> 
> Cc: Skiboot Stable <skiboot-stable@lists.ozlabs.org>
> Fixes: 5616c42d900a ("libflash/blocklevel: Make read/write be ECC agnostic for callers")
> Signed-off-by: Andrew Jeffery <andrew@aj.id.au>

Patch looks good to me. Thanks for the fix.


-Vasant
diff mbox series

Patch

diff --git a/external/gard/gard.c b/external/gard/gard.c
index f8c10324b7a8..6f75932bcd07 100644
--- a/external/gard/gard.c
+++ b/external/gard/gard.c
@@ -570,7 +570,7 @@  static int do_clear_i(struct gard_ctx *ctx, int pos, struct gard_record *gard, v
 			return rc;
 		}
 
-		rc = blocklevel_smart_write(ctx->bl, buf_pos - sizeof(gard), buf, buf_len);
+		rc = blocklevel_smart_write(ctx->bl, buf_pos - sizeof(*gard), buf, buf_len);
 		free(buf);
 		if (rc) {
 			fprintf(stderr, "Couldn't write to flash at 0x%08x for len 0x%08x\n",
diff --git a/external/gard/test/results/10-clear-single.err b/external/gard/test/results/10-clear-single.err
new file mode 100644
index 000000000000..e69de29bb2d1
diff --git a/external/gard/test/results/10-clear-single.out b/external/gard/test/results/10-clear-single.out
new file mode 100644
index 000000000000..904e61a560c8
--- /dev/null
+++ b/external/gard/test/results/10-clear-single.out
@@ -0,0 +1,7 @@ 
+Clearing the entire gard partition...done
+ ID       | Error    | Type       | Path
+---------------------------------------------------------
+ 00000001 | 00000000 | Manual     | /Sys0/Node0/Proc1
+=========================================================
+Clearing gard record 0x00000001...done
+No GARD entries to display
diff --git a/external/gard/test/tests/10-clear-single b/external/gard/test/tests/10-clear-single
new file mode 100644
index 000000000000..3a5f962ec16b
--- /dev/null
+++ b/external/gard/test/tests/10-clear-single
@@ -0,0 +1,23 @@ 
+#!/bin/sh
+
+set -e
+
+DATA=$(mktemp)
+
+cleanup() {
+	rm -f $DATA
+}
+
+trap cleanup EXIT
+
+dd if=/dev/zero of=$DATA bs=$((0x1000)) count=5 2>/dev/null
+
+run_binary "./opal-gard" "-p -e -f $DATA clear all"
+run_binary "./opal-gard" "-p -e -f $DATA create /sys0/node0/proc1"
+run_binary "./opal-gard" "-p -e -f $DATA list"
+run_binary "./opal-gard" "-p -e -f $DATA clear 00000001"
+run_binary "./opal-gard" "-p -e -f $DATA list"
+
+diff_with_result
+
+pass_test