diff mbox

[U-Boot] fsl: board EEPROM has the CRC in the wrong location

Message ID 1342129594-7861-1-git-send-email-timur@freescale.com
State Rejected, archived
Delegated to: Andy Fleming
Headers show

Commit Message

Timur Tabi July 12, 2012, 9:46 p.m. UTC
The NXID v1 EEPROM format has the CRC at offset 0xFC, but for some reason it
was placed at address 0xCC instead.  To retain compatibility with existing
boards, we check the CRC in the old location if necessary.

Signed-off-by: Timur Tabi <timur@freescale.com>
---
 board/freescale/common/sys_eeprom.c |   28 ++++++++++++++++++++++++++--
 1 files changed, 26 insertions(+), 2 deletions(-)

Comments

sun york-R58495 July 12, 2012, 10:03 p.m. UTC | #1
Timur,

That patch itself is OK. But the comment is incorrect. We keep adding more mac addresses to this data structure. The CRC was at the end. The offset 0xCC was correct.

York


On Jul 12, 2012, at 2:46 PM, Timur Tabi wrote:

> The NXID v1 EEPROM format has the CRC at offset 0xFC, but for some reason it
> was placed at address 0xCC instead.  To retain compatibility with existing
> boards, we check the CRC in the old location if necessary.
> 
> Signed-off-by: Timur Tabi <timur@freescale.com>
> ---
> board/freescale/common/sys_eeprom.c |   28 ++++++++++++++++++++++++++--
> 1 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/board/freescale/common/sys_eeprom.c b/board/freescale/common/sys_eeprom.c
> index d2ed036..2541dd2 100644
> --- a/board/freescale/common/sys_eeprom.c
> +++ b/board/freescale/common/sys_eeprom.c
> @@ -34,8 +34,16 @@
> #endif
> 
> #ifdef CONFIG_SYS_I2C_EEPROM_NXID
> -#define MAX_NUM_PORTS	23
> +#define MAX_NUM_PORTS	31
> #define NXID_VERSION	1
> +
> +/*
> + * Older versions of this code incorrectly placed the CRC at offset 0xCC,
> + * when it should have been at 0xFC.  To maintain compatibility with boards
> + * that have the CRC at 0xCC, we check for the CRC at 0xCC if it's not in
> + * 0xFC.
> + */
> +#define BROKEN_CRC_OFFSET	0xCC
> #endif
> 
> /**
> @@ -71,7 +79,7 @@ static struct __attribute__ ((__packed__)) eeprom {
> 	u8 mac_count;     /* 0x40        Number of MAC addresses */
> 	u8 mac_flag;      /* 0x41        MAC table flags */
> 	u8 mac[MAX_NUM_PORTS][6];     /* 0x42 - x MAC addresses */
> -	u32 crc;          /* x+1         CRC32 checksum */
> +	u32 crc;          /* 0xFC        CRC32 checksum */
> #endif
> } e;
> 
> @@ -457,6 +465,22 @@ int mac_read_from_eeprom(void)
> 
> 	crc = crc32(0, (void *)&e, crc_offset);
> 	crcp = (void *)&e + crc_offset;
> +#ifdef BROKEN_CRC_OFFSET
> +	/*
> +	 * If the CRC is wrong, then check the old location.  If it contains a
> +	 * valid CRC, then assume that this is an older EEPROM.  We update the
> +	 * real CRC so that the EEPROM looks valid.
> +	 */
> +	if ((e.version == NXID_VERSION) && (crc != be32_to_cpup(crcp))) {
> +		u32 crc2 = crc32(0, (void *)&e, BROKEN_CRC_OFFSET);
> +		void *crcp2 = (void *)&e + BROKEN_CRC_OFFSET;
> +
> +		if (crc2 == be32_to_cpup(crcp2)) {
> +			debug("Broken NXID v1 CRC found and corrected\n");
> +			update_crc();
> +		}
> +	}
> +#endif
> 	if (crc != be32_to_cpu(*crcp)) {
> 		printf("CRC mismatch (%08x != %08x)\n", crc, be32_to_cpu(e.crc));
> 		return -1;
> -- 
> 1.7.3.4
>
Scott Wood July 12, 2012, 10:37 p.m. UTC | #2
On 07/12/2012 05:03 PM, sun york-R58495 wrote:
> Timur,
> 
> That patch itself is OK. But the comment is incorrect. We keep adding more mac addresses to this data structure. The CRC was at the end. The offset 0xCC was correct.

Is there anything in the data structure to indicate that this growth has
happened?

-Scott
Tabi Timur-B04825 July 12, 2012, 10:44 p.m. UTC | #3
Scott Wood wrote:
>> > That patch itself is OK. But the comment is incorrect. We keep adding more mac addresses to this data structure. The CRC was at the end. The offset 0xCC was correct.

> Is there anything in the data structure to indicate that this growth has
> happened?

The version number indicates whether it's 8 addresses (v0) or more than 8
(v1).

The problem is that I think I just made a math error when I calculated the
number of MAC addresses that would fit in the EEPROM.  I was supposed to
do (0xFC - 0x72) / 6 == 31, but instead I ended up with 23, and never
validated it.
sun york-R58495 July 12, 2012, 10:46 p.m. UTC | #4
On Jul 12, 2012, at 3:37 PM, Scott Wood wrote:

> On 07/12/2012 05:03 PM, sun york-R58495 wrote:
>> Timur,
>> 
>> That patch itself is OK. But the comment is incorrect. We keep adding more mac addresses to this data structure. The CRC was at the end. The offset 0xCC was correct.
> 
> Is there anything in the data structure to indicate that this growth has
> happened?
> 
Discussed with Timur. The MAX_NUM_PORT was chosen as 23, probably because of a wrong math. There is no reason to use that number. Now changing to 31 will use up all the EEPROM space and push the CRC to the end, offset 0xFC.

York
Scott Wood July 12, 2012, 10:49 p.m. UTC | #5
On 07/12/2012 05:46 PM, sun york-R58495 wrote:
> 
> On Jul 12, 2012, at 3:37 PM, Scott Wood wrote:
> 
>> On 07/12/2012 05:03 PM, sun york-R58495 wrote:
>>> Timur,
>>>
>>> That patch itself is OK. But the comment is incorrect. We keep adding more mac addresses to this data structure. The CRC was at the end. The offset 0xCC was correct.
>>
>> Is there anything in the data structure to indicate that this growth has
>> happened?
>>
> Discussed with Timur. The MAX_NUM_PORT was chosen as 23, probably because of a wrong math. There is no reason to use that number. Now changing to 31 will use up all the EEPROM space and push the CRC to the end, offset 0xFC.

If the 0xCC version is already in real use, then this change should bump
the version number.

-Scott
Tabi Timur-B04825 July 12, 2012, 10:52 p.m. UTC | #6
Scott Wood wrote:
> If the 0xCC version is already in real use, then this change should bump
> the version number.

I can't, because I don't control the spec.  Like I said, it was just wrong
before.  No one has more than 23 MAC addresses anyway.

My patch provides transparent updates to handle it.  It will read broken
EEPROMs and verify the CRC in the old location, and if you have re-save
the EEPROM, it will put the CRC in the right place.
Wolfgang Denk July 13, 2012, 4:30 a.m. UTC | #7
Dear York,

In message <9F5356FB-8CA2-44DE-9089-64ABD82CA733@freescale.com> you wrote:
> 
> That patch itself is OK. But the comment is incorrect. We keep
> adding more mac addresses to this data structure. The CRC was at the
> end. The offset 0xCC was correct.

This is a totally broken design then, when you have a growing data
structure where vital information fields get shifted.  In such case,
the CRC should have been at the beginning, so it never changes
location. Or even better, you should not have used a binary data
structure at all (guess why the environment in U-Boot has been
implemented the way it is).

Best regards,

Wolfgang Denk
Wolfgang Denk July 13, 2012, 4:32 a.m. UTC | #8
Dear Timur Tabi,

In message <4FFF5524.2080704@freescale.com> you wrote:
>
> My patch provides transparent updates to handle it.  It will read broken
> EEPROMs and verify the CRC in the old location, and if you have re-save
> the EEPROM, it will put the CRC in the right place.

It will work by chance, accessing random data.  This is crap.

Best regards,

Wolfgang Denk
sun york-R58495 July 13, 2012, 4:38 a.m. UTC | #9
On Jul 12, 2012, at 9:30 PM, Wolfgang Denk wrote:

> Dear York,
> 
> In message <9F5356FB-8CA2-44DE-9089-64ABD82CA733@freescale.com> you wrote:
>> 
>> That patch itself is OK. But the comment is incorrect. We keep
>> adding more mac addresses to this data structure. The CRC was at the
>> end. The offset 0xCC was correct.
> 
> This is a totally broken design then, when you have a growing data
> structure where vital information fields get shifted.  In such case,
> the CRC should have been at the beginning, so it never changes
> location. Or even better, you should not have used a binary data
> structure at all (guess why the environment in U-Boot has been
> implemented the way it is).

I agree it was a broken design. Now we are using all available space and put CRC to the very end. It is not perfect but should work.

York
Tabi Timur-B04825 July 13, 2012, 12:10 p.m. UTC | #10
Wolfgang Denk wrote:
> This is a totally broken design then, when you have a growing data
> structure where vital information fields get shifted.  In such case,
> the CRC should have been at the beginning, so it never changes
> location. Or even better, you should not have used a binary data
> structure at all (guess why the environment in U-Boot has been
> implemented the way it is).

York is mistaken.  The CRC was always at location 0xFC, but for some 
reason, when I wrote the code, I put it at 0xCC.  Now I'm fixing it, and 
providing some backwards compatibility to avoid causing problems for 
people who upgrade U-Boot on existing boards.  I don't see how this is 
controversial in any way.
Tabi Timur-B04825 July 13, 2012, 12:11 p.m. UTC | #11
Wolfgang Denk wrote:
>> >My patch provides transparent updates to handle it.  It will read broken
>> >EEPROMs and verify the CRC in the old location, and if you have re-save
>> >the EEPROM, it will put the CRC in the right place.

> It will work by chance, accessing random data.  This is crap.

It is not crap, and it will not work by chance.  It is not accessing 
random data, it is accessing the CRC in the old location, just like the 
current code does today.
Tabi Timur-B04825 July 13, 2012, 12:12 p.m. UTC | #12
sun york-R58495 wrote:

> I agree it was a broken design. Now we are using all available space
> and put CRC to the very end. It is not perfect but should work.

*sigh*

The design was never broken, the code was just wrong.  The CRC has always 
supposed to have been at the end.
York Sun July 13, 2012, 9:22 p.m. UTC | #13
On Fri, 2012-07-13 at 05:12 -0700, Tabi Timur-B04825 wrote:
> sun york-R58495 wrote:
> 
> > I agree it was a broken design. Now we are using all available space
> > and put CRC to the very end. It is not perfect but should work.
> 
> *sigh*
> 
> The design was never broken, the code was just wrong.  The CRC has always 
> supposed to have been at the end.
> 
Sorry, I meant the design was broken when I implemented it. I didn't put
the CRC to a fixed location.

York
Wolfgang Denk July 13, 2012, 9:25 p.m. UTC | #14
Dear Tabi Timur-B04825,

In message <50001038.50303@freescale.com> you wrote:
> >
> York is mistaken.  The CRC was always at location 0xFC, but for some 
> reason, when I wrote the code, I put it at 0xCC.  Now I'm fixing it, and 
> providing some backwards compatibility to avoid causing problems for 
> people who upgrade U-Boot on existing boards.  I don't see how this is 
> controversial in any way.

In case you have an EEPROM with correct layout (CRC at 0xFC) but
incorrect CRC, you will access random data and interpret this as CRC.
This is provoking undefined behaviour.

Yes, it is inlikely that you happen to find a matching CRC in such a
case, but it is possible.

Undefined behaviour is something you must avoid.


If you want, then rather provide an update tool that theuser can use
(manually!) to update, but this should be done once, and with explicit
confirmation from the user, never automagically.

Best regards,

Wolfgang Denk
Wolfgang Denk July 13, 2012, 9:26 p.m. UTC | #15
Dear Tabi Timur-B04825,

In message <5000106B.5090007@freescale.com> you wrote:
>
> > It will work by chance, accessing random data.  This is crap.
> 
> It is not crap, and it will not work by chance.  It is not accessing 
> random data, it is accessing the CRC in the old location, just like the 
> current code does today.

If you have the CRC at 0xFC, and the CRC is incorrect, then it _will_
access random data, and result inundefined behaviour. Yes, this _is_ a
crappy design.

Best regards,

Wolfgang Denk
Tabi Timur-B04825 July 13, 2012, 9:29 p.m. UTC | #16
Wolfgang Denk wrote:

> In case you have an EEPROM with correct layout (CRC at 0xFC) but
> incorrect CRC, you will access random data and interpret this as CRC.
> This is provoking undefined behaviour.

True, but it doesn't matter.  The EEPROM is not that important, and the
odds of screwing this up is one in four billion.

> If you want, then rather provide an update tool that theuser can use
> (manually!) to update, but this should be done once, and with explicit
> confirmation from the user, never automagically.

Considering how unimportant the EEPROM really is, I don't see the point in
making it so complicated.  We already automagically upgrade the board from
NXID v0 to NXID v1.  Now we automagically fix boards that have the CRC in
the wrong place.

Anyway, I don't see why it's so controversial.  This code is only used on
a small number of Freescale reference boards.
Wolfgang Denk July 13, 2012, 9:32 p.m. UTC | #17
Dear Timur Tabi,

In message <50009349.9000609@freescale.com> you wrote:
> 
> > In case you have an EEPROM with correct layout (CRC at 0xFC) but
> > incorrect CRC, you will access random data and interpret this as CRC.
> > This is provoking undefined behaviour.
> 
> True, but it doesn't matter.  The EEPROM is not that important, and the
> odds of screwing this up is one in four billion.
> 
> > If you want, then rather provide an update tool that theuser can use
> > (manually!) to update, but this should be done once, and with explicit
> > confirmation from the user, never automagically.
> 
> Considering how unimportant the EEPROM really is, I don't see the point in
> making it so complicated.  We already automagically upgrade the board from
> NXID v0 to NXID v1.  Now we automagically fix boards that have the CRC in
> the wrong place.
> 
> Anyway, I don't see why it's so controversial.  This code is only used on
> a small number of Freescale reference boards.

Well, if it's really so unimportant and used in only a small number
of boards, then just omit this broken code that provokes the
undefined behaviour.

Best regards,

Wolfgang Denk
Tabi Timur-B04825 July 13, 2012, 9:39 p.m. UTC | #18
Wolfgang Denk wrote:
> Well, if it's really so unimportant and used in only a small number
> of boards, then just omit this broken code that provokes the
> undefined behaviour.

As I said before, we need to support situations where people upgrade their
U-Boot.  When the EEPROM is read, the CRC is checked in both locations.
If it's valid in either, then we assume the data is valid and continue.

When the user wants to write back the EEPROM (via the "mac save" command),
the CRC is written only at the proper location (0xFC).  This "fixes" the
EEPROM, and the code will never read the CRC from the wrong location (0xCC).
Scott Wood July 13, 2012, 9:45 p.m. UTC | #19
On 07/13/2012 04:25 PM, Wolfgang Denk wrote:
> Dear Tabi Timur-B04825,
> 
> In message <50001038.50303@freescale.com> you wrote:
>>>
>> York is mistaken.  The CRC was always at location 0xFC, but for some 
>> reason, when I wrote the code, I put it at 0xCC.  Now I'm fixing it, and 
>> providing some backwards compatibility to avoid causing problems for 
>> people who upgrade U-Boot on existing boards.  I don't see how this is 
>> controversial in any way.
> 
> In case you have an EEPROM with correct layout (CRC at 0xFC) but
> incorrect CRC, you will access random data and interpret this as CRC.
> This is provoking undefined behaviour.
> 
> Yes, it is inlikely that you happen to find a matching CRC in such a
> case, but it is possible.
> 
> Undefined behaviour is something you must avoid.

Is it any more likely that this will happen, than that you'll see
arbitrary data accidentally match a magic number that identifies a data
format?

> If you want, then rather provide an update tool that theuser can use
> (manually!) to update, but this should be done once, and with explicit
> confirmation from the user, never automagically.

In the real world, this will result in more problems than Timur's
approach (even if the "problems" are just increased support burden from
users asking why ethernet isn't working any more).  The odds of a user
screwing up (or simply not being aware that this update of U-Boot
requires special migration steps) is much more than one in four billion.
 Perhaps this could be limited to boards that are known to have had the
bug in the past?

Timur, I know you said you don't control the format, but could you ask
for a version number bump so that going forward there's a way to
unambiguously mark the contents as "good" (the spec wouldn't change, but
there would be no known implementations of v2 with this bug)?

If not, and Wolfgang still refuses to accept this, what about checking
the old location on a CRC fail, and if the old CRC passes, don't
automatically use it but print a message telling the user that they
probably need to run the migration command?

-Scott
Tabi Timur-B04825 July 13, 2012, 9:53 p.m. UTC | #20
Scott Wood wrote:

> Timur, I know you said you don't control the format, but could you ask
> for a version number bump so that going forward there's a way to
> unambiguously mark the contents as "good" (the spec wouldn't change, but
> there would be no known implementations of v2 with this bug)?

I'm not sure what you mean.  The specification for v1 has always said that
the CRC is at address 0xFC.  I just wrote the code wrong.  I was always
under the impression that I was writing the CRC at 0xFC, until York
pointed that out to me last year.  As far as the specification is
concerned, nothing has changed.

> If not, and Wolfgang still refuses to accept this, what about checking
> the old location on a CRC fail, and if the old CRC passes, don't
> automatically use it but print a message telling the user that they
> probably need to run the migration command?

I honestly don't see what's wrong with checking the CRC in the old
location, and using it if it's valid.  Like I said, we already
automagically update EEPROMs from version 0 to version 1.  The existing
code already checks a version number to determine where the CRC is:

/*
 * If we've read an NXID v0 EEPROM, then we need to set the CRC offset
 * to where it is in v0.
 */
if (e.version == 0)
	crc_offset = 0x72;

So here we're reading the 'version' field before we validate the data,
because we need to check the version to know where the CRC is.
Scott Wood July 13, 2012, 10:19 p.m. UTC | #21
On 07/13/2012 04:53 PM, Timur Tabi wrote:
> Scott Wood wrote:
> 
>> Timur, I know you said you don't control the format, but could you ask
>> for a version number bump so that going forward there's a way to
>> unambiguously mark the contents as "good" (the spec wouldn't change, but
>> there would be no known implementations of v2 with this bug)?
> 
> I'm not sure what you mean.  The specification for v1 has always said that
> the CRC is at address 0xFC.  I just wrote the code wrong.  I was always
> under the impression that I was writing the CRC at 0xFC, until York
> pointed that out to me last year.  As far as the specification is
> concerned, nothing has changed.

I know the spec wouldn't change, except the version number.  But as I
said above, there would be no known v2 implementations with the bug.
You would only check the bad CRC location if you see v1 data, because
there are known buggy v1 implementations.

>> If not, and Wolfgang still refuses to accept this, what about checking
>> the old location on a CRC fail, and if the old CRC passes, don't
>> automatically use it but print a message telling the user that they
>> probably need to run the migration command?
> 
> I honestly don't see what's wrong with checking the CRC in the old
> location, and using it if it's valid.

Neither do I; I was just suggesting a compromise should Wolfgang
maintain his opposition.

-Scott
Tabi Timur-B04825 July 13, 2012, 10:22 p.m. UTC | #22
Scott Wood wrote:

> I know the spec wouldn't change, except the version number.  But as I
> said above, there would be no known v2 implementations with the bug.
> You would only check the bad CRC location if you see v1 data, because
> there are known buggy v1 implementations.

I already have that:

	if ((e.version == NXID_VERSION) && (crc != be32_to_cpup(crcp))) {

NXID_VERSION is equal to 1, so we only do the check for the old CRC if we
have a v1 EEPROM.
Scott Wood July 13, 2012, 10:41 p.m. UTC | #23
On 07/13/2012 05:22 PM, Timur Tabi wrote:
> Scott Wood wrote:
> 
>> I know the spec wouldn't change, except the version number.  But as I
>> said above, there would be no known v2 implementations with the bug.
>> You would only check the bad CRC location if you see v1 data, because
>> there are known buggy v1 implementations.
> 
> I already have that:
> 
> 	if ((e.version == NXID_VERSION) && (crc != be32_to_cpup(crcp))) {
> 
> NXID_VERSION is equal to 1, so we only do the check for the old CRC if we
> have a v1 EEPROM.

But you continue to generate v1 EEPROMs.  If we get the people who
define the format to accept v2, then we can generate v2 after the fix is
applied, and the (very small) risk of a real CRC failure combined with a
spurious CRC success in the old location would only apply on EEPROMs
which haven't been saved since the update.

-Scott
Tabi Timur-B04825 July 13, 2012, 10:46 p.m. UTC | #24
Scott Wood wrote:
> But you continue to generate v1 EEPROMs.  If we get the people who
> define the format to accept v2, then we can generate v2 after the fix is
> applied, and the (very small) risk of a real CRC failure combined with a
> spurious CRC success in the old location would only apply on EEPROMs
> which haven't been saved since the update.

Again, I'm confused.  Why would we bump the version to v2?  What is
different in V2 compared to V1?

It's highly unlikely that there will ever be a V2.  The only reason we
went from V0 to V1 is to allow for more than 8 MAC addresses.  Now the
entire EEPROM is filled with MAC addresses, and so we've reached the hard
limit of 31.  We would need to switch to a new EEPROM device in order to
handle more, and none of the other information in the EEPROM is used by
U-Boot.
Scott Wood July 13, 2012, 10:54 p.m. UTC | #25
On 07/13/2012 05:46 PM, Timur Tabi wrote:
> Scott Wood wrote:
>> But you continue to generate v1 EEPROMs.  If we get the people who
>> define the format to accept v2, then we can generate v2 after the fix is
>> applied, and the (very small) risk of a real CRC failure combined with a
>> spurious CRC success in the old location would only apply on EEPROMs
>> which haven't been saved since the update.
> 
> Again, I'm confused.  Why would we bump the version to v2?  What is
> different in V2 compared to V1?

I'm pretty sure I've explained it adequately a couple times over.

Nothing is different in the spec between v1 and v2 except the version
number.  However, if you see v2 you can assume it didn't come from an
implementation with this bug.

> It's highly unlikely that there will ever be a V2.  The only reason we
> went from V0 to V1 is to allow for more than 8 MAC addresses.  Now the
> entire EEPROM is filled with MAC addresses, and so we've reached the hard
> limit of 31.  We would need to switch to a new EEPROM device in order to
> handle more, and none of the other information in the EEPROM is used by
> U-Boot.

I wasn't suggesting that v2 be used to expand the number of addresses.

-Scott
Wolfgang Denk July 13, 2012, 10:59 p.m. UTC | #26
Dear Scott Wood,

In message <500096E0.9010406@freescale.com> you wrote:
>
> If not, and Wolfgang still refuses to accept this, what about checking
> the old location on a CRC fail, and if the old CRC passes, don't
> automatically use it but print a message telling the user that they
> probably need to run the migration command?

Yes, that appears to be a good idea.


Best regards,

Wolfgang Denk
Wolfgang Denk July 13, 2012, 11:01 p.m. UTC | #27
Dear Timur Tabi,

In message <500098F6.8050702@freescale.com> you wrote:
> 
> I honestly don't see what's wrong with checking the CRC in the old
> location, and using it if it's valid.  Like I said, we already

You are interpreting something which can be random data.

> if (e.version == 0)
> 	crc_offset = 0x72;
> 
> So here we're reading the 'version' field before we validate the data,
> because we need to check the version to know where the CRC is.

Argh.  More crap ...

Best regards,

Wolfgang Denk
Scott Wood July 13, 2012, 11:12 p.m. UTC | #28
On 07/13/2012 06:01 PM, Wolfgang Denk wrote:
> Dear Timur Tabi,
> 
> In message <500098F6.8050702@freescale.com> you wrote:
>>
>> I honestly don't see what's wrong with checking the CRC in the old
>> location, and using it if it's valid.  Like I said, we already
> 
> You are interpreting something which can be random data.
> 
>> if (e.version == 0)
>> 	crc_offset = 0x72;
>>
>> So here we're reading the 'version' field before we validate the data,
>> because we need to check the version to know where the CRC is.
> 
> Argh.  More crap ...

And how would you do it?  You have to look at *something* first, and
whatever that is could be a coincidence if you think people are going to
stuff arbitrary data into the EEPROM.

-Scott
Tabi Timur-B04825 July 15, 2012, 12:52 p.m. UTC | #29
Or Yoram-R56270 wrote:
> 1. Correct the v1 documentation to specify 23 MAC addresses and placing the CRC at 0xCC.
> 2. Create a new code and documentation for v2, specifying 31 MAC addresses and placing the CRC at 0xFC.

Both of these are unnecessary.  My patch fixes the code to match the spec 
*and* handles the CRC problem transparently.  It also doesn't do anything 
controversial that the current code doesn't already do.  Really, this is a 
non-issue.
Wolfgang Denk July 16, 2012, 7:57 p.m. UTC | #30
Dear Scott,

In message <5000AB43.6090406@freescale.com> you wrote:
>
> > You are interpreting something which can be random data.
> > 
> >> if (e.version == 0)
> >> 	crc_offset = 0x72;
> >>
> >> So here we're reading the 'version' field before we validate the data,
> >> because we need to check the version to know where the CRC is.
> > 
> > Argh.  More crap ...
> 
> And how would you do it?  You have to look at *something* first, and
> whatever that is could be a coincidence if you think people are going to
> stuff arbitrary data into the EEPROM.

If you cannot avoid using binary data structures you must make sure
the design allows extensions which do not break the design.  This was
attempted here (CRC at fixed position 0xFC), which is supposed to be
"at the end" - unless the EEPROM size changes one day.

Accessing _any_ data fields in the binary structure must always be
done only _after_ the CRC has been verified.  It makes zero sense to
try to interpret a version field without knowing if it is valid at
all.

Such code shows undefined behaviour.  You may argument that the
likelyhood of a false match is small, but this doesn't matter: it's
still undefined behaviour.


With all the previous explanations already given (only very fewsystems
affected) it is best to remove all this crap, and provide a manual
recovery tool (to be run under close supervision of the user).

Best regards,

Wolfgang Denk
Scott Wood July 16, 2012, 10:32 p.m. UTC | #31
On 07/16/2012 02:57 PM, Wolfgang Denk wrote:
> Dear Scott,
> 
> In message <5000AB43.6090406@freescale.com> you wrote:
>>
>>> You are interpreting something which can be random data.
>>>
>>>> if (e.version == 0)
>>>> 	crc_offset = 0x72;
>>>>
>>>> So here we're reading the 'version' field before we validate the data,
>>>> because we need to check the version to know where the CRC is.
>>>
>>> Argh.  More crap ...
>>
>> And how would you do it?  You have to look at *something* first, and
>> whatever that is could be a coincidence if you think people are going to
>> stuff arbitrary data into the EEPROM.
> 
> If you cannot avoid using binary data structures you must make sure
> the design allows extensions which do not break the design.

That's what the version field is for.  Would it be nice if, from the
beginning, there were a size and CRC field up front?  Sure.  But that's
not what happened, and this is not a data format that we (the team that
Timur and I are on) control.  It's specified by the part of Freescale
that programs EEPROMs on the boards before sending them to customers,
and they're not going to change just because we tell them you think it's
"crap".  And we are not going to break compatibility with existing
EEPROM contents.  Sorry.

> This was
> attempted here (CRC at fixed position 0xFC), which is supposed to be
> "at the end" - unless the EEPROM size changes one day.
> 
> Accessing _any_ data fields in the binary structure must always be
> done only _after_ the CRC has been verified.  It makes zero sense to
> try to interpret a version field without knowing if it is valid at
> all.

And you don't know where the CRC is if you don't know what sort of data
structure it is.  If the version field gets corrupted, and the result is
interpreted as a valid version, then you rely on that version's CRC to
reject the EEPROM contents.

To use your logic, it makes zero sense to interpret a value as a CRC
without first knowing that it's supposed to be one.  Do you reject as
"crap" every single system of format autodetection in the world, even
when a good magic number is used (I'm not claiming that these version
numbers are good magic numbers)?

> Such code shows undefined behaviour.  You may argument that the
> likelyhood of a false match is small, but this doesn't matter: 

In the real world, yes, it does.  It also matters that you need bad data
to start with (a failed CRC in the proper location) to even begin
rolling the dice on whether there's a spurious "good" CRC in the old
location.

The U-Boot source code is managed by a version control system that is
constantly gambling on not getting a chance collision.  Such a collision
is so extremely rare that it will likely never happen to any git user
until humans go extinct, but it's theoretically possible.  The odds here
are larger than that, but still very small, and it's a chance that's
much more rarely taken to begin with.  The odds of there being a problem
if we do what you want us to do are greater.

> it's still undefined behaviour.
> 
> 
> With all the previous explanations already given (only very fewsystems
> affected) it is best to remove all this crap, and provide a manual
> recovery tool

I wonder if "only very few systems affected" was thrown in to prevent a
counterargument that it is best to remove the code for the old U-Boot
environment data structure, replace it with something that is
size-extensible (not to mention the assumption that
CONFIG_SYS_RENDUNDAND_ENVIRONMENT (sic) hasn't changed on upgrade), and
provide a manual recovery tool. :-)

They are systems that we care about, regardless of whether they are
"very few" as a percentage of all boards that U-Boot supports.

> (to be run under close supervision of the user).

Often it's the users that need supervision. :-P

-Scott
diff mbox

Patch

diff --git a/board/freescale/common/sys_eeprom.c b/board/freescale/common/sys_eeprom.c
index d2ed036..2541dd2 100644
--- a/board/freescale/common/sys_eeprom.c
+++ b/board/freescale/common/sys_eeprom.c
@@ -34,8 +34,16 @@ 
 #endif
 
 #ifdef CONFIG_SYS_I2C_EEPROM_NXID
-#define MAX_NUM_PORTS	23
+#define MAX_NUM_PORTS	31
 #define NXID_VERSION	1
+
+/*
+ * Older versions of this code incorrectly placed the CRC at offset 0xCC,
+ * when it should have been at 0xFC.  To maintain compatibility with boards
+ * that have the CRC at 0xCC, we check for the CRC at 0xCC if it's not in
+ * 0xFC.
+ */
+#define BROKEN_CRC_OFFSET	0xCC
 #endif
 
 /**
@@ -71,7 +79,7 @@  static struct __attribute__ ((__packed__)) eeprom {
 	u8 mac_count;     /* 0x40        Number of MAC addresses */
 	u8 mac_flag;      /* 0x41        MAC table flags */
 	u8 mac[MAX_NUM_PORTS][6];     /* 0x42 - x MAC addresses */
-	u32 crc;          /* x+1         CRC32 checksum */
+	u32 crc;          /* 0xFC        CRC32 checksum */
 #endif
 } e;
 
@@ -457,6 +465,22 @@  int mac_read_from_eeprom(void)
 
 	crc = crc32(0, (void *)&e, crc_offset);
 	crcp = (void *)&e + crc_offset;
+#ifdef BROKEN_CRC_OFFSET
+	/*
+	 * If the CRC is wrong, then check the old location.  If it contains a
+	 * valid CRC, then assume that this is an older EEPROM.  We update the
+	 * real CRC so that the EEPROM looks valid.
+	 */
+	if ((e.version == NXID_VERSION) && (crc != be32_to_cpup(crcp))) {
+		u32 crc2 = crc32(0, (void *)&e, BROKEN_CRC_OFFSET);
+		void *crcp2 = (void *)&e + BROKEN_CRC_OFFSET;
+
+		if (crc2 == be32_to_cpup(crcp2)) {
+			debug("Broken NXID v1 CRC found and corrected\n");
+			update_crc();
+		}
+	}
+#endif
 	if (crc != be32_to_cpu(*crcp)) {
 		printf("CRC mismatch (%08x != %08x)\n", crc, be32_to_cpu(e.crc));
 		return -1;