diff mbox series

[v02] powerpc/mobility: Fix node detach/rename problem

Message ID b3f658d9-efc5-e532-e8d4-162494b88194@linux.vnet.ibm.com (mailing list archive)
State Changes Requested
Headers show
Series [v02] powerpc/mobility: Fix node detach/rename problem | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success next/apply_patch Successfully applied
snowpatch_ozlabs/checkpatch fail Test checkpatch on branch next
snowpatch_ozlabs/build-ppc64le fail Test build-ppc64le on branch next
snowpatch_ozlabs/build-ppc64be fail Test build-ppc64be on branch next
snowpatch_ozlabs/build-ppc64e success Test build-ppc64e on branch next
snowpatch_ozlabs/build-ppc32 success Test build-ppc32 on branch next

Commit Message

Michael Bringmann Aug. 6, 2018, 2:21 p.m. UTC
The PPC mobility code receives RTAS requests to delete nodes with
platform-/hardware-specific attributes when restarting the kernel
after a migration.  My example is for migration between a P8 Alpine
and a P8 Brazos.   Nodes to be deleted include 'ibm,random-v1',
'ibm,platform-facilities', 'ibm,sym-encryption-v1', and,
'ibm,compression-v1'.

The mobility.c code calls 'of_detach_node' for the nodes and their
children.  This makes calls to detach the properties and to remove
the associated sysfs/kernfs files.

Then new copies of the same nodes are next provided by the PHYP,
local copies are built, and a pointer to the 'struct device_node'
is passed to of_attach_node.  Before the call to of_attach_node,
the phandle is initialized to 0 when the data structure is alloced.
During the call to of_attach_node, it calls __of_attach_node which
pulls the actual name and phandle from just created sub-properties
named something like 'name' and 'ibm,phandle'.

This is all fine for the first migration.  The problem occurs with
the second and subsequent migrations when the PHYP on the new system
wants to replace the same set of nodes again, referenced with the
same names and phandle values.

On the second and subsequent migrations, the PHYP tells the system
to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
nodes by its known set of phandle values -- the same handles used
by the PHYP on the source system are known on the target system.
The mobility.c code calls of_find_node_by_phandle() with these values
and ends up locating the first instance of each node that was added
during the original boot, instead of the second instance of each node
created after the first migration.  The detach during the second
migration fails with errors like,

[ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0
[ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
[ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G        W         4.18.0-rc1-wi107836-v05-120+ #201
[ 4565.030737] NIP:  c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170
[ 4565.030741] REGS: c0000003f302b690 TRAP: 0700   Tainted: G        W          (4.18.0-rc1-wi107836-v05-120+)
[ 4565.030745] MSR:  800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22288822  XER: 0000000a
[ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1
[ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68
[ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff
[ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843
[ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000
[ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff
[ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001
[ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930
[ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0
[ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0
[ 4565.030811] Call Trace:
[ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable)
[ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150
[ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80
[ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0
[ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0
[ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130
[ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60
[ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0
[ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240
[ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220
[ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240
[ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100
[ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70
[ 4565.030884] Instruction dump:
[ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8
[ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040
[ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]---

The mobility.c code continues on during the second migration, accepts
the definitions of the new nodes from the PHYP and ends up renaming
the new properties e.g.

[ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1"

There is no check like 'of_node_check_flag(np, OF_DETACHED)' within
of_find_node_by_phandle to skip nodes that are detached, but still
present due to caching or use count considerations.  Also, note that
of_find_node_by_phandle also uses a 'phandle_cache' which does not
appear to be updated when of_detach_node() is invoked.

We don't appear to have anything that invalidates the phandle_cache
when a node is removed.

The right solution may be for __of_detach_node() to invalidate
phandle_cache for the node being detached.  Alternatively, we can
manually invalidate / rebuild the phandle_cache at the point of
LPAR migration.  The latter solution is presented here.
---
 arch/powerpc/platforms/pseries/mobility.c |    7 +++++++
 1 file changed, 7 insertions(+)

Comments

Michael Ellerman Aug. 8, 2018, 2:02 p.m. UTC | #1
Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index e245a88..efc9442 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -22,6 +22,9 @@
>  #include <asm/rtas.h>
>  #include "pseries.h"
>  
> +extern int of_free_phandle_cache(void);
> +extern void of_populate_phandle_cache(void);

We don't do that, they should be in a header.

But that's a minor problem given that the patch doesn't compile, because
both those functions are static.

Presumably you have a hack in your tree to make them non-static?
Please try and compile your patches in a clean tree before sending.

cheers
Michael Bringmann Aug. 8, 2018, 3:37 p.m. UTC | #2
On 08/08/2018 09:02 AM, Michael Ellerman wrote:
> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index e245a88..efc9442 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -22,6 +22,9 @@
>>  #include <asm/rtas.h>
>>  #include "pseries.h"
>>  
>> +extern int of_free_phandle_cache(void);
>> +extern void of_populate_phandle_cache(void);
> 
> We don't do that, they should be in a header.
> 
> But that's a minor problem given that the patch doesn't compile, because
> both those functions are static.

I am building against the latest 'linux-ppc' kernel.  It includes patch

Commit b9952b5218added5577e4a3443969bc20884cea9 Mon Sep 17 00:00:00 2001
From: Frank Rowand <frank.rowand@sony.com>
Date: Thu, 12 Jul 2018 14:00:07 -0700
Subject: of: overlay: update phandle cache on overlay apply and remove

which makes the functions static.  I will rebuild and test with an
earlier version if you will specify which one.

> 
> Presumably you have a hack in your tree to make them non-static?
> Please try and compile your patches in a clean tree before sending.
> 
> cheers

Regards,
Michael
Michael Bringmann Aug. 8, 2018, 3:39 p.m. UTC | #3
I will update the header files 'of_private.h' and 'of.h' and repost.

Michael

On 08/08/2018 10:37 AM, Michael Bringmann wrote:
> On 08/08/2018 09:02 AM, Michael Ellerman wrote:
>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>>> index e245a88..efc9442 100644
>>> --- a/arch/powerpc/platforms/pseries/mobility.c
>>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>>> @@ -22,6 +22,9 @@
>>>  #include <asm/rtas.h>
>>>  #include "pseries.h"
>>>  
>>> +extern int of_free_phandle_cache(void);
>>> +extern void of_populate_phandle_cache(void);
>>
>> We don't do that, they should be in a header.
>>
>> But that's a minor problem given that the patch doesn't compile, because
>> both those functions are static.
> 
> I am building against the latest 'linux-ppc' kernel.  It includes patch
> 
> Commit b9952b5218added5577e4a3443969bc20884cea9 Mon Sep 17 00:00:00 2001
> From: Frank Rowand <frank.rowand@sony.com>
> Date: Thu, 12 Jul 2018 14:00:07 -0700
> Subject: of: overlay: update phandle cache on overlay apply and remove
> 
> which makes the functions static.  I will rebuild and test with an
> earlier version if you will specify which one.
> 
>>
>> Presumably you have a hack in your tree to make them non-static?
>> Please try and compile your patches in a clean tree before sending.
>>
>> cheers
> 
> Regards,
> Michael
>
Michael Ellerman Aug. 10, 2018, 1:46 a.m. UTC | #4
Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
> On 08/08/2018 09:02 AM, Michael Ellerman wrote:
>> Michael Bringmann <mwb@linux.vnet.ibm.com> writes:
>>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>>> index e245a88..efc9442 100644
>>> --- a/arch/powerpc/platforms/pseries/mobility.c
>>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>>> @@ -22,6 +22,9 @@
>>>  #include <asm/rtas.h>
>>>  #include "pseries.h"
>>>  
>>> +extern int of_free_phandle_cache(void);
>>> +extern void of_populate_phandle_cache(void);
>> 
>> We don't do that, they should be in a header.
>> 
>> But that's a minor problem given that the patch doesn't compile, because
>> both those functions are static.
>
> I am building against the latest 'linux-ppc' kernel.  It includes patch

OK you must be using the master branch.

> Commit b9952b5218added5577e4a3443969bc20884cea9 Mon Sep 17 00:00:00 2001
> From: Frank Rowand <frank.rowand@sony.com>
> Date: Thu, 12 Jul 2018 14:00:07 -0700
> Subject: of: overlay: update phandle cache on overlay apply and remove

That only landed in v4.18-rc6, so it's not in my next branch which is
where patches like this targeted for the next release are applied.

> which makes the functions static.  I will rebuild and test with an
> earlier version if you will specify which one.

No that's fine it will just have to wait until next and master are
merged before it can go in.

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index e245a88..efc9442 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -22,6 +22,9 @@ 
 #include <asm/rtas.h>
 #include "pseries.h"
 
+extern int of_free_phandle_cache(void);
+extern void of_populate_phandle_cache(void);
+
 static struct kobject *mobility_kobj;
 
 struct update_props_workarea {
@@ -343,6 +346,8 @@  void post_mobility_fixup(void)
 		rc = rtas_call(activate_fw_token, 0, 1, NULL);
 	} while (rtas_busy_delay(rc));
 
+	of_free_phandle_cache();
+
 	if (rc)
 		printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc);
 
@@ -354,6 +359,8 @@  void post_mobility_fixup(void)
 	/* Possibly switch to a new RFI flush type */
 	pseries_setup_rfi_flush();
 
+	of_populate_phandle_cache();
+
 	return;
 }