Message ID | c2fba52a-baac-25fb-c26b-c84b25c3178c@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
Series | powerpc/mobility: Fix node detach/rename problem | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | next/apply_patch Successfully applied |
snowpatch_ozlabs/checkpatch | success | Test checkpatch on branch next |
snowpatch_ozlabs/build-ppc64le | warning | Test build-ppc64le on branch next |
snowpatch_ozlabs/build-ppc64be | warning | Test build-ppc64be on branch next |
snowpatch_ozlabs/build-ppc64e | success | Test build-ppc64e on branch next |
snowpatch_ozlabs/build-ppc32 | success | Test build-ppc32 on branch next |
Hi Michael, Thank you for the patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on v4.18-rc6 next-20180727] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Michael-Bringmann/powerpc-mobility-Fix-node-detach-rename-problem/20180729-213517 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-defconfig (attached as .config) compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=powerpc All errors (new ones prefixed by >>): arch/powerpc/platforms/pseries/dlpar.c: In function 'dlpar_detach_node': >> arch/powerpc/platforms/pseries/dlpar.c:276:9: error: passing argument 1 of 'memset' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] memset(dn->name, 0, strlen(dn->name)); ^~ In file included from include/linux/string.h:20:0, from arch/powerpc/include/asm/paca.h:19, from arch/powerpc/include/asm/current.h:16, from include/linux/mutex.h:14, from include/linux/notifier.h:14, from arch/powerpc/platforms/pseries/dlpar.c:16: arch/powerpc/include/asm/string.h:23:15: note: expected 'void *' but argument is of type 'const char *' extern void * memset(void *,int,__kernel_size_t); ^~~~~~ cc1: all warnings being treated as errors vim +276 arch/powerpc/platforms/pseries/dlpar.c 259 260 int dlpar_detach_node(struct device_node *dn) 261 { 262 struct device_node *child; 263 int rc; 264 265 child = of_get_next_child(dn, NULL); 266 while (child) { 267 dlpar_detach_node(child); 268 child = of_get_next_child(dn, child); 269 } 270 271 rc = of_detach_node(dn); 272 if (rc) 273 return rc; 274 275 dn->phandle = 0; > 276 memset(dn->name, 0, strlen(dn->name)); 277 278 return 0; 279 } 280 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Michael Bringmann <mwb@linux.vnet.ibm.com> writes: > During LPAR migration, the content of the device tree/sysfs may > be updated including deletion and replacement of nodes in the > tree. When nodes are added to the internal node structures, they > are appended in FIFO order to a list of nodes maintained by the > OF code APIs. That hasn't been true for several years. The data structure is an n-ary tree. What kernel version are you working on? > When nodes are removed from the device tree, they > are marked OF_DETACHED, but not actually deleted from the system > to allow for pointers cached elsewhere in the kernel. The order > and content of the entries in the list of nodes is not altered, > though. Something is going wrong if this is actually happening. When the node is detached it should be *detached* from the tree of all nodes, so it should not be discoverable other than by having an existing pointer to it. That's what __of_detach_node() does: parent = np->parent; if (WARN_ON(!parent)) return; if (parent->child == np) parent->child = np->sibling; else { struct device_node *prevsib; for (prevsib = np->parent->child; prevsib->sibling != np; prevsib = prevsib->sibling) ; prevsib->sibling = np->sibling; } ie. the node must already have a NULL parent, and then it is spliced out of its parent's child list. Please give us more info so we can work out what's actually happening. cheers
See below. On 07/30/2018 01:31 AM, Michael Ellerman wrote: > Michael Bringmann <mwb@linux.vnet.ibm.com> writes: > >> During LPAR migration, the content of the device tree/sysfs may >> be updated including deletion and replacement of nodes in the >> tree. When nodes are added to the internal node structures, they >> are appended in FIFO order to a list of nodes maintained by the >> OF code APIs. > > That hasn't been true for several years. The data structure is an n-ary > tree. What kernel version are you working on? Sorry for an error in my description. I oversimplified based on the name of a search iterator. Let me try to provide a better explanation of the problem, here. This is the problem. The PPC mobility code receives RTAS requests to delete nodes with platform-/hardware-specific attributes when restarting the kernel after a migration. My example is for migration between a P8 Alpine and a P8 Brazos. Nodes to be deleted may include 'ibm,random-v1', 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1', or others. The mobility.c code calls 'of_detach_node' for the nodes and their children. This makes calls to detach the properties and to try to remove the associated sysfs/kernfs files. Then new copies of the same nodes are next provided by the PHYP, local copies are built, and a pointer to the 'struct device_node' is passed to of_attach_node. Before the call to of_attach_node, the phandle is initialized to 0 when the data structure is alloced. During the call to of_attach_node, it calls __of_attach_node which pulls the actual name and phandle from just created sub-properties named something like 'name' and 'ibm,phandle'. This is all fine for the first migration. The problem occurs with the second and subsequent migrations when the PHYP on the new system wants to replace the same set of nodes again, referenced with the same names and phandle values. > >> When nodes are removed from the device tree, they >> are marked OF_DETACHED, but not actually deleted from the system >> to allow for pointers cached elsewhere in the kernel. The order >> and content of the entries in the list of nodes is not altered, >> though. > > Something is going wrong if this is actually happening. > > When the node is detached it should be *detached* from the tree of all > nodes, so it should not be discoverable other than by having an existing > pointer to it. On the second and subsequent migrations, the PHYP tells the system to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1', 'ibm,compression-v1', 'ibm,sym-encryption-v1'. It specifies these nodes by its known set of phandle values -- the same handles used by the PHYP on the source system are known on the target system. The mobility.c code calls of_find_node_by_phandle() with these values and ends up locating the first instance of each node that was added during the original boot, instead of the second instance of each node created after the first migration. The detach during the second migration fails with errors like, [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0 [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G W 4.18.0-rc1-wi107836-v05-120+ #201 [ 4565.030737] NIP: c0000000007c1ea8 LR: c0000000007c1fb4 CTR: 0000000000655170 [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700 Tainted: G W (4.18.0-rc1-wi107836-v05-120+) [ 4565.030745] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22288822 XER: 0000000a [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1 [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 c0000003ffff8e68 [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 ffffffffffffffff [ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 0000000000002843 [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 0000000000000000 [ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 00000000f6ffffff [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 0000000000000001 [ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 c0000003f302b930 [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0 [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0 [ 4565.030811] Call Trace: [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] of_detach_node+0x64/0xd0 (unreliable) [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] dlpar_detach_node+0xb4/0x150 [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80 [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] pseries_devicetree_update+0x150/0x4f0 [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] post_mobility_fixup+0x7c/0xf0 [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] migration_store+0xf8/0x130 [ 4565.030847] [c0000003f302bc70] [c000000000998160] kobj_attr_store+0x30/0x60 [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0 [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] kernfs_fop_write+0x16c/0x240 [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220 [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240 [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100 [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70 [ 4565.030884] Instruction dump: [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 4cde0020 e9030040 [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]--- The mobility.c code continues on during the second migration, accepts the definitions of the new nodes from the PHYP and ends up renaming the new properties e.g. [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1" I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within of_find_node_by_phandle to skip nodes that are detached, but still present due to caching or use count considerations. Another possibility to consider is that of_find_node_by_phandle also uses something called 'phandle_cache' which may have outdated data as of_detach_node() does not have access to that cache for the 'OF_DETACHED' nodes. > > That's what __of_detach_node() does: > > parent = np->parent; > if (WARN_ON(!parent)) > return; > > if (parent->child == np) > parent->child = np->sibling; > else { > struct device_node *prevsib; > for (prevsib = np->parent->child; > prevsib->sibling != np; > prevsib = prevsib->sibling) > ; > prevsib->sibling = np->sibling; > } > > ie. the node must already have a NULL parent, and then it is spliced out > of its parent's child list. > > > Please give us more info so we can work out what's actually happening. I duplicated the problem in the 4.18 raw kernel on the second and subsequent legs of migration of an LPAR back and forth between an Alpine to a Brazos. Again, entries like 'ibm,random-v1', 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1', yield errors as described above. My patch was attempting to resolve the issue wholly within the powerpc code, at this time. > > cheers > Michael
On 07/29/2018 06:11 AM, Michael Bringmann wrote: > During LPAR migration, the content of the device tree/sysfs may > be updated including deletion and replacement of nodes in the > tree. When nodes are added to the internal node structures, they > are appended in FIFO order to a list of nodes maintained by the > OF code APIs. When nodes are removed from the device tree, they > are marked OF_DETACHED, but not actually deleted from the system > to allow for pointers cached elsewhere in the kernel. The order > and content of the entries in the list of nodes is not altered, > though. > > During LPAR migration some common nodes are deleted and re-added > e.g. "ibm,platform-facilities". If a node is re-added to the OF > node lists, the of_attach_node function checks to make sure that > the name + ibm,phandle of the to-be-added data is unique. As the > previous copy of a re-added node is not modified beyond the addition > of a bit flag, the code (1) finds the old copy, (2) prints a WARNING > notice to the console, (3) renames the to-be-added node to avoid > filename collisions within a directory, and (3) adds entries to > the sysfs/kernfs. So, this patch actually just band aids over the real problem. This is a long standing problem with several PFO drivers leaking references. The issue here is that, during the device tree update that follows a migration. the update of the ibm,platform-facilities node and friends below are always deleted and re-added on the destination lpar and subsequently the leaked references prevent the devices nodes from every actually being properly cleaned up after detach. Thus, leading to the issue you are observing. As and example a quick look at nx-842-pseries.c reveals several issues. static int __init nx842_pseries_init(void) { struct nx842_devdata *new_devdata; int ret; if (!of_find_compatible_node(NULL, NULL, "ibm,compression")) return -ENODEV; This call to of_find_compatible_node() results in a node returned with refcount incremented and therefore immediately leaked. Further, the reconfig notifier logic makes the assumption that it only needs to deal with node updates, but as I outlined above the post migration device tree update always results in PFO nodes and properties being deleted and re-added. /** * nx842_OF_notifier - Process updates to OF properties for the device * * @np: notifier block * @action: notifier action * @update: struct pSeries_reconfig_prop_update pointer if action is * PSERIES_UPDATE_PROPERTY * * Returns: * NOTIFY_OK on success * NOTIFY_BAD encoded with error number on failure, use * notifier_to_errno() to decode this value */ static int nx842_OF_notifier(struct notifier_block *np, unsigned long action, void *data) { struct of_reconfig_data *upd = data; struct nx842_devdata *local_devdata; struct device_node *node = NULL; rcu_read_lock(); local_devdata = rcu_dereference(devdata); if (local_devdata) node = local_devdata->dev->of_node; if (local_devdata && action == OF_RECONFIG_UPDATE_PROPERTY && !strcmp(upd->dn->name, node->name)) { rcu_read_unlock(); nx842_OF_upd(upd->prop); } else rcu_read_unlock(); return NOTIFY_OK; } I expect to find the same problems in pseries-rng.c and nx.c. -Tyrel > > This patch fixes the 'migration add' problem by changing the > stored 'phandle' of the OF_DETACHed node to 0 (reserved value for > of_find_node_by_phandle), so that subsequent re-add operations, > such as those during migration, do not find the detached node, > do not observe duplicate names, do not rename them, and the > extra WARNING notices are removed from the console output. > > In addition, it erases the 'name' field of the OF_DETACHED node, > to prevent any future calls to of_find_node_by_name() or > of_find_node_by_path() from matching this node. > > Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> > --- > arch/powerpc/platforms/pseries/dlpar.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c > index 2de0f0d..9d82c28 100644 > --- a/arch/powerpc/platforms/pseries/dlpar.c > +++ b/arch/powerpc/platforms/pseries/dlpar.c > @@ -274,6 +274,9 @@ int dlpar_detach_node(struct device_node *dn) > if (rc) > return rc; > > + dn->phandle = 0; > + memset(dn->name, 0, strlen(dn->name)); > + > return 0; > } > >
On 07/30/2018 05:59 PM, Tyrel Datwyler wrote: > On 07/29/2018 06:11 AM, Michael Bringmann wrote: >> During LPAR migration, the content of the device tree/sysfs may >> be updated including deletion and replacement of nodes in the >> tree. When nodes are added to the internal node structures, they >> are appended in FIFO order to a list of nodes maintained by the >> OF code APIs. When nodes are removed from the device tree, they >> are marked OF_DETACHED, but not actually deleted from the system >> to allow for pointers cached elsewhere in the kernel. The order >> and content of the entries in the list of nodes is not altered, >> though. >> >> During LPAR migration some common nodes are deleted and re-added >> e.g. "ibm,platform-facilities". If a node is re-added to the OF >> node lists, the of_attach_node function checks to make sure that >> the name + ibm,phandle of the to-be-added data is unique. As the >> previous copy of a re-added node is not modified beyond the addition >> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING >> notice to the console, (3) renames the to-be-added node to avoid >> filename collisions within a directory, and (3) adds entries to >> the sysfs/kernfs. > > So, this patch actually just band aids over the real problem. This is a long standing problem with several PFO drivers leaking references. The issue here is that, during the device tree update that follows a migration. the update of the ibm,platform-facilities node and friends below are always deleted and re-added on the destination lpar and subsequently the leaked references prevent the devices nodes from every actually being properly cleaned up after detach. Thus, leading to the issue you are observing. > > As and example a quick look at nx-842-pseries.c reveals several issues. > > static int __init nx842_pseries_init(void) > { > struct nx842_devdata *new_devdata; > int ret; > > if (!of_find_compatible_node(NULL, NULL, "ibm,compression")) > return -ENODEV; > > This call to of_find_compatible_node() results in a node returned with refcount incremented and therefore immediately leaked. > > Further, the reconfig notifier logic makes the assumption that it only needs to deal with node updates, but as I outlined above the post migration device tree update always results in PFO nodes and properties being deleted and re-added. > > /** > * nx842_OF_notifier - Process updates to OF properties for the device > * > * @np: notifier block > * @action: notifier action > * @update: struct pSeries_reconfig_prop_update pointer if action is > * PSERIES_UPDATE_PROPERTY > * > * Returns: > * NOTIFY_OK on success > * NOTIFY_BAD encoded with error number on failure, use > * notifier_to_errno() to decode this value > */ > static int nx842_OF_notifier(struct notifier_block *np, unsigned long action, > void *data) > { > struct of_reconfig_data *upd = data; > struct nx842_devdata *local_devdata; > struct device_node *node = NULL; > > rcu_read_lock(); > local_devdata = rcu_dereference(devdata); > if (local_devdata) > node = local_devdata->dev->of_node; > > if (local_devdata && > action == OF_RECONFIG_UPDATE_PROPERTY && > !strcmp(upd->dn->name, node->name)) { > rcu_read_unlock(); > nx842_OF_upd(upd->prop); > } else > rcu_read_unlock(); > > return NOTIFY_OK; > } > > I expect to find the same problems in pseries-rng.c and nx.c. So, in actuality the main root of the problem is really in the vio core. A node reference for each PFO device under "ibm,platform-facilities" is taken during vio_register_device_node(). We need a reconfig notifier to call vio_unregister_device() for each PFO device on detach, and vio_bus_scan_register_devices("ibm,platform-facilities") on attach. This will make sure the PFO vio devices are released such that vio_dev_release() gets called to put the node reference that was taken at original registration time. /* vio_dev refcount hit 0 */ static void vio_dev_release(struct device *dev) { struct iommu_table *tbl = get_iommu_table_base(dev); if (tbl) iommu_tce_table_put(tbl); of_node_put(dev->of_node); kfree(to_vio_dev(dev)); } -Tyrel > > -Tyrel > >> >> This patch fixes the 'migration add' problem by changing the >> stored 'phandle' of the OF_DETACHed node to 0 (reserved value for >> of_find_node_by_phandle), so that subsequent re-add operations, >> such as those during migration, do not find the detached node, >> do not observe duplicate names, do not rename them, and the >> extra WARNING notices are removed from the console output. >> >> In addition, it erases the 'name' field of the OF_DETACHED node, >> to prevent any future calls to of_find_node_by_name() or >> of_find_node_by_path() from matching this node. >> >> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> >> --- >> arch/powerpc/platforms/pseries/dlpar.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c >> index 2de0f0d..9d82c28 100644 >> --- a/arch/powerpc/platforms/pseries/dlpar.c >> +++ b/arch/powerpc/platforms/pseries/dlpar.c >> @@ -274,6 +274,9 @@ int dlpar_detach_node(struct device_node *dn) >> if (rc) >> return rc; >> >> + dn->phandle = 0; >> + memset(dn->name, 0, strlen(dn->name)); >> + >> return 0; >> } >> >> >
Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes: > On 07/29/2018 06:11 AM, Michael Bringmann wrote: >> During LPAR migration, the content of the device tree/sysfs may >> be updated including deletion and replacement of nodes in the >> tree. When nodes are added to the internal node structures, they >> are appended in FIFO order to a list of nodes maintained by the >> OF code APIs. When nodes are removed from the device tree, they >> are marked OF_DETACHED, but not actually deleted from the system >> to allow for pointers cached elsewhere in the kernel. The order >> and content of the entries in the list of nodes is not altered, >> though. >> >> During LPAR migration some common nodes are deleted and re-added >> e.g. "ibm,platform-facilities". If a node is re-added to the OF >> node lists, the of_attach_node function checks to make sure that >> the name + ibm,phandle of the to-be-added data is unique. As the >> previous copy of a re-added node is not modified beyond the addition >> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING >> notice to the console, (3) renames the to-be-added node to avoid >> filename collisions within a directory, and (3) adds entries to >> the sysfs/kernfs. > > So, this patch actually just band aids over the real problem. This is > a long standing problem with several PFO drivers leaking references. > The issue here is that, during the device tree update that follows a > migration. the update of the ibm,platform-facilities node and friends > below are always deleted and re-added on the destination lpar and > subsequently the leaked references prevent the devices nodes from > every actually being properly cleaned up after detach. Thus, leading > to the issue you are observing. Leaking references shouldn't affect the node being detached from the tree though. See of_detach_node() calling __of_detach_node(), none of that depends on the refcount. It's only the actual freeing of the node, in of_node_release() that is prevented by leaked reference counts. So I agree we need to do a better job with the reference counting, but I don't see how it is causing the problem here. cheers
On 07/30/2018 11:42 PM, Michael Ellerman wrote: > Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes: >> On 07/29/2018 06:11 AM, Michael Bringmann wrote: >>> During LPAR migration, the content of the device tree/sysfs may >>> be updated including deletion and replacement of nodes in the >>> tree. When nodes are added to the internal node structures, they >>> are appended in FIFO order to a list of nodes maintained by the >>> OF code APIs. When nodes are removed from the device tree, they >>> are marked OF_DETACHED, but not actually deleted from the system >>> to allow for pointers cached elsewhere in the kernel. The order >>> and content of the entries in the list of nodes is not altered, >>> though. >>> >>> During LPAR migration some common nodes are deleted and re-added >>> e.g. "ibm,platform-facilities". If a node is re-added to the OF >>> node lists, the of_attach_node function checks to make sure that >>> the name + ibm,phandle of the to-be-added data is unique. As the >>> previous copy of a re-added node is not modified beyond the addition >>> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING >>> notice to the console, (3) renames the to-be-added node to avoid >>> filename collisions within a directory, and (3) adds entries to >>> the sysfs/kernfs. >> >> So, this patch actually just band aids over the real problem. This is >> a long standing problem with several PFO drivers leaking references. >> The issue here is that, during the device tree update that follows a >> migration. the update of the ibm,platform-facilities node and friends >> below are always deleted and re-added on the destination lpar and >> subsequently the leaked references prevent the devices nodes from >> every actually being properly cleaned up after detach. Thus, leading >> to the issue you are observing. So, it was the end of the day, and I kind of glossed over the issue Michael was trying to address with an issue that I remembered that had been around for awhile. > > Leaking references shouldn't affect the node being detached from the > tree though. No, but it does prevent it from being freed from sysfs which leads to the sysfs entry renaming that happens when another node with the same name is attached. > > See of_detach_node() calling __of_detach_node(), none of that depends on > the refcount. > > It's only the actual freeing of the node, in of_node_release() that is > prevented by leaked reference counts. Right, but if we did refcount correctly there is the new problem that the node is freed and now the phandle cache points at freed memory in the case were no invalidation is done. > > So I agree we need to do a better job with the reference counting, but I > don't see how it is causing the problem here Now in regards to the phandle caching somehow I missed that code going into OF earlier this year. That should have had at least some discussion from our side based on the fact that it is built on dtc compiler assumption that there are a set number of phandles that are allocated starting at 1..n such that they are monotonically increasing. That has a nice fixed size with O(1) lookup time. Phyp doesn't guarantee any sort of niceness with nicely ordered phandles. From what I've seen there are a subset of phandles that decrease from (-1) monotonically, and then there are a bunch of random values for cpus and IOAs. Thinking on it might not be that big of a deal as we just end up in the case where we have a phandle collision one makes it into the cache and is optimized while the other doesn't. On another note, they clearly hit a similar issue during overlay attach and remove, and as Rob pointed out their solution to handle it is a full cache invalidation followed by rescanning the whole tree to rebuild it. Seeing as our dynamic lifecycle is node by node this definitely adds some overhead. -Tyrel > > cheers >
Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes: > On 07/30/2018 11:42 PM, Michael Ellerman wrote: >> Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes: >>> On 07/29/2018 06:11 AM, Michael Bringmann wrote: >>>> During LPAR migration, the content of the device tree/sysfs may >>>> be updated including deletion and replacement of nodes in the >>>> tree. When nodes are added to the internal node structures, they >>>> are appended in FIFO order to a list of nodes maintained by the >>>> OF code APIs. When nodes are removed from the device tree, they >>>> are marked OF_DETACHED, but not actually deleted from the system >>>> to allow for pointers cached elsewhere in the kernel. The order >>>> and content of the entries in the list of nodes is not altered, >>>> though. >>>> >>>> During LPAR migration some common nodes are deleted and re-added >>>> e.g. "ibm,platform-facilities". If a node is re-added to the OF >>>> node lists, the of_attach_node function checks to make sure that >>>> the name + ibm,phandle of the to-be-added data is unique. As the >>>> previous copy of a re-added node is not modified beyond the addition >>>> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING >>>> notice to the console, (3) renames the to-be-added node to avoid >>>> filename collisions within a directory, and (3) adds entries to >>>> the sysfs/kernfs. >>> >>> So, this patch actually just band aids over the real problem. This is >>> a long standing problem with several PFO drivers leaking references. >>> The issue here is that, during the device tree update that follows a >>> migration. the update of the ibm,platform-facilities node and friends >>> below are always deleted and re-added on the destination lpar and >>> subsequently the leaked references prevent the devices nodes from >>> every actually being properly cleaned up after detach. Thus, leading >>> to the issue you are observing. > > So, it was the end of the day, and I kind of glossed over the issue > Michael was trying to address with an issue that I remembered that had > been around for awhile. Sure, I know that feeling :) >> Leaking references shouldn't affect the node being detached from the >> tree though. > > No, but it does prevent it from being freed from sysfs which leads to > the sysfs entry renaming that happens when another node with the same > name is attached. OK. >> See of_detach_node() calling __of_detach_node(), none of that depends on >> the refcount. >> >> It's only the actual freeing of the node, in of_node_release() that is >> prevented by leaked reference counts. > > Right, but if we did refcount correctly there is the new problem that > the node is freed and now the phandle cache points at freed memory in > the case were no invalidation is done. Yeah that's bad. I guess no one is supposed to lookup that phandle anymore, but it's super fragile. >> So I agree we need to do a better job with the reference counting, but I >> don't see how it is causing the problem here > > Now in regards to the phandle caching somehow I missed that code going > into OF earlier this year. That should have had at least some > discussion from our side based on the fact that it is built on dtc > compiler assumption that there are a set number of phandles that are > allocated starting at 1..n such that they are monotonically > increasing. That has a nice fixed size with O(1) lookup time. Phyp > doesn't guarantee any sort of niceness with nicely ordered phandles. > From what I've seen there are a subset of phandles that decrease from > (-1) monotonically, and then there are a bunch of random values for > cpus and IOAs. Thinking on it might not be that big of a deal as we > just end up in the case where we have a phandle collision one makes it > into the cache and is optimized while the other doesn't. On another > note, they clearly hit a similar issue during overlay attach and > remove, and as Rob pointed out their solution to handle it is a full > cache invalidation followed by rescanning the whole tree to rebuild > it. Seeing as our dynamic lifecycle is node by node this definitely > adds some overhead. Yeah I also should have noticed it going in, but despite being subscribed to the devicetree list I didn't spot it in the flood. AIUI the 1..n assumption is not about correctness but if our phandles violate that we may not be getting any benefit. Anyway yeah lots of things to look at tomorrow :) cheers
diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c index 2de0f0d..9d82c28 100644 --- a/arch/powerpc/platforms/pseries/dlpar.c +++ b/arch/powerpc/platforms/pseries/dlpar.c @@ -274,6 +274,9 @@ int dlpar_detach_node(struct device_node *dn) if (rc) return rc; + dn->phandle = 0; + memset(dn->name, 0, strlen(dn->name)); + return 0; }
During LPAR migration, the content of the device tree/sysfs may be updated including deletion and replacement of nodes in the tree. When nodes are added to the internal node structures, they are appended in FIFO order to a list of nodes maintained by the OF code APIs. When nodes are removed from the device tree, they are marked OF_DETACHED, but not actually deleted from the system to allow for pointers cached elsewhere in the kernel. The order and content of the entries in the list of nodes is not altered, though. During LPAR migration some common nodes are deleted and re-added e.g. "ibm,platform-facilities". If a node is re-added to the OF node lists, the of_attach_node function checks to make sure that the name + ibm,phandle of the to-be-added data is unique. As the previous copy of a re-added node is not modified beyond the addition of a bit flag, the code (1) finds the old copy, (2) prints a WARNING notice to the console, (3) renames the to-be-added node to avoid filename collisions within a directory, and (3) adds entries to the sysfs/kernfs. This patch fixes the 'migration add' problem by changing the stored 'phandle' of the OF_DETACHed node to 0 (reserved value for of_find_node_by_phandle), so that subsequent re-add operations, such as those during migration, do not find the detached node, do not observe duplicate names, do not rename them, and the extra WARNING notices are removed from the console output. In addition, it erases the 'name' field of the OF_DETACHED node, to prevent any future calls to of_find_node_by_name() or of_find_node_by_path() from matching this node. Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> --- arch/powerpc/platforms/pseries/dlpar.c | 3 +++ 1 file changed, 3 insertions(+)