Patchwork PCI: Separate stop and remove devices in pciehp

login
register
mail settings
Submitter Bjorn Helgaas
Date July 23, 2013, 3:56 p.m.
Message ID <20130723155645.GA18422@google.com>
Download mbox | patch
Permalink /patch/261119/
State Not Applicable
Headers show

Comments

Bjorn Helgaas - July 23, 2013, 3:56 p.m.
On Mon, Jul 22, 2013 at 07:32:06PM -0700, Yinghai Lu wrote:
> On Mon, Jul 22, 2013 at 2:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > Evidently this is really part of a series, but you didn't label it
> > that way.  It looks like this applies on top of your "PCI: Fix hotplug
> > remove with sriov again" patch?
> 
> Yes.
> 
> that one need be back ported to 3.9 and later.
> 
> this one need to be for 3.11 final, as Jiang's patch is in 3.11-rc0.
> 
> >
> > On Fri, Jul 19, 2013 at 1:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> >> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
> >> (PCI: Simplify IOV implementation and fix reference count races)
> >> VF need to be removed via virtfn_remove to make sure ref to PF
> >> is put back.
> >
> > I'm sure this makes sense, but it needs background.  I haven't figured
> > out exactly what the problem is.  You're describing the lowest-level
> > details, but not the overall picture that would make it understandable
> > to the rest of us.
> 
> Overall, after we reversely loop the bus_devices, VF get stop and removed,
> but fail to release ref to PF, and prevent PF to be removed and freed.
> 
> >
> >> So we can not call stop_and_remove for VF before PF, that will
> >> make VF get removed directly before PF's driver try to use
> >> virtfn_remove to do it.
> >>
> >> Solution is separating stop and remove in two iterations.
> >>
> >> In first iteration, we stop VF driver at first with iterating devices
> >> reversely, and later during stop PF driver, disable_sriov will use
> >> virtfn_remove to remove VFs.
> >>
> >> Also some driver (like mlx4_core) need VF's driver get stopped before PF.
> >
> > If this is relevant, please give a pointer to the mlx4_core code that
> > requires VFs to be stopped before the PF.
> 
> that is add-on benefits.
> 
> drivers/net/ethernet/mellanox/mlx4/main.c::
> static void mlx4_remove_one(struct pci_dev *pdev)
> {
>         struct mlx4_dev  *dev  = pci_get_drvdata(pdev);
>         struct mlx4_priv *priv = mlx4_priv(dev);
>         int p;
> 
>         if (dev) {
>                 /* in SRIOV it is not allowed to unload the pf's
>                  * driver while there are alive vf's */
>                 if (mlx4_is_master(dev)) {
>                         if (mlx4_how_many_lives_vf(dev))
>                                 printk(KERN_ERR "Removing PF when
> there are assigned VF's !!!\n");
>                 }
> 
> mlx4_how_many_lives_vf() is checking how VFs have driver loaded.
> 
> >
> >> To make it simple, separate VGA checking out and do that at first,
> >> if there is VGA in the chain, do even try to stop or remove any device
> >> under that bus.
> >
> > This part seems like it could be a separate patch.
> 
> ok, will separate it out in next rev.
> 
> >
> >> Need this one for v3.11.
> >
> > OK, but why?  We need a better explanation of what problem this fixes.
> >  It sounds like it fixes a reference counting problem in v3.11-rc1,
> > but I don't know what the effect of that problem is.  Maybe it means a
> > device isn't freed when it should be?  Maybe it means we can't add a
> > new device after hot-removing an SR-IOV device?
> 
> Yes.

Can you include an example that shows the failure, like you did for
the "Fix hotplug remove" patch?

> ...
> >> Index: linux-2.6/drivers/pci/remove.c
> >> ===================================================================
> >> --- linux-2.6.orig/drivers/pci/remove.c
> >> +++ linux-2.6/drivers/pci/remove.c
> >> @@ -56,7 +56,7 @@ void pci_remove_bus(struct pci_bus *bus)
> >>  }
> >>  EXPORT_SYMBOL(pci_remove_bus);
> >>
> >> -static void pci_stop_bus_device(struct pci_dev *dev)
> >> +void pci_stop_bus_device(struct pci_dev *dev)
> >>  {
> >>         struct pci_bus *bus = dev->subordinate;
> >>         struct pci_dev *child, *tmp;
> >> @@ -75,8 +75,9 @@ static void pci_stop_bus_device(struct p
> >>
> >>         pci_stop_dev(dev);
> >>  }
> >> +EXPORT_SYMBOL(pci_stop_bus_device);
> >>
> >> -static void pci_remove_bus_device(struct pci_dev *dev)
> >> +void pci_remove_bus_device(struct pci_dev *dev)
> >>  {
> >>         struct pci_bus *bus = dev->subordinate;
> >>         struct pci_dev *child, *tmp;
> >> @@ -92,6 +93,7 @@ static void pci_remove_bus_device(struct
> >>
> >>         pci_destroy_dev(dev);
> >>  }
> >> +EXPORT_SYMBOL(pci_remove_bus_device);
> >
> > I really don't want to export these two symbols unless we have to.
> > Obviously this is the heart of your patch, so we probably *will* have
> > to.
> 
> Agree.
> 
> >
> > But maybe they can at least be confined to drivers/pci code, and not
> > exported to modules?  I don't think there's any reason pciehp needs to
> > be a module.  I was planning to merge a patch restricting it to be
> > built-in this cycle anyway.
> 
> sure, you can apply that patch (make pciehp to be built-in) at first.

Below is the patch I intend to apply.  We can easily do this for v3.12.
But your patch needs to be in v3.11, so we'll have to figure out how
to handle that.  Maybe we can put the Kconfig change in v3.11, too.
It should be low-risk because it doesn't add any new code paths that
weren't in v3.10.

Bjorn


commit 04216ce0f2381090572142ebab28f63bae157d8d
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Thu Jun 27 10:16:19 2013 -0600

    PCI: pciehp: Convert pciehp to be builtin only, not modular
    
    Convert pciehp to be builtin only, with no module option.
    
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - July 23, 2013, 10:44 p.m.
On Tue, Jul 23, 2013 at 8:56 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Can you include an example that shows the failure, like you did for
> the "Fix hotplug remove" patch?

Rescue:~ # echo -n 0 > /sys/bus/pci/slots/2/power
[  342.236825] pci_hotplug: power_write_file: power = 0
[  342.237149] pciehp 0000:00:03.0:pcie04: disable_slot: physical_slot = 2
[  342.247370] pciehp 0000:00:03.0:pcie04: pciehp_get_power_status:
SLOTCTRL a8 value read 1f9
[  342.247922] pciehp 0000:00:03.0:pcie04: pciehp_unconfigure_device:
domain:bus:dev = 0000:02:00
[  342.467075] mlx4_core 0000:02:00.0: Received reset from slave:63
[  342.467909]   free irq_desc for 1422
[  342.468154]   free irq_desc for 1423
[  342.468181]   free irq_desc for 1424
[  342.468208]   free irq_desc for 1425
[  342.469192] pci 0000:02:07.7: freeing pci_dev info
[  342.708175] mlx4_core 0000:02:00.0: Received reset from slave:62
[  342.708990]   free irq_desc for 1418
[  342.709223]   free irq_desc for 1419
[  342.727400]   free irq_desc for 1420
[  342.727656]   free irq_desc for 1421
[  342.728072] pci 0000:02:07.6: freeing pci_dev info
[  342.946429] mlx4_core 0000:02:00.0: Received reset from slave:61
[  342.947283]   free irq_desc for 1414
[  342.947511]   free irq_desc for 1415
[  342.957275]   free irq_desc for 1416
[  342.957301]   free irq_desc for 1417
[  342.957722] pci 0000:02:07.5: freeing pci_dev info
[  343.189692] mlx4_core 0000:02:00.0: Received reset from slave:60
[  343.190545]   free irq_desc for 1410
[  343.190814]   free irq_desc for 1411
[  343.207337]   free irq_desc for 1412
[  343.207576]   free irq_desc for 1413
[  343.207985] pci 0000:02:07.4: freeing pci_dev info
[  343.410663] mlx4_core 0000:02:00.0: Received reset from slave:59
[  343.411639]   free irq_desc for 1406
[  343.411880]   free irq_desc for 1407
[  343.427370]   free irq_desc for 1408
[  343.427412]   free irq_desc for 1409
[  343.427741] pci 0000:02:07.3: freeing pci_dev info
[  343.630818] mlx4_core 0000:02:00.0: Received reset from slave:58
[  343.631652]   free irq_desc for 1402
[  343.631842]   free irq_desc for 1403
[  343.647404]   free irq_desc for 1404
[  343.647433]   free irq_desc for 1405
[  343.647769] pci 0000:02:07.2: freeing pci_dev info
[  343.877043] mlx4_core 0000:02:00.0: Received reset from slave:57
[  343.877884]   free irq_desc for 1398
[  343.878128]   free irq_desc for 1399
[  343.887382]   free irq_desc for 1400
[  343.887409]   free irq_desc for 1401
[  343.887827] pci 0000:02:07.1: freeing pci_dev info
[  344.117975] mlx4_core 0000:02:00.0: Received reset from slave:56
[  344.118777]   free irq_desc for 1394
[  344.118978]   free irq_desc for 1395
[  344.137653]   free irq_desc for 1396
[  344.137683]   free irq_desc for 1397
[  344.138013] pci 0000:02:07.0: freeing pci_dev info
[  344.353527] mlx4_core 0000:02:00.0: Received reset from slave:55
[  344.354368]   free irq_desc for 1390
[  344.354616]   free irq_desc for 1391
[  344.367628]   free irq_desc for 1392
[  344.367654]   free irq_desc for 1393
[  344.368080] pci 0000:02:06.7: freeing pci_dev info
[  344.578265] mlx4_core 0000:02:00.0: Received reset from slave:54
[  344.579170]   free irq_desc for 1386
[  344.579393]   free irq_desc for 1387
[  344.597591]   free irq_desc for 1388
[  344.597829]   free irq_desc for 1389
[  344.598375] pci 0000:02:06.6: freeing pci_dev info
[  344.823697] mlx4_core 0000:02:00.0: Received reset from slave:53
[  344.824566]   free irq_desc for 1382
[  344.824796]   free irq_desc for 1383
[  344.837932]   free irq_desc for 1384
[  344.837958]   free irq_desc for 1385
[  344.838295] pci 0000:02:06.5: freeing pci_dev info
[  345.080025] mlx4_core 0000:02:00.0: Received reset from slave:52
[  345.080991]   free irq_desc for 1378
[  345.081017]   free irq_desc for 1379
[  345.081044]   free irq_desc for 1380
[  345.081072]   free irq_desc for 1381
[  345.081673] pci 0000:02:06.4: freeing pci_dev info
[  345.324705] mlx4_core 0000:02:00.0: Received reset from slave:51
[  345.325533]   free irq_desc for 1374
[  345.325776]   free irq_desc for 1375
[  345.337761]   free irq_desc for 1376
[  345.338003]   free irq_desc for 1377
[  345.338344] pci 0000:02:06.3: freeing pci_dev info
[  345.561761] mlx4_core 0000:02:00.0: Received reset from slave:50
[  345.562572]   free irq_desc for 1370
[  345.562854]   free irq_desc for 1371
[  345.577696]   free irq_desc for 1372
[  345.577900]   free irq_desc for 1373
[  345.578428] pci 0000:02:06.2: freeing pci_dev info
[  345.817280] mlx4_core 0000:02:00.0: Received reset from slave:49
[  345.818069]   free irq_desc for 1366
[  345.818334]   free irq_desc for 1367
[  345.827792]   free irq_desc for 1368
[  345.828041]   free irq_desc for 1369
[  345.828479] pci 0000:02:06.1: freeing pci_dev info
[  346.043593] mlx4_core 0000:02:00.0: Received reset from slave:48
[  346.044435]   free irq_desc for 1362
[  346.044696]   free irq_desc for 1363
[  346.057780]   free irq_desc for 1364
[  346.058026]   free irq_desc for 1365
[  346.058468] pci 0000:02:06.0: freeing pci_dev info
[  346.285626] mlx4_core 0000:02:00.0: Received reset from slave:47
[  346.286447]   free irq_desc for 1358
[  346.286654]   free irq_desc for 1359
[  346.298437]   free irq_desc for 1360
[  346.298468]   free irq_desc for 1361
[  346.298883] pci 0000:02:05.7: freeing pci_dev info
[  346.521972] mlx4_core 0000:02:00.0: Received reset from slave:46
[  346.522804]   free irq_desc for 1354
[  346.523021]   free irq_desc for 1355
[  346.537805]   free irq_desc for 1356
[  346.538014]   free irq_desc for 1357
[  346.538345] pci 0000:02:05.6: freeing pci_dev info
[  346.758293] mlx4_core 0000:02:00.0: Received reset from slave:45
[  346.759116]   free irq_desc for 1350
[  346.759326]   free irq_desc for 1351
[  346.777970]   free irq_desc for 1352
[  346.778225]   free irq_desc for 1353
[  346.778594] pci 0000:02:05.5: freeing pci_dev info
[  347.002991] mlx4_core 0000:02:00.0: Received reset from slave:44
[  347.003844]   free irq_desc for 1346
[  347.004075]   free irq_desc for 1347
[  347.017996]   free irq_desc for 1348
[  347.018022]   free irq_desc for 1349
[  347.018428] pci 0000:02:05.4: freeing pci_dev info
[  347.246432] mlx4_core 0000:02:00.0: Received reset from slave:43
[  347.247341]   free irq_desc for 1342
[  347.247584]   free irq_desc for 1343
[  347.258171]   free irq_desc for 1344
[  347.258198]   free irq_desc for 1345
[  347.258711] pci 0000:02:05.3: freeing pci_dev info
[  347.479329] mlx4_core 0000:02:00.0: Received reset from slave:42
[  347.480199]   free irq_desc for 1338
[  347.480428]   free irq_desc for 1339
[  347.498712]   free irq_desc for 1340
[  347.498740]   free irq_desc for 1341
[  347.499261] pci 0000:02:05.2: freeing pci_dev info
[  347.722833] mlx4_core 0000:02:00.0: Received reset from slave:41
[  347.723694]   free irq_desc for 1334
[  347.723909]   free irq_desc for 1335
[  347.738117]   free irq_desc for 1336
[  347.738145]   free irq_desc for 1337
[  347.738865] pci 0000:02:05.1: freeing pci_dev info
[  347.966906] mlx4_core 0000:02:00.0: Received reset from slave:40
[  347.967833]   free irq_desc for 1330
[  347.968129]   free irq_desc for 1331
[  347.978209]   free irq_desc for 1332
[  347.978235]   free irq_desc for 1333
[  347.978779] pci 0000:02:05.0: freeing pci_dev info
[  348.202972] mlx4_core 0000:02:00.0: Received reset from slave:39
[  348.203801]   free irq_desc for 1326
[  348.204106]   free irq_desc for 1327
[  348.218210]   free irq_desc for 1328
[  348.218239]   free irq_desc for 1329
[  348.218707] pci 0000:02:04.7: freeing pci_dev info
[  348.440833] mlx4_core 0000:02:00.0: Received reset from slave:38
[  348.441566]   free irq_desc for 1322
[  348.441592]   free irq_desc for 1323
[  348.441618]   free irq_desc for 1324
[  348.441645]   free irq_desc for 1325
[  348.442210] pci 0000:02:04.6: freeing pci_dev info
[  348.674736] mlx4_core 0000:02:00.0: Received reset from slave:37
[  348.675620]   free irq_desc for 1318
[  348.675836]   free irq_desc for 1319
[  348.688249]   free irq_desc for 1320
[  348.688279]   free irq_desc for 1321
[  348.688913] pci 0000:02:04.5: freeing pci_dev info
[  348.915084] mlx4_core 0000:02:00.0: Received reset from slave:36
[  348.915924]   free irq_desc for 1314
[  348.916166]   free irq_desc for 1315
[  348.928316]   free irq_desc for 1316
[  348.928342]   free irq_desc for 1317
[  348.928891] pci 0000:02:04.4: freeing pci_dev info
[  349.163498] mlx4_core 0000:02:00.0: Received reset from slave:35
[  349.164328]   free irq_desc for 1310
[  349.164557]   free irq_desc for 1311
[  349.178429]   free irq_desc for 1312
[  349.178457]   free irq_desc for 1313
[  349.178925] pci 0000:02:04.3: freeing pci_dev info
[  349.408023] mlx4_core 0000:02:00.0: Received reset from slave:34
[  349.408968]   free irq_desc for 1306
[  349.409262]   free irq_desc for 1307
[  349.428433]   free irq_desc for 1308
[  349.428458]   free irq_desc for 1309
[  349.429123] pci 0000:02:04.2: freeing pci_dev info
[  349.654194] mlx4_core 0000:02:00.0: Received reset from slave:33
[  349.655147]   free irq_desc for 1302
[  349.655371]   free irq_desc for 1303
[  349.668461]   free irq_desc for 1304
[  349.668493]   free irq_desc for 1305
[  349.669050] pci 0000:02:04.1: freeing pci_dev info
[  349.891204] mlx4_core 0000:02:00.0: Received reset from slave:32
[  349.892084]   free irq_desc for 1298
[  349.892379]   free irq_desc for 1299
[  349.910554]   free irq_desc for 1300
[  349.910580]   free irq_desc for 1301
[  349.910977] pci 0000:02:04.0: freeing pci_dev info
[  350.135234] mlx4_core 0000:02:00.0: Received reset from slave:31
[  350.136159]   free irq_desc for 1294
[  350.136402]   free irq_desc for 1295
[  350.148679]   free irq_desc for 1296
[  350.148706]   free irq_desc for 1297
[  350.149284] pci 0000:02:03.7: freeing pci_dev info
[  350.375403] mlx4_core 0000:02:00.0: Received reset from slave:30
[  350.376317]   free irq_desc for 1290
[  350.376562]   free irq_desc for 1291
[  350.388723]   free irq_desc for 1292
[  350.388752]   free irq_desc for 1293
[  350.389227] pci 0000:02:03.6: freeing pci_dev info
[  350.623964] mlx4_core 0000:02:00.0: Received reset from slave:29
[  350.624553]   free irq_desc for 1286
[  350.624797]   free irq_desc for 1287
[  350.639613]   free irq_desc for 1288
[  350.639639]   free irq_desc for 1289
[  350.640102] pci 0000:02:03.5: freeing pci_dev info
[  350.859688] mlx4_core 0000:02:00.0: Received reset from slave:28
[  350.860599]   free irq_desc for 1282
[  350.860867]   free irq_desc for 1283
[  350.878685]   free irq_desc for 1284
[  350.878713]   free irq_desc for 1285
[  350.879210] pci 0000:02:03.4: freeing pci_dev info
[  351.120632] mlx4_core 0000:02:00.0: Received reset from slave:27
[  351.121627]   free irq_desc for 1278
[  351.121876]   free irq_desc for 1279
[  351.138726]   free irq_desc for 1280
[  351.138754]   free irq_desc for 1281
[  351.139239] pci 0000:02:03.3: freeing pci_dev info
[  351.371352] mlx4_core 0000:02:00.0: Received reset from slave:26
[  351.372238]   free irq_desc for 1274
[  351.372446]   free irq_desc for 1275
[  351.388802]   free irq_desc for 1276
[  351.388830]   free irq_desc for 1277
[  351.389344] pci 0000:02:03.2: freeing pci_dev info
[  351.607354] mlx4_core 0000:02:00.0: Received reset from slave:25
[  351.608232]   free irq_desc for 1270
[  351.608542]   free irq_desc for 1271
[  351.618836]   free irq_desc for 1272
[  351.618863]   free irq_desc for 1273
[  351.619517] pci 0000:02:03.1: freeing pci_dev info
[  351.847766] mlx4_core 0000:02:00.0: Received reset from slave:24
[  351.848654]   free irq_desc for 1266
[  351.848939]   free irq_desc for 1267
[  351.859021]   free irq_desc for 1268
[  351.859050]   free irq_desc for 1269
[  351.859523] pci 0000:02:03.0: freeing pci_dev info
[  352.088053] mlx4_core 0000:02:00.0: Received reset from slave:23
[  352.088968]   free irq_desc for 1262
[  352.089296]   free irq_desc for 1263
[  352.099156]   free irq_desc for 1264
[  352.099183]   free irq_desc for 1265
[  352.099819] pci 0000:02:02.7: freeing pci_dev info
[  352.315173] mlx4_core 0000:02:00.0: Received reset from slave:22
[  352.316039]   free irq_desc for 1258
[  352.316318]   free irq_desc for 1259
[  352.329593]   free irq_desc for 1260
[  352.329621]   free irq_desc for 1261
[  352.330101] pci 0000:02:02.6: freeing pci_dev info
[  352.552492] mlx4_core 0000:02:00.0: Received reset from slave:21
[  352.553419]   free irq_desc for 1254
[  352.553618]   free irq_desc for 1255
[  352.568953]   free irq_desc for 1256
[  352.569255]   free irq_desc for 1257
[  352.569765] pci 0000:02:02.5: freeing pci_dev info
[  352.795614] mlx4_core 0000:02:00.0: Received reset from slave:20
[  352.796516]   free irq_desc for 1250
[  352.796782]   free irq_desc for 1251
[  352.810942]   free irq_desc for 1252
[  352.810970]   free irq_desc for 1253
[  352.811667] pci 0000:02:02.4: freeing pci_dev info
[  353.036489] mlx4_core 0000:02:00.0: Received reset from slave:19
[  353.037399]   free irq_desc for 1246
[  353.037618]   free irq_desc for 1247
[  353.049243]   free irq_desc for 1248
[  353.049269]   free irq_desc for 1249
[  353.049919] pci 0000:02:02.3: freeing pci_dev info
[  353.268173] mlx4_core 0000:02:00.0: Received reset from slave:18
[  353.269072]   free irq_desc for 1242
[  353.269338]   free irq_desc for 1243
[  353.279047]   free irq_desc for 1244
[  353.279075]   free irq_desc for 1245
[  353.279578] pci 0000:02:02.2: freeing pci_dev info
[  353.495897] mlx4_core 0000:02:00.0: Received reset from slave:17
[  353.496792]   free irq_desc for 1238
[  353.497020]   free irq_desc for 1239
[  353.509130]   free irq_desc for 1240
[  353.509155]   free irq_desc for 1241
[  353.509591] pci 0000:02:02.1: freeing pci_dev info
[  353.727846] mlx4_core 0000:02:00.0: Received reset from slave:16
[  353.728760]   free irq_desc for 1234
[  353.729031]   free irq_desc for 1235
[  353.739199]   free irq_desc for 1236
[  353.739226]   free irq_desc for 1237
[  353.739710] pci 0000:02:02.0: freeing pci_dev info
[  353.964528] mlx4_core 0000:02:00.0: Received reset from slave:15
[  353.965411]   free irq_desc for 1230
[  353.965702]   free irq_desc for 1231
[  353.979235]   free irq_desc for 1232
[  353.979264]   free irq_desc for 1233
[  353.979754] pci 0000:02:01.7: freeing pci_dev info
[  354.192662] mlx4_core 0000:02:00.0: Received reset from slave:14
[  354.193571]   free irq_desc for 1226
[  354.193818]   free irq_desc for 1227
[  354.211294]   free irq_desc for 1228
[  354.211323]   free irq_desc for 1229
[  354.211725] pci 0000:02:01.6: freeing pci_dev info
[  354.430697] mlx4_core 0000:02:00.0: Received reset from slave:13
[  354.431291]   free irq_desc for 1222
[  354.431596]   free irq_desc for 1223
[  354.449506]   free irq_desc for 1224
[  354.449868]   free irq_desc for 1225
[  354.450335] pci 0000:02:01.5: freeing pci_dev info
[  354.671942] mlx4_core 0000:02:00.0: Received reset from slave:12
[  354.672829]   free irq_desc for 1218
[  354.673044]   free irq_desc for 1219
[  354.689367]   free irq_desc for 1220
[  354.689710]   free irq_desc for 1221
[  354.690078] pci 0000:02:01.4: freeing pci_dev info
[  354.905203] mlx4_core 0000:02:00.0: Received reset from slave:11
[  354.906106]   free irq_desc for 1214
[  354.906403]   free irq_desc for 1215
[  354.919409]   free irq_desc for 1216
[  354.919441]   free irq_desc for 1217
[  354.919868] pci 0000:02:01.3: freeing pci_dev info
[  355.136846] mlx4_core 0000:02:00.0: Received reset from slave:10
[  355.137760]   free irq_desc for 1210
[  355.138031]   free irq_desc for 1211
[  355.149449]   free irq_desc for 1212
[  355.149474]   free irq_desc for 1213
[  355.150103] pci 0000:02:01.2: freeing pci_dev info
[  355.376467] mlx4_core 0000:02:00.0: Received reset from slave:9
[  355.377307]   free irq_desc for 1206
[  355.377613]   free irq_desc for 1207
[  355.389630]   free irq_desc for 1208
[  355.389658]   free irq_desc for 1209
[  355.390091] pci 0000:02:01.1: freeing pci_dev info
[  355.614027] mlx4_core 0000:02:00.0: Received reset from slave:8
[  355.614871]   free irq_desc for 1202
[  355.615125]   free irq_desc for 1203
[  355.615151]   free irq_desc for 1204
[  355.615177]   free irq_desc for 1205
[  355.615608] pci 0000:02:01.0: freeing pci_dev info
[  355.845129] mlx4_core 0000:02:00.0: Received reset from slave:7
[  355.846105]   free irq_desc for 1198
[  355.846343]   free irq_desc for 1199
[  355.859707]   free irq_desc for 1200
[  355.859738]   free irq_desc for 1201
[  355.860170] pci 0000:02:00.7: freeing pci_dev info
[  356.080643] mlx4_core 0000:02:00.0: Received reset from slave:6
[  356.081540]   free irq_desc for 1194
[  356.081821]   free irq_desc for 1195
[  356.099602]   free irq_desc for 1196
[  356.099628]   free irq_desc for 1197
[  356.100048] pci 0000:02:00.6: freeing pci_dev info
[  356.317310] mlx4_core 0000:02:00.0: Received reset from slave:5
[  356.318150]   free irq_desc for 1190
[  356.318414]   free irq_desc for 1191
[  356.329658]   free irq_desc for 1192
[  356.329687]   free irq_desc for 1193
[  356.330131] pci 0000:02:00.5: freeing pci_dev info
[  356.560616] mlx4_core 0000:02:00.0: Received reset from slave:4
[  356.561558]   free irq_desc for 1186
[  356.561796]   free irq_desc for 1187
[  356.579690]   free irq_desc for 1188
[  356.579716]   free irq_desc for 1189
[  356.580123] pci 0000:02:00.4: freeing pci_dev info
[  356.805358] mlx4_core 0000:02:00.0: Received reset from slave:3
[  356.806231]   free irq_desc for 1182
[  356.806512]   free irq_desc for 1183
[  356.819726]   free irq_desc for 1184
[  356.819755]   free irq_desc for 1185
[  356.820244] pci 0000:02:00.3: freeing pci_dev info
[  357.037258] mlx4_core 0000:02:00.0: Received reset from slave:2
[  357.038125]   free irq_desc for 1178
[  357.038350]   free irq_desc for 1179
[  357.049932]   free irq_desc for 1180
[  357.049956]   free irq_desc for 1181
[  357.050484] pci 0000:02:00.2: freeing pci_dev info
[  357.280942] mlx4_core 0000:02:00.0: Received reset from slave:1
[  357.281875]   free irq_desc for 1174
[  357.282105]   free irq_desc for 1175
[  357.299910]   free irq_desc for 1176
[  357.299936]   free irq_desc for 1177
[  357.300380] pci 0000:02:00.1: freeing pci_dev info
[  357.666540]   free irq_desc for 1170
[  357.666769]   free irq_desc for 1171
[  357.666969]   free irq_desc for 1172
[  357.667331]   free irq_desc for 1173
[  357.679938] mlx4_core 0000:02:00.0: Disabling SR-IOV
[  359.692517] pcieport 0000:00:03.0: pcie_link_disable_set: lnk_ctrl = 50
[  359.693860] pciehp 0000:00:03.0:pcie04: pcie_isr: intr_loc 10
[  359.710375] pciehp 0000:00:03.0:pcie04: pciehp_power_off_slot:
SLOTCTRL a8 write cmd 400
[  360.713164] pciehp 0000:00:03.0:pcie04: pcie_isr: intr_loc 10
[  360.713568] pciehp 0000:00:03.0:pcie04: pciehp_green_led_off:
SLOTCTRL a8 write cmd 300
Rescue:~ #
Rescue:~ # echo -n 1 > /sys/bus/pci/slots/2/power
[  386.009999] pci_hotplug: power_write_file: power = 1
[  386.010279] pciehp 0000:00:03.0:pcie04: enable_slot: physical_slot = 2
[  386.014703] pciehp 0000:00:03.0:pcie04: pciehp_get_power_status:
SLOTCTRL a8 value read 7f9
[  386.016722] pciehp 0000:00:03.0:pcie04: pcie_isr: intr_loc 10
[  386.034851] pciehp 0000:00:03.0:pcie04: pciehp_power_on_slot:
SLOTCTRL a8 write cmd 0
[  386.054793] pcieport 0000:00:03.0: pcie_link_disable_set: lnk_ctrl = 40
[  386.055342] pciehp 0000:00:03.0:pcie04: pcie_isr: intr_loc 10
[  386.075495] pciehp 0000:00:03.0:pcie04: pciehp_green_led_blink:
SLOTCTRL a8 write cmd 200
[  386.345818] pciehp 0000:00:03.0:pcie04: check_link_active: lnk_status = f082
[  386.449877] pciehp 0000:00:03.0:pcie04: pciehp_check_link_status:
lnk_status = f082
[  386.450350] pciehp 0000:00:03.0:pcie04: Device 0000:02:00.0 already
exists at 0000:02:00, cannot hot-add
[  386.464941] pciehp 0000:00:03.0:pcie04: Cannot add device at 0000:02:00
[  386.485177] pcieport 0000:00:03.0: pcie_link_disable_set: lnk_ctrl = 50
[  386.485180] pciehp 0000:00:03.0:pcie04: check_link_active: lnk_status = f882
[  386.499216] pciehp 0000:00:03.0:pcie04: pcie_isr: intr_loc 10
[  386.499236] pciehp 0000:00:03.0:pcie04: pciehp_power_off_slot:
SLOTCTRL a8 write cmd 400
[  387.503092] pciehp 0000:00:03.0:pcie04: pcie_isr: intr_loc 10
[  387.503425] pciehp 0000:00:03.0:pcie04: pciehp_green_led_off:
SLOTCTRL a8 write cmd 300
[  387.515045] pciehp 0000:00:03.0:pcie04:
pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
[  387.536437] pciehp 0000:00:03.0:pcie04: pcie_isr: intr_loc 10
-bash: echo: write error: Invalid argument
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - July 23, 2013, 11:15 p.m.
On Tue, Jul 23, 2013 at 8:56 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Mon, Jul 22, 2013 at 07:32:06PM -0700, Yinghai Lu wrote:
>> On Mon, Jul 22, 2013 at 2:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> > Evidently this is really part of a series, but you didn't label it
>> > that way.  It looks like this applies on top of your "PCI: Fix hotplug
>> > remove with sriov again" patch?
>>
>> Yes.
>>
>> that one need be back ported to 3.9 and later.
>>
>> this one need to be for 3.11 final, as Jiang's patch is in 3.11-rc0.
>>
>> >
>> > On Fri, Jul 19, 2013 at 1:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> >> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>> >> (PCI: Simplify IOV implementation and fix reference count races)
>> >> VF need to be removed via virtfn_remove to make sure ref to PF
>> >> is put back.
>> >
>> > I'm sure this makes sense, but it needs background.  I haven't figured
>> > out exactly what the problem is.  You're describing the lowest-level
>> > details, but not the overall picture that would make it understandable
>> > to the rest of us.
>>
>> Overall, after we reversely loop the bus_devices, VF get stop and removed,
>> but fail to release ref to PF, and prevent PF to be removed and freed.
>>
>> >
>> >> So we can not call stop_and_remove for VF before PF, that will
>> >> make VF get removed directly before PF's driver try to use
>> >> virtfn_remove to do it.
>> >>
>> >> Solution is separating stop and remove in two iterations.
>> >>
>> >> In first iteration, we stop VF driver at first with iterating devices
>> >> reversely, and later during stop PF driver, disable_sriov will use
>> >> virtfn_remove to remove VFs.
>> >>
>> >> Also some driver (like mlx4_core) need VF's driver get stopped before PF.
>> >
>> > If this is relevant, please give a pointer to the mlx4_core code that
>> > requires VFs to be stopped before the PF.
>>
>> that is add-on benefits.
>>
>> drivers/net/ethernet/mellanox/mlx4/main.c::
>> static void mlx4_remove_one(struct pci_dev *pdev)
>> {
>>         struct mlx4_dev  *dev  = pci_get_drvdata(pdev);
>>         struct mlx4_priv *priv = mlx4_priv(dev);
>>         int p;
>>
>>         if (dev) {
>>                 /* in SRIOV it is not allowed to unload the pf's
>>                  * driver while there are alive vf's */
>>                 if (mlx4_is_master(dev)) {
>>                         if (mlx4_how_many_lives_vf(dev))
>>                                 printk(KERN_ERR "Removing PF when
>> there are assigned VF's !!!\n");
>>                 }
>>
>> mlx4_how_many_lives_vf() is checking how VFs have driver loaded.
>>
>> >
>> >> To make it simple, separate VGA checking out and do that at first,
>> >> if there is VGA in the chain, do even try to stop or remove any device
>> >> under that bus.
>> >
>> > This part seems like it could be a separate patch.
>>
>> ok, will separate it out in next rev.
>>
>> >
>> >> Need this one for v3.11.
>> >
>> > OK, but why?  We need a better explanation of what problem this fixes.
>> >  It sounds like it fixes a reference counting problem in v3.11-rc1,
>> > but I don't know what the effect of that problem is.  Maybe it means a
>> > device isn't freed when it should be?  Maybe it means we can't add a
>> > new device after hot-removing an SR-IOV device?
>>
>> Yes.
>
> Can you include an example that shows the failure, like you did for
> the "Fix hotplug remove" patch?
>
>> ...
>> >> Index: linux-2.6/drivers/pci/remove.c
>> >> ===================================================================
>> >> --- linux-2.6.orig/drivers/pci/remove.c
>> >> +++ linux-2.6/drivers/pci/remove.c
>> >> @@ -56,7 +56,7 @@ void pci_remove_bus(struct pci_bus *bus)
>> >>  }
>> >>  EXPORT_SYMBOL(pci_remove_bus);
>> >>
>> >> -static void pci_stop_bus_device(struct pci_dev *dev)
>> >> +void pci_stop_bus_device(struct pci_dev *dev)
>> >>  {
>> >>         struct pci_bus *bus = dev->subordinate;
>> >>         struct pci_dev *child, *tmp;
>> >> @@ -75,8 +75,9 @@ static void pci_stop_bus_device(struct p
>> >>
>> >>         pci_stop_dev(dev);
>> >>  }
>> >> +EXPORT_SYMBOL(pci_stop_bus_device);
>> >>
>> >> -static void pci_remove_bus_device(struct pci_dev *dev)
>> >> +void pci_remove_bus_device(struct pci_dev *dev)
>> >>  {
>> >>         struct pci_bus *bus = dev->subordinate;
>> >>         struct pci_dev *child, *tmp;
>> >> @@ -92,6 +93,7 @@ static void pci_remove_bus_device(struct
>> >>
>> >>         pci_destroy_dev(dev);
>> >>  }
>> >> +EXPORT_SYMBOL(pci_remove_bus_device);
>> >
>> > I really don't want to export these two symbols unless we have to.
>> > Obviously this is the heart of your patch, so we probably *will* have
>> > to.
>>
>> Agree.
>>
>> >
>> > But maybe they can at least be confined to drivers/pci code, and not
>> > exported to modules?  I don't think there's any reason pciehp needs to
>> > be a module.  I was planning to merge a patch restricting it to be
>> > built-in this cycle anyway.
>>
>> sure, you can apply that patch (make pciehp to be built-in) at first.
>
> Below is the patch I intend to apply.  We can easily do this for v3.12.
> But your patch needs to be in v3.11, so we'll have to figure out how
> to handle that.  Maybe we can put the Kconfig change in v3.11, too.
> It should be low-risk because it doesn't add any new code paths that
> weren't in v3.10.
>
> Bjorn
>
>
> commit 04216ce0f2381090572142ebab28f63bae157d8d
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Thu Jun 27 10:16:19 2013 -0600
>
>     PCI: pciehp: Convert pciehp to be builtin only, not modular
>
>     Convert pciehp to be builtin only, with no module option.
>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index a82e70a..7958e59 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/drivers/pci/pcie/Kconfig
> @@ -14,15 +14,12 @@ config PCIEPORTBUS
>  # Include service Kconfig here
>  #
>  config HOTPLUG_PCI_PCIE
> -       tristate "PCI Express Hotplug driver"
> +       bool "PCI Express Hotplug driver"
>         depends on HOTPLUG_PCI && PCIEPORTBUS
>         help
>           Say Y here if you have a motherboard that supports PCI Express Native
>           Hotplug
>
> -         To compile this driver as a module, choose M here: the
> -         module will be called pciehp.
> -
>           When in doubt, say N.
>
>  source "drivers/pci/pcie/aer/Kconfig"

on top of for-linus, please check three updated patches.

Thanks

Yinghai

Patch

diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index a82e70a..7958e59 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -14,15 +14,12 @@  config PCIEPORTBUS
 # Include service Kconfig here
 #
 config HOTPLUG_PCI_PCIE
-	tristate "PCI Express Hotplug driver"
+	bool "PCI Express Hotplug driver"
 	depends on HOTPLUG_PCI && PCIEPORTBUS
 	help
 	  Say Y here if you have a motherboard that supports PCI Express Native
 	  Hotplug
 
-	  To compile this driver as a module, choose M here: the
-	  module will be called pciehp.
-
 	  When in doubt, say N.
 
 source "drivers/pci/pcie/aer/Kconfig"