diff mbox

[RESEND,v2,7/7] PCI/hotplug: PowerPC PowerNV PCI hotplug driver

Message ID 1424157203-691-8-git-send-email-gwshan@linux.vnet.ibm.com (mailing list archive)
State Changes Requested
Delegated to: Benjamin Herrenschmidt
Headers show

Commit Message

Gavin Shan Feb. 17, 2015, 7:13 a.m. UTC
The patch intends to add standalone driver to support PCI hotplug
for PowerPC PowerNV platform, which runs on top of skiboot firmware.
The firmware identified hotpluggable slots and marked their device
tree node with proper "ibm,slot-pluggable" and "ibm,reset-by-firmware".
The driver simply scans device-tree to create/register PCI hotplug slot
accordingly.

If the skiboot firmware doesn't support slot status retrieval, the PCI
slot device node shouldn't have property "ibm,reset-by-firmware". In
that case, none of valid PCI slots will be detected from device tree.
The skiboot firmware doesn't export the capability to access attention
LEDs yet and it's something for TBD.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 drivers/pci/hotplug/Kconfig            |  12 ++
 drivers/pci/hotplug/Makefile           |   4 +
 drivers/pci/hotplug/powernv_php.c      | 126 +++++++++++
 drivers/pci/hotplug/powernv_php.h      |  70 ++++++
 drivers/pci/hotplug/powernv_php_slot.c | 382 +++++++++++++++++++++++++++++++++
 5 files changed, 594 insertions(+)
 create mode 100644 drivers/pci/hotplug/powernv_php.c
 create mode 100644 drivers/pci/hotplug/powernv_php.h
 create mode 100644 drivers/pci/hotplug/powernv_php_slot.c

Comments

Bjorn Helgaas Feb. 17, 2015, 10:09 p.m. UTC | #1
On Tue, Feb 17, 2015 at 06:13:23PM +1100, Gavin Shan wrote:
> The patch intends to add standalone driver to support PCI hotplug
> for PowerPC PowerNV platform, which runs on top of skiboot firmware.
> The firmware identified hotpluggable slots and marked their device
> tree node with proper "ibm,slot-pluggable" and "ibm,reset-by-firmware".
> The driver simply scans device-tree to create/register PCI hotplug slot
> accordingly.
> 
> If the skiboot firmware doesn't support slot status retrieval, the PCI
> slot device node shouldn't have property "ibm,reset-by-firmware". In
> that case, none of valid PCI slots will be detected from device tree.
> The skiboot firmware doesn't export the capability to access attention
> LEDs yet and it's something for TBD.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ...

> +static int disable_slot(struct hotplug_slot *php_slot)
> +{
> +	struct powernv_php_slot *slot = php_slot->private;
> +
> +	if (slot->state != POWERNV_PHP_SLOT_STATE_POPULATED)
> +		return 0;
> +
> +	pci_lock_rescan_remove();
> +	pcibios_remove_pci_devices(slot->bus);
> +	pci_unlock_rescan_remove();
> +	vm_unmap_aliases();

What is vm_unmap_aliases() for?  I see this is probably copied from
rpaphp_core.c, where it was added by b4a26be9f6f8 ("powerpc/pseries: Flush
lazy kernel mappings after unplug operations").

But I don't know whether:

  - this is something specific to powerpc,
  - the lack of vm_unmap_aliases() in other hotplug paths is a bug,
  - the fact that we only do this on powerpc is covering up a
    powerpc bug somewhere

> +
> +	/* Detach the child hotpluggable slots */
> +	powernv_php_unregister(slot->dn);
> +
> +	/* Update slot state */
> +	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
> +	return 0;
> +}
>
Gavin Shan Feb. 18, 2015, 12:16 a.m. UTC | #2
On Tue, Feb 17, 2015 at 04:09:16PM -0600, Bjorn Helgaas wrote:
>On Tue, Feb 17, 2015 at 06:13:23PM +1100, Gavin Shan wrote:
>> The patch intends to add standalone driver to support PCI hotplug
>> for PowerPC PowerNV platform, which runs on top of skiboot firmware.
>> The firmware identified hotpluggable slots and marked their device
>> tree node with proper "ibm,slot-pluggable" and "ibm,reset-by-firmware".
>> The driver simply scans device-tree to create/register PCI hotplug slot
>> accordingly.
>> 
>> If the skiboot firmware doesn't support slot status retrieval, the PCI
>> slot device node shouldn't have property "ibm,reset-by-firmware". In
>> that case, none of valid PCI slots will be detected from device tree.
>> The skiboot firmware doesn't export the capability to access attention
>> LEDs yet and it's something for TBD.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ...
>
>> +static int disable_slot(struct hotplug_slot *php_slot)
>> +{
>> +	struct powernv_php_slot *slot = php_slot->private;
>> +
>> +	if (slot->state != POWERNV_PHP_SLOT_STATE_POPULATED)
>> +		return 0;
>> +
>> +	pci_lock_rescan_remove();
>> +	pcibios_remove_pci_devices(slot->bus);
>> +	pci_unlock_rescan_remove();
>> +	vm_unmap_aliases();
>
>What is vm_unmap_aliases() for?  I see this is probably copied from
>rpaphp_core.c, where it was added by b4a26be9f6f8 ("powerpc/pseries: Flush
>lazy kernel mappings after unplug operations").
>
>But I don't know whether:
>
>  - this is something specific to powerpc,
>  - the lack of vm_unmap_aliases() in other hotplug paths is a bug,
>  - the fact that we only do this on powerpc is covering up a
>    powerpc bug somewhere
>

Yes, I copied this piece of code from rpaphp_core.c. I think Ben might
help to answer the questions as he added the patch. I had very quick
check on mm/vmalloc.c and it's reasonable to have vm_unmap_aliases()
here to flush TLB entries for ioremap() regions, which were unmapped
previously. if I'm correct. I don't think it's powerpc specific.

Thanks,
Gavin

>> +
>> +	/* Detach the child hotpluggable slots */
>> +	powernv_php_unregister(slot->dn);
>> +
>> +	/* Update slot state */
>> +	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
>> +	return 0;
>> +}
>> 
>
Benjamin Herrenschmidt Feb. 18, 2015, 12:30 a.m. UTC | #3
On Wed, 2015-02-18 at 11:16 +1100, Gavin Shan wrote:
> >What is vm_unmap_aliases() for?  I see this is probably copied from
> >rpaphp_core.c, where it was added by b4a26be9f6f8 ("powerpc/pseries:
> Flush
> >lazy kernel mappings after unplug operations").
> >
> >But I don't know whether:
> >
> >  - this is something specific to powerpc,
> >  - the lack of vm_unmap_aliases() in other hotplug paths is a bug,
> >  - the fact that we only do this on powerpc is covering up a
> >    powerpc bug somewhere
> >
> 
> Yes, I copied this piece of code from rpaphp_core.c. I think Ben might
> help to answer the questions as he added the patch. I had very quick
> check on mm/vmalloc.c and it's reasonable to have vm_unmap_aliases()
> here to flush TLB entries for ioremap() regions, which were unmapped
> previously. if I'm correct. I don't think it's powerpc specific.

It's specific to running under the PowerVM hypervisor, and thus doesn't
affect PowerNV, just don't copy it over.

It comes from the fact that the generic ioremap code nowadays delays
TLB flushing on unmap. The TLB flushing code is what, on powerpc,
ensures that we remove the translations from the MMU hash table (the
hash table is essentially treated as an extended in-memory TLB), which
on pseries turns into hypervisor calls.

When running under that hypervisor, the HV ensures that no translation
still exists in the hash before allowing a device to be removed from
a partition. If translations still exist, the removal fails.

So we need to force the generic ioremap code to perform all the TLB
flushes for iounmap'ed regions before we "complete" the unplug operation
from a kernel perspective so that the device can be re-assigned to
another partition.

This is thus useless on platforms like powernv which do not run under
such a hypervisor.

Cheers,
Ben.
Bjorn Helgaas Feb. 18, 2015, 2:30 p.m. UTC | #4
[+cc linux-mm, linux-kernel]

For context, the start of this discussion was here:
http://lkml.kernel.org/r/1424157203-691-8-git-send-email-gwshan@linux.vnet.ibm.com
where Gavin is adding a new PCI hotplug driver for PowerNV.  That new
driver calls vm_unmap_aliases() the same way we do in the existing RPA
hotplug driver here:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/hotplug/rpaphp_core.c#n432

I'm trying to figure out whether it's correct to use
vm_unmap_aliases() here, but I'm not an mm person so all I have is my
gut feeling that something doesn't smell right.

On Tue, Feb 17, 2015 at 6:30 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Wed, 2015-02-18 at 11:16 +1100, Gavin Shan wrote:
>> >What is vm_unmap_aliases() for?  I see this is probably copied from
>> >rpaphp_core.c, where it was added by b4a26be9f6f8 ("powerpc/pseries:
>> Flush
>> >lazy kernel mappings after unplug operations").
>> >
>> >But I don't know whether:
>> >
>> >  - this is something specific to powerpc,
>> >  - the lack of vm_unmap_aliases() in other hotplug paths is a bug,
>> >  - the fact that we only do this on powerpc is covering up a
>> >    powerpc bug somewhere
>>
>> Yes, I copied this piece of code from rpaphp_core.c. I think Ben might
>> help to answer the questions as he added the patch. I had very quick
>> check on mm/vmalloc.c and it's reasonable to have vm_unmap_aliases()
>> here to flush TLB entries for ioremap() regions, which were unmapped
>> previously. if I'm correct. I don't think it's powerpc specific.
>
> It's specific to running under the PowerVM hypervisor, and thus doesn't
> affect PowerNV, just don't copy it over.
>
> It comes from the fact that the generic ioremap code nowadays delays
> TLB flushing on unmap. The TLB flushing code is what, on powerpc,
> ensures that we remove the translations from the MMU hash table (the
> hash table is essentially treated as an extended in-memory TLB), which
> on pseries turns into hypervisor calls.
>
> When running under that hypervisor, the HV ensures that no translation
> still exists in the hash before allowing a device to be removed from
> a partition. If translations still exist, the removal fails.
>
> So we need to force the generic ioremap code to perform all the TLB
> flushes for iounmap'ed regions before we "complete" the unplug operation
> from a kernel perspective so that the device can be re-assigned to
> another partition.
>
> This is thus useless on platforms like powernv which do not run under
> such a hypervisor.

So the hypervisor call that removes the device from the partition will
fail if there are any translations that reference the memory of the
device.

Let me go through this in excruciating detail to see if I understand
what's going on:

  - PCI core enumerates device D1
  - PCI core sets device D1 BAR 0 = 0x1000
  - driver claims D1
  - driver ioremaps 0x1000 at virtual address V
  - translation V -> 0x1000 is in TLB
  - driver iounmaps V (but V -> 0x1000 translation may remain in TLB)
  - driver releases D1
  - hot-remove D1 (without vm_unmap_aliases(), hypervisor would fail this)
  - it would be a bug to reference V here, but if we did, the
virt-to-phys translation would succeed and we'd have a Master Abort or
Unsupported Request on PCI/PCIe
  - hot-add D2
  - PCI core enumerates device D2
  - PCI core sets device D2 BAR 0 = 0x1000
  - it would be a bug to reference V here (before ioremapping), but if
we did, the reference would reach D2

I don't see anything hypervisor-specific here except for the fact that
the hypervisor checks for existing translations and most other
platforms don't.  But it seems like the unexpected PCI aborts could
happen on any platform.

Are we saying that those PCI aborts are OK, since it's a bug to make
those references in the first place?  Or would we rather take a TLB
miss fault instead so the references never make it to PCI?

I would think there would be similar issues when unmapping and
re-mapping plain old physical memory.  But I don't see
vm_unmap_aliases() calls there, so those issues must be handled
differently.  Should we handle this PCI hotplug issue the same way we
handle RAM?

Bjorn
Benjamin Herrenschmidt Feb. 18, 2015, 9:03 p.m. UTC | #5
On Wed, 2015-02-18 at 08:30 -0600, Bjorn Helgaas wrote:
> 
> So the hypervisor call that removes the device from the partition will
> fail if there are any translations that reference the memory of the
> device.
> 
> Let me go through this in excruciating detail to see if I understand
> what's going on:
> 
>   - PCI core enumerates device D1
>   - PCI core sets device D1 BAR 0 = 0x1000
>   - driver claims D1
>   - driver ioremaps 0x1000 at virtual address V
>   - translation V -> 0x1000 is in TLB
>   - driver iounmaps V (but V -> 0x1000 translation may remain in TLB)
>   - driver releases D1
>   - hot-remove D1 (without vm_unmap_aliases(), hypervisor would fail
> this)
>   - it would be a bug to reference V here, but if we did, the
> virt-to-phys translation would succeed and we'd have a Master Abort or
> Unsupported Request on PCI/PCIe
>   - hot-add D2
>   - PCI core enumerates device D2
>   - PCI core sets device D2 BAR 0 = 0x1000
>   - it would be a bug to reference V here (before ioremapping), but if
> we did, the reference would reach D2
> 
> I don't see anything hypervisor-specific here except for the fact that
> the hypervisor checks for existing translations and most other
> platforms don't.  But it seems like the unexpected PCI aborts could
> happen on any platform.

Well, only if we incorrectly dereferenced an ioremap'ed address for
which the driver who owns it is long gone so fairly unlikely. I'm not
saying you shouldn't put the vm_unmap_aliases() in the generic unplug
code, I wouldn't mind that, but I don't think we have a nasty bug to
squash here :)

> Are we saying that those PCI aborts are OK, since it's a bug to make
> those references in the first place?  Or would we rather take a TLB
> miss fault instead so the references never make it to PCI?

I think a miss fault which is basically a page fault -> oops is
preferable for debugging (after all that MMIO might hvae been reassigned
to another device, so that abort might actually instead turn into
writing to the wrong device... bad).

However I also think the scenario is very unlikely.

> I would think there would be similar issues when unmapping and
> re-mapping plain old physical memory.  But I don't see
> vm_unmap_aliases() calls there, so those issues must be handled
> differently.  Should we handle this PCI hotplug issue the same way we
> handle RAM?

If we don't have a vm_unmap_aliases() in the memory unplug path we
probably have a bug on those HVs too :-)

Cheers,
Ben.
diff mbox

Patch

diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index df8caec..ef55dae 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -113,6 +113,18 @@  config HOTPLUG_PCI_SHPC
 
 	  When in doubt, say N.
 
+config HOTPLUG_PCI_POWERNV
+	tristate "PowerPC PowerNV PCI Hotplug driver"
+	depends on PPC_POWERNV && EEH
+	help
+	  Say Y here if you run PowerPC PowerNV platform that supports
+          PCI Hotplug
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called powernv-php.
+
+	  When in doubt, say N.
+
 config HOTPLUG_PCI_RPA
 	tristate "RPA PCI Hotplug driver"
 	depends on PPC_PSERIES && EEH
diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
index 4a9aa08..a69665e 100644
--- a/drivers/pci/hotplug/Makefile
+++ b/drivers/pci/hotplug/Makefile
@@ -14,6 +14,7 @@  obj-$(CONFIG_HOTPLUG_PCI_PCIE)		+= pciehp.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)	+= cpcihp_zt5550.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)	+= cpcihp_generic.o
 obj-$(CONFIG_HOTPLUG_PCI_SHPC)		+= shpchp.o
+obj-$(CONFIG_HOTPLUG_PCI_POWERNV)	+= powernv-php.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA)		+= rpaphp.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)	+= rpadlpar_io.o
 obj-$(CONFIG_HOTPLUG_PCI_SGI)		+= sgi_hotplug.o
@@ -50,6 +51,9 @@  ibmphp-objs		:=	ibmphp_core.o	\
 acpiphp-objs		:=	acpiphp_core.o	\
 				acpiphp_glue.o
 
+powernv-php-objs	:=	powernv_php.o	\
+				powernv_php_slot.o
+
 rpaphp-objs		:=	rpaphp_core.o	\
 				rpaphp_pci.o	\
 				rpaphp_slot.o
diff --git a/drivers/pci/hotplug/powernv_php.c b/drivers/pci/hotplug/powernv_php.c
new file mode 100644
index 0000000..e36eaf1
--- /dev/null
+++ b/drivers/pci/hotplug/powernv_php.c
@@ -0,0 +1,126 @@ 
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sysfs.h>
+#include <linux/pci.h>
+#include <linux/pci_hotplug.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <asm/opal.h>
+#include <asm/pnv-pci.h>
+
+#include "powernv_php.h"
+
+#define DRIVER_VERSION	"0.1"
+#define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
+#define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
+
+static int powernv_php_register_one(struct device_node *dn)
+{
+	struct powernv_php_slot *slot;
+	const __be32 *prop32;
+	int ret;
+
+	/* Check if it's hotpluggable slot */
+	prop32 = of_get_property(dn, "ibm,slot-pluggable", NULL);
+	if (!prop32 || !of_read_number(prop32, 1))
+		return 0;
+
+	prop32 = of_get_property(dn, "ibm,reset-by-firmware", NULL);
+	if (!prop32 || !of_read_number(prop32, 1))
+		return 0;
+
+	/* Allocate slot */
+	slot = powernv_php_slot_alloc(dn);
+	if (!slot)
+		return -ENODEV;
+
+	/* Register it */
+	ret = powernv_php_slot_register(slot);
+	if (ret) {
+		powernv_php_slot_put(slot);
+		return ret;
+	}
+
+	return powernv_php_slot_enable(slot->php_slot, false);
+}
+
+int powernv_php_register(struct device_node *dn)
+{
+	struct device_node *child;
+	int ret = 0;
+
+	for_each_child_of_node(dn, child) {
+		ret = powernv_php_register_one(child);
+		if (ret)
+			break;
+
+		powernv_php_register(child);
+	}
+
+	return ret;
+}
+
+static void powernv_php_unregister_one(struct device_node *dn)
+{
+	struct powernv_php_slot *slot;
+
+	slot = powernv_php_slot_find(dn);
+	if (!slot)
+		return;
+
+	pci_hp_deregister(slot->php_slot);
+}
+
+void powernv_php_unregister(struct device_node *dn)
+{
+	struct device_node *child;
+
+	for_each_child_of_node(dn, child) {
+		powernv_php_unregister_one(child);
+		powernv_php_unregister(child);
+	}
+}
+
+static int __init powernv_php_init(void)
+{
+	struct device_node *dn;
+
+	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
+
+	/* Scan PHB nodes and their children */
+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
+		powernv_php_register(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
+		powernv_php_register(dn);
+
+	return 0;
+}
+
+static void __exit powernv_php_exit(void)
+{
+	struct device_node *dn;
+
+	for_each_compatible_node(dn, NULL, "ibm,ioda-phb")
+		powernv_php_unregister(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
+		powernv_php_unregister(dn);
+}
+
+module_init(powernv_php_init);
+module_exit(powernv_php_exit);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/pci/hotplug/powernv_php.h b/drivers/pci/hotplug/powernv_php.h
new file mode 100644
index 0000000..1c2b6f6
--- /dev/null
+++ b/drivers/pci/hotplug/powernv_php.h
@@ -0,0 +1,70 @@ 
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _POWERNV_PHP_H
+#define _POWERNV_PHP_H
+
+/* Slot power status */
+#define POWERNV_PHP_SLOT_POWER_OFF	0
+#define POWERNV_PHP_SLOT_POWER_ON	1
+
+/* Slot presence status */
+#define POWERNV_PHP_SLOT_EMPTY		0
+#define POWERNV_PHP_SLOT_PRESENT	1
+
+/* Slot attention status */
+#define POWERNV_PHP_SLOT_ATTEN_OFF	0
+#define POWERNV_PHP_SLOT_ATTEN_ON	1
+#define POWERNV_PHP_SLOT_ATTEN_IND	2
+#define POWERNV_PHP_SLOT_ATTEN_ACT	3
+
+struct powernv_php_slot {
+	struct kref		kref;
+	int			state;
+#define POWERNV_PHP_SLOT_STATE_INIT		0x0
+#define POWERNV_PHP_SLOT_STATE_REGISTER		0x1
+#define POWERNV_PHP_SLOT_STATE_POPULATED	0x2
+	char			*name;
+	struct device_node	*dn;
+	struct pci_bus		*bus;
+	uint64_t		id;
+	int			slot_no;
+	struct hotplug_slot	*php_slot;
+	struct powernv_php_slot	*parent;
+	void (*release)(struct kref *kref);
+	struct list_head	children;
+	struct list_head	link;
+};
+
+#define to_powernv_php_slot(kref) container_of(kref, struct powernv_php_slot, kref)
+
+static inline void powernv_php_slot_get(struct powernv_php_slot *slot)
+{
+	if (slot)
+		kref_get(&slot->kref);
+}
+
+static inline int powernv_php_slot_put(struct powernv_php_slot *slot)
+{
+	if (slot)
+		return kref_put(&slot->kref, slot->release);
+
+	return 0;
+}
+
+struct powernv_php_slot *powernv_php_slot_find(struct device_node *dn);
+struct powernv_php_slot *powernv_php_slot_alloc(struct device_node *dn);
+int powernv_php_slot_register(struct powernv_php_slot *slot);
+int powernv_php_slot_enable(struct hotplug_slot *php_slot, bool rescan);
+int powernv_php_register(struct device_node *dn);
+void powernv_php_unregister(struct device_node *dn);
+
+#endif /* !_POWERNV_PHP_H */
diff --git a/drivers/pci/hotplug/powernv_php_slot.c b/drivers/pci/hotplug/powernv_php_slot.c
new file mode 100644
index 0000000..84c5c6f
--- /dev/null
+++ b/drivers/pci/hotplug/powernv_php_slot.c
@@ -0,0 +1,382 @@ 
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sysfs.h>
+#include <linux/pci.h>
+#include <linux/pci_hotplug.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+
+#include <asm/opal.h>
+#include <asm/pnv-pci.h>
+
+#include "powernv_php.h"
+
+static LIST_HEAD(php_slot_list);
+static DEFINE_MUTEX(php_slot_mutex);
+
+static int get_power_status(struct hotplug_slot *php_slot, u8 *val)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t state;
+	int ret;
+
+	/* By default, the power is on */
+	*val = POWERNV_PHP_SLOT_POWER_ON;
+
+	/* Retrieve power status from firmware */
+	ret = pnv_pci_get_power_status(slot->id, &state);
+	if (!ret) {
+		*val = state ? POWERNV_PHP_SLOT_POWER_ON :
+			       POWERNV_PHP_SLOT_POWER_OFF;
+		php_slot->info->power_status = *val;
+	}
+
+	return 0;
+}
+
+static int get_adapter_status(struct hotplug_slot *php_slot, u8 *val)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t state;
+	int ret;
+
+	/* By default, the slot is empty */
+	*val = 0;
+
+	/* Retrieve presence status from firmware */
+	ret = pnv_pci_get_presence_status(slot->id, &state);
+	if (ret >= 0) {
+		*val = state ? POWERNV_PHP_SLOT_PRESENT :
+			       POWERNV_PHP_SLOT_EMPTY;
+		php_slot->info->adapter_status = *val;
+	}
+
+	return 0;
+}
+
+static int set_attention_status(struct hotplug_slot *php_slot, u8 val)
+{
+	/*
+	 * The default operation would to turn on
+	 * the attention
+	*/
+	switch (val) {
+	case POWERNV_PHP_SLOT_ATTEN_OFF:
+	case POWERNV_PHP_SLOT_ATTEN_ON:
+	case POWERNV_PHP_SLOT_ATTEN_IND:
+	case POWERNV_PHP_SLOT_ATTEN_ACT:
+		break;
+	default:
+		val = POWERNV_PHP_SLOT_ATTEN_ON;
+	}
+
+	/* FIXME: Make it real once firmware supports it */
+	php_slot->info->attention_status = val;
+
+	return 0;
+}
+
+int powernv_php_slot_enable(struct hotplug_slot *php_slot, bool rescan)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+	uint8_t presence;
+	int ret;
+
+	/* Check if the slot has been configured */
+	if (slot->state != POWERNV_PHP_SLOT_STATE_REGISTER)
+		return 0;
+
+	/* Retrieve slot presence status */
+	ret = php_slot->ops->get_adapter_status(php_slot, &presence);
+	if (ret)
+		return ret;
+
+	switch (presence) {
+	case POWERNV_PHP_SLOT_PRESENT:
+		pci_lock_rescan_remove();
+		pcibios_add_pci_devices(slot->bus);
+		pci_unlock_rescan_remove();
+		slot->state = POWERNV_PHP_SLOT_STATE_POPULATED;
+
+		/* Rescan for child hotpluggable slots */
+		if (rescan)
+			powernv_php_register(slot->dn);
+		break;
+	case POWERNV_PHP_SLOT_EMPTY:
+		break;
+	default:
+		pr_warn("%s: Invalid presence status %d on slot %s\n",
+			__func__, presence, slot->name);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int enable_slot(struct hotplug_slot *php_slot)
+{
+	return powernv_php_slot_enable(php_slot, true);
+}
+
+static int disable_slot(struct hotplug_slot *php_slot)
+{
+	struct powernv_php_slot *slot = php_slot->private;
+
+	if (slot->state != POWERNV_PHP_SLOT_STATE_POPULATED)
+		return 0;
+
+	pci_lock_rescan_remove();
+	pcibios_remove_pci_devices(slot->bus);
+	pci_unlock_rescan_remove();
+	vm_unmap_aliases();
+
+	/* Detach the child hotpluggable slots */
+	powernv_php_unregister(slot->dn);
+
+	/* Update slot state */
+	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
+	return 0;
+}
+
+static struct hotplug_slot_ops php_slot_ops = {
+	.get_power_status	= get_power_status,
+	.get_adapter_status	= get_adapter_status,
+	.set_attention_status	= set_attention_status,
+	.enable_slot		= enable_slot,
+	.disable_slot		= disable_slot,
+};
+
+static struct powernv_php_slot *php_slot_match(struct device_node *dn,
+					       struct powernv_php_slot *slot)
+{
+	struct powernv_php_slot *target, *tmp;
+
+	if (slot->dn == dn)
+		return slot;
+
+	list_for_each_entry(tmp, &slot->children, link) {
+		target = php_slot_match(dn, tmp);
+		if (target)
+			return target;
+	}
+
+	return NULL;
+}
+
+struct powernv_php_slot *powernv_php_slot_find(struct device_node *dn)
+{
+	struct powernv_php_slot *slot, *tmp;
+
+	mutex_lock(&php_slot_mutex);
+	list_for_each_entry(tmp, &php_slot_list, link) {
+		slot = php_slot_match(dn, tmp);
+		if (slot) {
+			mutex_unlock(&php_slot_mutex);
+			return slot;
+		}
+	}
+	mutex_unlock(&php_slot_mutex);
+
+	return NULL;
+}
+
+static void php_slot_free(struct kref *kref)
+{
+	struct powernv_php_slot *slot = to_powernv_php_slot(kref);
+
+	WARN_ON(!list_empty(&slot->children));
+	kfree(slot->name);
+	kfree(slot);
+}
+
+static void php_slot_release(struct hotplug_slot *hp_slot)
+{
+	struct powernv_php_slot *slot = hp_slot->private;
+
+	/* Remove from global or child list */
+	mutex_lock(&php_slot_mutex);
+	list_del(&slot->link);
+	mutex_unlock(&php_slot_mutex);
+
+	/* Detach from parent */
+	powernv_php_slot_put(slot);
+	powernv_php_slot_put(slot->parent);
+}
+
+static bool php_slot_get_id(struct device_node *dn,
+			    uint64_t *id)
+{
+	struct device_node *parent = dn;
+	const __be64 *prop64;
+	const __be32 *prop32;
+
+	/*
+	 * The hotpluggable slot always has a compound Id, which
+	 * consists of 16-bits PHB Id, 16 bits bus/slot/function
+	 * number, and compound indicator
+	 */
+	*id = (0x1ul << 63);
+
+	/* Bus/Slot/Function number */
+	prop32 = of_get_property(dn, "reg", NULL);
+	if (!prop32)
+		return false;
+	*id |= ((of_read_number(prop32, 1) & 0x00ffff00) << 16);
+
+	/* PHB Id */
+	while ((parent = of_get_parent(parent))) {
+		if (!PCI_DN(parent)) {
+			of_node_put(parent);
+			break;
+		}
+
+		if (!of_device_is_compatible(parent, "ibm,ioda2-phb") &&
+		    !of_device_is_compatible(parent, "ibm,ioda-phb")) {
+			of_node_put(parent);
+			continue;
+		}
+
+		prop64 = of_get_property(parent, "ibm,opal-phbid", NULL);
+		if (!prop64) {
+			of_node_put(parent);
+			return false;
+		}
+
+		*id |= be64_to_cpup(prop64);
+		of_node_put(parent);
+		return true;
+	}
+
+        return false;
+}
+
+struct powernv_php_slot *powernv_php_slot_alloc(struct device_node *dn)
+{
+	struct pci_bus *bus;
+	struct powernv_php_slot *slot;
+	const char *label;
+	uint64_t id;
+	int slot_no;
+	size_t size;
+	void *pmem;
+
+	/* Slot name */
+	label = of_get_property(dn, "ibm,slot-label", NULL);
+	if (!label)
+		return NULL;
+
+	/* Slot indentifier */
+	if (!php_slot_get_id(dn, &id))
+		return NULL;
+
+	/* PCI bus */
+	bus = pcibios_find_pci_bus(dn);
+	if (!bus)
+		return NULL;
+
+	/* Slot number */
+	if (dn->child && PCI_DN(dn->child))
+		slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
+	else
+		slot_no = -1;
+
+	/* Allocate slot */
+	size = sizeof(struct powernv_php_slot) +
+	       sizeof(struct hotplug_slot) +
+	       sizeof(struct hotplug_slot_info);
+	pmem = kzalloc(size, GFP_KERNEL);
+	if (!pmem) {
+		pr_warn("%s: Cannot allocate slot for node %s\n",
+			__func__, dn->full_name);
+		return NULL;
+	}
+
+	/* Assign memory blocks */
+	slot = pmem;
+	slot->php_slot = pmem + sizeof(struct powernv_php_slot);
+	slot->php_slot->info = pmem + sizeof(struct powernv_php_slot) +
+			      sizeof(struct hotplug_slot);
+	slot->name = kstrdup(label, GFP_KERNEL);
+	if (!slot->name) {
+		pr_warn("%s: Cannot populate name for node %s\n",
+			__func__, dn->full_name);
+		kfree(pmem);
+		return NULL;
+	}
+
+	/* Initialize slot */
+	kref_init(&slot->kref);
+	slot->state = POWERNV_PHP_SLOT_STATE_INIT;
+	slot->dn = dn;
+	slot->bus = bus;
+	slot->id = id;
+	slot->slot_no = slot_no;
+	slot->release = php_slot_free;
+	slot->php_slot->ops = &php_slot_ops;
+	slot->php_slot->release = php_slot_release;
+	slot->php_slot->private = slot;
+	INIT_LIST_HEAD(&slot->children);
+	INIT_LIST_HEAD(&slot->link);
+
+	return slot;
+}
+
+int powernv_php_slot_register(struct powernv_php_slot *slot)
+{
+	struct powernv_php_slot *parent;
+	struct device_node *dn = slot->dn;
+	int ret;
+
+	/* Avoid register same slot for twice */
+	if (powernv_php_slot_find(slot->dn))
+		return -EEXIST;
+
+	/* Register slot */
+	ret = pci_hp_register(slot->php_slot, slot->bus,
+			      slot->slot_no, slot->name);
+	if (ret) {
+		pr_warn("%s: Cannot register slot %s (%d)\n",
+			__func__, slot->name, ret);
+		return ret;
+	}
+
+	/* Put into global or parent list */
+	while ((dn = of_get_parent(dn))) {
+		if (!PCI_DN(dn)) {
+			of_node_put(dn);
+			break;
+		}
+
+		parent = powernv_php_slot_find(dn);
+		if (parent) {
+			of_node_put(dn);
+			break;
+		}
+	}
+
+	mutex_lock(&php_slot_mutex);
+	if (parent) {
+		powernv_php_slot_get(parent);
+		slot->parent = parent;
+		list_add_tail(&slot->link, &parent->children);
+	} else {
+		list_add_tail(&slot->link, &php_slot_list);
+	}
+	mutex_unlock(&php_slot_mutex);
+
+	/* Update slot state */
+	slot->state = POWERNV_PHP_SLOT_STATE_REGISTER;
+	return 0;
+}