diff mbox

[4/4] phylib: Allow reading and writing a mii bus from atomic context.

Message ID a06cf09ed5096bf8f9434d90828adfa70586e936.1278307573.git.richard.cochran@omicron.at
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Richard Cochran July 5, 2010, 5:33 a.m. UTC
In order to support hardware time stamping from a PHY, it is necessary to
read from the PHY while running in_interrupt(). This patch allows a mii
bus to operate in an atomic context. An mii_bus driver may declare itself
capable for this mode. Drivers which do not do this will remain with the
default that bus operations may sleep.

Before commit 35b5f6b1a82b5c586e0b24c711dc6ba944e88ef1 mii bus
operations were protected with spin locks. That commit replaced the
locks with mutexs in order to accommodate i2c buses that need to
sleep. Thus, this patch restores the original behavior as a run time
option.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
 drivers/net/fsl_pq_mdio.c  |    4 +-
 drivers/net/phy/mdio_bus.c |   45 +++++++++++++++++++++++++++++++++++++------
 include/linux/phy.h        |   15 ++++++++++++-
 3 files changed, 53 insertions(+), 11 deletions(-)

Comments

David Miller July 6, 2010, 2:05 a.m. UTC | #1
From: Richard Cochran <richardcochran@gmail.com>
Date: Mon, 5 Jul 2010 07:33:14 +0200

> In order to support hardware time stamping from a PHY, it is necessary to
> read from the PHY while running in_interrupt(). This patch allows a mii
> bus to operate in an atomic context. An mii_bus driver may declare itself
> capable for this mode. Drivers which do not do this will remain with the
> default that bus operations may sleep.
> 
> Before commit 35b5f6b1a82b5c586e0b24c711dc6ba944e88ef1 mii bus
> operations were protected with spin locks. That commit replaced the
> locks with mutexs in order to accommodate i2c buses that need to
> sleep. Thus, this patch restores the original behavior as a run time
> option.
> 
> Signed-off-by: Richard Cochran <richard.cochran@omicron.at>

Ok, I'm not to happy about this one, to be honest.

Conditional locking is always a problem waiting to happen.

Question: What context can you call mdiobus_lock() in?

Answer: You absolutely cannot know.

Therefore every piece of code, except those with special knowledge
of what kind of device is behind the mdiobus object, have to assume
the worst which is that the interface can only be called in sleepable
contexts.

We need to find another way to accomodate this, I really don't want to
see this kind of situation where the locking facility type is
conditional upon the device sitting behind the interface.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Fleming July 6, 2010, 5:09 p.m. UTC | #2
On Mon, Jul 5, 2010 at 12:33 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> In order to support hardware time stamping from a PHY, it is necessary to
> read from the PHY while running in_interrupt(). This patch allows a mii
> bus to operate in an atomic context. An mii_bus driver may declare itself
> capable for this mode. Drivers which do not do this will remain with the
> default that bus operations may sleep.
>
> Before commit 35b5f6b1a82b5c586e0b24c711dc6ba944e88ef1 mii bus
> operations were protected with spin locks. That commit replaced the
> locks with mutexs in order to accommodate i2c buses that need to
> sleep. Thus, this patch restores the original behavior as a run time
> option.


The reason we moved to mutexes was because we didn't want PHY
operations to be done in interrupt context.  They are too slow.  Also,
some MII busses will trigger an interrupt when the operation is done,
which means their driver has the option of sleeping.  As such, it was
an original principle of the PHY lib that MII transactions were not
allowed in interrupt context.  *Certainly*, once you *do* allow MII
transactions in interrupt context, you *cannot* use spin_lock().  You
at least have to use spin_lock_irq[save].

Also, I agree with David's comments, and Grant's.  There's got to be
another way to do this.

Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Richard Cochran July 7, 2010, 7:18 a.m. UTC | #3
On Tue, Jul 06, 2010 at 12:09:10PM -0500, Andy Fleming wrote:
> allowed in interrupt context.  *Certainly*, once you *do* allow MII
> transactions in interrupt context, you *cannot* use spin_lock().  You
> at least have to use spin_lock_irq[save].

Okay, I see. Before the change to mutexes the code used spin_lock_bh.

> Also, I agree with David's comments, and Grant's.  There's got to be
> another way to do this.

I understand what troubles you about the proposed changes. However, I
cannot see a better way to get a PHY time stamp other than to read it
out during the napi poll and hard_xmit functions. It is a chicken and
egg problem.

Consider the receive path:

1. PTP Packet passed through PHY. PHY recognizes it and stores a time
   stamp along with some UID from the packet.

2. Napi calls the MAC driver's poll function.

3. MAC driver acquires packet. At this point, if we want to have a
   hardware time stamp, we must read it out over the mdio bus, before
   handing the packet over to the stack via netif_receive_skb().

If we decide to defer the packet delivery (in step 3), we have to know
whether the PHY will have a time stamp for this packet. The only way
to do this is to compare the UIDs, but that requires reading it over
the mdio bus.

One could simply defer *all* packets, but that would really stink.

Possibly, we could defer all likely packets, for example all PTP
packets. Changing all the MAC driver to call a time stamping hook that
conditionally consumes the skbuffs would also stink. Could the packet
be deferred by the stack, early in netif_receive_skb? If so, a work
queue could read the PHY and then deliver the deferred packets at some
later time.

IMHO, it better just to check for a PHY time stamp immediately, and
live with the performance hit. I believe that PTP users will accept
this. Naturally, it requires allowing reading the mdio bus in two
non-sleeping contexts.

There are perhaps other ways to provide PHY time stamps, but I am
doubtful that they would work better, in general. The National
Semiconductor PHYTER can provide the time stamps as status frames, and
it can also insert time stamps directly into the incoming and
outgoing frames.

* Using status frames is an attractive idea, because it obviates the
  need to read over the mdio bus. However, you still have the problem
  of matching the status frames to the PTP packets.

* Inserting a time stamp into outgoing frames presents no problems
  (this is the so called "PTP one step" operation). However, for
  incoming frames the PHYTER inserts the time stamp into the PTP
  message at a programmable offset. This invalidates the UDP checksum
  for PTPv1 packets (PTPv2 checksums will be automatically corrected.)

Even if we implement one of these alternatives, I am not sure that
other PHYs will also offer the same capabilities as the PHYTER.

Well, there you have it. I welcome any ideas on how to go about
offering PHY time stamping.

Thanks,
Richard





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Richard Cochran July 16, 2010, 11:25 a.m. UTC | #4
On Wed, Jul 07, 2010 at 09:18:32AM +0200, Richard Cochran wrote:
> Consider the receive path:
> 
> 1. PTP Packet passed through PHY. PHY recognizes it and stores a time
>    stamp along with some UID from the packet.
> 
> 2. Napi calls the MAC driver's poll function.
> 
> 3. MAC driver acquires packet. At this point, if we want to have a
>    hardware time stamp, we must read it out over the mdio bus, before
>    handing the packet over to the stack via netif_receive_skb().
> 
> If we decide to defer the packet delivery (in step 3), we have to know
> whether the PHY will have a time stamp for this packet. The only way
> to do this is to compare the UIDs, but that requires reading it over
> the mdio bus.

(Forwarding Andy's message to me that was missing a CC to the list)

On Thu, Jul 08, 2010 at 03:40:41PM -0500, Andy Fleming wrote:

Wait, is the intent for this mdio read to be done for *every* packet?
MDIO is spec'ed out to go up to 2.5MHz.  Each transaction takes 64
cycles.  And I see that reading the timestamp from the PHY you
submitted support for takes at *least* 2 transactions.  The fastest
you can process packets would then be under 20,000 packets per second.
Even on a 100Mb link with full-sized packets, you would be 40% done
receiving the next packet before you had passed the packet on to the
stack.  Each MDIO transaction takes 25,600 cycles on a gigahertz
processor.  It's just too long, IMO, and it looks like this code will
end up doing up to...10?

I wasn't able to find an example of how you were going to use the
time-stamp functions you provided.  Could you please go into a little
more detail about how you intended this to work?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Richard Cochran July 16, 2010, 11:49 a.m. UTC | #5
On Thu, Jul 08, 2010 at 03:40:41PM -0500, Andy Fleming wrote:
> 
> Wait, is the intent for this mdio read to be done for *every* packet?
> MDIO is spec'ed out to go up to 2.5MHz.  Each transaction takes 64
> cycles.  And I see that reading the timestamp from the PHY you
> submitted support for takes at *least* 2 transactions.  The fastest
> you can process packets would then be under 20,000 packets per second.
> Even on a 100Mb link with full-sized packets, you would be 40% done
> receiving the next packet before you had passed the packet on to the
> stack.  Each MDIO transaction takes 25,600 cycles on a gigahertz
> processor.  It's just too long, IMO, and it looks like this code will
> end up doing up to...10?

Well, it is actually worse than that. From my measurements, 64 cycles
at 2.5 MHz (25.6 usec) is too optimistic. On one Xscale IXP platform,
I get 35 to 40 usec per read. Also, the PHYTER needs six reads for a
Rx and four for a Tx time stamp.

So you are right, it is a very long time to leave interrupts off.

However, not every packet needs a time stamp. The PHY devices that I
know of all are selective. That is, they recognize PTP packets in
hardware and only provide time stamps for those special packets. The
packet rate is not that hight. For example, the "sync" message comes
once every two seconds in PTPv1 and up to ten times per second in
PTPv2.

I have been working on an alternative, which I will post soon. It goes
like this... (comments, please, before I get too far into it!)

* General
   - phy registers itself with net_device { opaque pointer }

   - ts_candidate(skb):
     function to detect likely packets, returns true if
       1. phy_device pointer exists in net_device
       2. data look like PTP, using a BPF

   - work queue:
     1. read out time stamps
     2. match times with queued skbs
     3. deliver skbs with time stamps (or without on timeout)

* Rx path
   - netif_receive_skb:
     if ts_candidate(skb), defer(skb).

* Tx path
   - hook in each MAC driver
   - if ts_candidate(skb), clone and defer(skb).


> I wasn't able to find an example of how you were going to use the
> time-stamp functions you provided.  Could you please go into a little
> more detail about how you intended this to work?

My last PTP series included the PHY stuff, too. There you can see how
it all fits together.

   http://marc.info/?l=linux-netdev&m=127661796627120&w=3


Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/fsl_pq_mdio.c b/drivers/net/fsl_pq_mdio.c
index b4c41d7..0b34f50 100644
--- a/drivers/net/fsl_pq_mdio.c
+++ b/drivers/net/fsl_pq_mdio.c
@@ -145,7 +145,7 @@  static int fsl_pq_mdio_reset(struct mii_bus *bus)
 	struct fsl_pq_mdio __iomem *regs = fsl_pq_mdio_get_regs(bus);
 	int timeout = PHY_INIT_TIMEOUT;
 
-	mutex_lock(&bus->mdio_lock);
+	mdiobus_lock(bus);
 
 	/* Reset the management interface */
 	out_be32(&regs->miimcfg, MIIMCFG_RESET);
@@ -157,7 +157,7 @@  static int fsl_pq_mdio_reset(struct mii_bus *bus)
 	while ((in_be32(&regs->miimind) & MIIMIND_BUSY) && timeout--)
 		cpu_relax();
 
-	mutex_unlock(&bus->mdio_lock);
+	mdiobus_unlock(bus);
 
 	if (timeout < 0) {
 		printk(KERN_ERR "%s: The MII Bus is stuck!\n",
diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 6a6b819..ad6bed8 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -37,6 +37,32 @@ 
 #include <asm/uaccess.h>
 
 /**
+ * mdiobus_lock - locks a given bus for read or write operations.
+ * @bus: target mii_bus
+ */
+void mdiobus_lock(struct mii_bus *bus)
+{
+	if (MDIOBUS_SLEEPS_RW == bus->locktype)
+		mutex_lock(&bus->lock.m);
+	else
+		spin_lock(&bus->lock.s);
+}
+EXPORT_SYMBOL(mdiobus_lock);
+
+/**
+ * mdiobus_unlock - unlocks a given bus for read or write operations.
+ * @bus: target mii_bus
+ */
+void mdiobus_unlock(struct mii_bus *bus)
+{
+	if (MDIOBUS_SLEEPS_RW == bus->locktype)
+		mutex_unlock(&bus->lock.m);
+	else
+		spin_unlock(&bus->lock.s);
+}
+EXPORT_SYMBOL(mdiobus_unlock);
+
+/**
  * mdiobus_alloc - allocate a mii_bus structure
  *
  * Description: called by a bus driver to allocate an mii_bus
@@ -107,7 +133,10 @@  int mdiobus_register(struct mii_bus *bus)
 		return -EINVAL;
 	}
 
-	mutex_init(&bus->mdio_lock);
+	if (MDIOBUS_SLEEPS_RW == bus->locktype)
+		mutex_init(&bus->lock.m);
+	else
+		spin_lock_init(&bus->lock.s);
 
 	if (bus->reset)
 		bus->reset(bus);
@@ -212,11 +241,12 @@  int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum)
 {
 	int retval;
 
-	BUG_ON(in_interrupt());
+	if (MDIOBUS_SLEEPS_RW == bus->locktype)
+		BUG_ON(in_interrupt());
 
-	mutex_lock(&bus->mdio_lock);
+	mdiobus_lock(bus);
 	retval = bus->read(bus, addr, regnum);
-	mutex_unlock(&bus->mdio_lock);
+	mdiobus_unlock(bus);
 
 	return retval;
 }
@@ -237,11 +267,12 @@  int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val)
 {
 	int err;
 
-	BUG_ON(in_interrupt());
+	if (MDIOBUS_SLEEPS_RW == bus->locktype)
+		BUG_ON(in_interrupt());
 
-	mutex_lock(&bus->mdio_lock);
+	mdiobus_lock(bus);
 	err = bus->write(bus, addr, regnum, val);
-	mutex_unlock(&bus->mdio_lock);
+	mdiobus_unlock(bus);
 
 	return err;
 }
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 7a8caac..93ea55f 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -98,11 +98,20 @@  struct mii_bus {
 	int (*write)(struct mii_bus *bus, int phy_id, int regnum, u16 val);
 	int (*reset)(struct mii_bus *bus);
 
+	/* Indicates whether bus may be used from an atomic context. */
+	enum {
+		MDIOBUS_SLEEPS_RW,
+		MDIOBUS_ATOMIC_RW
+	} locktype;
+
 	/*
-	 * A lock to ensure that only one thing can read/write
+	 * A lock or mutex to ensure that only one thing can read/write
 	 * the MDIO bus at a time
 	 */
-	struct mutex mdio_lock;
+	union {
+		struct mutex m;
+		spinlock_t s;
+	} lock;
 
 	struct device *parent;
 	enum {
@@ -127,6 +136,8 @@  struct mii_bus {
 };
 #define to_mii_bus(d) container_of(d, struct mii_bus, dev)
 
+void mdiobus_lock(struct mii_bus *bus);
+void mdiobus_unlock(struct mii_bus *bus);
 struct mii_bus *mdiobus_alloc(void);
 int mdiobus_register(struct mii_bus *bus);
 void mdiobus_unregister(struct mii_bus *bus);