diff mbox

[v8,09/11] block: add a new interface to block events

Message ID 1351501298-3716-10-git-send-email-aaron.lu@intel.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Aaron Lu Oct. 29, 2012, 9:01 a.m. UTC
A new interface to block disk events is added, this interface is
meant to eliminate a race between PM runtime callback and disk events
checking.

Suppose the following device tree:
device_sata_port  (the parent)
  device_ODD      (the child)

When ODD is runtime suspended, sata port will have a chance to enter
runtime suspended state. And in sata port's runtime suspend callback,
it will check if it is OK to omit the power of the ODD. And if yes, the
periodically running events checking work will be stopped, as the ODD
will be waken up by that check and cancel it can make the ODD stay in
zero power state much longer(no worry about how the ODD gets media
change event in ZPODD's case).

I used disk_block_events to do the events blocking, but there is a race
and can lead to a deadlock: when I call disk_block_events in sata port's
runtime suspend callback, the events checking work may already be running
and it will resume the ODD synchronously, and PM core will try to resume
its parent, the sata port first. The PM core makes sure that runtime
resume callback does not run concurrently with runtime suspend callback,
and if the runtime suspend callback is running, the runtime resume
callback will wait for it done. So the events checking work will wait
for sata port's runtime suspend callback, while the sata port's runtime
suspend callback is waiting for the disk events work to finish. Deadlock.

ODD_suspend                        disk_events_workfn
  ata_port_suspend                   check_events
    disk_block_events                  resume ODD
      cancel_delayed_work_sync           resume parent
      (waiting for disk_events_workfn)   (waiting for suspend callback)

So a new function disk_try_block_events is added, it will try to
cancel the delayed work if the work is currently pending, which means
it has been queued but the timer is not expired yet. If the work is not
pending, we return false, as we do not know if the task is already
running or not queued at all. And if the queued work is successfully
cancelled, disk_block_events will be called and we are done. But if we
failed to cancel the work, false will be returned, and the calling
thread knows that it's not safe to block events at the moment and should
act accordingly. In ZPODD's case, poweroff will not be attempted.

Note:
ODD means: Optical Disc Drive.
ZPODD means: Zero Power ODD. It's a mechnism to place the ODD into zero
power state without user's awareness when the system is at S0 system
state.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 block/genhd.c         | 26 ++++++++++++++++++++++++++
 include/linux/genhd.h |  1 +
 2 files changed, 27 insertions(+)

Comments

Tejun Heo Oct. 29, 2012, 3:35 p.m. UTC | #1
Hello,

On Mon, Oct 29, 2012 at 05:01:36PM +0800, Aaron Lu wrote:
> ODD_suspend                        disk_events_workfn
>   ata_port_suspend                   check_events
>     disk_block_events                  resume ODD
>       cancel_delayed_work_sync           resume parent
>       (waiting for disk_events_workfn)   (waiting for suspend callback)

I don't understand why solving it needs to be this elaborate.
check_event() can retry.  Just add a per-sr mutex which is try-locked
by sr_block_check_events() and grab it when entering zero power.

> +/*
> + * Under some circumstances, there is a race between the calling thread
> + * of disk_block_events and the events checking function. To avoid such a race,
> + * this function will check if the delayed work is pending. If not, it means
> + * the work is either not queued or is already running, false is returned.
> + * And if yes, try to cancel the delayed work. If succedded, disk_block_events
> + * will be called and there is no worry that cancel_delayed_work_sync will
> + * deadlock the events checking function. And if failed, false is returned.
> + */
> +bool disk_try_block_events(struct gendisk *disk)
> +{
> +	struct disk_events *ev = disk->ev;
> +
> +	if (!ev)
> +		return false;
> +
> +	if (delayed_work_pending(&ev->dwork)) {

And please don't use delayed_work_pending() like this.  It doesn't add
anything.  cancel_delayed_work() already needs to perform all the
necessary tests.

> +		if (cancel_delayed_work(&disk->ev->dwork)) {
> +			disk_block_events(disk);
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}

Thanks.
Aaron Lu Oct. 30, 2012, 7:04 a.m. UTC | #2
On 10/29/2012 11:35 PM, Tejun Heo wrote:
> Hello,
> 
> On Mon, Oct 29, 2012 at 05:01:36PM +0800, Aaron Lu wrote:
>> ODD_suspend                        disk_events_workfn
>>   ata_port_suspend                   check_events
>>     disk_block_events                  resume ODD
>>       cancel_delayed_work_sync           resume parent
>>       (waiting for disk_events_workfn)   (waiting for suspend callback)
> 
> I don't understand why solving it needs to be this elaborate.
							:-)

> check_event() can retry.  Just add a per-sr mutex which is try-locked
> by sr_block_check_events() and grab it when entering zero power.

Good suggestion. I didn't think about solving it this way.

Many people suggest me that ZPODD is pure SATA/ACPI stuff, and should
not pollute sr driver, so I was trying hard not to touch sr while
preparing these patches, unless there is no other choice(like the
blocking event interface).

So I'm not sure if your suggestion is the way to go.

James, what do you think? Is it OK if I add a mutex into the scsi_cd
structure to do this? Of course I'll define this only under
CONFIG_SATA_ZPODD.

> 
>> +/*
>> + * Under some circumstances, there is a race between the calling thread
>> + * of disk_block_events and the events checking function. To avoid such a race,
>> + * this function will check if the delayed work is pending. If not, it means
>> + * the work is either not queued or is already running, false is returned.
>> + * And if yes, try to cancel the delayed work. If succedded, disk_block_events
>> + * will be called and there is no worry that cancel_delayed_work_sync will
>> + * deadlock the events checking function. And if failed, false is returned.
>> + */
>> +bool disk_try_block_events(struct gendisk *disk)
>> +{
>> +	struct disk_events *ev = disk->ev;
>> +
>> +	if (!ev)
>> +		return false;
>> +
>> +	if (delayed_work_pending(&ev->dwork)) {
> 
> And please don't use delayed_work_pending() like this.  It doesn't add
> anything.  cancel_delayed_work() already needs to perform all the
> necessary tests.

OK, thanks for the suggestion.

-Aaron

> 
>> +		if (cancel_delayed_work(&disk->ev->dwork)) {
>> +			disk_block_events(disk);
>> +			return true;
>> +		}
>> +	}
>> +
>> +	return false;
>> +}
> 
> Thanks.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo Oct. 31, 2012, 9:51 p.m. UTC | #3
Hello,

On Tue, Oct 30, 2012 at 03:04:17PM +0800, Aaron Lu wrote:
> > check_event() can retry.  Just add a per-sr mutex which is try-locked
> > by sr_block_check_events() and grab it when entering zero power.
> 
> Good suggestion. I didn't think about solving it this way.
> 
> Many people suggest me that ZPODD is pure SATA/ACPI stuff, and should
> not pollute sr driver, so I was trying hard not to touch sr while
> preparing these patches, unless there is no other choice(like the
> blocking event interface).
> 
> So I'm not sure if your suggestion is the way to go.
> 
> James, what do you think? Is it OK if I add a mutex into the scsi_cd
> structure to do this? Of course I'll define this only under
> CONFIG_SATA_ZPODD.

I don't think what James' and my suggestions are that different.  Just
silence check_event() while zpodd is kicked in somehow.  There's no
reason to synchronize across multiple subsystems.

Thanks.
Aaron Lu Nov. 1, 2012, 6:30 a.m. UTC | #4
Hi,

On 11/01/2012 05:51 AM, Tejun Heo wrote:
> Hello,
> 
> On Tue, Oct 30, 2012 at 03:04:17PM +0800, Aaron Lu wrote:
>>> check_event() can retry.  Just add a per-sr mutex which is try-locked
>>> by sr_block_check_events() and grab it when entering zero power.
>>
>> Good suggestion. I didn't think about solving it this way.
>>
>> Many people suggest me that ZPODD is pure SATA/ACPI stuff, and should
>> not pollute sr driver, so I was trying hard not to touch sr while
>> preparing these patches, unless there is no other choice(like the
>> blocking event interface).
>>
>> So I'm not sure if your suggestion is the way to go.
>>
>> James, what do you think? Is it OK if I add a mutex into the scsi_cd
>> structure to do this? Of course I'll define this only under
>> CONFIG_SATA_ZPODD.
> 
> I don't think what James' and my suggestions are that different.  Just
> silence check_event() while zpodd is kicked in somehow.  There's no

Well, since sr is not supposed to know anything about ZPODD, I don't see
another way to silence check_event() in ATA layer.

What I hope to achieve is when zero power ready status is sensed in ATA
layer, no more sr_check_events should be called, as that function will
runtime resume the ODD.

Thanks,
Aaron

> reason to synchronize across multiple subsystems.
> 
> Thanks.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/block/genhd.c b/block/genhd.c
index cac7366..c6af755 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1469,6 +1469,32 @@  void disk_block_events(struct gendisk *disk)
 	mutex_unlock(&ev->block_mutex);
 }
 
+/*
+ * Under some circumstances, there is a race between the calling thread
+ * of disk_block_events and the events checking function. To avoid such a race,
+ * this function will check if the delayed work is pending. If not, it means
+ * the work is either not queued or is already running, false is returned.
+ * And if yes, try to cancel the delayed work. If succedded, disk_block_events
+ * will be called and there is no worry that cancel_delayed_work_sync will
+ * deadlock the events checking function. And if failed, false is returned.
+ */
+bool disk_try_block_events(struct gendisk *disk)
+{
+	struct disk_events *ev = disk->ev;
+
+	if (!ev)
+		return false;
+
+	if (delayed_work_pending(&ev->dwork)) {
+		if (cancel_delayed_work(&disk->ev->dwork)) {
+			disk_block_events(disk);
+			return true;
+		}
+	}
+
+	return false;
+}
+
 static void __disk_unblock_events(struct gendisk *disk, bool check_now)
 {
 	struct disk_events *ev = disk->ev;
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 4f440b3..b67247f 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -420,6 +420,7 @@  static inline int get_disk_ro(struct gendisk *disk)
 }
 
 extern void disk_block_events(struct gendisk *disk);
+extern bool disk_try_block_events(struct gendisk *disk);
 extern void disk_unblock_events(struct gendisk *disk);
 extern void disk_flush_events(struct gendisk *disk, unsigned int mask);
 extern unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask);