diff mbox

[-v12,02/15] resources: Add probe_resource()

Message ID 1340736849-14875-3-git-send-email-yinghai@kernel.org
State Rejected
Headers show

Commit Message

Yinghai Lu June 26, 2012, 6:53 p.m. UTC
It is changed from busn_res only version, because Bjorn found that version
was not holding resource_lock.
Even it may be ok for busn_res not holding resource_lock.
It would be better to have it to be generic and use lock and we may
use it for other resources.

probe_resource() will try to find specified size or more in parent bus.
If can not find current parent resource, and it will try to expand parents
top.
If still can not find that specified on top, it will try to reduce target size
until find one.

It will return 0, if it find any resource that could be used.

Returned resource is registered in the tree.
So caller may need to use replace_resource to put real resource in tree.

-v3: remove two parameters that is for debug purpose.
-v4: fix stop_flags checking.
-v5: adjust stop_flags checking position to avoid not needed calling
	into allocate_resource().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/ioport.h |    7 ++
 kernel/resource.c      |  149 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 156 insertions(+), 0 deletions(-)

Comments

Yinghai Lu June 26, 2012, 10:07 p.m. UTC | #1
On Tue, Jun 26, 2012 at 11:53 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> It is changed from busn_res only version, because Bjorn found that version
> was not holding resource_lock.
> Even it may be ok for busn_res not holding resource_lock.
> It would be better to have it to be generic and use lock and we may
> use it for other resources.
>
> probe_resource() will try to find specified size or more in parent bus.
> If can not find current parent resource, and it will try to expand parents
> top.
> If still can not find that specified on top, it will try to reduce target size
> until find one.
>
> It will return 0, if it find any resource that could be used.
>
> Returned resource is registered in the tree.
> So caller may need to use replace_resource to put real resource in tree.
>
> -v3: remove two parameters that is for debug purpose.
> -v4: fix stop_flags checking.
> -v5: adjust stop_flags checking position to avoid not needed calling
>        into allocate_resource().

please check attached one that is updated after first patch with
__allocate_resource changes.
except this one and first one are changed. left ones are not needed to
be updated.
So i'm not going to resend them.

the git branch does get updated.

Thanks

Yinghai
Yinghai Lu Aug. 28, 2012, 4:09 p.m. UTC | #2
On Tue, Jun 26, 2012 at 3:07 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, Jun 26, 2012 at 11:53 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> It is changed from busn_res only version, because Bjorn found that version
>> was not holding resource_lock.
>> Even it may be ok for busn_res not holding resource_lock.
>> It would be better to have it to be generic and use lock and we may
>> use it for other resources.
>>
>> probe_resource() will try to find specified size or more in parent bus.
>> If can not find current parent resource, and it will try to expand parents
>> top.
>> If still can not find that specified on top, it will try to reduce target size
>> until find one.
>>
>> It will return 0, if it find any resource that could be used.
>>
>> Returned resource is registered in the tree.
>> So caller may need to use replace_resource to put real resource in tree.
>>
>> -v3: remove two parameters that is for debug purpose.
>> -v4: fix stop_flags checking.
>> -v5: adjust stop_flags checking position to avoid not needed calling
>>        into allocate_resource().
>
> please check attached one that is updated after first patch with
> __allocate_resource changes.
> except this one and first one are changed. left ones are not needed to
> be updated.
> So i'm not going to resend them.

please check update one. -v7

Thanks

Yinghai
Linus Torvalds Aug. 28, 2012, 5:05 p.m. UTC | #3
On Tue, Aug 28, 2012 at 9:09 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>
> please check update one. -v7

Ugh. Ok, looking closer at this, I think I agree with Bjorn. This is too ugly.

The whole "reduce size/needed_size by 1" double loop is O(n**2). And
it does ugly "insert fake resource to check, and then remove it". Ugh.

Maybe this works in practice for the special case of some PCI bus
numbers that are small integers, but the code is nasty nasty horrible
and not a generic resource allocation thing.

                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Aug. 29, 2012, 12:10 a.m. UTC | #4
On Tue, Aug 28, 2012 at 10:05 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Ugh. Ok, looking closer at this,

Btw, looking at that code, I also found what looks like a potential
locking bug in allocate_resource().

The code does

    if (new->parent)
       .. reallocate ..

to check whether a resource was already allocated. HOWEVER, it does so
without actually holding the resource lock. Which means that
"new->parent" might in theory change.

I don't really know if we care. Anybody who does a
"allocate_resource()" on an existing resource that might already be in
the resource tree hopefully does *not* do that in parallel with
another user trying to change the resource allocation, so maybe the
right thing to do is to just say "whatever, if there is a race with
two threads reallocating the same resource at the same time, the bug
is a much more serious one at a higher level".

So this may well be a non-issue. It was introduced some time ago, in
commit 23c570a67448e ("resource: ability to resize an allocated
resource"), since before that we never even allowed re-allocation of
an already allocated resource in the first place, and everything
happened under the lock.

I dunno. Here's a (UNTESTED!) patch that should fix it. Because it's
extremely doubtful whether this is a real problem, I'm certainly not
going to commit it now, much less mark it for stable. But I'm throwing
it out as an RFC. Technically, if people rely on any races being
handled by the resource subsystem, this *could* trigger. But I'm
hoping that the PCI layer has some higher-level locking for
"reallocate the resources of this PCI device". I did *not* check the
callers.

Btw, reallocate_resource() isn't actually used anywhere in the tree
that I can see, so maybe we should remove it and the export, and just
have the __reallocate_resource() that is static to resource.c and is
to be called only with the lock held.

                         Linus
Yinghai Lu Aug. 29, 2012, 3:57 p.m. UTC | #5
On Tue, Aug 28, 2012 at 5:10 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Aug 28, 2012 at 10:05 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Ugh. Ok, looking closer at this,
>
> Btw, looking at that code, I also found what looks like a potential
> locking bug in allocate_resource().
>
> The code does
>
>     if (new->parent)
>        .. reallocate ..
>
> to check whether a resource was already allocated. HOWEVER, it does so
> without actually holding the resource lock. Which means that
> "new->parent" might in theory change.
>
> I don't really know if we care. Anybody who does a
> "allocate_resource()" on an existing resource that might already be in
> the resource tree hopefully does *not* do that in parallel with
> another user trying to change the resource allocation, so maybe the
> right thing to do is to just say "whatever, if there is a race with
> two threads reallocating the same resource at the same time, the bug
> is a much more serious one at a higher level".

yes, another patch that that split __allocate_resource out have the similar fix.

http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=5d52b21303b7271ba4c5302b387766628f074ae2

but the changelog does not that mention the reason.

also have another version for probe_resource, please check attached version -v8.

Thanks

Yinghai
Yinghai Lu Aug. 29, 2012, 5:36 p.m. UTC | #6
On Wed, Aug 29, 2012 at 8:57 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> also have another version for probe_resource, please check attached version -v8.
>

sorry, v8 forget removing two lines.

please -v9 instead.

-v8: Linus said: allocation/return is not right, and -1 step tricks make it
	not work as generic resource probe.
     So try to remove the needed_size tricks, and also use __adjust_resource
	for probing instead.
-v9: remove two lines that is supposed to be removed after converting to use
	__adjust_resource

Thanks

Yinghai
Bjorn Helgaas Aug. 31, 2012, 12:40 a.m. UTC | #7
On Wed, Aug 29, 2012 at 10:36 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, Aug 29, 2012 at 8:57 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> also have another version for probe_resource, please check attached version -v8.
>>
>
> sorry, v8 forget removing two lines.
>
> please -v9 instead.
>
> -v8: Linus said: allocation/return is not right, and -1 step tricks make it
>         not work as generic resource probe.
>      So try to remove the needed_size tricks, and also use __adjust_resource
>         for probing instead.
> -v9: remove two lines that is supposed to be removed after converting to use
>         __adjust_resource

These tweaks might be slight improvements, but they completely miss
the point of my objection.  I just don't think the probe_resource()
interface is a reasonable addition to kernel/resource.c.  I think it's
too hard to describe what it does, and it seems like it's too specific
to what PCI needs in this particular case.  We should be able to look
at the prototype and get a pretty good idea of what the function does,
but I can't do that with this:

+int probe_resource(struct resource *b_res,
+			 struct resource *busn_res,
+			 resource_size_t needed_size, struct resource **p,
+			 int skip_nr, int limit, int stop_flags)

We already have adjust_resource(), which grows or shrinks a resource
while maintaining the invariants that the adjusted resource (1)
doesn't overlap any of its siblings and (2) still contains all its
children.

adjust_resource() seems like a fairly generic, generally useful
interface.  What you're trying to do with probe_resource() is quite
similar, except that probe_resource() adds the idea of walking up the
tree.

I think you should consider something like an "expand_resource()" that
just balloons a resource at both ends until it abuts its siblings,
i.e., it grows the resource as much as possible.  Then you know the
largest possible size, and you can use adjust_resource() to shrink it
again if you don't need that much.  You can walk up the tree in the
caller when you need to.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 589e0e7..8292e8b 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -156,6 +156,13 @@  extern int allocate_resource(struct resource *root, struct resource *new,
 						       resource_size_t,
 						       resource_size_t),
 			     void *alignf_data);
+void resource_shrink_parents_top(struct resource *b_res,
+				 long size, struct resource *parent_res);
+struct device;
+int probe_resource(struct resource *b_res,
+			struct resource *busn_res,
+			resource_size_t needed_size, struct resource **p,
+			int skip_nr, int limit, int flags);
 struct resource *lookup_resource(struct resource *root, resource_size_t start);
 int adjust_resource(struct resource *res, resource_size_t start,
 		    resource_size_t size);
diff --git a/kernel/resource.c b/kernel/resource.c
index 3f6e522..d5d9aef 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -980,6 +980,155 @@  void __release_region(struct resource *parent, resource_size_t start,
 }
 EXPORT_SYMBOL(__release_region);
 
+static void __resource_extend_parents_top(struct resource *b_res,
+		 long size, struct resource *parent_res)
+{
+	struct resource *res = b_res;
+
+	if (!size)
+		return;
+
+	while (res && res != parent_res) {
+		res->end += size;
+		res = res->parent;
+	}
+}
+
+void resource_shrink_parents_top(struct resource *b_res,
+		 long size, struct resource *parent_res)
+{
+	write_lock(&resource_lock);
+	__resource_extend_parents_top(b_res, -size, parent_res);
+	write_unlock(&resource_lock);
+}
+
+static resource_size_t __find_res_top_free_size(struct resource *res,
+						 int skip_nr)
+{
+	resource_size_t n_size;
+	struct resource tmp_res;
+
+	/*
+	 *   find out free number below res->end that we can use.
+	 *	res->start to res->start + skip_nr - 1 can not be used.
+	 */
+	n_size = resource_size(res);
+	if (n_size <= skip_nr)
+		return 0;
+
+	n_size -= skip_nr;
+	memset(&tmp_res, 0, sizeof(struct resource));
+	while (n_size > 0) {
+		int ret;
+
+		ret = __allocate_resource(res, &tmp_res, n_size,
+			res->end - n_size + skip_nr, res->end,
+			1, NULL, NULL, false, false);
+		if (ret == 0) {
+			__release_resource(&tmp_res);
+			break;
+		}
+		n_size--;
+	}
+
+	return n_size;
+}
+
+/**
+ * probe_resource - Probe resource in parent resource.
+ * @b_res: parent resource descriptor
+ * @busn_res: return probed resource
+ * @needed_size: target size
+ * @p: pointer to farest parent that we extend the top
+ * @skip_nr: number in b_res start that we need to skip.
+ * @limit: local boundary
+ * @stop_flags: flags for stopping extend parent res
+ *
+ * will try to allocate resource in b_res, if can not find the range
+ *  will try to extend parent resources' top.
+ * if still can not make it, will reduce needed_size.
+ */
+int probe_resource(struct resource *b_res,
+			 struct resource *busn_res,
+			 resource_size_t needed_size, struct resource **p,
+			 int skip_nr, int limit, int stop_flags)
+{
+	int ret = -ENOMEM;
+	resource_size_t n_size;
+	struct resource *parent_res = NULL;
+	resource_size_t tmp = b_res->end + 1;
+
+again:
+	/*
+	 * We first try to allocate biggest range in b_res that
+	 *  we can use in b_res directly.
+	 *  we also need to skip skip_nr from start of b_res.
+	 */
+	n_size = resource_size(b_res);
+	if (n_size > skip_nr)
+		n_size -= skip_nr;
+	else
+		n_size = 0;
+	memset(busn_res, 0, sizeof(struct resource));
+	while (n_size >= needed_size) {
+		ret = allocate_resource(b_res, busn_res, n_size,
+				b_res->start + skip_nr, b_res->end,
+				1, NULL, NULL);
+		if (!ret)
+			return ret;
+		n_size--;
+	}
+
+	/* Try to extend the top of parent resources to meet needed_size */
+	write_lock(&resource_lock);
+
+	/* b_res could be root bus resource and can not be extended */
+	if (b_res->flags & stop_flags)
+		goto reduce_needed_size;
+
+	/* find out free range under top at first */
+	n_size = __find_res_top_free_size(b_res, skip_nr);
+	/* can not extend cross local boundary */
+	if ((limit - b_res->end) < (needed_size - n_size))
+		goto reduce_needed_size;
+
+	/* Probe extended range above top */
+	memset(busn_res, 0, sizeof(struct resource));
+	parent_res = b_res->parent;
+	while (parent_res && !(parent_res->flags & stop_flags)) {
+		ret = __allocate_resource(parent_res, busn_res,
+			 needed_size - n_size,
+			 tmp, tmp + needed_size - n_size - 1,
+			 1, NULL, NULL, false, false);
+		if (!ret) {
+			/* save parent_res, we need it as stopper later */
+			*p = parent_res;
+
+			/* prepare busn_res for return */
+			__release_resource(busn_res);
+			busn_res->start -= n_size;
+
+			/* extend parent resources top */
+			__resource_extend_parents_top(b_res,
+					 needed_size - n_size, parent_res);
+			__request_resource(b_res, busn_res);
+
+			write_unlock(&resource_lock);
+			return ret;
+		}
+		parent_res = parent_res->parent;
+	}
+
+reduce_needed_size:
+	write_unlock(&resource_lock);
+	/* ret must not be 0 here */
+	needed_size--;
+	if (needed_size)
+		goto again;
+
+	return ret;
+}
+
 /*
  * Managed region resource
  */