Patchwork [02/11] PCI: Try to allocate mem64 above 4G at first

login
register
mail settings
Submitter Bjorn Helgaas
Date May 25, 2012, 7:37 p.m.
Message ID <20120525193716.GA8817@google.com>
Download mbox | patch
Permalink /patch/161403/
State Not Applicable
Headers show

Comments

Bjorn Helgaas - May 25, 2012, 7:37 p.m.
On Fri, May 25, 2012 at 11:39:26AM -0700, Yinghai Lu wrote:
> On Fri, May 25, 2012 at 10:53 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> >> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
> >> overflowing to zero -- that means the reader has to know what the
> >> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
> >> ways if we changed it.
> >>
> 
> please check if attached one is more clear.
> 
> make max and bottom is only related to _MEM and not default one.
> 
> -       if (!(res->flags & IORESOURCE_MEM_64))
> -               max = PCIBIOS_MAX_MEM_32;
> +       if (res->flags & IORESOURCE_MEM) {
> +               if (!(res->flags & IORESOURCE_MEM_64))
> +                       max = PCIBIOS_MAX_MEM_32;
> +               else if (PCIBIOS_MAX_MEM_32 != -1)
> +                       bottom = (resource_size_t)(1ULL<<32);
> +       }
> 
> will still not affect to other arches.

That's goofy.  You're proposing to make only x86_64 and x86-PAE try to put
64-bit BARs above 4GB.  Why should this be specific to x86?  I acknowledge
that there's risk in doing this, but if it's a good idea for x86_64, it
should also be a good idea for other 64-bit architectures.

And testing for "is this x86_32 without PAE?" with
"PCIBIOS_MAX_MEM_32 == -1" is just plain obtuse and hides an
important bit of arch-specific behavior.

Tangential question about allocate_resource():  Is its "max" argument
really necessary?  We'll obviously only allocate from inside the root
resource, so "max" is just a way to artificially avoid the end of
that resource.  Is there really a case where that's required?

"min" makes sense because in a case like this, it's valid to allocate from
anywhere in the root resource, but we want to try to allocate from the >4GB
part first, then fall back to allocating from the whole resource.  I'm not
sure there's a corresponding case for "max."

Getting back to this patch, I don't think we should need to adjust "max" at
all.  For example, this:

commit cb1c8e46244cfd84a1a2fe91be860a74c1cf4e25
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Thu May 24 22:15:26 2012 -0600

    PCI: try to allocate 64-bit mem resources above 4GB
    
    If we have a 64-bit mem resource, try to allocate it above 4GB first.  If
    that fails, we'll fall back to allocating space below 4GB.

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
H. Peter Anvin - May 25, 2012, 8:18 p.m.
On 05/25/2012 12:37 PM, Bjorn Helgaas wrote:
> 
> That's goofy.  You're proposing to make only x86_64 and x86-PAE try to put
> 64-bit BARs above 4GB.  Why should this be specific to x86?  I acknowledge
> that there's risk in doing this, but if it's a good idea for x86_64, it
> should also be a good idea for other 64-bit architectures.
> 

And so it is (assuming, of course, that we can address that memory.)

	-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - May 25, 2012, 8:19 p.m.
On Fri, May 25, 2012 at 12:37 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, May 25, 2012 at 11:39:26AM -0700, Yinghai Lu wrote:
>> On Fri, May 25, 2012 at 10:53 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> >> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
>> >> overflowing to zero -- that means the reader has to know what the
>> >> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
>> >> ways if we changed it.
>> >>
>>
>> please check if attached one is more clear.
>>
>> make max and bottom is only related to _MEM and not default one.
>>
>> -       if (!(res->flags & IORESOURCE_MEM_64))
>> -               max = PCIBIOS_MAX_MEM_32;
>> +       if (res->flags & IORESOURCE_MEM) {
>> +               if (!(res->flags & IORESOURCE_MEM_64))
>> +                       max = PCIBIOS_MAX_MEM_32;
>> +               else if (PCIBIOS_MAX_MEM_32 != -1)
>> +                       bottom = (resource_size_t)(1ULL<<32);
>> +       }
>>
>> will still not affect to other arches.
>
> That's goofy.  You're proposing to make only x86_64 and x86-PAE try to put
> 64-bit BARs above 4GB.  Why should this be specific to x86?  I acknowledge
> that there's risk in doing this, but if it's a good idea for x86_64, it
> should also be a good idea for other 64-bit architectures.
>
> And testing for "is this x86_32 without PAE?" with
> "PCIBIOS_MAX_MEM_32 == -1" is just plain obtuse and hides an
> important bit of arch-specific behavior.
>
> Tangential question about allocate_resource():  Is its "max" argument
> really necessary?  We'll obviously only allocate from inside the root
> resource, so "max" is just a way to artificially avoid the end of
> that resource.  Is there really a case where that's required?
>
> "min" makes sense because in a case like this, it's valid to allocate from
> anywhere in the root resource, but we want to try to allocate from the >4GB
> part first, then fall back to allocating from the whole resource.  I'm not
> sure there's a corresponding case for "max."
>
> Getting back to this patch, I don't think we should need to adjust "max" at
> all.  For example, this:
>
> commit cb1c8e46244cfd84a1a2fe91be860a74c1cf4e25
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Thu May 24 22:15:26 2012 -0600
>
>    PCI: try to allocate 64-bit mem resources above 4GB
>
>    If we have a 64-bit mem resource, try to allocate it above 4GB first.  If
>    that fails, we'll fall back to allocating space below 4GB.
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 4ce5ef2..075e5b1 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -121,14 +121,16 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
>  {
>        int i, ret = -ENOMEM;
>        struct resource *r;
> -       resource_size_t max = -1;
> +       resource_size_t start = 0;
> +       resource_size_t end = MAX_RESOURCE;

yeah, MAX_RESOURCE is better than -1.

>
>        type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
>
> -       /* don't allocate too high if the pref mem doesn't support 64bit*/
> -       if (!(res->flags & IORESOURCE_MEM_64))
> -               max = PCIBIOS_MAX_MEM_32;

can not remove this one.
otherwise will could allocate above 4g range to non MEM64 resource.


> +       /* If this is a 64-bit mem resource, try above 4GB first */
> +       if (res->flags & IORESOURCE_MEM_64)
> +               start = (resource_size_t) (1ULL << 32);

could affect other arches. let's see if other arches is ok.

please check merged version.

also we have

include/linux/range.h:#define MAX_RESOURCE ((resource_size_t)~0)
arch/x86/kernel/e820.c:#define MAX_RESOURCE_SIZE ((resource_size_t)-1)

we should merge them later?

Thanks

Yinghai
Bjorn Helgaas - May 25, 2012, 9:55 p.m.
On Fri, May 25, 2012 at 2:19 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, May 25, 2012 at 12:37 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Fri, May 25, 2012 at 11:39:26AM -0700, Yinghai Lu wrote:
>>> On Fri, May 25, 2012 at 10:53 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> >> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL
>>> >> overflowing to zero -- that means the reader has to know what the
>>> >> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious
>>> >> ways if we changed it.
>>> >>
>>>
>>> please check if attached one is more clear.
>>>
>>> make max and bottom is only related to _MEM and not default one.
>>>
>>> -       if (!(res->flags & IORESOURCE_MEM_64))
>>> -               max = PCIBIOS_MAX_MEM_32;
>>> +       if (res->flags & IORESOURCE_MEM) {
>>> +               if (!(res->flags & IORESOURCE_MEM_64))
>>> +                       max = PCIBIOS_MAX_MEM_32;
>>> +               else if (PCIBIOS_MAX_MEM_32 != -1)
>>> +                       bottom = (resource_size_t)(1ULL<<32);
>>> +       }
>>>
>>> will still not affect to other arches.
>>
>> That's goofy.  You're proposing to make only x86_64 and x86-PAE try to put
>> 64-bit BARs above 4GB.  Why should this be specific to x86?  I acknowledge
>> that there's risk in doing this, but if it's a good idea for x86_64, it
>> should also be a good idea for other 64-bit architectures.
>>
>> And testing for "is this x86_32 without PAE?" with
>> "PCIBIOS_MAX_MEM_32 == -1" is just plain obtuse and hides an
>> important bit of arch-specific behavior.
>>
>> Tangential question about allocate_resource():  Is its "max" argument
>> really necessary?  We'll obviously only allocate from inside the root
>> resource, so "max" is just a way to artificially avoid the end of
>> that resource.  Is there really a case where that's required?
>>
>> "min" makes sense because in a case like this, it's valid to allocate from
>> anywhere in the root resource, but we want to try to allocate from the >4GB
>> part first, then fall back to allocating from the whole resource.  I'm not
>> sure there's a corresponding case for "max."
>>
>> Getting back to this patch, I don't think we should need to adjust "max" at
>> all.  For example, this:
>>
>> commit cb1c8e46244cfd84a1a2fe91be860a74c1cf4e25
>> Author: Bjorn Helgaas <bhelgaas@google.com>
>> Date:   Thu May 24 22:15:26 2012 -0600
>>
>>    PCI: try to allocate 64-bit mem resources above 4GB
>>
>>    If we have a 64-bit mem resource, try to allocate it above 4GB first.  If
>>    that fails, we'll fall back to allocating space below 4GB.
>>
>> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
>> index 4ce5ef2..075e5b1 100644
>> --- a/drivers/pci/bus.c
>> +++ b/drivers/pci/bus.c
>> @@ -121,14 +121,16 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
>>  {
>>        int i, ret = -ENOMEM;
>>        struct resource *r;
>> -       resource_size_t max = -1;
>> +       resource_size_t start = 0;
>> +       resource_size_t end = MAX_RESOURCE;
>
> yeah, MAX_RESOURCE is better than -1.
>
>>
>>        type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
>>
>> -       /* don't allocate too high if the pref mem doesn't support 64bit*/
>> -       if (!(res->flags & IORESOURCE_MEM_64))
>> -               max = PCIBIOS_MAX_MEM_32;
>
> can not remove this one.
> otherwise will could allocate above 4g range to non MEM64 resource.

Yeah, I convince myself of the dumbest things sometimes.  It occurred
to me while driving home that we need this, but you beat me to it :)

I think we actually have a separate bug here.  On 64-bit non-x86
architectures, PCIBIOS_MAX_MEM_32 is a 64-bit -1, so the following
attempt to avoid putting a 32-bit BAR above 4G only works on x86,
where PCIBIOS_MAX_MEM_32 is 0xffffffff.

        /* don't allocate too high if the pref mem doesn't support 64bit*/
        if (!(res->flags & IORESOURCE_MEM_64))
                max = PCIBIOS_MAX_MEM_32;

I think we should fix this with a separate patch that removes
PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
0xffffffff (or some other "max 32-bit value" symbol).  I don't think
there's anything arch-specific about this.

So I'd like to see two patches here:
  1) Avoid allocating 64-bit regions for 32-bit BARs
  2) Try to allocate regions above 4GB for 64-bit BARs

> also we have
>
> include/linux/range.h:#define MAX_RESOURCE ((resource_size_t)~0)
> arch/x86/kernel/e820.c:#define MAX_RESOURCE_SIZE ((resource_size_t)-1)
>
> we should merge them later?

I would support that.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
H. Peter Anvin - May 25, 2012, 9:58 p.m.
On 05/25/2012 02:55 PM, Bjorn Helgaas wrote:
> 
> I think we actually have a separate bug here.  On 64-bit non-x86
> architectures, PCIBIOS_MAX_MEM_32 is a 64-bit -1, so the following
> attempt to avoid putting a 32-bit BAR above 4G only works on x86,
> where PCIBIOS_MAX_MEM_32 is 0xffffffff.
> 
>         /* don't allocate too high if the pref mem doesn't support 64bit*/
>         if (!(res->flags & IORESOURCE_MEM_64))
>                 max = PCIBIOS_MAX_MEM_32;
> 
> I think we should fix this with a separate patch that removes
> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
> 0xffffffff (or some other "max 32-bit value" symbol).  I don't think
> there's anything arch-specific about this.
> 
> So I'd like to see two patches here:
>   1) Avoid allocating 64-bit regions for 32-bit BARs
>   2) Try to allocate regions above 4GB for 64-bit BARs
> 

Do we also need to track the maximum address available to the CPU?

	-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - May 25, 2012, 10:14 p.m.
On Fri, May 25, 2012 at 3:58 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 05/25/2012 02:55 PM, Bjorn Helgaas wrote:
>>
>> I think we actually have a separate bug here.  On 64-bit non-x86
>> architectures, PCIBIOS_MAX_MEM_32 is a 64-bit -1, so the following
>> attempt to avoid putting a 32-bit BAR above 4G only works on x86,
>> where PCIBIOS_MAX_MEM_32 is 0xffffffff.
>>
>>         /* don't allocate too high if the pref mem doesn't support 64bit*/
>>         if (!(res->flags & IORESOURCE_MEM_64))
>>                 max = PCIBIOS_MAX_MEM_32;
>>
>> I think we should fix this with a separate patch that removes
>> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
>> 0xffffffff (or some other "max 32-bit value" symbol).  I don't think
>> there's anything arch-specific about this.
>>
>> So I'd like to see two patches here:
>>   1) Avoid allocating 64-bit regions for 32-bit BARs
>>   2) Try to allocate regions above 4GB for 64-bit BARs
>
> Do we also need to track the maximum address available to the CPU?

We are allocating from the resources available on this bus, which
means the host bridge apertures or a subset.  If the arch supplies
those, that should be enough.  If it doesn't, we default to
iomem_resource.  On x86, we trim that down to only the supported
address space:

        iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;

But I'm sure some other architectures don't do that yet.  I guess
that's one of the risks -- if an arch is doesn't tell us the apertures
and doesn't trim iomem_resource, we could allocate a non-addressable
region.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - May 25, 2012, 11:10 p.m.
On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> I think we should fix this with a separate patch that removes
> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
> 0xffffffff (or some other "max 32-bit value" symbol).  I don't think
> there's anything arch-specific about this.
>
> So I'd like to see two patches here:
>  1) Avoid allocating 64-bit regions for 32-bit BARs
>  2) Try to allocate regions above 4GB for 64-bit BARs

Sure. please check updated two patches.

Thanks

Yinghai
Bjorn Helgaas - May 26, 2012, 12:12 a.m.
On Fri, May 25, 2012 at 5:10 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> I think we should fix this with a separate patch that removes
>> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
>> 0xffffffff (or some other "max 32-bit value" symbol).  I don't think
>> there's anything arch-specific about this.
>>
>> So I'd like to see two patches here:
>>  1) Avoid allocating 64-bit regions for 32-bit BARs
>>  2) Try to allocate regions above 4GB for 64-bit BARs
>
> Sure. please check updated two patches.

I think the first one looks good.

I'm curious about the second.  Why did you add the IORESOURCE_MEM
test?  That's doesn't affect the "start =" piece because
IORESOURCE_MEM is always set if IORESOURCE_MEM_64 is set.

But it does affect the "end =" part.  Previously we limited all I/O
and 32-bit mem BARs to the low 4GB.  This patch makes it so we limit
32-bit mem BARs to the low 4GB, but we don't limit I/O BARs.  But I/O
BARs can only have 32 bits of address, so it seems like we should
limit them the same way as 32-bit mem BARs.  So I expected something
like this:

    if (res->flags & IORESOURCE_MEM_64) {
        start = (resource_size_t) (1ULL << 32);
        end = PCI_MAX_RESOURCE;
    } else {
        start = 0;
        end = PCI_MAX_RESOURCE_32;
    }
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - May 26, 2012, 3:01 p.m.
On Fri, May 25, 2012 at 6:12 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, May 25, 2012 at 5:10 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> I think we should fix this with a separate patch that removes
>>> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
>>> 0xffffffff (or some other "max 32-bit value" symbol).  I don't think
>>> there's anything arch-specific about this.
>>>
>>> So I'd like to see two patches here:
>>>  1) Avoid allocating 64-bit regions for 32-bit BARs
>>>  2) Try to allocate regions above 4GB for 64-bit BARs
>>
>> Sure. please check updated two patches.
>
> I think the first one looks good.
>
> I'm curious about the second.  Why did you add the IORESOURCE_MEM
> test?  That's doesn't affect the "start =" piece because
> IORESOURCE_MEM is always set if IORESOURCE_MEM_64 is set.
>
> But it does affect the "end =" part.  Previously we limited all I/O
> and 32-bit mem BARs to the low 4GB.  This patch makes it so we limit
> 32-bit mem BARs to the low 4GB, but we don't limit I/O BARs.  But I/O
> BARs can only have 32 bits of address, so it seems like we should
> limit them the same way as 32-bit mem BARs.  So I expected something
> like this:
>
>    if (res->flags & IORESOURCE_MEM_64) {
>        start = (resource_size_t) (1ULL << 32);
>        end = PCI_MAX_RESOURCE;
>    } else {
>        start = 0;
>        end = PCI_MAX_RESOURCE_32;
>    }

Another bug here: we're trying to restrict the *bus* addresses we
allocate, but we're applying the limits to *CPU* addresses.
Therefore, this only works as intended when CPU addresses are the same
as bus addresses, i.e., when the host bridge applies no address
translation.  That happens to be the case for x86, but is not the case
in general.

I think we need a third patch to fix this problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - May 29, 2012, 5:55 p.m.
On Fri, May 25, 2012 at 5:12 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, May 25, 2012 at 5:10 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> I think we should fix this with a separate patch that removes
>>> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit
>>> 0xffffffff (or some other "max 32-bit value" symbol).  I don't think
>>> there's anything arch-specific about this.
>>>
>>> So I'd like to see two patches here:
>>>  1) Avoid allocating 64-bit regions for 32-bit BARs
>>>  2) Try to allocate regions above 4GB for 64-bit BARs
>>
>> Sure. please check updated two patches.
>
> I think the first one looks good.
>
> I'm curious about the second.  Why did you add the IORESOURCE_MEM
> test?  That's doesn't affect the "start =" piece because
> IORESOURCE_MEM is always set if IORESOURCE_MEM_64 is set.
>
> But it does affect the "end =" part.  Previously we limited all I/O
> and 32-bit mem BARs to the low 4GB.  This patch makes it so we limit
> 32-bit mem BARs to the low 4GB, but we don't limit I/O BARs.  But I/O
> BARs can only have 32 bits of address, so it seems like we should
> limit them the same way as 32-bit mem BARs.  So I expected something
> like this:
>
>    if (res->flags & IORESOURCE_MEM_64) {
>        start = (resource_size_t) (1ULL << 32);
>        end = PCI_MAX_RESOURCE;
>    } else {
>        start = 0;
>        end = PCI_MAX_RESOURCE_32;
>    }

x86 are using 16bits.

some others use 32 bits.
#define IO_SPACE_LIMIT 0xffffffff

ia64 and sparc are using 64bits.
#define IO_SPACE_LIMIT               0xffffffffffffffffUL

but pci only support 16bits and 32bits.

maybe later we can add
PCI_MAX_RESOURCE_16

to handle 16bits and 32bit io ports.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - May 29, 2012, 5:56 p.m.
On Sat, May 26, 2012 at 8:01 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Another bug here: we're trying to restrict the *bus* addresses we
> allocate, but we're applying the limits to *CPU* addresses.
> Therefore, this only works as intended when CPU addresses are the same
> as bus addresses, i.e., when the host bridge applies no address
> translation.  That happens to be the case for x86, but is not the case
> in general.
>
> I think we need a third patch to fix this problem.

yes.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
H. Peter Anvin - May 29, 2012, 5:57 p.m.
On 05/29/2012 10:55 AM, Yinghai Lu wrote:
> 
> x86 are using 16bits.
> 
> some others use 32 bits.
> #define IO_SPACE_LIMIT 0xffffffff
> 
> ia64 and sparc are using 64bits.
> #define IO_SPACE_LIMIT               0xffffffffffffffffUL
> 
> but pci only support 16bits and 32bits.
> 
> maybe later we can add
> PCI_MAX_RESOURCE_16
> 
> to handle 16bits and 32bit io ports.
> 

Shouldn't this be dealt by root port apertures?

	-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - May 29, 2012, 6:17 p.m.
On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>
>> x86 are using 16bits.
>>
>> some others use 32 bits.
>> #define IO_SPACE_LIMIT 0xffffffff
>>
>> ia64 and sparc are using 64bits.
>> #define IO_SPACE_LIMIT               0xffffffffffffffffUL
>>
>> but pci only support 16bits and 32bits.
>>
>> maybe later we can add
>> PCI_MAX_RESOURCE_16
>>
>> to handle 16bits and 32bit io ports.
>>
>
> Shouldn't this be dealt by root port apertures?
>

pci bridge could support 16bits and 32bits io port.
but we did not record if 32bits is supported.

so during allocating, could have allocated above 64k address to non
32bit bridge.

but  x86 is ok, because ioport.end always set to 0xffff.
other arches with IO_SPACE_LIMIT with 0xffffffff or
0xffffffffffffffffUL may have problem.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
H. Peter Anvin - May 29, 2012, 7:03 p.m.
On 05/29/2012 11:17 AM, Yinghai Lu wrote:
> 
> pci bridge could support 16bits and 32bits io port.
> but we did not record if 32bits is supported.
> 

Okay, so this is the actual problem, no?

> so during allocating, could have allocated above 64k address to non
> 32bit bridge.
> 
> but  x86 is ok, because ioport.end always set to 0xffff.
> other arches with IO_SPACE_LIMIT with 0xffffffff or
> 0xffffffffffffffffUL may have problem.

The latter is nonsense, the PCI-side address space is only 32 bits wide.

	-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - May 29, 2012, 7:23 p.m.
On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>>
>>> x86 are using 16bits.
>>>
>>> some others use 32 bits.
>>> #define IO_SPACE_LIMIT 0xffffffff
>>>
>>> ia64 and sparc are using 64bits.
>>> #define IO_SPACE_LIMIT               0xffffffffffffffffUL
>>>
>>> but pci only support 16bits and 32bits.
>>>
>>> maybe later we can add
>>> PCI_MAX_RESOURCE_16
>>>
>>> to handle 16bits and 32bit io ports.
>>>
>>
>> Shouldn't this be dealt by root port apertures?
>>
>
> pci bridge could support 16bits and 32bits io port.
> but we did not record if 32bits is supported.
>
> so during allocating, could have allocated above 64k address to non
> 32bit bridge.
>
> but  x86 is ok, because ioport.end always set to 0xffff.
> other arches with IO_SPACE_LIMIT with 0xffffffff or
> 0xffffffffffffffffUL may have problem.

I think current IO_SPACE_LIMIT usage is a little confused.  The
"ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to
a CPU-side address, not a bus address.  Other uses, e.g., in
__pci_read_base(), apply it to bus addresses from BARs, which is
wrong.  Host bridges apply I/O port offsets just like they apply
memory offsets.  The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means
there's no restriction on CPU-side I/O port addresses, but any given
host bridge will translate its I/O port aperture to bus addresses that
fit in 32 bits.

None of this is really relevant to the question I asked, namely, "why
Yinghai's patch doesn't limit I/O BAR values to 32 bits?"  That
constraint is clearly a requirement because I/O BARs are only 32 bits
wide, but I don't think it needs to be enforced in the code here.  The
host bridge or upstream P2P bridge apertures should already take care
of that automatically.  I don't think the 16- or 32-bitness of P2P
bridge apertures is relevant here, because the I/O resources available
on the secondary bus already reflect that.

After all that discussion, I think my objection here boils down to
"you shouldn't change the I/O BAR constraints in a patch that claims
to allocate 64-bit *memory* BARs above 4GB."

I think the code below is still the clearest way to set the constraints:

   if (res->flags & IORESOURCE_MEM_64) {
       start = (resource_size_t) (1ULL << 32);
       end = PCI_MAX_RESOURCE;
   } else {
       start = 0;
       end = PCI_MAX_RESOURCE_32;
   }

It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32
because host bridge apertures should already enforce that, but I think
the code above just makes it clearer.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - May 29, 2012, 8:40 p.m.
On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>>>
>>>> x86 are using 16bits.
>>>>
>>>> some others use 32 bits.
>>>> #define IO_SPACE_LIMIT 0xffffffff
>>>>
>>>> ia64 and sparc are using 64bits.
>>>> #define IO_SPACE_LIMIT               0xffffffffffffffffUL
>>>>
>>>> but pci only support 16bits and 32bits.
>>>>
>>>> maybe later we can add
>>>> PCI_MAX_RESOURCE_16
>>>>
>>>> to handle 16bits and 32bit io ports.
>>>>
>>>
>>> Shouldn't this be dealt by root port apertures?
>>>
>>
>> pci bridge could support 16bits and 32bits io port.
>> but we did not record if 32bits is supported.
>>
>> so during allocating, could have allocated above 64k address to non
>> 32bit bridge.
>>
>> but  x86 is ok, because ioport.end always set to 0xffff.
>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>> 0xffffffffffffffffUL may have problem.
>
> I think current IO_SPACE_LIMIT usage is a little confused.  The
> "ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to
> a CPU-side address, not a bus address.  Other uses, e.g., in
> __pci_read_base(), apply it to bus addresses from BARs, which is
> wrong.  Host bridges apply I/O port offsets just like they apply
> memory offsets.  The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means
> there's no restriction on CPU-side I/O port addresses, but any given
> host bridge will translate its I/O port aperture to bus addresses that
> fit in 32 bits.
>
> None of this is really relevant to the question I asked, namely, "why
> Yinghai's patch doesn't limit I/O BAR values to 32 bits?"  That
> constraint is clearly a requirement because I/O BARs are only 32 bits
> wide, but I don't think it needs to be enforced in the code here.  The
> host bridge or upstream P2P bridge apertures should already take care
> of that automatically.  I don't think the 16- or 32-bitness of P2P
> bridge apertures is relevant here, because the I/O resources available
> on the secondary bus already reflect that.
>
> After all that discussion, I think my objection here boils down to
> "you shouldn't change the I/O BAR constraints in a patch that claims
> to allocate 64-bit *memory* BARs above 4GB."
>
> I think the code below is still the clearest way to set the constraints:
>
>   if (res->flags & IORESOURCE_MEM_64) {
>       start = (resource_size_t) (1ULL << 32);
>       end = PCI_MAX_RESOURCE;
>   } else {
>       start = 0;
>       end = PCI_MAX_RESOURCE_32;
>   }
>
> It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32
> because host bridge apertures should already enforce that, but I think
> the code above just makes it clearer.


ok, please check the version, that put back PCI_MAX_RESOURCE_32 for io ports.

also RFC to limit for 16 bit ioport handling.  only help other arches
that does support 32bit ioports but have bridges only support 16bit io
ports.

Thanks

Yinghai
Yinghai Lu - May 29, 2012, 8:46 p.m.
On Tue, May 29, 2012 at 12:03 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 05/29/2012 11:17 AM, Yinghai Lu wrote:
>>
>> pci bridge could support 16bits and 32bits io port.
>> but we did not record if 32bits is supported.
>>
>
> Okay, so this is the actual problem, no?

their fw could not need kernel help to allocate io ports, or they are
only use device that support 32bit ioport.

>
>> so during allocating, could have allocated above 64k address to non
>> 32bit bridge.
>>
>> but  x86 is ok, because ioport.end always set to 0xffff.
>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>> 0xffffffffffffffffUL may have problem.
>
> The latter is nonsense, the PCI-side address space is only 32 bits wide.
>
maybe they have unified io include ioport and mem io?

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
H. Peter Anvin - May 29, 2012, 8:50 p.m.
On 05/29/2012 01:46 PM, Yinghai Lu wrote:
> On Tue, May 29, 2012 at 12:03 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 05/29/2012 11:17 AM, Yinghai Lu wrote:
>>>
>>> pci bridge could support 16bits and 32bits io port.
>>> but we did not record if 32bits is supported.
>>>
>>
>> Okay, so this is the actual problem, no?
> 
> their fw could not need kernel help to allocate io ports, or they are
> only use device that support 32bit ioport.
> 

That barely parses, never mind makes sense.

>>
>>> so during allocating, could have allocated above 64k address to non
>>> 32bit bridge.
>>>
>>> but  x86 is ok, because ioport.end always set to 0xffff.
>>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>>> 0xffffffffffffffffUL may have problem.
>>
>> The latter is nonsense, the PCI-side address space is only 32 bits wide.
>>
> maybe they have unified io include ioport and mem io?
> 

The bus-side address space should not be more than 32 bits no matter
what.  As Bjorn indicates, you seem to be mixing up bus and cpu
addresses all over the place.

	-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - May 29, 2012, 8:53 p.m.
From: Yinghai Lu <yinghai@kernel.org>
Date: Tue, 29 May 2012 13:46:03 -0700

> On Tue, May 29, 2012 at 12:03 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 05/29/2012 11:17 AM, Yinghai Lu wrote:
>>> so during allocating, could have allocated above 64k address to non
>>> 32bit bridge.
>>>
>>> but  x86 is ok, because ioport.end always set to 0xffff.
>>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>>> 0xffffffffffffffffUL may have problem.
>>
>> The latter is nonsense, the PCI-side address space is only 32 bits wide.
>>
> maybe they have unified io include ioport and mem io?

The resource values stored are the actual physical I/O addresses on
some platforms.  Sparc, for example, falls into this category.

Therefore ~0UL is the only feasible limit for them.

We have to distinguish explicitly when we are operating upon actual
PCI bus view addresses vs. the values that end up in the resources.
They are entirely different, and have completely different address
ranges.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - May 29, 2012, 11:24 p.m.
On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>>>>
>>>>> x86 are using 16bits.
>>>>>
>>>>> some others use 32 bits.
>>>>> #define IO_SPACE_LIMIT 0xffffffff
>>>>>
>>>>> ia64 and sparc are using 64bits.
>>>>> #define IO_SPACE_LIMIT               0xffffffffffffffffUL
>>>>>
>>>>> but pci only support 16bits and 32bits.
>>>>>
>>>>> maybe later we can add
>>>>> PCI_MAX_RESOURCE_16
>>>>>
>>>>> to handle 16bits and 32bit io ports.
>>>>>
>>>>
>>>> Shouldn't this be dealt by root port apertures?
>>>>
>>>
>>> pci bridge could support 16bits and 32bits io port.
>>> but we did not record if 32bits is supported.
>>>
>>> so during allocating, could have allocated above 64k address to non
>>> 32bit bridge.
>>>
>>> but  x86 is ok, because ioport.end always set to 0xffff.
>>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>>> 0xffffffffffffffffUL may have problem.
>>
>> I think current IO_SPACE_LIMIT usage is a little confused.  The
>> "ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to
>> a CPU-side address, not a bus address.  Other uses, e.g., in
>> __pci_read_base(), apply it to bus addresses from BARs, which is
>> wrong.  Host bridges apply I/O port offsets just like they apply
>> memory offsets.  The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means
>> there's no restriction on CPU-side I/O port addresses, but any given
>> host bridge will translate its I/O port aperture to bus addresses that
>> fit in 32 bits.
>>
>> None of this is really relevant to the question I asked, namely, "why
>> Yinghai's patch doesn't limit I/O BAR values to 32 bits?"  That
>> constraint is clearly a requirement because I/O BARs are only 32 bits
>> wide, but I don't think it needs to be enforced in the code here.  The
>> host bridge or upstream P2P bridge apertures should already take care
>> of that automatically.  I don't think the 16- or 32-bitness of P2P
>> bridge apertures is relevant here, because the I/O resources available
>> on the secondary bus already reflect that.
>>
>> After all that discussion, I think my objection here boils down to
>> "you shouldn't change the I/O BAR constraints in a patch that claims
>> to allocate 64-bit *memory* BARs above 4GB."
>>
>> I think the code below is still the clearest way to set the constraints:
>>
>>   if (res->flags & IORESOURCE_MEM_64) {
>>       start = (resource_size_t) (1ULL << 32);
>>       end = PCI_MAX_RESOURCE;
>>   } else {
>>       start = 0;
>>       end = PCI_MAX_RESOURCE_32;
>>   }
>>
>> It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32
>> because host bridge apertures should already enforce that, but I think
>> the code above just makes it clearer.
>
>
> ok, please check the version, that put back PCI_MAX_RESOURCE_32 for io ports.

I like the fact that this patch no longer changes anything for I/O
resources.  I assume this is part of fixing some bug (Steven's?)  I'd
like to have a pointer in the changelog to a bugzilla or discussion
about the bug.

The effect of this patch is similar to what I did earlier with
b126b4703afa4 and e7f8567db9a7 (allocate space from top down), though
this one is more limited and it won't change quite as much.  We ran
into problems (BIOS defects, I think) and had to revert my patches, so
it's quite possible that we'll run into similar problems here.

I'm a little nervous because this is a fundamental area that explores
new areas of the address space minefield.  I think we're generally
safer if we  follow a path similar to where Windows has been.  I think
Windows also prefers space above 4GB for 64-bit BARs, but I suspect
that's just a natural consequence of allocating from the top down.  So
we'll place things just above 4GB, and Windows will place things as
high as possible.

I don't know the best solution here.  This patch ("bottom-up above
4GB") is one possibility.  Another is to allocate only 64-bit BARs
top-down.  Or maybe allocate everything top-down on machines newer
than some date.  They all seem ugly.  What makes me uneasy is that
your patch strikes out on a new path that is different from what we've
done before *and* different from what Windows does.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - May 29, 2012, 11:27 p.m.
On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>>>>>
>>>>> x86 are using 16bits.
>>>>>
>>>>> some others use 32 bits.
>>>>> #define IO_SPACE_LIMIT 0xffffffff
>>>>>
>>>>> ia64 and sparc are using 64bits.
>>>>> #define IO_SPACE_LIMIT               0xffffffffffffffffUL
>>>>>
>>>>> but pci only support 16bits and 32bits.
>>>>>
>>>>> maybe later we can add
>>>>> PCI_MAX_RESOURCE_16
>>>>>
>>>>> to handle 16bits and 32bit io ports.
>>>>>
>>>>
>>>> Shouldn't this be dealt by root port apertures?
>>>>
>>>
>>> pci bridge could support 16bits and 32bits io port.
>>> but we did not record if 32bits is supported.
>>>
>>> so during allocating, could have allocated above 64k address to non
>>> 32bit bridge.
>>>
>>> but  x86 is ok, because ioport.end always set to 0xffff.
>>> other arches with IO_SPACE_LIMIT with 0xffffffff or
>>> 0xffffffffffffffffUL may have problem.
>>
>> I think current IO_SPACE_LIMIT usage is a little confused.  The
>> "ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to
>> a CPU-side address, not a bus address.  Other uses, e.g., in
>> __pci_read_base(), apply it to bus addresses from BARs, which is
>> wrong.  Host bridges apply I/O port offsets just like they apply
>> memory offsets.  The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means
>> there's no restriction on CPU-side I/O port addresses, but any given
>> host bridge will translate its I/O port aperture to bus addresses that
>> fit in 32 bits.
>>
>> None of this is really relevant to the question I asked, namely, "why
>> Yinghai's patch doesn't limit I/O BAR values to 32 bits?"  That
>> constraint is clearly a requirement because I/O BARs are only 32 bits
>> wide, but I don't think it needs to be enforced in the code here.  The
>> host bridge or upstream P2P bridge apertures should already take care
>> of that automatically.  I don't think the 16- or 32-bitness of P2P
>> bridge apertures is relevant here, because the I/O resources available
>> on the secondary bus already reflect that.
>>
>> After all that discussion, I think my objection here boils down to
>> "you shouldn't change the I/O BAR constraints in a patch that claims
>> to allocate 64-bit *memory* BARs above 4GB."
>>
>> I think the code below is still the clearest way to set the constraints:
>>
>>   if (res->flags & IORESOURCE_MEM_64) {
>>       start = (resource_size_t) (1ULL << 32);
>>       end = PCI_MAX_RESOURCE;
>>   } else {
>>       start = 0;
>>       end = PCI_MAX_RESOURCE_32;
>>   }
>>
>> It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32
>> because host bridge apertures should already enforce that, but I think
>> the code above just makes it clearer.
>
>
> ok, please check the version, that put back PCI_MAX_RESOURCE_32 for io ports.
>
> also RFC to limit for 16 bit ioport handling.  only help other arches
> that does support 32bit ioports but have bridges only support 16bit io
> ports.

I don't understand this one at all.  It looks like you mashed together
at least two changes: (1) prefer I/O space above 64K if available, and
(2) mark secondary bus resources with IORESOURCE_IO_32 when the P2P
bridge I/O window address decode type is PCI_IO_RANGE_TYPE_32 and use
that to limit allocations.

I don't see the justification for (1).  What problem does this solve?

I don't see the need for (2).  Even without the start/end constraints
in pci_bus_alloc_resource(), we only allocate from the resources
available on the bus.  Apparently you think this list might be
incorrect, i.e., it might include I/O space above 64K when it
shouldn't.  I don't think that's the case, but if it is, we should fix
the list of bus resources, not add a hack to avoid parts of it.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - May 29, 2012, 11:33 p.m.
On Tue, May 29, 2012 at 4:27 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> I don't understand this one at all.  It looks like you mashed together
> at least two changes: (1) prefer I/O space above 64K if available, and
> (2) mark secondary bus resources with IORESOURCE_IO_32 when the P2P
> bridge I/O window address decode type is PCI_IO_RANGE_TYPE_32 and use
> that to limit allocations.

let drop the one about IORESOURCE_IO_32.

that should only fix one reallocation bug in theory out of x86.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - May 29, 2012, 11:47 p.m.
On Tue, May 29, 2012 at 5:33 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, May 29, 2012 at 4:27 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> I don't understand this one at all.  It looks like you mashed together
>> at least two changes: (1) prefer I/O space above 64K if available, and
>> (2) mark secondary bus resources with IORESOURCE_IO_32 when the P2P
>> bridge I/O window address decode type is PCI_IO_RANGE_TYPE_32 and use
>> that to limit allocations.
>
> let drop the one about IORESOURCE_IO_32.
>
> that should only fix one reallocation bug in theory out of x86.

If there's a bug, we should try to fix it.  I just don't understand
what the bug *is*.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Newbury - May 30, 2012, 7:40 a.m.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 30/05/12 00:27, Bjorn Helgaas wrote:
> On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <yinghai@kernel.org>
> wrote:
>> On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas
>> <bhelgaas@google.com> wrote:
>>> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu
>>> <yinghai@kernel.org> wrote:
>>>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin
>>>> <hpa@zytor.com> wrote:
>>>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:

*SNIP*

>> 
>> 
>> ok, please check the version, that put back PCI_MAX_RESOURCE_32
>> for io ports.
>> 
>> also RFC to limit for 16 bit ioport handling.  only help other
>> arches that does support 32bit ioports but have bridges only
>> support 16bit io ports.
> 
> I don't understand this one at all.  It looks like you mashed
> together at least two changes: (1) prefer I/O space above 64K if
> available, and (2) mark secondary bus resources with
> IORESOURCE_IO_32 when the P2P bridge I/O window address decode type
> is PCI_IO_RANGE_TYPE_32 and use that to limit allocations.
> 
> I don't see the justification for (1).  What problem does this
> solve?
> 
I can potentially see this helping with hotplug, where a new device
has only 16 bit io ports, but if there's no space remaining under 64k
allocation would fail.  Preferring above 64k means preserving that
limited resource.  This is exactly equivalent to my original
motivation for preferring 64 bit PCI mem in order to preserve 32 bit
address space for hotplug devices without 64 bit BARs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk/FzvEACgkQGcb56gMuC61lYwCfchbsyzN5KLCWTuyQiMcJR0DH
l4gAoJAY1D0HN6m5JnQLu705hvm85p5a
=whd3
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - May 30, 2012, 4:27 p.m.
On Wed, May 30, 2012 at 1:40 AM, Steven Newbury <steve@snewbury.org.uk> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 30/05/12 00:27, Bjorn Helgaas wrote:
>> On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <yinghai@kernel.org>
>> wrote:
>>> On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas
>>> <bhelgaas@google.com> wrote:
>>>> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu
>>>> <yinghai@kernel.org> wrote:
>>>>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin
>>>>> <hpa@zytor.com> wrote:
>>>>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote:
>
> *SNIP*
>
>>>
>>>
>>> ok, please check the version, that put back PCI_MAX_RESOURCE_32
>>> for io ports.
>>>
>>> also RFC to limit for 16 bit ioport handling.  only help other
>>> arches that does support 32bit ioports but have bridges only
>>> support 16bit io ports.
>>
>> I don't understand this one at all.  It looks like you mashed
>> together at least two changes: (1) prefer I/O space above 64K if
>> available, and (2) mark secondary bus resources with
>> IORESOURCE_IO_32 when the P2P bridge I/O window address decode type
>> is PCI_IO_RANGE_TYPE_32 and use that to limit allocations.
>>
>> I don't see the justification for (1).  What problem does this
>> solve?
>>
> I can potentially see this helping with hotplug, where a new device
> has only 16 bit io ports, but if there's no space remaining under 64k
> allocation would fail.  Preferring above 64k means preserving that
> limited resource.  This is exactly equivalent to my original
> motivation for preferring 64 bit PCI mem in order to preserve 32 bit
> address space for hotplug devices without 64 bit BARs.

You're right, the spec does allow the upper 16 bits of I/O BARs to be
hardwired to zero (PCI spec rev 3.0, p. 226), so this part does make
some sense.  I don't think it applies to x86, since I don't think
there's a way to generate an I/O access to anything above 64K, but it
could help other arches.

I'm inclined to be conservative and wait until we find a problem where
a patch like this would help.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
H. Peter Anvin - May 30, 2012, 4:30 p.m.
On 05/30/2012 09:27 AM, Bjorn Helgaas wrote:
> 
> You're right, the spec does allow the upper 16 bits of I/O BARs to be
> hardwired to zero (PCI spec rev 3.0, p. 226), so this part does make
> some sense.  I don't think it applies to x86, since I don't think
> there's a way to generate an I/O access to anything above 64K, but it
> could help other arches.
> 
> I'm inclined to be conservative and wait until we find a problem where
> a patch like this would help.
> 

The really conservative thing is to just use 16-bit addresses for I/O on
all platforms.  I think a lot of non-x86 platforms does that.

	-hpa
Linus Torvalds - May 30, 2012, 4:33 p.m.
On Wed, May 30, 2012 at 9:27 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> You're right, the spec does allow the upper 16 bits of I/O BARs to be
> hardwired to zero (PCI spec rev 3.0, p. 226), so this part does make
> some sense.  I don't think it applies to x86, since I don't think
> there's a way to generate an I/O access to anything above 64K, but it
> could help other arches.
>
> I'm inclined to be conservative and wait until we find a problem where
> a patch like this would help.

I agree. I would be *very* suspicious of "it might help with other
architectures".

Sure, other architectures might be using non-16-bit IO addresses for
their motherboard resources (just because they can). However, any
hotplug device is most likely (by far) to have been tested and
designed for x86, which means that even if it might look like it has
more than 16 bits of IO space, it will never have been tested that
way.

So rather than make it likely to help other architectures, I'd say
that it would make it likely to break things.

                 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - June 1, 2012, 11:30 p.m.
On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> The bus-side address space should not be more than 32 bits no matter
> what.  As Bjorn indicates, you seem to be mixing up bus and cpu
> addresses all over the place.

please check update patches that is using converted pci bus address
for boundary checking.

Thanks

Yinghai Lu
Bjorn Helgaas - June 4, 2012, 1:05 a.m.
On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> The bus-side address space should not be more than 32 bits no matter
>> what.  As Bjorn indicates, you seem to be mixing up bus and cpu
>> addresses all over the place.
>
> please check update patches that is using converted pci bus address
> for boundary checking.

What problem does this fix?  There's significant risk that this
allocation change  will make us trip over something, so it must fix
something to make it worth considering.

Steve's problem doesn't count because that's a "pci=nocrs" case that
will always require special handling.  A general solution is not
possible without a BIOS change (to describe >4GB apertures) or a
native host bridge driver (to discover >4GB apertures from the
hardware).  These patches only make Steve's machine work by accident
-- they make us put the video device above 4GB, and we're just lucky
that the host bridge claims that region.

One possibility is some sort of boot-time option to force a PCI device
to a specified address.  That would be useful for debugging as well as
for Steve's machine.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - June 5, 2012, 2:37 a.m.
On Sun, Jun 3, 2012 at 6:05 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>
>>> The bus-side address space should not be more than 32 bits no matter
>>> what.  As Bjorn indicates, you seem to be mixing up bus and cpu
>>> addresses all over the place.
>>
>> please check update patches that is using converted pci bus address
>> for boundary checking.
>
> What problem does this fix?  There's significant risk that this
> allocation change  will make us trip over something, so it must fix
> something to make it worth considering.

If we do not enable that, we would not find the problem.

On one my test setup that _CRS does state 64bit resource range,
but when I clear some device resource manually and let kernel allocate
high, just then find out those devices does not work with drivers.
It turns out _CRS have more big range than what the chipset setting states.
with fixing in BIOS, allocate high is working now on that platform.

>
> Steve's problem doesn't count because that's a "pci=nocrs" case that
> will always require special handling.

but pci=nocrs is still supported, even some systems does not work with
pci=use_crs

> A general solution is not
> possible without a BIOS change (to describe >4GB apertures) or a
> native host bridge driver (to discover >4GB apertures from the
> hardware).  These patches only make Steve's machine work by accident
> -- they make us put the video device above 4GB, and we're just lucky
> that the host bridge claims that region.

Some bios looks enabling the non-stated range default to legacy chain.
Some bios does not do that. only stated range count.
So with pci=nocrs we still have some chance to get allocate high working.

>
> One possibility is some sort of boot-time option to force a PCI device
> to a specified address.  That would be useful for debugging as well as
> for Steve's machine.

yeah, how about

pci=alloc_high

and default to disabled ?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - June 5, 2012, 4:50 a.m.
On Mon, Jun 4, 2012 at 7:37 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Sun, Jun 3, 2012 at 6:05 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>>
>>>> The bus-side address space should not be more than 32 bits no matter
>>>> what.  As Bjorn indicates, you seem to be mixing up bus and cpu
>>>> addresses all over the place.
>>>
>>> please check update patches that is using converted pci bus address
>>> for boundary checking.
>>
>> What problem does this fix?  There's significant risk that this
>> allocation change  will make us trip over something, so it must fix
>> something to make it worth considering.
>
> If we do not enable that, we would not find the problem.

Sorry, that didn't make any sense to me.  I'm hoping you will point us
to a bug report that is fixed by this patch.

> On one my test setup that _CRS does state 64bit resource range,
> but when I clear some device resource manually and let kernel allocate
> high, just then find out those devices does not work with drivers.
> It turns out _CRS have more big range than what the chipset setting states.
> with fixing in BIOS, allocate high is working now on that platform.

I didn't understand this either, sorry.  Are you saying that this
patch helps us work around a BIOS defect?

>> Steve's problem doesn't count because that's a "pci=nocrs" case that
>> will always require special handling.
>
> but pci=nocrs is still supported, even some systems does not work with
> pci=use_crs
>
>> A general solution is not
>> possible without a BIOS change (to describe >4GB apertures) or a
>> native host bridge driver (to discover >4GB apertures from the
>> hardware).  These patches only make Steve's machine work by accident
>> -- they make us put the video device above 4GB, and we're just lucky
>> that the host bridge claims that region.
>
> Some bios looks enabling the non-stated range default to legacy chain.
> Some bios does not do that. only stated range count.
> So with pci=nocrs we still have some chance to get allocate high working.

The patch as proposed changes behavior for all systems, whether we're
using _CRS or not (in fact, it even changes the behavior for non-x86
systems).  The only case we know of where it fixes something is
Steve's system, where he already has to use "pci=nocrs" in order for
it to help.  My point is that it would be safer to leave things as
they are for everybody, and merely ask Steve to use "pci=nocrs
pci=alloc_high" or something similar.

>> One possibility is some sort of boot-time option to force a PCI device
>> to a specified address.  That would be useful for debugging as well as
>> for Steve's machine.
>
> yeah, how about
>
> pci=alloc_high
>
> and default to disabled ?

I was actually thinking of something more specific, e.g., a way to
place one device at an exact address.  I've implemented that a couple
times already for testing various things.  But maybe a more general
option like "pci=alloc_high" would make sense, too.

Linux has a long history of allocating bottom-up.  Windows has a long
history of allocating top-down.  You're proposing a third alternative,
allocating bottom-up starting at 4GB for 64-bit BARs.  If we change
this area, I would prefer something that follows Windows because I
think it will be closer to what's been tested by Windows.  Do you
think your alternative is better?
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - June 5, 2012, 5:04 a.m.
On Mon, Jun 4, 2012 at 9:50 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Mon, Jun 4, 2012 at 7:37 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Sun, Jun 3, 2012 at 6:05 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>>> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>>>>
>>>>> The bus-side address space should not be more than 32 bits no matter
>>>>> what.  As Bjorn indicates, you seem to be mixing up bus and cpu
>>>>> addresses all over the place.
>>>>
>>>> please check update patches that is using converted pci bus address
>>>> for boundary checking.
>>>
>>> What problem does this fix?  There's significant risk that this
>>> allocation change  will make us trip over something, so it must fix
>>> something to make it worth considering.
>>
>> If we do not enable that, we would not find the problem.
>
> Sorry, that didn't make any sense to me.  I'm hoping you will point us
> to a bug report that is fixed by this patch.

current it only help Steve's test case.

>
>> On one my test setup that _CRS does state 64bit resource range,
>> but when I clear some device resource manually and let kernel allocate
>> high, just then find out those devices does not work with drivers.
>> It turns out _CRS have more big range than what the chipset setting states.
>> with fixing in BIOS, allocate high is working now on that platform.
>
> I didn't understand this either, sorry.  Are you saying that this
> patch helps us work around a BIOS defect?

Help us find out one BIOS defect.

>
>> yeah, how about
>>
>> pci=alloc_high
>>
>> and default to disabled ?
>
> I was actually thinking of something more specific, e.g., a way to
> place one device at an exact address.  I've implemented that a couple
> times already for testing various things.  But maybe a more general
> option like "pci=alloc_high" would make sense, too.

yeah.

>
....
> Linux has a long history of allocating bottom-up.  Windows has a long
> history of allocating top-down.  You're proposing a third alternative,
> allocating bottom-up starting at 4GB for 64-bit BARs.  If we change
> this area, I would prefer something that follows Windows because I
> think it will be closer to what's been tested by Windows.  Do you
> think your alternative is better?

hope we can figure out how windows is making it work.

Steve, Can you check if Windows is working with your test case ?

If it works, we may try do the same thing from Linux, so you will not need to
append "pci=nocrs pci=alloc_high"...

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Newbury - June 6, 2012, 9:44 a.m.
On Tue,   5 Jun 2012, 06:04:57 BST, Yinghai Lu <yinghai@kernel.org> wrote:
> > Linux has a long history of allocating bottom-up.  Windows has a long
> > history of allocating top-down.  You're proposing a third alternative,
> > allocating bottom-up starting at 4GB for 64-bit BARs.  If we change
> > this area, I would prefer something that follows Windows because I
> > think it will be closer to what's been tested by Windows.  Do you
> > think your alternative is better?
> 
> hope we can figure out how windows is making it work.
> 
> Steve, Can you check if Windows is working with your test case ?
> 
> If it works, we may try do the same thing from Linux, so you will not
> need to append "pci=nocrs pci=alloc_high"...
> 
Unfortunately I don't have a 64 bit version of Windows to test with.  Vista(32 bit) fails to even boot when docked, hot-plugging fails to allocate resources, but at least doesn't crash.

From what I've read about the (64 bit) Windows allocation stragegy it's closer to Yinghai's method than the Linux default, preferring 64 bit resources (>4G) when possible.  I'll try to find the specification document again.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - June 6, 2012, 4:18 p.m.
On Wed, Jun 6, 2012 at 2:44 AM, Steven Newbury <steve@snewbury.org.uk> wrote:
> On Tue,   5 Jun 2012, 06:04:57 BST, Yinghai Lu <yinghai@kernel.org> wrote:
>> > Linux has a long history of allocating bottom-up.  Windows has a long
>> > history of allocating top-down.  You're proposing a third alternative,
>> > allocating bottom-up starting at 4GB for 64-bit BARs.  If we change
>> > this area, I would prefer something that follows Windows because I
>> > think it will be closer to what's been tested by Windows.  Do you
>> > think your alternative is better?
>>
>> hope we can figure out how windows is making it work.
>>
>> Steve, Can you check if Windows is working with your test case ?
>>
>> If it works, we may try do the same thing from Linux, so you will not
>> need to append "pci=nocrs pci=alloc_high"...
>>
> Unfortunately I don't have a 64 bit version of Windows to test with.  Vista(32 bit) fails to even boot when docked, hot-plugging fails to allocate resources, but at least doesn't crash.
>
> From what I've read about the (64 bit) Windows allocation stragegy it's closer to Yinghai's method than the Linux default, preferring 64 bit resources (>4G) when possible.  I'll try to find the specification document again.

Here's the host bridge info from the BIOS (from
https://bugzilla.kernel.org/show_bug.cgi?id=10461 attachment
https://bugzilla.kernel.org/attachment.cgi?id=72869):

ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7]
pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff]
pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
pci_root PNP0A03:00: host bridge window [mem 0x000d0000-0x000dffff]
pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xf7ffffff]
pci_root PNP0A03:00: host bridge window [mem 0xfc000000-0xfebfffff]
pci_root PNP0A03:00: host bridge window [mem 0xfec10000-0xfecfffff]
pci_root PNP0A03:00: host bridge window [mem 0xfed1c000-0xfed1ffff]
pci_root PNP0A03:00: host bridge window [mem 0xfed90000-0xfed9ffff]
pci_root PNP0A03:00: host bridge window [mem 0xfed40000-0xfed44fff]
pci_root PNP0A03:00: host bridge window [mem 0xfeda7000-0xfedfffff]
pci_root PNP0A03:00: host bridge window [mem 0xfee10000-0xff9fffff]
pci_root PNP0A03:00: host bridge window [mem 0xffc00000-0xffdfffff]

There's no aperture above 4GB.  So I don't think any version of
Windows will ever assign a BAR above 4GB.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
joeyli - July 4, 2012, 3 a.m.
於 三,2012-07-04 於 10:56 +0800,lee joey 提到:
> 
> 
> ---------- Forwarded message ----------
> From: Bjorn Helgaas <bhelgaas@google.com>
> Date: 2012/6/7
> Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at
> first
> To: Steven Newbury <steve@snewbury.org.uk>
> Cc: Yinghai Lu <yinghai@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
> David Miller <davem@davemloft.net>, Tony Luck <tony.luck@intel.com>,
> Linus Torvalds <torvalds@linux-foundation.org>, Andrew Morton
> <akpm@linux-foundation.org>, linux-pci@vger.kernel.org,
> linux-kernel@vger.kernel.org
> 
> 
> On Wed, Jun 6, 2012 at 2:44 AM, Steven Newbury <steve@snewbury.org.uk>
> wrote:
> > On Tue,   5 Jun 2012, 06:04:57 BST, Yinghai Lu <yinghai@kernel.org>
> wrote:
> >> > Linux has a long history of allocating bottom-up.  Windows has a
> long
> >> > history of allocating top-down.  You're proposing a third
> alternative,
> >> > allocating bottom-up starting at 4GB for 64-bit BARs.  If we
> change
> >> > this area, I would prefer something that follows Windows because
> I
> >> > think it will be closer to what's been tested by Windows.  Do you
> >> > think your alternative is better?
> >>
> >> hope we can figure out how windows is making it work.
> >>
> >> Steve, Can you check if Windows is working with your test case ?
> >>
> >> If it works, we may try do the same thing from Linux, so you will
> not
> >> need to append "pci=nocrs pci=alloc_high"...
> >>
> > Unfortunately I don't have a 64 bit version of Windows to test
> with.  Vista(32 bit) fails to even boot when docked, hot-plugging
> fails to allocate resources, but at least doesn't crash.
> >
> > From what I've read about the (64 bit) Windows allocation stragegy
> it's closer to Yinghai's method than the Linux default, preferring 64
> bit resources (>4G) when possible.  I'll try to find the specification
> document again.
> 
> 
> Here's the host bridge info from the BIOS (from
> https://bugzilla.kernel.org/show_bug.cgi?id=10461 attachment
> https://bugzilla.kernel.org/attachment.cgi?id=72869):
> 
> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7]
> pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff]
> pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
> pci_root PNP0A03:00: host bridge window [mem 0x000d0000-0x000dffff]
> pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xf7ffffff]
> pci_root PNP0A03:00: host bridge window [mem 0xfc000000-0xfebfffff]
> pci_root PNP0A03:00: host bridge window [mem 0xfec10000-0xfecfffff]
> pci_root PNP0A03:00: host bridge window [mem 0xfed1c000-0xfed1ffff]
> pci_root PNP0A03:00: host bridge window [mem 0xfed90000-0xfed9ffff]
> pci_root PNP0A03:00: host bridge window [mem 0xfed40000-0xfed44fff]
> pci_root PNP0A03:00: host bridge window [mem 0xfeda7000-0xfedfffff]
> pci_root PNP0A03:00: host bridge window [mem 0xfee10000-0xff9fffff]
> pci_root PNP0A03:00: host bridge window [mem 0xffc00000-0xffdfffff]
> 
> There's no aperture above 4GB.  So I don't think any version of
> Windows will ever assign a BAR above 4GB.
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

Hope have any help...

Here have a document from MSDN talk about the pci allocate strategy on
Windows server 2003, XP and vista:
	http://msdn.microsoft.com/en-us/library/windows/hardware/gg462986.aspx


Per page 4, looks Microsoft have different strategy on different Windows
version

On XP and server 2003: First, they ignored BIOS's boot configuration and
allocate below 4G. If fail, then try to allocate above 4GB.

On Vista: it always respects the boot configuration of devices above 4
GB.

But, this document didn't cover the behavior on Windows 7, not sure it's
the same with Vista.


Thanks

Joey Lee



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 4ce5ef2..075e5b1 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -121,14 +121,16 @@  pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
 {
 	int i, ret = -ENOMEM;
 	struct resource *r;
-	resource_size_t max = -1;
+	resource_size_t start = 0;
+	resource_size_t end = MAX_RESOURCE;
 
 	type_mask |= IORESOURCE_IO | IORESOURCE_MEM;
 
-	/* don't allocate too high if the pref mem doesn't support 64bit*/
-	if (!(res->flags & IORESOURCE_MEM_64))
-		max = PCIBIOS_MAX_MEM_32;
+	/* If this is a 64-bit mem resource, try above 4GB first */
+	if (res->flags & IORESOURCE_MEM_64)
+		start = (resource_size_t) (1ULL << 32);
 
+again:
 	pci_bus_for_each_resource(bus, r, i) {
 		if (!r)
 			continue;
@@ -145,12 +147,18 @@  pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
 
 		/* Ok, try it out.. */
 		ret = allocate_resource(r, res, size,
-					r->start ? : min,
-					max, align,
+					max(start, r->start ? : min),
+					end, align,
 					alignf, alignf_data);
 		if (ret == 0)
-			break;
+			return 0;
+	}
+
+	if (start != 0) {
+		start = 0;
+		goto again;
 	}
+
 	return ret;
 }