diff mbox

[RFC] sparc64: Meaning of /sys/**/core_siblings on newer platforms.

Message ID 029d528e-27bf-b55c-5dfb-335d202dc1ce@oracle.com
State Superseded
Delegated to: David Miller
Headers show

Commit Message

chris hyser June 6, 2016, 10:23 p.m. UTC
Before SPARC T7, the notion of core_siblings was both those CPUs that share a
common highest level cache and the set of CPUs within a particular socket
(share same package_id). This was also true on older x86 CPUs and perhaps most
recent though my knowledge of x86 is dated.

The idea of same package_id is stated in Documentation/cputopology.txt and
programs such as lscpu have relied upon this to find the number of sockets by
counting the number of unique core_siblings_list entries. I suspect the reliance
on that algorithm predates the ability to read package IDs directly which is
simpler, more straightforward and preserves the platform assigned package ID
versus an ID that is just an incremented index based on order of discovery.

The idea that it needs to represent shared common highest level cache comes
from irqbalance, an important run-time performance enhancing daemon.

irqbalance uses the following hierarchy of locality goodness:

          - shared common core (thread_siblings)
          - shared common cache (core_siblings)
          - shared common socket (CPUs with same physical_package_id)
          - shared common node (CPUS in same node)

This layout perfectly describes the T7 and interestingly suggests that there are
one or more other architectures that have reached the point where enough cores
can be jammed into the same package that a shared high level cache is either not
desirable or not worth the real estate/effort. Said differently, socket in the
future will likely become less synonymous with shared cache and instead more
synonymous with node. I'm still digging to see if and what those architectures
are.

The issue is that on newer SPARC HW both definitions can no longer be true and
that choosing one versus the other will break differing sets of code. This can
be illustrated as a choice between an unmodified lscpu spitting out nonsensical
answers (although it currently can do that for different unrelated reasons) or
an unmodified irqbalance incorrectly making cache-thrashing decisions. The
number of important programs in each class is unknown, but either way some
things will have to be fixed. As I believe the whole point of large SPARC
servers is performance and the goal of the people on the SPARC mailing list is
to maximize SPARC linux performance, I would argue for not breaking what I
would call the performance class of programs versus the topology description
class.

Rationale:

- performance class breakage is harder to diagnose as it results in lost
performance and tracing back to root cause is incredibly difficult. Topology
description programs on the other hand spit out easily identified nonsense
and can be modified in a manner that is actually more straight forward than
the current algorithm while preserving architecturally neutral functional
correctness (i.e. not hacks/workarounds)

Attached is a working sparc64 patch for redefinition in favor of "shared
highest level cache" (not intended in its current form for actual upstream
submission but to clarify the proposal and allow actual testing). I'm seeking
feedback on how to proceed here to prevent wasted effort fixing the wrong set
of user land programs and related in-progress patches for SPARC sysfs.

Example results of patch:

Before:
          [root@ca-sparc30 topology]# cat core_siblings_list
          32-63,128-223

After:
          [root@ca-sparc30 topology]# cat core_siblings_list
          32-63

the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Julian Calaby June 7, 2016, 12:14 a.m. UTC | #1
Hi Chris,

On Tue, Jun 7, 2016 at 8:23 AM, chris hyser <chris.hyser@oracle.com> wrote:
> Before SPARC T7, the notion of core_siblings was both those CPUs that share
> a
> common highest level cache and the set of CPUs within a particular socket
> (share same package_id). This was also true on older x86 CPUs and perhaps
> most
> recent though my knowledge of x86 is dated.
>
> The idea of same package_id is stated in Documentation/cputopology.txt and
> programs such as lscpu have relied upon this to find the number of sockets
> by
> counting the number of unique core_siblings_list entries. I suspect the
> reliance
> on that algorithm predates the ability to read package IDs directly which is
> simpler, more straightforward and preserves the platform assigned package ID
> versus an ID that is just an incremented index based on order of discovery.
>
> The idea that it needs to represent shared common highest level cache comes
> from irqbalance, an important run-time performance enhancing daemon.
>
> irqbalance uses the following hierarchy of locality goodness:
>
>          - shared common core (thread_siblings)
>          - shared common cache (core_siblings)
>          - shared common socket (CPUs with same physical_package_id)
>          - shared common node (CPUS in same node)
>
> This layout perfectly describes the T7 and interestingly suggests that there
> are
> one or more other architectures that have reached the point where enough
> cores
> can be jammed into the same package that a shared high level cache is either
> not
> desirable or not worth the real estate/effort. Said differently, socket in
> the
> future will likely become less synonymous with shared cache and instead more
> synonymous with node. I'm still digging to see if and what those
> architectures
> are.
>
> The issue is that on newer SPARC HW both definitions can no longer be true
> and
> that choosing one versus the other will break differing sets of code. This
> can
> be illustrated as a choice between an unmodified lscpu spitting out
> nonsensical
> answers (although it currently can do that for different unrelated reasons)
> or
> an unmodified irqbalance incorrectly making cache-thrashing decisions. The
> number of important programs in each class is unknown, but either way some
> things will have to be fixed. As I believe the whole point of large SPARC
> servers is performance and the goal of the people on the SPARC mailing list
> is
> to maximize SPARC linux performance, I would argue for not breaking what I
> would call the performance class of programs versus the topology description
> class.
>
> Rationale:
>
> - performance class breakage is harder to diagnose as it results in lost
> performance and tracing back to root cause is incredibly difficult. Topology
> description programs on the other hand spit out easily identified nonsense
> and can be modified in a manner that is actually more straight forward than
> the current algorithm while preserving architecturally neutral functional
> correctness (i.e. not hacks/workarounds)
>
> Attached is a working sparc64 patch for redefinition in favor of "shared
> highest level cache" (not intended in its current form for actual upstream
> submission but to clarify the proposal and allow actual testing). I'm
> seeking
> feedback on how to proceed here to prevent wasted effort fixing the wrong
> set
> of user land programs and related in-progress patches for SPARC sysfs.
>
> Example results of patch:
>
> Before:
>          [root@ca-sparc30 topology]# cat core_siblings_list
>          32-63,128-223
>
> After:
>          [root@ca-sparc30 topology]# cat core_siblings_list
>          32-63
>
> diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c
> index 1122886..e1b3893 100644
> --- a/arch/sparc/kernel/mdesc.c
> +++ b/arch/sparc/kernel/mdesc.c
>   @@ -597,20 +598,21 @@ static void fill_in_one_cache(cpuinfo_sparc *c,
> struct mdesc_handle *hp, u64 mp)
>                 c->ecache_line_size = *line_size;
>                 break;
>   +     case 3:

Is your patch mangled?

> +               c->l3_cache_size = *size;
> +               c->l3_cache_line_size = *line_size;
> +               break;
> +
>         default:
>                 break;
>         }
>   -     if (*level == 1) {
> -               u64 a;
> -
> -               mdesc_for_each_arc(a, hp, mp, MDESC_ARC_TYPE_FWD) {
> -                       u64 target = mdesc_arc_target(hp, a);
> -                       const char *name = mdesc_node_name(hp, target);
> +       mdesc_for_each_arc(a, hp, mp, MDESC_ARC_TYPE_FWD) {
> +               u64 target = mdesc_arc_target(hp, a);
> +               const char *name = mdesc_node_name(hp, target);
>   -                     if (!strcmp(name, "cache"))
> -                               fill_in_one_cache(c, hp, target);
> -               }
> +               if (!strcmp(name, "cache"))
> +                       fill_in_one_cache(c, hp, target);
>         }
>   }
>   @@ -645,13 +647,19 @@ static void __mark_core_id(struct mdesc_handle *hp,
> u64 node,
>                 cpu_data(*id).core_id = core_id;
>   }
>   -static void __mark_sock_id(struct mdesc_handle *hp, u64 node,
> -                          int sock_id)
> +static void __mark_max_cache_id(struct mdesc_handle *hp, u64 node,
> +                               int max_cache_id)
>   {
>         const u64 *id = mdesc_get_property(hp, node, "id", NULL);
>   -     if (*id < num_possible_cpus())
> -               cpu_data(*id).sock_id = sock_id;
> +       if (*id < num_possible_cpus()) {
> +               cpu_data(*id).max_cache_id = max_cache_id;
> +
> +               /* On systems without explicit socket descriptions, socket
> +                * is max_cache_id
> +                */
> +               cpu_data(*id).sock_id = max_cache_id;
> +       }
>   }
>    static void mark_core_ids(struct mdesc_handle *hp, u64 mp,
> @@ -660,10 +668,11 @@ static void mark_core_ids(struct mdesc_handle *hp, u64
> mp,
>         find_back_node_value(hp, mp, "cpu", __mark_core_id, core_id, 10);
>   }
>   -static void mark_sock_ids(struct mdesc_handle *hp, u64 mp,
> -                         int sock_id)
> +static void mark_max_cache_ids(struct mdesc_handle *hp, u64 mp,
> +                              int max_cache_id)
>   {
> -       find_back_node_value(hp, mp, "cpu", __mark_sock_id, sock_id, 10);
> +       find_back_node_value(hp, mp, "cpu", __mark_max_cache_id,
> +                            max_cache_id, 10);
>   }
>    static void set_core_ids(struct mdesc_handle *hp)
> @@ -694,14 +703,15 @@ static void set_core_ids(struct mdesc_handle *hp)
>         }
>   }
>   -static int set_sock_ids_by_cache(struct mdesc_handle *hp, int level)
> +static int set_max_cache_ids_by_cache(struct mdesc_handle *hp,
> +                                     int level)
>   {
>         u64 mp;
>         int idx = 1;
>         int fnd = 0;
>   -     /* Identify unique sockets by looking for cpus backpointed to by
> -        * shared level n caches.
> +       /* Identify unique highest level of shared cache by looking for cpus
> +        * backpointed to by shared level N caches.
>          */
>         mdesc_for_each_node_by_name(hp, mp, "cache") {
>                 const u64 *cur_lvl;
> @@ -710,7 +720,7 @@ static int set_sock_ids_by_cache(struct mdesc_handle
> *hp, int level)
>                 if (*cur_lvl != level)
>                         continue;
>   -             mark_sock_ids(hp, mp, idx);
> +               mark_max_cache_ids(hp, mp, idx);
>                 idx++;
>                 fnd = 1;
>         }
> @@ -745,15 +755,17 @@ static void set_sock_ids(struct mdesc_handle *hp)
>   {
>         u64 mp;
>   +     /* Find the highest level of shared cache which on pre-T7 is also
> +        * the socket.
> +        */
> +       if (!set_max_cache_ids_by_cache(hp, 3))
> +               set_max_cache_ids_by_cache(hp, 2);
> +
>         /* If machine description exposes sockets data use it.
> -        * Otherwise fallback to use shared L3 or L2 caches.
>          */
>         mp = mdesc_node_by_name(hp, MDESC_NODE_NULL, "sockets");
>         if (mp != MDESC_NODE_NULL)
> -               return set_sock_ids_by_socket(hp, mp);
> -
> -       if (!set_sock_ids_by_cache(hp, 3))
> -               set_sock_ids_by_cache(hp, 2);
> +               set_sock_ids_by_socket(hp, mp);
>   }
>    static void mark_proc_ids(struct mdesc_handle *hp, u64 mp, int proc_id)
> diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
> index 8a6151a..bbe27a4 100644
> --- a/arch/sparc/kernel/smp_64.c
> +++ b/arch/sparc/kernel/smp_64.c
> @@ -1250,8 +1250,11 @@ void smp_fill_in_sib_core_maps(void)
>         for_each_present_cpu(i)  {
>                 unsigned int j;
>   +             cpumask_clear(&cpu_core_sib_map[i]);
> +
>                 for_each_present_cpu(j)  {
> -                       if (cpu_data(i).sock_id == cpu_data(j).sock_id)
> +                       if (cpu_data(i).max_cache_id ==
> +                           cpu_data(j).max_cache_id)
>                                 cpumask_set_cpu(j, &cpu_core_sib_map[i]);
>                 }
>         }
> --
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
chris hyser June 7, 2016, 2:11 a.m. UTC | #2
On 6/6/2016 8:14 PM, Julian Calaby wrote:
> Hi Chris,
>
> On Tue, Jun 7, 2016 at 8:23 AM, chris hyser <chris.hyser@oracle.com> wrote:
>> Before SPARC T7, the notion of core_siblings was both those CPUs that share
>> a
>> common highest level cache and the set of CPUs within a particular socket
>> (share same package_id). This was also true on older x86 CPUs and perhaps
>> most
>> recent though my knowledge of x86 is dated.
>>
>> The idea of same package_id is stated in Documentation/cputopology.txt and
>> programs such as lscpu have relied upon this to find the number of sockets
>> by
>> counting the number of unique core_siblings_list entries. I suspect the
>> reliance
>> on that algorithm predates the ability to read package IDs directly which is
>> simpler, more straightforward and preserves the platform assigned package ID
>> versus an ID that is just an incremented index based on order of discovery.
>>
>> The idea that it needs to represent shared common highest level cache comes
>> from irqbalance, an important run-time performance enhancing daemon.
>>
>> irqbalance uses the following hierarchy of locality goodness:
>>
>>          - shared common core (thread_siblings)
>>          - shared common cache (core_siblings)
>>          - shared common socket (CPUs with same physical_package_id)
>>          - shared common node (CPUS in same node)
>>
>> This layout perfectly describes the T7 and interestingly suggests that there
>> are
>> one or more other architectures that have reached the point where enough
>> cores
>> can be jammed into the same package that a shared high level cache is either
>> not
>> desirable or not worth the real estate/effort. Said differently, socket in
>> the
>> future will likely become less synonymous with shared cache and instead more
>> synonymous with node. I'm still digging to see if and what those
>> architectures
>> are.
>>
>> The issue is that on newer SPARC HW both definitions can no longer be true
>> and
>> that choosing one versus the other will break differing sets of code. This
>> can
>> be illustrated as a choice between an unmodified lscpu spitting out
>> nonsensical
>> answers (although it currently can do that for different unrelated reasons)
>> or
>> an unmodified irqbalance incorrectly making cache-thrashing decisions. The
>> number of important programs in each class is unknown, but either way some
>> things will have to be fixed. As I believe the whole point of large SPARC
>> servers is performance and the goal of the people on the SPARC mailing list
>> is
>> to maximize SPARC linux performance, I would argue for not breaking what I
>> would call the performance class of programs versus the topology description
>> class.
>>
>> Rationale:
>>
>> - performance class breakage is harder to diagnose as it results in lost
>> performance and tracing back to root cause is incredibly difficult. Topology
>> description programs on the other hand spit out easily identified nonsense
>> and can be modified in a manner that is actually more straight forward than
>> the current algorithm while preserving architecturally neutral functional
>> correctness (i.e. not hacks/workarounds)
>>
>> Attached is a working sparc64 patch for redefinition in favor of "shared
>> highest level cache" (not intended in its current form for actual upstream
>> submission but to clarify the proposal and allow actual testing). I'm
>> seeking
>> feedback on how to proceed here to prevent wasted effort fixing the wrong
>> set
>> of user land programs and related in-progress patches for SPARC sysfs.
>>
>> Example results of patch:
>>
>> Before:
>>          [root@ca-sparc30 topology]# cat core_siblings_list
>>          32-63,128-223
>>
>> After:
>>          [root@ca-sparc30 topology]# cat core_siblings_list
>>          32-63
>>
>> diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c
>> index 1122886..e1b3893 100644
>> --- a/arch/sparc/kernel/mdesc.c
>> +++ b/arch/sparc/kernel/mdesc.c
>>   @@ -597,20 +598,21 @@ static void fill_in_one_cache(cpuinfo_sparc *c,
>> struct mdesc_handle *hp, u64 mp)
>>                 c->ecache_line_size = *line_size;
>>                 break;
>>   +     case 3:
>
> Is your patch mangled?

Apparently. I had to do this is in a weird way. I tried to be extra careful. Let me try again. Apologies.

>
>> +               c->l3_cache_size = *size;
>> +               c->l3_cache_line_size = *line_size;
>> +               break;
>> +
>>         default:
>>                 break;
>>         }
>>   -     if (*level == 1) {
>> -               u64 a;
>> -
>> -               mdesc_for_each_arc(a, hp, mp, MDESC_ARC_TYPE_FWD) {
>> -                       u64 target = mdesc_arc_target(hp, a);
>> -                       const char *name = mdesc_node_name(hp, target);
>> +       mdesc_for_each_arc(a, hp, mp, MDESC_ARC_TYPE_FWD) {
>> +               u64 target = mdesc_arc_target(hp, a);
>> +               const char *name = mdesc_node_name(hp, target);
>>   -                     if (!strcmp(name, "cache"))
>> -                               fill_in_one_cache(c, hp, target);
>> -               }
>> +               if (!strcmp(name, "cache"))
>> +                       fill_in_one_cache(c, hp, target);
>>         }
>>   }
>>   @@ -645,13 +647,19 @@ static void __mark_core_id(struct mdesc_handle *hp,
>> u64 node,
>>                 cpu_data(*id).core_id = core_id;
>>   }
>>   -static void __mark_sock_id(struct mdesc_handle *hp, u64 node,
>> -                          int sock_id)
>> +static void __mark_max_cache_id(struct mdesc_handle *hp, u64 node,
>> +                               int max_cache_id)
>>   {
>>         const u64 *id = mdesc_get_property(hp, node, "id", NULL);
>>   -     if (*id < num_possible_cpus())
>> -               cpu_data(*id).sock_id = sock_id;
>> +       if (*id < num_possible_cpus()) {
>> +               cpu_data(*id).max_cache_id = max_cache_id;
>> +
>> +               /* On systems without explicit socket descriptions, socket
>> +                * is max_cache_id
>> +                */
>> +               cpu_data(*id).sock_id = max_cache_id;
>> +       }
>>   }
>>    static void mark_core_ids(struct mdesc_handle *hp, u64 mp,
>> @@ -660,10 +668,11 @@ static void mark_core_ids(struct mdesc_handle *hp, u64
>> mp,
>>         find_back_node_value(hp, mp, "cpu", __mark_core_id, core_id, 10);
>>   }
>>   -static void mark_sock_ids(struct mdesc_handle *hp, u64 mp,
>> -                         int sock_id)
>> +static void mark_max_cache_ids(struct mdesc_handle *hp, u64 mp,
>> +                              int max_cache_id)
>>   {
>> -       find_back_node_value(hp, mp, "cpu", __mark_sock_id, sock_id, 10);
>> +       find_back_node_value(hp, mp, "cpu", __mark_max_cache_id,
>> +                            max_cache_id, 10);
>>   }
>>    static void set_core_ids(struct mdesc_handle *hp)
>> @@ -694,14 +703,15 @@ static void set_core_ids(struct mdesc_handle *hp)
>>         }
>>   }
>>   -static int set_sock_ids_by_cache(struct mdesc_handle *hp, int level)
>> +static int set_max_cache_ids_by_cache(struct mdesc_handle *hp,
>> +                                     int level)
>>   {
>>         u64 mp;
>>         int idx = 1;
>>         int fnd = 0;
>>   -     /* Identify unique sockets by looking for cpus backpointed to by
>> -        * shared level n caches.
>> +       /* Identify unique highest level of shared cache by looking for cpus
>> +        * backpointed to by shared level N caches.
>>          */
>>         mdesc_for_each_node_by_name(hp, mp, "cache") {
>>                 const u64 *cur_lvl;
>> @@ -710,7 +720,7 @@ static int set_sock_ids_by_cache(struct mdesc_handle
>> *hp, int level)
>>                 if (*cur_lvl != level)
>>                         continue;
>>   -             mark_sock_ids(hp, mp, idx);
>> +               mark_max_cache_ids(hp, mp, idx);
>>                 idx++;
>>                 fnd = 1;
>>         }
>> @@ -745,15 +755,17 @@ static void set_sock_ids(struct mdesc_handle *hp)
>>   {
>>         u64 mp;
>>   +     /* Find the highest level of shared cache which on pre-T7 is also
>> +        * the socket.
>> +        */
>> +       if (!set_max_cache_ids_by_cache(hp, 3))
>> +               set_max_cache_ids_by_cache(hp, 2);
>> +
>>         /* If machine description exposes sockets data use it.
>> -        * Otherwise fallback to use shared L3 or L2 caches.
>>          */
>>         mp = mdesc_node_by_name(hp, MDESC_NODE_NULL, "sockets");
>>         if (mp != MDESC_NODE_NULL)
>> -               return set_sock_ids_by_socket(hp, mp);
>> -
>> -       if (!set_sock_ids_by_cache(hp, 3))
>> -               set_sock_ids_by_cache(hp, 2);
>> +               set_sock_ids_by_socket(hp, mp);
>>   }
>>    static void mark_proc_ids(struct mdesc_handle *hp, u64 mp, int proc_id)
>> diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
>> index 8a6151a..bbe27a4 100644
>> --- a/arch/sparc/kernel/smp_64.c
>> +++ b/arch/sparc/kernel/smp_64.c
>> @@ -1250,8 +1250,11 @@ void smp_fill_in_sib_core_maps(void)
>>         for_each_present_cpu(i)  {
>>                 unsigned int j;
>>   +             cpumask_clear(&cpu_core_sib_map[i]);
>> +
>>                 for_each_present_cpu(j)  {
>> -                       if (cpu_data(i).sock_id == cpu_data(j).sock_id)
>> +                       if (cpu_data(i).max_cache_id ==
>> +                           cpu_data(j).max_cache_id)
>>                                 cpumask_set_cpu(j, &cpu_core_sib_map[i]);
>>                 }
>>         }
>> --
>> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Calaby June 7, 2016, 3:52 a.m. UTC | #3
Hi All,

Resending without HTML. Thanks Gmail Android!

On Tue, Jun 7, 2016 at 1:07 PM, Julian Calaby <julian.calaby@gmail.com> wrote:
> Hi Chris,
>
> On 7 Jun 2016 12:11, "chris hyser" <chris.hyser@oracle.com> wrote:
>>
>>
>>
>> On 6/6/2016 8:14 PM, Julian Calaby wrote:
>>>
>>> Hi Chris,
>>>
>>> On Tue, Jun 7, 2016 at 8:23 AM, chris hyser <chris.hyser@oracle.com>
>>> wrote:
>>>>
>>>> Before SPARC T7, the notion of core_siblings was both those CPUs that
>>>> share
>>>> a
>>>> common highest level cache and the set of CPUs within a particular
>>>> socket
>>>> (share same package_id). This was also true on older x86 CPUs and
>>>> perhaps
>>>> most
>>>> recent though my knowledge of x86 is dated.
>>>>
>>>> The idea of same package_id is stated in Documentation/cputopology.txt
>>>> and
>>>> programs such as lscpu have relied upon this to find the number of
>>>> sockets
>>>> by
>>>> counting the number of unique core_siblings_list entries. I suspect the
>>>> reliance
>>>> on that algorithm predates the ability to read package IDs directly
>>>> which is
>>>> simpler, more straightforward and preserves the platform assigned
>>>> package ID
>>>> versus an ID that is just an incremented index based on order of
>>>> discovery.
>>>>
>>>> The idea that it needs to represent shared common highest level cache
>>>> comes
>>>> from irqbalance, an important run-time performance enhancing daemon.
>>>>
>>>> irqbalance uses the following hierarchy of locality goodness:
>>>>
>>>>          - shared common core (thread_siblings)
>>>>          - shared common cache (core_siblings)
>>>>          - shared common socket (CPUs with same physical_package_id)
>>>>          - shared common node (CPUS in same node)
>>>>
>>>> This layout perfectly describes the T7 and interestingly suggests that
>>>> there
>>>> are
>>>> one or more other architectures that have reached the point where enough
>>>> cores
>>>> can be jammed into the same package that a shared high level cache is
>>>> either
>>>> not
>>>> desirable or not worth the real estate/effort. Said differently, socket
>>>> in
>>>> the
>>>> future will likely become less synonymous with shared cache and instead
>>>> more
>>>> synonymous with node. I'm still digging to see if and what those
>>>> architectures
>>>> are.
>>>>
>>>> The issue is that on newer SPARC HW both definitions can no longer be
>>>> true
>>>> and
>>>> that choosing one versus the other will break differing sets of code.
>>>> This
>>>> can
>>>> be illustrated as a choice between an unmodified lscpu spitting out
>>>> nonsensical
>>>> answers (although it currently can do that for different unrelated
>>>> reasons)
>>>> or
>>>> an unmodified irqbalance incorrectly making cache-thrashing decisions.
>>>> The
>>>> number of important programs in each class is unknown, but either way
>>>> some
>>>> things will have to be fixed. As I believe the whole point of large
>>>> SPARC
>>>> servers is performance and the goal of the people on the SPARC mailing
>>>> list
>>>> is
>>>> to maximize SPARC linux performance, I would argue for not breaking what
>>>> I
>>>> would call the performance class of programs versus the topology
>>>> description
>>>> class.
>>>>
>>>> Rationale:
>>>>
>>>> - performance class breakage is harder to diagnose as it results in lost
>>>> performance and tracing back to root cause is incredibly difficult.
>>>> Topology
>>>> description programs on the other hand spit out easily identified
>>>> nonsense
>>>> and can be modified in a manner that is actually more straight forward
>>>> than
>>>> the current algorithm while preserving architecturally neutral
>>>> functional
>>>> correctness (i.e. not hacks/workarounds)
>>>>
>>>> Attached is a working sparc64 patch for redefinition in favor of "shared
>>>> highest level cache" (not intended in its current form for actual
>>>> upstream
>>>> submission but to clarify the proposal and allow actual testing). I'm
>>>> seeking
>>>> feedback on how to proceed here to prevent wasted effort fixing the
>>>> wrong
>>>> set
>>>> of user land programs and related in-progress patches for SPARC sysfs.
>>>>
>>>> Example results of patch:
>>>>
>>>> Before:
>>>>          [root@ca-sparc30 topology]# cat core_siblings_list
>>>>          32-63,128-223
>>>>
>>>> After:
>>>>          [root@ca-sparc30 topology]# cat core_siblings_list
>>>>          32-63
>>>>
>>>> diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c
>>>> index 1122886..e1b3893 100644
>>>> --- a/arch/sparc/kernel/mdesc.c
>>>> +++ b/arch/sparc/kernel/mdesc.c
>>>>   @@ -597,20 +598,21 @@ static void fill_in_one_cache(cpuinfo_sparc *c,
>>>> struct mdesc_handle *hp, u64 mp)
>>>>                 c->ecache_line_size = *line_size;
>>>>                 break;
>>>>   +     case 3:
>>>
>>>
>>> Is your patch mangled?
>>
>>
>> Apparently. I had to do this is in a weird way. I tried to be extra
>> careful. Let me try again. Apologies.
>
> You should be using git-send-email unless there's something weird about your
> email setup.
>
> There's documentation on how to set up most common email clients in the
> Documentation directory of the kernel tree.
>
> Thanks,
>
> Julian Calaby
diff mbox

Patch

diff --git a/arch/sparc/include/asm/cpudata_64.h b/arch/sparc/include/asm/cpudata_64.h
index a6cfdab..2b4e384 100644
--- a/arch/sparc/include/asm/cpudata_64.h
+++ b/arch/sparc/include/asm/cpudata_64.h
@@ -19,14 +19,19 @@  typedef struct {
    	/* Dcache line 2, rarely used */
   	unsigned int	dcache_size;
-	unsigned int	dcache_line_size;
   	unsigned int	icache_size;
-	unsigned int	icache_line_size;
   	unsigned int	ecache_size;
-	unsigned int	ecache_line_size;
-	unsigned short	sock_id;
+	unsigned int	l3_cache_size;
+
+	unsigned short	icache_line_size;
+	unsigned short	dcache_line_size;
+	unsigned short	ecache_line_size;
+	unsigned short	l3_cache_line_size;
+
+	unsigned short	sock_id;	/* physical package */
   	unsigned short	core_id;
-	int		proc_id;
+	unsigned short	max_cache_id;	/* groupings of highest shared cache */
+	unsigned short	proc_id;	/* strand (aka HW thread) id */
   } cpuinfo_sparc;
    DECLARE_PER_CPU(cpuinfo_sparc, __cpu_data);
diff --git a/arch/sparc/include/asm/topology_64.h b/arch/sparc/include/asm/topology_64.h
index bec481a..6f98d4e 100644
--- a/arch/sparc/include/asm/topology_64.h
+++ b/arch/sparc/include/asm/topology_64.h
@@ -41,7 +41,7 @@  int __node_distance(int, int);
   #endif /* !(CONFIG_NUMA) */
    #ifdef CONFIG_SMP
-#define topology_physical_package_id(cpu)	(cpu_data(cpu).proc_id)
+#define topology_physical_package_id(cpu)	(cpu_data(cpu).sock_id)
   #define topology_core_id(cpu)			(cpu_data(cpu).core_id)
   #define topology_core_cpumask(cpu)		(&cpu_core_sib_map[cpu])
   #define topology_sibling_cpumask(cpu)		(&per_cpu(cpu_sibling_map, cpu))
diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c
index 1122886..e1b3893 100644
--- a/arch/sparc/kernel/mdesc.c
+++ b/arch/sparc/kernel/mdesc.c
@@ -578,6 +578,7 @@  static void fill_in_one_cache(cpuinfo_sparc *c, struct mdesc_handle *hp, u64 mp)
   	const u64 *line_size = mdesc_get_property(hp, mp, "line-size", NULL);
   	const char *type;
   	int type_len;
+	u64 a;
    	type = mdesc_get_property(hp, mp, "type", &type_len);
   @@ -597,20 +598,21 @@ static void fill_in_one_cache(cpuinfo_sparc *c, struct mdesc_handle *hp, u64 mp)
   		c->ecache_line_size = *line_size;
   		break;
   +	case 3:
+		c->l3_cache_size = *size;
+		c->l3_cache_line_size = *line_size;
+		break;
+
   	default:
   		break;
   	}
   -	if (*level == 1) {
-		u64 a;
-
-		mdesc_for_each_arc(a, hp, mp, MDESC_ARC_TYPE_FWD) {
-			u64 target = mdesc_arc_target(hp, a);
-			const char *name = mdesc_node_name(hp, target);
+	mdesc_for_each_arc(a, hp, mp, MDESC_ARC_TYPE_FWD) {
+		u64 target = mdesc_arc_target(hp, a);
+		const char *name = mdesc_node_name(hp, target);
   -			if (!strcmp(name, "cache"))
-				fill_in_one_cache(c, hp, target);
-		}
+		if (!strcmp(name, "cache"))
+			fill_in_one_cache(c, hp, target);
   	}
   }
   @@ -645,13 +647,19 @@ static void __mark_core_id(struct mdesc_handle *hp, u64 node,
   		cpu_data(*id).core_id = core_id;
   }
   -static void __mark_sock_id(struct mdesc_handle *hp, u64 node,
-			   int sock_id)
+static void __mark_max_cache_id(struct mdesc_handle *hp, u64 node,
+				int max_cache_id)
   {
   	const u64 *id = mdesc_get_property(hp, node, "id", NULL);
   -	if (*id < num_possible_cpus())
-		cpu_data(*id).sock_id = sock_id;
+	if (*id < num_possible_cpus()) {
+		cpu_data(*id).max_cache_id = max_cache_id;
+
+		/* On systems without explicit socket descriptions, socket
+		 * is max_cache_id
+		 */
+		cpu_data(*id).sock_id = max_cache_id;
+	}
   }
    static void mark_core_ids(struct mdesc_handle *hp, u64 mp,
@@ -660,10 +668,11 @@  static void mark_core_ids(struct mdesc_handle *hp, u64 mp,
   	find_back_node_value(hp, mp, "cpu", __mark_core_id, core_id, 10);
   }
   -static void mark_sock_ids(struct mdesc_handle *hp, u64 mp,
-			  int sock_id)
+static void mark_max_cache_ids(struct mdesc_handle *hp, u64 mp,
+			       int max_cache_id)
   {
-	find_back_node_value(hp, mp, "cpu", __mark_sock_id, sock_id, 10);
+	find_back_node_value(hp, mp, "cpu", __mark_max_cache_id,
+			     max_cache_id, 10);
   }
    static void set_core_ids(struct mdesc_handle *hp)
@@ -694,14 +703,15 @@  static void set_core_ids(struct mdesc_handle *hp)
   	}
   }
   -static int set_sock_ids_by_cache(struct mdesc_handle *hp, int level)
+static int set_max_cache_ids_by_cache(struct mdesc_handle *hp,
+				      int level)
   {
   	u64 mp;
   	int idx = 1;
   	int fnd = 0;
   -	/* Identify unique sockets by looking for cpus backpointed to by
-	 * shared level n caches.
+	/* Identify unique highest level of shared cache by looking for cpus
+	 * backpointed to by shared level N caches.
   	 */
   	mdesc_for_each_node_by_name(hp, mp, "cache") {
   		const u64 *cur_lvl;
@@ -710,7 +720,7 @@  static int set_sock_ids_by_cache(struct mdesc_handle *hp, int level)
   		if (*cur_lvl != level)
   			continue;
   -		mark_sock_ids(hp, mp, idx);
+		mark_max_cache_ids(hp, mp, idx);
   		idx++;
   		fnd = 1;
   	}
@@ -745,15 +755,17 @@  static void set_sock_ids(struct mdesc_handle *hp)
   {
   	u64 mp;
   +	/* Find the highest level of shared cache which on pre-T7 is also
+	 * the socket.
+	 */
+	if (!set_max_cache_ids_by_cache(hp, 3))
+		set_max_cache_ids_by_cache(hp, 2);
+
   	/* If machine description exposes sockets data use it.
-	 * Otherwise fallback to use shared L3 or L2 caches.
   	 */
   	mp = mdesc_node_by_name(hp, MDESC_NODE_NULL, "sockets");
   	if (mp != MDESC_NODE_NULL)
-		return set_sock_ids_by_socket(hp, mp);
-
-	if (!set_sock_ids_by_cache(hp, 3))
-		set_sock_ids_by_cache(hp, 2);
+		set_sock_ids_by_socket(hp, mp);
   }
    static void mark_proc_ids(struct mdesc_handle *hp, u64 mp, int proc_id)
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index 8a6151a..bbe27a4 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -1250,8 +1250,11 @@  void smp_fill_in_sib_core_maps(void)
   	for_each_present_cpu(i)  {
   		unsigned int j;
   +		cpumask_clear(&cpu_core_sib_map[i]);
+
   		for_each_present_cpu(j)  {
-			if (cpu_data(i).sock_id == cpu_data(j).sock_id)
+			if (cpu_data(i).max_cache_id ==
+			    cpu_data(j).max_cache_id)
   				cpumask_set_cpu(j, &cpu_core_sib_map[i]);
   		}
   	}
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in