diff mbox

[U-Boot,02/10] arm64: Make full va map code more dynamic

Message ID 56CF2D7B.9090101@suse.de
State Superseded
Headers show

Commit Message

Alexander Graf Feb. 25, 2016, 4:36 p.m. UTC
On 24.02.16 19:14, Stephen Warren wrote:
> On 02/24/2016 05:11 AM, Alexander Graf wrote:
>> The idea to generate our pages tables from an array of memory ranges
>> is very sound. However, instead of hard coding the code to create up
>> to 2 levels of 64k granule page tables, we really should just create
>> normal 4k page tables that allow us to set caching attributes on 2M
>> or 4k level later on.
>>
>> So this patch moves the full_va mapping code to 4k page size and
>> makes it fully flexible to dynamically create as many levels as
>> necessary for a map (including dynamic 1G/2M pages). It also adds
>> support to dynamically split a large map into smaller ones when
>> some code wants to set dcache attributes.
>>
>> With all this in place, there is very little reason to create your
>> own page tables in board specific files.
>>
>> Signed-off-by: Alexander Graf <agraf@suse.de>
> 
>> +/*
>> + * This is a recursively called function to count the number of
>> + * page tables we need to cover a particular PTE range. If you
>> + * call this with level = -1 you basically get the full 48 bit
>> + * coverage.
>> + */
>> +static int count_required_pts(u64 addr, int level, u64 maxaddr)
> 
> I think this looks correct now. Nits below if a respin is needed for
> other reasons.
> 
>> +{
>> +    int levelshift = level2shift(level);
>> +    u64 levelsize = 1ULL << levelshift;
>> +    u64 levelmask = levelsize - 1;
>> +    u64 levelend = addr + levelsize;
>> +    int r = 0;
>> +    int i;
>> +    bool is_level = false;
> 
> I might suggest renaming that is_level_needed. It's not obvious to me
> exactly what the name "is_level" is intended to represent; the name
> seems to represent whether something (unspecified) is a level or not.

We're basically asking the function whether a PTE for address <addr> at
level <level> would be an inval/block/level PTE. is_level marks it as a
level pte.

I could maybe rename this into pte_type and create an enum that is
either PTE_INVAL, PTE_BLOCK or PTE_LEVEL.

> 
>> +
>> +    for (i = 0; i < ARRAY_SIZE(mem_map); i++) {
>> +        struct mm_region *map = &mem_map[i];
>> +        u64 start = map->base;
>> +        u64 end = start + map->size;
>> +
>> +        /* Check if the PTE would overlap with the map */
>> +        if (max(addr, start) <= min(levelend, end)) {
>> +            start = max(addr, start);
>> +            end = min(levelend, end);
>> +
>> +            /* We need a sub-pt for this level */
>> +            if ((start & levelmask) || (end & levelmask)) {
>> +                is_level = true;
>> +                break;
>> +            }
>> +
>> +            /* Lv0 can not do block PTEs, so do levels here too */
>> +            if (level <= 0) {
>> +                is_level = true;
>> +                break;
>> +            }
>> +        }
>> +    }
>> +
>> +    /*
>> +     * Block PTEs at this level are already covered by the parent page
>> +     * table, so we only need to count sub page tables.
>> +     */
>> +    if (is_level) {
>> +        int sublevel = level + 1;
>> +        u64 sublevelsize = 1ULL << level2shift(sublevel);
>> +
>> +        /* Account for the new sub page table ... */
>> +        r = 1;
> 
> "Account for the page table at /this/ level"? This may represent the
> top-level (0/1) page table, not just sub page tables. The sub-level
> accounting is via the recursive call to count_required_pts() I believe.

I think the easiest way to visualize what's going on is if we start with
level -1.

We basically ask the function at level -1 whether a PTE at level -1 (48
bits) would fit into a block PTE at level -1 (so the PTE spans all 48
bits) or whether we need to create a new level entry plus page table for it.

Obviously for all entries this resolves into a "yes, create a level PTE
entry and a new page table to point to". So at level=-1 we account for
the root page table which really is a "sub page table" here.

Then we recursively go through the address space that we cover, checking
whether a page fits into inval/block PTEs or we need to create page
tables for it as well.

So at level=0, we really are checking for PTEs that you'd have at
level=0 (38 bits). If the <addr, level> combination at level=0 results
in "level", we need to create a level=1 page table and walk through that
one again.

I agree that the logic seems backwards, but it's the only thing that I
could come up with that has low memory footprint (required to run from
SRAM).


Alex

Does the diff below make it more obvious? It's slightly more runtime
overhead because we set the PTE_BLOCK case as well, but maintainability
of the code wins vs speed imho :).


 		struct mm_region *map = &mem_map[i];
@@ -264,15 +270,18 @@ static int count_required_pts(u64 addr, int level,
u64 maxaddr)

 			/* We need a sub-pt for this level */
 			if ((start & levelmask) || (end & levelmask)) {
-				is_level = true;
+				pte_type = PTE_LEVEL;
 				break;
 			}

 			/* Lv0 can not do block PTEs, so do levels here too */
 			if (level <= 0) {
-				is_level = true;
+				pte_type = PTE_LEVEL;
 				break;
 			}
+
+			/* PTE is active, but fits into a block */
+			pte_type = PTE_BLOCK;
 		}
 	}

@@ -280,7 +289,7 @@ static int count_required_pts(u64 addr, int level,
u64 maxaddr)
 	 * Block PTEs at this level are already covered by the parent page
 	 * table, so we only need to count sub page tables.
 	 */
-	if (is_level) {
+	if (pte_type == PTE_LEVEL) {
 		int sublevel = level + 1;
 		u64 sublevelsize = 1ULL << level2shift(sublevel);

Comments

Stephen Warren Feb. 26, 2016, 7:25 p.m. UTC | #1
On 02/25/2016 09:36 AM, Alexander Graf wrote:
>
>
> On 24.02.16 19:14, Stephen Warren wrote:
>> On 02/24/2016 05:11 AM, Alexander Graf wrote:
>>> The idea to generate our pages tables from an array of memory ranges
>>> is very sound. However, instead of hard coding the code to create up
>>> to 2 levels of 64k granule page tables, we really should just create
>>> normal 4k page tables that allow us to set caching attributes on 2M
>>> or 4k level later on.
>>>
>>> So this patch moves the full_va mapping code to 4k page size and
>>> makes it fully flexible to dynamically create as many levels as
>>> necessary for a map (including dynamic 1G/2M pages). It also adds
>>> support to dynamically split a large map into smaller ones when
>>> some code wants to set dcache attributes.
>>>
>>> With all this in place, there is very little reason to create your
>>> own page tables in board specific files.
>>>
>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>
>>> +/*
>>> + * This is a recursively called function to count the number of
>>> + * page tables we need to cover a particular PTE range. If you
>>> + * call this with level = -1 you basically get the full 48 bit
>>> + * coverage.
>>> + */
>>> +static int count_required_pts(u64 addr, int level, u64 maxaddr)
>>
>> I think this looks correct now. Nits below if a respin is needed for
>> other reasons.
>>
>>> +{
>>> +    int levelshift = level2shift(level);
>>> +    u64 levelsize = 1ULL << levelshift;
>>> +    u64 levelmask = levelsize - 1;
>>> +    u64 levelend = addr + levelsize;
>>> +    int r = 0;
>>> +    int i;
>>> +    bool is_level = false;
>>
>> I might suggest renaming that is_level_needed. It's not obvious to me
>> exactly what the name "is_level" is intended to represent; the name
>> seems to represent whether something (unspecified) is a level or not.
>
> We're basically asking the function whether a PTE for address <addr> at
> level <level> would be an inval/block/level PTE. is_level marks it as a
> level pte.
>
> I could maybe rename this into pte_type and create an enum that is
> either PTE_INVAL, PTE_BLOCK or PTE_LEVEL.
>
>>
>>> +
>>> +    for (i = 0; i < ARRAY_SIZE(mem_map); i++) {
>>> +        struct mm_region *map = &mem_map[i];
>>> +        u64 start = map->base;
>>> +        u64 end = start + map->size;
>>> +
>>> +        /* Check if the PTE would overlap with the map */
>>> +        if (max(addr, start) <= min(levelend, end)) {
>>> +            start = max(addr, start);
>>> +            end = min(levelend, end);
>>> +
>>> +            /* We need a sub-pt for this level */
>>> +            if ((start & levelmask) || (end & levelmask)) {
>>> +                is_level = true;
>>> +                break;
>>> +            }
>>> +
>>> +            /* Lv0 can not do block PTEs, so do levels here too */
>>> +            if (level <= 0) {
>>> +                is_level = true;
>>> +                break;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    /*
>>> +     * Block PTEs at this level are already covered by the parent page
>>> +     * table, so we only need to count sub page tables.
>>> +     */
>>> +    if (is_level) {
>>> +        int sublevel = level + 1;
>>> +        u64 sublevelsize = 1ULL << level2shift(sublevel);
>>> +
>>> +        /* Account for the new sub page table ... */
>>> +        r = 1;
>>
>> "Account for the page table at /this/ level"? This may represent the
>> top-level (0/1) page table, not just sub page tables. The sub-level
>> accounting is via the recursive call to count_required_pts() I believe.
>
> I think the easiest way to visualize what's going on is if we start with
> level -1.
>
> We basically ask the function at level -1 whether a PTE at level -1 (48
> bits) would fit into a block PTE at level -1 (so the PTE spans all 48
> bits) or whether we need to create a new level entry plus page table for it.

Hmm, I had overlooked that the call to count_required_pts() passed 
"start_level - 1" not "start_level". Now that I see that, your 
explanation makes more sense to me.

It'd be nice if some/all of this explanation were comments in the code 
to aid readers.

That said, I would have expected a slightly more direct calculation; why 
not:

int start_level = 0; // or 1 if small VA space
total_pts = count_required_pts(start_level);
total_pts += 4; // "random" number to account for later splits

int count_required_pts(int level, ...) {
     int npts = 1; // the table at this level
     for each pte slot at this level:
         if a mem_map entry starts/stops within the pte slot:
             npts += count_required_pts(level + 1, ...);
         else:
             // nothing; pte is either a block or inval at this level
     return npts;
}

Still, the current code appears to work find, and can be ammended later 
if we want.

> Obviously for all entries this resolves into a "yes, create a level PTE
> entry and a new page table to point to". So at level=-1 we account for
> the root page table which really is a "sub page table" here.
>
> Then we recursively go through the address space that we cover, checking
> whether a page fits into inval/block PTEs or we need to create page
> tables for it as well.
>
> So at level=0, we really are checking for PTEs that you'd have at
> level=0 (38 bits). If the <addr, level> combination at level=0 results
> in "level", we need to create a level=1 page table and walk through that
> one again.
>
> I agree that the logic seems backwards, but it's the only thing that I
> could come up with that has low memory footprint (required to run from
> SRAM).
Alexander Graf Feb. 27, 2016, 12:09 p.m. UTC | #2
On 26.02.16 20:25, Stephen Warren wrote:
> On 02/25/2016 09:36 AM, Alexander Graf wrote:
>>
>>
>> On 24.02.16 19:14, Stephen Warren wrote:
>>> On 02/24/2016 05:11 AM, Alexander Graf wrote:
>>>> The idea to generate our pages tables from an array of memory ranges
>>>> is very sound. However, instead of hard coding the code to create up
>>>> to 2 levels of 64k granule page tables, we really should just create
>>>> normal 4k page tables that allow us to set caching attributes on 2M
>>>> or 4k level later on.
>>>>
>>>> So this patch moves the full_va mapping code to 4k page size and
>>>> makes it fully flexible to dynamically create as many levels as
>>>> necessary for a map (including dynamic 1G/2M pages). It also adds
>>>> support to dynamically split a large map into smaller ones when
>>>> some code wants to set dcache attributes.
>>>>
>>>> With all this in place, there is very little reason to create your
>>>> own page tables in board specific files.
>>>>
>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>
>>>> +/*
>>>> + * This is a recursively called function to count the number of
>>>> + * page tables we need to cover a particular PTE range. If you
>>>> + * call this with level = -1 you basically get the full 48 bit
>>>> + * coverage.
>>>> + */
>>>> +static int count_required_pts(u64 addr, int level, u64 maxaddr)
>>>
>>> I think this looks correct now. Nits below if a respin is needed for
>>> other reasons.
>>>
>>>> +{
>>>> +    int levelshift = level2shift(level);
>>>> +    u64 levelsize = 1ULL << levelshift;
>>>> +    u64 levelmask = levelsize - 1;
>>>> +    u64 levelend = addr + levelsize;
>>>> +    int r = 0;
>>>> +    int i;
>>>> +    bool is_level = false;
>>>
>>> I might suggest renaming that is_level_needed. It's not obvious to me
>>> exactly what the name "is_level" is intended to represent; the name
>>> seems to represent whether something (unspecified) is a level or not.
>>
>> We're basically asking the function whether a PTE for address <addr> at
>> level <level> would be an inval/block/level PTE. is_level marks it as a
>> level pte.
>>
>> I could maybe rename this into pte_type and create an enum that is
>> either PTE_INVAL, PTE_BLOCK or PTE_LEVEL.
>>
>>>
>>>> +
>>>> +    for (i = 0; i < ARRAY_SIZE(mem_map); i++) {
>>>> +        struct mm_region *map = &mem_map[i];
>>>> +        u64 start = map->base;
>>>> +        u64 end = start + map->size;
>>>> +
>>>> +        /* Check if the PTE would overlap with the map */
>>>> +        if (max(addr, start) <= min(levelend, end)) {
>>>> +            start = max(addr, start);
>>>> +            end = min(levelend, end);
>>>> +
>>>> +            /* We need a sub-pt for this level */
>>>> +            if ((start & levelmask) || (end & levelmask)) {
>>>> +                is_level = true;
>>>> +                break;
>>>> +            }
>>>> +
>>>> +            /* Lv0 can not do block PTEs, so do levels here too */
>>>> +            if (level <= 0) {
>>>> +                is_level = true;
>>>> +                break;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * Block PTEs at this level are already covered by the parent page
>>>> +     * table, so we only need to count sub page tables.
>>>> +     */
>>>> +    if (is_level) {
>>>> +        int sublevel = level + 1;
>>>> +        u64 sublevelsize = 1ULL << level2shift(sublevel);
>>>> +
>>>> +        /* Account for the new sub page table ... */
>>>> +        r = 1;
>>>
>>> "Account for the page table at /this/ level"? This may represent the
>>> top-level (0/1) page table, not just sub page tables. The sub-level
>>> accounting is via the recursive call to count_required_pts() I believe.
>>
>> I think the easiest way to visualize what's going on is if we start with
>> level -1.
>>
>> We basically ask the function at level -1 whether a PTE at level -1 (48
>> bits) would fit into a block PTE at level -1 (so the PTE spans all 48
>> bits) or whether we need to create a new level entry plus page table
>> for it.
> 
> Hmm, I had overlooked that the call to count_required_pts() passed
> "start_level - 1" not "start_level". Now that I see that, your
> explanation makes more sense to me.
> 
> It'd be nice if some/all of this explanation were comments in the code
> to aid readers.
> 
> That said, I would have expected a slightly more direct calculation; why
> not:
> 
> int start_level = 0; // or 1 if small VA space
> total_pts = count_required_pts(start_level);

We need to account for the count twice, to have an (immutable) copy of
our page tables around when we split them.

> total_pts += 4; // "random" number to account for later splits
> 
> int count_required_pts(int level, ...) {

... includes the address, as I have it today, I guess?

>     int npts = 1; // the table at this level
>     for each pte slot at this level:
>         if a mem_map entry starts/stops within the pte slot:
>             npts += count_required_pts(level + 1, ...);

This means we would count some levels multiple times. Imagine two
entries from

	1G - 1.25G
	1.25G - 1.5G

With your pseudo-code, we would count for both cases while checking for
1G page table entries. Hence we need to jump out of the loop here and do
the counting outside.

At which point this is basically my code, right? :)


Alex

>         else:
>             // nothing; pte is either a block or inval at this level
>     return npts;
> }
> 
> Still, the current code appears to work find, and can be ammended later
> if we want.
Stephen Warren Feb. 29, 2016, 4:52 p.m. UTC | #3
On 02/27/2016 05:09 AM, Alexander Graf wrote:
>
>
> On 26.02.16 20:25, Stephen Warren wrote:
>> On 02/25/2016 09:36 AM, Alexander Graf wrote:
>>>
>>>
>>> On 24.02.16 19:14, Stephen Warren wrote:
>>>> On 02/24/2016 05:11 AM, Alexander Graf wrote:
>>>>> The idea to generate our pages tables from an array of memory ranges
>>>>> is very sound. However, instead of hard coding the code to create up
>>>>> to 2 levels of 64k granule page tables, we really should just create
>>>>> normal 4k page tables that allow us to set caching attributes on 2M
>>>>> or 4k level later on.
>>>>>
>>>>> So this patch moves the full_va mapping code to 4k page size and
>>>>> makes it fully flexible to dynamically create as many levels as
>>>>> necessary for a map (including dynamic 1G/2M pages). It also adds
>>>>> support to dynamically split a large map into smaller ones when
>>>>> some code wants to set dcache attributes.
>>>>>
>>>>> With all this in place, there is very little reason to create your
>>>>> own page tables in board specific files.
>>>>>
>>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>>
>>>>> +/*
>>>>> + * This is a recursively called function to count the number of
>>>>> + * page tables we need to cover a particular PTE range. If you
>>>>> + * call this with level = -1 you basically get the full 48 bit
>>>>> + * coverage.
>>>>> + */
>>>>> +static int count_required_pts(u64 addr, int level, u64 maxaddr)
>>>>
>>>> I think this looks correct now. Nits below if a respin is needed for
>>>> other reasons.
>>>>
>>>>> +{
>>>>> +    int levelshift = level2shift(level);
>>>>> +    u64 levelsize = 1ULL << levelshift;
>>>>> +    u64 levelmask = levelsize - 1;
>>>>> +    u64 levelend = addr + levelsize;
>>>>> +    int r = 0;
>>>>> +    int i;
>>>>> +    bool is_level = false;
>>>>
>>>> I might suggest renaming that is_level_needed. It's not obvious to me
>>>> exactly what the name "is_level" is intended to represent; the name
>>>> seems to represent whether something (unspecified) is a level or not.
>>>
>>> We're basically asking the function whether a PTE for address <addr> at
>>> level <level> would be an inval/block/level PTE. is_level marks it as a
>>> level pte.
>>>
>>> I could maybe rename this into pte_type and create an enum that is
>>> either PTE_INVAL, PTE_BLOCK or PTE_LEVEL.
>>>
>>>>
>>>>> +
>>>>> +    for (i = 0; i < ARRAY_SIZE(mem_map); i++) {
>>>>> +        struct mm_region *map = &mem_map[i];
>>>>> +        u64 start = map->base;
>>>>> +        u64 end = start + map->size;
>>>>> +
>>>>> +        /* Check if the PTE would overlap with the map */
>>>>> +        if (max(addr, start) <= min(levelend, end)) {
>>>>> +            start = max(addr, start);
>>>>> +            end = min(levelend, end);
>>>>> +
>>>>> +            /* We need a sub-pt for this level */
>>>>> +            if ((start & levelmask) || (end & levelmask)) {
>>>>> +                is_level = true;
>>>>> +                break;
>>>>> +            }
>>>>> +
>>>>> +            /* Lv0 can not do block PTEs, so do levels here too */
>>>>> +            if (level <= 0) {
>>>>> +                is_level = true;
>>>>> +                break;
>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    /*
>>>>> +     * Block PTEs at this level are already covered by the parent page
>>>>> +     * table, so we only need to count sub page tables.
>>>>> +     */
>>>>> +    if (is_level) {
>>>>> +        int sublevel = level + 1;
>>>>> +        u64 sublevelsize = 1ULL << level2shift(sublevel);
>>>>> +
>>>>> +        /* Account for the new sub page table ... */
>>>>> +        r = 1;
>>>>
>>>> "Account for the page table at /this/ level"? This may represent the
>>>> top-level (0/1) page table, not just sub page tables. The sub-level
>>>> accounting is via the recursive call to count_required_pts() I believe.
>>>
>>> I think the easiest way to visualize what's going on is if we start with
>>> level -1.
>>>
>>> We basically ask the function at level -1 whether a PTE at level -1 (48
>>> bits) would fit into a block PTE at level -1 (so the PTE spans all 48
>>> bits) or whether we need to create a new level entry plus page table
>>> for it.
>>
>> Hmm, I had overlooked that the call to count_required_pts() passed
>> "start_level - 1" not "start_level". Now that I see that, your
>> explanation makes more sense to me.
>>
>> It'd be nice if some/all of this explanation were comments in the code
>> to aid readers.
>>
>> That said, I would have expected a slightly more direct calculation; why
>> not:
>>
>> int start_level = 0; // or 1 if small VA space
>> total_pts = count_required_pts(start_level);
>
> We need to account for the count twice, to have an (immutable) copy of
> our page tables around when we split them.

I think that's just a hard-coded "* 2" there, unless I'm missing your point?

>> total_pts += 4; // "random" number to account for later splits
>>
>> int count_required_pts(int level, ...) {
>
> ... includes the address, as I have it today, I guess?
>
>>      int npts = 1; // the table at this level
>>      for each pte slot at this level:
>>          if a mem_map entry starts/stops within the pte slot:
>>              npts += count_required_pts(level + 1, ...);
>
> This means we would count some levels multiple times. Imagine two
> entries from
>
> 	1G - 1.25G
> 	1.25G - 1.5G
>
> With your pseudo-code, we would count for both cases while checking for
> 1G page table entries. Hence we need to jump out of the loop here and do
> the counting outside.

I don't think so; "if a mem_map entry starts/stops within the pte slot" 
is a single decision for that PT entry at the current level, not a loop 
that recurses once per memory block. Hence, you'd only ever recurse 
once, so there's no double-counting.

>
> At which point this is basically my code, right? :)
>
>
> Alex
>
>>          else:
>>              // nothing; pte is either a block or inval at this level
>>      return npts;
>> }
>>
>> Still, the current code appears to work find, and can be ammended later
>> if we want.
>
Alexander Graf Feb. 29, 2016, 9:05 p.m. UTC | #4
On 29.02.16 17:52, Stephen Warren wrote:
> On 02/27/2016 05:09 AM, Alexander Graf wrote:
>>
>>
>> On 26.02.16 20:25, Stephen Warren wrote:
>>> On 02/25/2016 09:36 AM, Alexander Graf wrote:
>>>>
>>>>
>>>> On 24.02.16 19:14, Stephen Warren wrote:
>>>>> On 02/24/2016 05:11 AM, Alexander Graf wrote:
>>>>>> The idea to generate our pages tables from an array of memory ranges
>>>>>> is very sound. However, instead of hard coding the code to create up
>>>>>> to 2 levels of 64k granule page tables, we really should just create
>>>>>> normal 4k page tables that allow us to set caching attributes on 2M
>>>>>> or 4k level later on.
>>>>>>
>>>>>> So this patch moves the full_va mapping code to 4k page size and
>>>>>> makes it fully flexible to dynamically create as many levels as
>>>>>> necessary for a map (including dynamic 1G/2M pages). It also adds
>>>>>> support to dynamically split a large map into smaller ones when
>>>>>> some code wants to set dcache attributes.
>>>>>>
>>>>>> With all this in place, there is very little reason to create your
>>>>>> own page tables in board specific files.
>>>>>>
>>>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>>>
>>>>>> +/*
>>>>>> + * This is a recursively called function to count the number of
>>>>>> + * page tables we need to cover a particular PTE range. If you
>>>>>> + * call this with level = -1 you basically get the full 48 bit
>>>>>> + * coverage.
>>>>>> + */
>>>>>> +static int count_required_pts(u64 addr, int level, u64 maxaddr)
>>>>>
>>>>> I think this looks correct now. Nits below if a respin is needed for
>>>>> other reasons.
>>>>>
>>>>>> +{
>>>>>> +    int levelshift = level2shift(level);
>>>>>> +    u64 levelsize = 1ULL << levelshift;
>>>>>> +    u64 levelmask = levelsize - 1;
>>>>>> +    u64 levelend = addr + levelsize;
>>>>>> +    int r = 0;
>>>>>> +    int i;
>>>>>> +    bool is_level = false;
>>>>>
>>>>> I might suggest renaming that is_level_needed. It's not obvious to me
>>>>> exactly what the name "is_level" is intended to represent; the name
>>>>> seems to represent whether something (unspecified) is a level or not.
>>>>
>>>> We're basically asking the function whether a PTE for address <addr> at
>>>> level <level> would be an inval/block/level PTE. is_level marks it as a
>>>> level pte.
>>>>
>>>> I could maybe rename this into pte_type and create an enum that is
>>>> either PTE_INVAL, PTE_BLOCK or PTE_LEVEL.
>>>>
>>>>>
>>>>>> +
>>>>>> +    for (i = 0; i < ARRAY_SIZE(mem_map); i++) {
>>>>>> +        struct mm_region *map = &mem_map[i];
>>>>>> +        u64 start = map->base;
>>>>>> +        u64 end = start + map->size;
>>>>>> +
>>>>>> +        /* Check if the PTE would overlap with the map */
>>>>>> +        if (max(addr, start) <= min(levelend, end)) {
>>>>>> +            start = max(addr, start);
>>>>>> +            end = min(levelend, end);
>>>>>> +
>>>>>> +            /* We need a sub-pt for this level */
>>>>>> +            if ((start & levelmask) || (end & levelmask)) {
>>>>>> +                is_level = true;
>>>>>> +                break;
>>>>>> +            }
>>>>>> +
>>>>>> +            /* Lv0 can not do block PTEs, so do levels here too */
>>>>>> +            if (level <= 0) {
>>>>>> +                is_level = true;
>>>>>> +                break;
>>>>>> +            }
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>> +    /*
>>>>>> +     * Block PTEs at this level are already covered by the parent
>>>>>> page
>>>>>> +     * table, so we only need to count sub page tables.
>>>>>> +     */
>>>>>> +    if (is_level) {
>>>>>> +        int sublevel = level + 1;
>>>>>> +        u64 sublevelsize = 1ULL << level2shift(sublevel);
>>>>>> +
>>>>>> +        /* Account for the new sub page table ... */
>>>>>> +        r = 1;
>>>>>
>>>>> "Account for the page table at /this/ level"? This may represent the
>>>>> top-level (0/1) page table, not just sub page tables. The sub-level
>>>>> accounting is via the recursive call to count_required_pts() I
>>>>> believe.
>>>>
>>>> I think the easiest way to visualize what's going on is if we start
>>>> with
>>>> level -1.
>>>>
>>>> We basically ask the function at level -1 whether a PTE at level -1 (48
>>>> bits) would fit into a block PTE at level -1 (so the PTE spans all 48
>>>> bits) or whether we need to create a new level entry plus page table
>>>> for it.
>>>
>>> Hmm, I had overlooked that the call to count_required_pts() passed
>>> "start_level - 1" not "start_level". Now that I see that, your
>>> explanation makes more sense to me.
>>>
>>> It'd be nice if some/all of this explanation were comments in the code
>>> to aid readers.
>>>
>>> That said, I would have expected a slightly more direct calculation; why
>>> not:
>>>
>>> int start_level = 0; // or 1 if small VA space
>>> total_pts = count_required_pts(start_level);
>>
>> We need to account for the count twice, to have an (immutable) copy of
>> our page tables around when we split them.
> 
> I think that's just a hard-coded "* 2" there, unless I'm missing your
> point?

Yes :).

> 
>>> total_pts += 4; // "random" number to account for later splits
>>>
>>> int count_required_pts(int level, ...) {
>>
>> ... includes the address, as I have it today, I guess?
>>
>>>      int npts = 1; // the table at this level
>>>      for each pte slot at this level:
>>>          if a mem_map entry starts/stops within the pte slot:
>>>              npts += count_required_pts(level + 1, ...);
>>
>> This means we would count some levels multiple times. Imagine two
>> entries from
>>
>>     1G - 1.25G
>>     1.25G - 1.5G
>>
>> With your pseudo-code, we would count for both cases while checking for
>> 1G page table entries. Hence we need to jump out of the loop here and do
>> the counting outside.
> 
> I don't think so; "if a mem_map entry starts/stops within the pte slot"
> is a single decision for that PT entry at the current level, not a loop
> that recurses once per memory block. Hence, you'd only ever recurse
> once, so there's no double-counting.

Ah, "if a" means another loop over all mem_maps, eventually saying "yes,
I am overlapping".

Either way, I am not fully convinced the code would end up much more
readable than it is today, but I'll be happy to get persuaded otherwise :).

And yes, that's definitely something we can still work on down the road
in future follow-up patches.


Alex
diff mbox

Patch

diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c
index 7236455..e8f2632 100644
--- a/arch/arm/cpu/armv8/cache_v8.c
+++ b/arch/arm/cpu/armv8/cache_v8.c
@@ -236,6 +236,12 @@  static void split_block(u64 *pte, int level)
 	set_pte_table(pte, new_table);
 }

+enum pte_type {
+	PTE_INVAL,
+	PTE_BLOCK,
+	PTE_LEVEL,
+};
+
 /*
  * This is a recursively called function to count the number of
  * page tables we need to cover a particular PTE range. If you
@@ -250,7 +256,7 @@  static int count_required_pts(u64 addr, int level,
u64 maxaddr)
 	u64 levelend = addr + levelsize;
 	int r = 0;
 	int i;
-	bool is_level = false;
+	enum pte_type pte_type = PTE_INVAL;

 	for (i = 0; mem_map[i].size || mem_map[i].attrs; i++) {