diff mbox series

zram-swap: enable swap discard

Message ID 20200623112953.3454-1-rsalvaterra@gmail.com
State Accepted
Delegated to: Petr Štetiar
Headers show
Series zram-swap: enable swap discard | expand

Commit Message

Rui Salvaterra June 23, 2020, 11:29 a.m. UTC
Zram block devices have supported trim/discard for over six years, let's
enable it. This allows the zram device to actually free up allocated memory
when it's marked as unused in the filesystem metadata, as explained in more
detail in the original commit message [1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/block/zram/zram_drv.c?h=linux-4.14.y&id=f4659d8e620d08bd1a84a8aec5d2f5294a242764

Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
---
 package/system/zram-swap/files/zram.init | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Rui Salvaterra June 24, 2020, 10:39 p.m. UTC | #1
Hi, Emil,

On Wed, 24 Jun 2020 at 20:30, Emil Muratov <gpm@hotplug.ru> wrote:
>
> zram swap discard is enabled by default, no need to specify additional
> option. Pls, check for example info output in
> https://github.com/openwrt/openwrt/pull/1515

Unfortunately, that's not quite right. This has nothing to do with
zram itself, but with the way swapon syscall works. Yes, both discard
*policies* are enabled by default if, *and only if* the syscall is
invoked with *only* the SWAP_FLAG_DISCARD flag set [1].
Anyway, it's easy to see if the swap device is issuing trim/discard
commands. For example, on my router, with my patch:

root@heimdal:~# dmesg | grep swap
[   12.302928] Adding 1036284k swap on /dev/zram0.  Priority:-2
extents:1 across:1036284k SSDsc
root@heimdal:~#

According to the relevant section of the kernel source code [2], The
last letters, SSDsc mean:
SS: solid state (obviously, it's a RAM-based device)
D: supports discard
s: area-based discard is enabled
c: page-based discard is enabled

Consequently, if a system only shows SS, discard commands aren't being issued.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/swap.h?h=linux-4.14.y#n24
[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/swapfile.c?h=linux-4.14.y#n3283

Cheers,
Rui
Rui Salvaterra June 25, 2020, 9:55 a.m. UTC | #2
Hi, Emil,

On Thu, 25 Jun 2020 at 10:05, Emil Muratov <gpm@hotplug.ru> wrote:
>
> Hmm... that looks weird at first,
> I was checking kernel counters at /sys/block/zram0/io_stat,
> which is zram device for swap. And if there was positive counter for
> discards for zram device while swap was not sending any discards than
> what is that counter is about?

The number you see in io_stat doesn't reflect only pages freed due to
discard/trim requests. From the zram documentation itself [1]:

File /sys/block/zram<id>/io_stat

The stat file represents device's I/O statistics not accounted by block
layer and, thus, not available in zram<id>/stat file. It consists of a
single line of text and contains the following stats separated by
whitespace:
 failed_reads     the number of failed reads
 failed_writes    the number of failed writes
 invalid_io       the number of non-page-size-aligned I/O requests
 notify_free      Depending on device usage scenario it may account
                  a) the number of pages freed because of swap slot free
                  notifications or b) the number of pages freed because of
                  REQ_OP_DISCARD requests sent by bio. The former ones are
                  sent to a swap block device when a swap slot is freed,
                  which implies that this disk is being used as a swap disk.
                  The latter ones are sent by filesystem mounted with
                  discard option, whenever some data blocks are getting
                  discarded.

What you are seeing is the number of freed pages due to swap slot free
notifications. This means the swap "filesystem" is marking the swap
slots as free (metadata), but the underlying zram device is not
returning the memory to the kernel.

> Have you checked if stressing swap with some push/pull paging load
> indeed keeps ram occupied at max level?
> Analyzing /sys/block/$zdev/mm_stat could also help there. I do not
> remember mem leaks issues when tested stats script, but who knows...
> Need to take a closer look into this.
> Thanks!

The thing is, there are no memory leaks from the kernel point of view.
The kernel thinks zram needs the memory it has allocated, the problem
is that it actually doesn't and also doesn't tell the kernel that it
can reclaim it.
So, indeed, we need to explicitly tell swapon we want to issue discard commands.

[1] https://www.kernel.org/doc/Documentation/blockdev/zram.txt

Thanks,
Rui
Rui Salvaterra June 25, 2020, 1:32 p.m. UTC | #3
Hi, Emil,

On Thu, 25 Jun 2020 at 13:57, Emil Muratov <gpm@hotplug.ru> wrote:
>
> Could you, please, describe you test case scenarios and measurements
> where you've hit memory waste for zram dev when swap shrinks?

I don't have a specific procedure to test this, but I believe I showed
you proof that it's a good idea. I'm sure the reason discard isn't the
default is because people actually use swap on SSD, and repeated
discard commands will wear the flash storage too much.
If you want to merge it, fine. If you don't, it's fine too, I'll just
keep it in my tree.

Cheers,
Rui
diff mbox series

Patch

diff --git a/package/system/zram-swap/files/zram.init b/package/system/zram-swap/files/zram.init
index 49140ad406..d33e779850 100755
--- a/package/system/zram-swap/files/zram.init
+++ b/package/system/zram-swap/files/zram.init
@@ -182,7 +182,7 @@  start()
 	zram_comp_streams "$zram_dev"
 	echo $(( $zram_size * 1024 * 1024 )) >"/sys/block/$( basename "$zram_dev" )/disksize"
 	mkswap "$zram_dev"
-	swapon $zram_priority "$zram_dev"
+	swapon -d $zram_priority "$zram_dev"
 }
 
 stop()