Message ID | 4926D022.5060008@cosmosbay.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
* Eric Dumazet <dada1@cosmosbay.com> wrote: > Before patch, time to run 8 millions of close(socket()) calls on 8 > CPUS was : > > real 0m27.496s > user 0m0.657s > sys 3m39.092s > > After patch : > > real 0m23.997s > user 0m0.682s > sys 3m11.193s cool :-) What would it take to get it down to: >> Cost if run one one cpu : >> >> real 0m1.561s >> user 0m0.092s >> sys 0m1.469s i guess asking for a wall-clock cost of 1.561/8 would be too much? :) Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ingo Molnar a écrit : > * Eric Dumazet <dada1@cosmosbay.com> wrote: > >> Before patch, time to run 8 millions of close(socket()) calls on 8 >> CPUS was : >> >> real 0m27.496s >> user 0m0.657s >> sys 3m39.092s >> >> After patch : >> >> real 0m23.997s >> user 0m0.682s >> sys 3m11.193s > > cool :-) > > What would it take to get it down to: > >>> Cost if run one one cpu : >>> >>> real 0m1.561s >>> user 0m0.092s >>> sys 0m1.469s > > i guess asking for a wall-clock cost of 1.561/8 would be too much? :) > It might be possible, depending on the level of hackery I am allowed to inject in fs/dcache.c and fs/inode.c :) wall cost of 1.56 (each cpu runs one loop of one million iterations) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Eric Dumazet <dada1@cosmosbay.com> wrote: > Ingo Molnar a écrit : >> * Eric Dumazet <dada1@cosmosbay.com> wrote: >> >>> Before patch, time to run 8 millions of close(socket()) calls on 8 >>> CPUS was : >>> >>> real 0m27.496s >>> user 0m0.657s >>> sys 3m39.092s >>> >>> After patch : >>> >>> real 0m23.997s >>> user 0m0.682s >>> sys 3m11.193s >> >> cool :-) >> >> What would it take to get it down to: >> >>>> Cost if run one one cpu : >>>> >>>> real 0m1.561s >>>> user 0m0.092s >>>> sys 0m1.469s >> >> i guess asking for a wall-clock cost of 1.561/8 would be too much? :) >> > > It might be possible, depending on the level of hackery I am allowed > to inject in fs/dcache.c and fs/inode.c :) I think being able to open+close sockets in a scalable way is an undisputed prime-time workload on Linux. The numbers you showed look horrible. Once you can show how much faster it could go via hacks, it should only be a matter of time to achieve that safely and cleanly. > wall cost of 1.56 (each cpu runs one loop of one million iterations) (indeed.) Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 21, 2008 at 04:13:38PM +0100, Eric Dumazet wrote: > [PATCH] fs: pipe/sockets/anon dentries should not have a parent > > Linking pipe/sockets/anon dentries to one root 'parent' has no functional > impact at all, but a scalability one. > > We can avoid touching a cache line at allocation stage (inside d_alloc(), no need > to touch root->d_count), but also at freeing time (in d_kill, decrementing d_count) > We avoid an expensive atomic_dec_and_lock() call on the root dentry. > > If we correct dnotify_parent() and inotify_d_instantiate() to take into account > a NULL d_parent, we can call d_alloc() with a NULL parent instead of root dentry. Sorry folks, but a NULL d_parent is a no-go from the VFS perspective, but you can set d_parent to the dentry itself which is the magic used for root of tree dentries. They should also be marked DCACHE_DISCONNECTED to make sure this is not unexpected. And this kind of stuff really needs to go through -fsdevel. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi all
Short summary : Nice speedups for allocation/deallocation of sockets/pipes
(From 27.5 seconds to 1.6 second)
Long version :
To allocate a socket or a pipe we :
0) Do the usual file table manipulation (pretty scalable these days,
but would be faster if 'struct files' were using SLAB_DESTROY_BY_RCU
and avoid call_rcu() cache killer)
1) allocate an inode with new_inode()
This function :
- locks inode_lock,
- dirties nr_inodes counter
- dirties inode_in_use list (for sockets/pipes, this is useless)
- dirties superblock s_inodes.
- dirties last_ino counter
All these are in different cache lines unfortunatly.
2) allocate a dentry
d_alloc() takes dcache_lock,
insert dentry on its parent list (dirtying sock_mnt->mnt_sb->s_root)
dirties nr_dentry
3) d_instantiate() dentry (dcache_lock taken again)
4) init_file() -> atomic_inc() on sock_mnt->refcount
At close() time, we must undo the things. Its even more expensive because
of the _atomic_dec_and_lock() that stress a lot, and because of two cache
lines that are touched when an element is deleted from a list
(previous and next items)
This is really bad, since sockets/pipes dont need to be visible in dcache
or an inode list per super block.
This patch series get rid of all contended cache lines for sockets, pipes
and anonymous fd (signalfd, timerfd, ...)
Sample program :
for (i = 0; i < 1000000; i++)
close(socket(AF_INET, SOCK_STREAM, 0));
Cost if one cpu runs the program :
real 1.561s
user 0.092s
sys 1.469s
Cost if 8 processes are launched on a 8 CPU machine
(benchmark named socket8) :
real 27.496s <<<< !!!! >>>>
user 0.657s
sys 3m39.092s
Oprofile results (for the 8 process run, 3 times):
CPU: Core 2, speed 3000.03 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 100000
samples cum. samples % cum. % symbol name
3347352 3347352 28.0232 28.0232 _atomic_dec_and_lock
3301428 6648780 27.6388 55.6620 d_instantiate
2971130 9619910 24.8736 80.5355 d_alloc
241318 9861228 2.0203 82.5558 init_file
146190 10007418 1.2239 83.7797 __slab_free
144149 10151567 1.2068 84.9864 inotify_d_instantiate
143971 10295538 1.2053 86.1917 inet_create
137168 10432706 1.1483 87.3401 new_inode
117549 10550255 0.9841 88.3242 add_partial
110795 10661050 0.9275 89.2517 generic_drop_inode
107137 10768187 0.8969 90.1486 kmem_cache_alloc
94029 10862216 0.7872 90.9358 tcp_close
82837 10945053 0.6935 91.6293 dput
67486 11012539 0.5650 92.1943 dentry_iput
57751 11070290 0.4835 92.6778 iput
54327 11124617 0.4548 93.1326 tcp_v4_init_sock
49921 11174538 0.4179 93.5505 sysenter_past_esp
47616 11222154 0.3986 93.9491 kmem_cache_free
30792 11252946 0.2578 94.2069 clear_inode
27540 11280486 0.2306 94.4375 copy_from_user
26509 11306995 0.2219 94.6594 init_timer
26363 11333358 0.2207 94.8801 discard_slab
25284 11358642 0.2117 95.0918 __fput
22482 11381124 0.1882 95.2800 __percpu_counter_add
20369 11401493 0.1705 95.4505 sock_alloc
18501 11419994 0.1549 95.6054 inet_csk_destroy_sock
17923 11437917 0.1500 95.7555 sys_close
This patch serie avoids all contented cache lines and makes this "bench"
pretty fast.
New cost if run on one cpu :
real 1.325s (instead of 1.561s)
user 0.091s
sys 1.234s
If run on 8 CPUS :
real 2.229s <<<< instead of 27.496s >>>
user 0.695s
sys 16.903s
Oprofile results (for the 8 process run, 3 times):
CPU: Core 2, speed 2999.74 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 100000
samples cum. samples % cum. % symbol name
143791 143791 11.7849 11.7849 __slab_free
128404 272195 10.5238 22.3087 add_partial
99150 371345 8.1262 30.4349 kmem_cache_alloc
52031 423376 4.2644 34.6993 sysenter_past_esp
47752 471128 3.9137 38.6130 kmem_cache_free
47429 518557 3.8872 42.5002 tcp_close
34376 552933 2.8174 45.3176 __percpu_counter_add
29046 581979 2.3806 47.6982 copy_from_user
28249 610228 2.3152 50.0134 init_timer
26220 636448 2.1490 52.1624 __slab_alloc
23402 659850 1.9180 54.0803 discard_slab
20560 680410 1.6851 55.7654 __call_rcu
18288 698698 1.4989 57.2643 d_alloc
16425 715123 1.3462 58.6104 get_empty_filp
16237 731360 1.3308 59.9412 __fput
15729 747089 1.2891 61.2303 alloc_fd
15021 762110 1.2311 62.4614 alloc_inode
14690 776800 1.2040 63.6654 sys_close
14666 791466 1.2020 64.8674 inet_create
13638 805104 1.1178 65.9852 dput
12503 817607 1.0247 67.0099 iput_special
12231 829838 1.0024 68.0123 lock_sock_nested
12210 842048 1.0007 69.0130 fd_install
12137 854185 0.9947 70.0078 d_alloc_special
12058 866243 0.9883 70.9960 sock_init_data
11200 877443 0.9179 71.9140 release_sock
11114 888557 0.9109 72.8248 inotify_d_instantiate
The last point is about SLUB being hit hard, unless we
use slub_min_order=3 at boot, or we use Christoph Lameter
patch (struct file RCU optimizations)
http://thread.gmane.org/gmane.linux.kernel/418615
If we boot machine with slub_min_order=3, SLUB overhead disappears.
New cost if run on one cpu :
real 1.307s
user 0.094s
sys 1.214s
If run on 8 CPUS :
real 1.625s <<<< instead of 27.496s or 2.229s >>>
user 0.771s
sys 12.061s
Oprofile results (for the 8 process run, 3 times):
CPU: Core 2, speed 3000.05 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 100000
samples cum. samples % cum. % symbol name
108005 108005 11.0758 11.0758 kmem_cache_alloc
52023 160028 5.3349 16.4107 sysenter_past_esp
47363 207391 4.8570 21.2678 tcp_close
45430 252821 4.6588 25.9266 kmem_cache_free
36566 289387 3.7498 29.6764 __percpu_counter_add
36085 325472 3.7005 33.3769 __slab_free
29185 354657 2.9929 36.3698 copy_from_user
28210 382867 2.8929 39.2627 init_timer
25663 408530 2.6317 41.8944 d_alloc_special
22360 430890 2.2930 44.1874 cap_file_alloc_security
19237 450127 1.9727 46.1601 __call_rcu
19097 469224 1.9584 48.1185 d_alloc
16962 486186 1.7394 49.8580 alloc_fd
16315 502501 1.6731 51.5311 __fput
16102 518603 1.6512 53.1823 get_empty_filp
14954 533557 1.5335 54.7158 inet_create
14468 548025 1.4837 56.1995 alloc_inode
14198 562223 1.4560 57.6555 sys_close
13905 576128 1.4259 59.0814 dput
12262 588390 1.2575 60.3389 lock_sock_nested
12203 600593 1.2514 61.5903 sock_attach_fd
12147 612740 1.2457 62.8360 iput_special
12049 624789 1.2356 64.0716 fd_install
12033 636822 1.2340 65.3056 sock_init_data
11999 648821 1.2305 66.5361 release_sock
11231 660052 1.1517 67.6878 inotify_d_instantiate
11068 671120 1.1350 68.8228 inet_csk_destroy_sock
This patch serie contains 6 patches, against net-next-2.6 tree
(because this tree already contains network improvement on this
subject, but should apply on other trees)
[PATCH 1/6] fs: Introduce a per_cpu nr_dentry
Adding a per_cpu nr_dentry avoids cache line ping pongs between
cpus to maintain this metric.
We centralize decrements of nr_dentry in d_free(),
and increments in d_alloc().
d_alloc() can avoid taking dcache_lock if parent is NULL
[PATCH 2/6] fs: Introduce special dentries for pipes, socket, anon fd
Sockets, pipes and anonymous fds have interesting properties.
Like other files, they use a dentry and an inode.
But dentries for these kind of files are not hashed into dcache,
since there is no way someone can lookup such a file in the vfs tree.
(/proc/{pid}/fd/{number} uses a different mechanism)
Still, allocating and freeing such dentries are expensive processes,
because we currently take dcache_lock inside d_alloc(), d_instantiate(),
and dput(). This lock is very contended on SMP machines.
This patch defines a new DCACHE_SPECIAL flag, to mark a dentry as
a special one (for sockets, pipes, anonymous fd), and a new
d_alloc_special(const struct qstr *name, struct inode *inode)
method, called by the three subsystems.
Internally, dput() can take a fast path to dput_special() for
special dentries.
Differences betwen a special dentry and a normal one are :
1) Special dentry has the DCACHE_SPECIAL flag
2) Special dentry's parent are themselves
This to avoid taking a reference on 'root' dentry, shared
by too many dentries.
3) They are not hashed into global hash table
4) Their d_alias list is empty
Internally, dput() can avoid an expensive atomic_dec_and_lock()
for special dentries.
(socket8 bench result : from 27.5s to 25.5s)
[PATCH 3/6] fs: Introduce a per_cpu last_ino allocator
new_inode() dirties a contended cache line to get inode numbers.
Solve this problem by providing to each cpu a per_cpu variable,
feeded by the shared last_ino, but once every 1024 allocations.
This reduce contention on the shared last_ino.
Note : last_ino_get() method must be called with preemption disabled.
(socket8 bench result : 25.5s to 25s almost no differences, but
this is because inode_lock cost is too heavy for the moment)
[PATCH 4/6] fs: Introduce a per_cpu nr_inodes
Avoids cache line ping pongs between cpus and prepare next patch,
because updates of nr_inodes dont need inode_lock anymore.
(socket8 bench result : 25s to 20.5s)
[PATCH 5/6] fs: Introduce special inodes
Goal of this patch is to not touch inode_lock for socket/pipes/anonfd
inodes allocation/freeing.
In new_inode(), we test if super block has MS_SPECIAL flag set.
If yes, we dont put inode in "inode_in_use" list nor "sb->s_inodes" list
As inode_lock was taken only to protect these lists, we avoid it as well
Using iput_special() from dput_special() avoids taking inode_lock
at freeing time.
This patch has a very noticeable effect, because we avoid dirtying
of three contended cache lines in new_inode(), and five cache lines
in iput()
Note: Not sure if we can use MS_SPECIAL=MS_NOUSER, or if we
really need a different flag.
(socket8 bench result : from 20.5s to 2.94s)
[PATCH 6/6] fs: Introduce kern_mount_special() to mount special vfs
This function arms a flag (MNT_SPECIAL) on the vfs, to avoid
refcounting on permanent system vfs.
Use this function for sockets, pipes, anonymous fds.
(socket8 bench result : from 2.94s to 2.23s)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
Overall diffstat :
fs/anon_inodes.c | 19 +-----
fs/dcache.c | 106 ++++++++++++++++++++++++++++++++-------
fs/fs-writeback.c | 2
fs/inode.c | 101 +++++++++++++++++++++++++++++++------
fs/pipe.c | 28 +---------
fs/super.c | 9 +++
include/linux/dcache.h | 2
include/linux/fs.h | 8 ++
include/linux/mount.h | 5 +
kernel/sysctl.c | 6 +-
mm/page-writeback.c | 2
net/socket.c | 27 +--------
12 files changed, 212 insertions(+), 103 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 27 Nov 2008, Eric Dumazet wrote: > The last point is about SLUB being hit hard, unless we > use slub_min_order=3 at boot, or we use Christoph Lameter > patch (struct file RCU optimizations) > http://thread.gmane.org/gmane.linux.kernel/418615 > > If we boot machine with slub_min_order=3, SLUB overhead disappears. I'd rather not be that drastic. Did you try increasing slub_min_objects instead? Try 40-100. If we find the right number then we should update the tuning to make sure that it pickes the right slab page sizes. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Lameter a écrit : > On Thu, 27 Nov 2008, Eric Dumazet wrote: > >> The last point is about SLUB being hit hard, unless we >> use slub_min_order=3 at boot, or we use Christoph Lameter >> patch (struct file RCU optimizations) >> http://thread.gmane.org/gmane.linux.kernel/418615 >> >> If we boot machine with slub_min_order=3, SLUB overhead disappears. > > > I'd rather not be that drastic. Did you try increasing slub_min_objects > instead? Try 40-100. If we find the right number then we should update > the tuning to make sure that it pickes the right slab page sizes. > > 4096/192 = 21 with slub_min_objects=22 : # cat /sys/kernel/slab/filp/order 1 # time ./socket8 real 0m1.725s user 0m0.685s sys 0m12.955s with slub_min_objects=45 : # cat /sys/kernel/slab/filp/order 2 # time ./socket8 real 0m1.652s user 0m0.694s sys 0m12.367s with slub_min_objects=80 : # cat /sys/kernel/slab/filp/order 3 # time ./socket8 real 0m1.642s user 0m0.719s sys 0m12.315s I would say slub_min_objects=45 is the optimal value on 32bit arches to get acceptable performance on this workload (order=2 for filp kmem_cache) Note : SLAB here is disastrous, but you already knew that :) real 0m8.128s user 0m0.748s sys 1m3.467s -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
As I told you before, you absolutely must include the fsdevel list and the VFS maintainer for a patchset like this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 27 Nov 2008, Eric Dumazet wrote: > with slub_min_objects=45 : > > # cat /sys/kernel/slab/filp/order > 2 > # time ./socket8 > real 0m1.652s > user 0m0.694s > sys 0m12.367s That may be a good value. How many processor do you have? Look at calculate_order() in mm/slub.c: if (!min_objects) min_objects = 4 * (fls(nr_cpu_ids) + 1); We couild increase the scaling factor there or start with a mininum of 20 objects? Try min_objects = 20 + 4 * (fls(nr_cpu_ids) + 1); > I would say slub_min_objects=45 is the optimal value on 32bit arches to > get acceptable performance on this workload (order=2 for filp kmem_cache) > > Note : SLAB here is disastrous, but you already knew that :) Its good though to have examples where the queue management gets in the way of performance. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Eric Dumazet <dada1@cosmosbay.com> wrote: > Hi all > > Short summary : Nice speedups for allocation/deallocation of sockets/pipes > (From 27.5 seconds to 1.6 second) Wow, that's incredibly impressive! :-) Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2008-11-28 at 19:03 +0100, Ingo Molnar wrote: > * Eric Dumazet <dada1@cosmosbay.com> wrote: > > > Hi all > > > > Short summary : Nice speedups for allocation/deallocation of sockets/pipes > > (From 27.5 seconds to 1.6 second) > > Wow, that's incredibly impressive! :-) Yeah, we got a similar speedup on -rt by pushing those super-block files list into per-cpu lists and doing crazy locking on them. Of course avoiding them all together, like done here is a nicer option but is sadly not a possibility for regular files (until hch gets around to removing the need for the list). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 28, 2008 at 07:47:56PM +0100, Peter Zijlstra wrote: > > Wow, that's incredibly impressive! :-) > > Yeah, we got a similar speedup on -rt by pushing those super-block files > list into per-cpu lists and doing crazy locking on them. > > Of course avoiding them all together, like done here is a nicer option > but is sadly not a possibility for regular files (until hch gets around > to removing the need for the list). We should have finished this long ago, thanks for the reminder. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig a écrit : > On Fri, Nov 28, 2008 at 07:47:56PM +0100, Peter Zijlstra wrote: >>> Wow, that's incredibly impressive! :-) >> Yeah, we got a similar speedup on -rt by pushing those super-block files >> list into per-cpu lists and doing crazy locking on them. >> >> Of course avoiding them all together, like done here is a nicer option >> but is sadly not a possibility for regular files (until hch gets around >> to removing the need for the list). > > We should have finished this long ago, thanks for the reminder. > > inode_in_use could be percpu, at least. Or just zap it, since we never have to scan it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi all
Short summary : Nice speedups for allocation/deallocation of sockets/pipes
(From 27.5 seconds to 2.9 seconds (2.3 seconds with SLUB tweaks))
Long version :
For this second version, I removed the mntput()/mntget() optimization
since most reviewers are not convinced it is usefull.
This is a four lines patch that can be reconsidered later.
I chose the name SINGLE instead of SPECIAL to name
isolated dentries (for sockets, pipes, anonymous fd) that
have no parent and no relationship in the vfs.
Thanks all
To allocate a socket or a pipe we :
0) Do the usual file table manipulation (pretty scalable these days,
but would be faster if 'struct files' were using SLAB_DESTROY_BY_RCU
and avoid call_rcu() cache killer)
1) allocate an inode with new_inode()
This function :
- locks inode_lock,
- dirties nr_inodes counter
- dirties inode_in_use list (for sockets/pipes, this is useless)
- dirties superblock s_inodes. - dirties last_ino counter
All these are in different cache lines unfortunatly.
2) allocate a dentry
d_alloc() takes dcache_lock,
insert dentry on its parent list (dirtying sock_mnt->mnt_sb->s_root)
dirties nr_dentry
3) d_instantiate() dentry (dcache_lock taken again)
4) init_file() -> atomic_inc() on sock_mnt->refcount
At close() time, we must undo the things. Its even more expensive because
of the _atomic_dec_and_lock() that stress a lot, and because of two cache
lines that are touched when an element is deleted from a list
(previous and next items)
This is really bad, since sockets/pipes dont need to be visible in dcache
or an inode list per super block.
This patch series get rid of all but one contended cache lines for
sockets, pipes and anonymous fd (signalfd, timerfd, ...)
Sample program :
for (i = 0; i < 1000000; i++)
close(socket(AF_INET, SOCK_STREAM, 0));
Cost if one cpu runs the program :
real 1.561s
user 0.092s
sys 1.469s
Cost if 8 processes are launched on a 8 CPU machine
(benchmark named socket8) :
real 27.496s <<<< !!!! >>>>
user 0.657s
sys 3m39.092s
Oprofile results (for the 8 process run, 3 times):
CPU: Core 2, speed 3000.03 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 100000
samples cum. samples % cum. % symbol name
3347352 3347352 28.0232 28.0232 _atomic_dec_and_lock
3301428 6648780 27.6388 55.6620 d_instantiate
2971130 9619910 24.8736 80.5355 d_alloc
241318 9861228 2.0203 82.5558 init_file
146190 10007418 1.2239 83.7797 __slab_free
144149 10151567 1.2068 84.9864 inotify_d_instantiate
143971 10295538 1.2053 86.1917 inet_create
137168 10432706 1.1483 87.3401 new_inode
117549 10550255 0.9841 88.3242 add_partial
110795 10661050 0.9275 89.2517 generic_drop_inode
107137 10768187 0.8969 90.1486 kmem_cache_alloc
94029 10862216 0.7872 90.9358 tcp_close
82837 10945053 0.6935 91.6293 dput
67486 11012539 0.5650 92.1943 dentry_iput
57751 11070290 0.4835 92.6778 iput
54327 11124617 0.4548 93.1326 tcp_v4_init_sock
49921 11174538 0.4179 93.5505 sysenter_past_esp
47616 11222154 0.3986 93.9491 kmem_cache_free
30792 11252946 0.2578 94.2069 clear_inode
27540 11280486 0.2306 94.4375 copy_from_user
26509 11306995 0.2219 94.6594 init_timer
26363 11333358 0.2207 94.8801 discard_slab
25284 11358642 0.2117 95.0918 __fput
22482 11381124 0.1882 95.2800 __percpu_counter_add
20369 11401493 0.1705 95.4505 sock_alloc
18501 11419994 0.1549 95.6054 inet_csk_destroy_sock
17923 11437917 0.1500 95.7555 sys_close
This patch serie avoids all contented cache lines and makes this "bench"
pretty fast.
New cost if run on one cpu :
real 1.325s (instead of 1.561s)
user 0.091s
sys 1.234s
If run on 8 CPUS :
real 0m2.971s
user 0m0.726s
sys 0m21.310s
CPU: Core 2, speed 3000.04 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100
000
samples cum. samples % cum. % symbol name
189772 189772 12.7205 12.7205 _atomic_dec_and_lock
140467 330239 9.4155 22.1360 __slab_free
128210 458449 8.5940 30.7300 add_partial
121578 580027 8.1494 38.8794 kmem_cache_alloc
72626 652653 4.8681 43.7475 init_file
62720 715373 4.2041 47.9517 __percpu_counter_add
51632 767005 3.4609 51.4126 sysenter_past_esp
49196 816201 3.2976 54.7102 tcp_close
47933 864134 3.2130 57.9231 kmem_cache_free
29628 893762 1.9860 59.9091 copy_from_user
28443 922205 1.9065 61.8157 init_timer
25602 947807 1.7161 63.5318 __slab_alloc
22139 969946 1.4840 65.0158 discard_slab
20428 990374 1.3693 66.3851 __call_rcu
18174 1008548 1.2182 67.6033 alloc_fd
17643 1026191 1.1826 68.7859 __fput
17374 1043565 1.1646 69.9505 d_alloc
17196 1060761 1.1527 71.1031 sys_close
17024 1077785 1.1411 72.2442 inet_create
15208 1092993 1.0194 73.2636 alloc_inode
12201 1105194 0.8178 74.0815 fd_install
12167 1117361 0.8156 74.8970 lock_sock_nested
12123 1129484 0.8126 75.7096 get_empty_filp
11648 1141132 0.7808 76.4904 release_sock
11509 1152641 0.7715 77.2619 dput
11335 1163976 0.7598 78.0216 sock_init_data
11038 1175014 0.7399 78.7615 inet_csk_destroy_sock
10880 1185894 0.7293 79.4908 drop_file_write_access
10083 1195977 0.6759 80.1667 inotify_d_instantiate
9216 1205193 0.6178 80.7844 local_bh_enable_ip
8881 1214074 0.5953 81.3797 sysenter_do_call
8759 1222833 0.5871 81.9668 setup_object
8489 1231322 0.5690 82.5359 iput_single
So we now hit mntput()/mntget() and SLUB.
The last point is about SLUB being hit hard, unless we
use slub_min_order=3 (or slub_min_objects=45) at boot,
or we use Christoph Lameter patch (struct file RCU optimizations)
http://thread.gmane.org/gmane.linux.kernel/418615
If we boot machine with slub_min_order=3, SLUB overhead disappears.
If run on 8 CPUS :
real 0m2.315s
user 0m0.752s
sys 0m17.324s
CPU: Core 2, speed 3000.15 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 100000
samples cum. samples % cum. % symbol name
199409 199409 15.6440 15.6440 _atomic_dec_and_lock (mntput())
141606 341015 11.1092 26.7532 kmem_cache_alloc
76071 417086 5.9679 32.7211 init_file
70595 487681 5.5383 38.2595 __percpu_counter_add
51595 539276 4.0477 42.3072 sysenter_past_esp
49313 588589 3.8687 46.1759 tcp_close
45503 634092 3.5698 49.7457 kmem_cache_free
41413 675505 3.2489 52.9946 __slab_free
29911 705416 2.3466 55.3412 copy_from_user
28979 734395 2.2735 57.6146 init_timer
22251 756646 1.7456 59.3602 get_empty_filp
19942 776588 1.5645 60.9247 __call_rcu
18348 794936 1.4394 62.3642 __fput
18328 813264 1.4379 63.8020 alloc_fd
17395 830659 1.3647 65.1667 sys_close
17301 847960 1.3573 66.5240 d_alloc
16570 864530 1.2999 67.8239 inet_create
15522 880052 1.2177 69.0417 alloc_inode
13185 893237 1.0344 70.0761 setup_object
12359 905596 0.9696 71.0456 fd_install
12275 917871 0.9630 72.0086 lock_sock_nested
11924 929795 0.9355 72.9441 release_sock
11790 941585 0.9249 73.8690 sock_init_data
11310 952895 0.8873 74.7563 dput
10924 963819 0.8570 75.6133 drop_file_write_access
10903 974722 0.8554 76.4687 inet_csk_destroy_sock
10184 984906 0.7990 77.2676 inotify_d_instantiate
9372 994278 0.7353 78.0029 local_bh_enable_ip
8901 1003179 0.6983 78.7012 sysenter_do_call
8569 1011748 0.6723 79.3735 iput_single
8194 1019942 0.6428 80.0163 inet_release
This patch serie contains 5 patches, against net-next-2.6 tree
(because this tree already contains network improvement on this
subject, but should apply on other trees)
[PATCH 1/5] fs: Use a percpu_counter to track nr_dentry
Adding a percpu_counter nr_dentry avoids cache line ping pongs
between cpus to maintain this metric, and dcache_lock is
no more needed to protect dentry_stat.nr_dentry
We centralize nr_dentry updates at the right place :
- increments in d_alloc()
- decrements in d_free()
d_alloc() can avoid taking dcache_lock if parent is NULL
(socket8 bench result : 27.5s to 25s)
[PATCH 2/5] fs: Use a percpu_counter to track nr_inodes
Avoids cache line ping pongs between cpus and prepare next patch,
because updates of nr_inodes dont need inode_lock anymore.
(socket8 bench result : no difference at this point)
[PATCH 3/5] fs: Introduce a per_cpu last_ino allocator
new_inode() dirties a contended cache line to get increasing
inode numbers.
Solve this problem by providing to each cpu a per_cpu variable,
feeded by the shared last_ino, but once every 1024 allocations.
This reduce contention on the shared last_ino, and give same
spreading ino numbers than before.
(same wraparound after 2^32 allocations)
(socket8 bench result : no difference)
[PATCH 4/5] fs: Introduce SINGLE dentries for pipes, socket, anon fd
Sockets, pipes and anonymous fds have interesting properties.
Like other files, they use a dentry and an inode.
But dentries for these kind of files are not hashed into dcache,
since there is no way someone can lookup such a file in the vfs tree.
(/proc/{pid}/fd/{number} uses a different mechanism)
Still, allocating and freeing such dentries are expensive processes,
because we currently take dcache_lock inside d_alloc(), d_instantiate(),
and dput(). This lock is very contended on SMP machines.
This patch defines a new DCACHE_SINGLE flag, to mark a dentry as
a single one (for sockets, pipes, anonymous fd), and a new
d_alloc_single(const struct qstr *name, struct inode *inode)
method, called by the three subsystems.
Internally, dput() can take a fast path to dput_single() for
SINGLE dentries. No more atomic_dec_and_lock()
for such dentries.
Differences betwen an SINGLE dentry and a normal one are :
1) SINGLE dentry has the DCACHE_SINGLE flag
2) SINGLE dentry's parent is itself (DCACHE_DISCONNECTED)
This to avoid taking a reference on sb 'root' dentry, shared
by too many dentries.
3) They are not hashed into global hash table (DCACHE_UNHASHED)
4) Their d_alias list is empty
(socket8 bench result : from 25s to 19.9s)
[PATCH 5/5] fs: new_inode_single() and iput_single()
Goal of this patch is to not touch inode_lock for socket/pipes/anonfd
inodes allocation/freeing.
SINGLE dentries are attached to inodes that dont need to be linked
in a list of inodes, being "inode_in_use" or "sb->s_inodes"
As inode_lock was taken only to protect these lists, we avoid
taking it as well.
Using iput_single() from dput_single() avoids taking inode_lock
at freeing time.
This patch has a very noticeable effect, because we avoid dirtying of
three contended cache lines in new_inode(), and five cache lines
in iput()
(socket8 bench result : from 19.9s to 2.3s)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
Overall diffstat :
fs/anon_inodes.c | 18 ------
fs/dcache.c | 100 ++++++++++++++++++++++++++++++--------
fs/fs-writeback.c | 2
fs/inode.c | 101 +++++++++++++++++++++++++++++++--------
fs/pipe.c | 25 +--------
include/linux/dcache.h | 9 +++
include/linux/fs.h | 17 ++++++
kernel/sysctl.c | 6 +-
mm/page-writeback.c | 2
net/socket.c | 26 +---------
10 files changed, 200 insertions(+), 106 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Andrew Take v2 of this patch serie got no new feedback, maybe its time for mm inclusion for a while ? In this third version I added last two patches, one intialy from Christoph Lameter, and one to avoid dirtying mnt->mnt_count on hardwired fs. Many thanks to Christoph and Paul for this SLAB_DESTROY_PER_RCU work done on "struct file". Thank you Short summary : Nice speedups for allocation/deallocation of sockets/pipes (From 27.5 seconds to 1.62 s, on a 8 cpus machine) Long version : To allocate a socket or a pipe we : 0) Do the usual file table manipulation (pretty scalable these days, but would be faster if 'struct file' were using SLAB_DESTROY_BY_RCU and avoid call_rcu() cache killer). This point is addressed by 6th patch. 1) allocate an inode with new_inode() This function : - locks inode_lock, - dirties nr_inodes counter - dirties inode_in_use list (for sockets/pipes, this is useless) - dirties superblock s_inodes. - dirties last_ino counter All these are in different cache lines unfortunatly. 2) allocate a dentry d_alloc() takes dcache_lock, insert dentry on its parent list (dirtying sock_mnt->mnt_sb->s_root) dirties nr_dentry 3) d_instantiate() dentry (dcache_lock taken again) 4) init_file() -> atomic_inc() on sock_mnt->refcount At close() time, we must undo the things. Its even more expensive because of the _atomic_dec_and_lock() that stress a lot, and because of two cache lines that are touched when an element is deleted from a list (previous and next items) This is really bad, since sockets/pipes dont need to be visible in dcache or an inode list per super block. This patch series get rid of all but one contended cache lines for sockets, pipes and anonymous fd (signalfd, timerfd, ...) socketallocbench is a very simple program (attached to this mail) that makes a loop : for (i = 0; i < 1000000; i++) close(socket(AF_INET, SOCK_STREAM, 0)); Cost if one cpu runs the program : real 1.561s user 0.092s sys 1.469s Cost if 8 processes are launched on a 8 CPU machine (socketallocbench -n 8) : real 27.496s <<<< !!!! >>>> user 0.657s sys 3m39.092s Oprofile results (for the 8 process run, 3 times): CPU: Core 2, speed 3000.03 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples cum. samples % cum. % symbol name 3347352 3347352 28.0232 28.0232 _atomic_dec_and_lock 3301428 6648780 27.6388 55.6620 d_instantiate 2971130 9619910 24.8736 80.5355 d_alloc 241318 9861228 2.0203 82.5558 init_file 146190 10007418 1.2239 83.7797 __slab_free 144149 10151567 1.2068 84.9864 inotify_d_instantiate 143971 10295538 1.2053 86.1917 inet_create 137168 10432706 1.1483 87.3401 new_inode 117549 10550255 0.9841 88.3242 add_partial 110795 10661050 0.9275 89.2517 generic_drop_inode 107137 10768187 0.8969 90.1486 kmem_cache_alloc 94029 10862216 0.7872 90.9358 tcp_close 82837 10945053 0.6935 91.6293 dput 67486 11012539 0.5650 92.1943 dentry_iput 57751 11070290 0.4835 92.6778 iput 54327 11124617 0.4548 93.1326 tcp_v4_init_sock 49921 11174538 0.4179 93.5505 sysenter_past_esp 47616 11222154 0.3986 93.9491 kmem_cache_free 30792 11252946 0.2578 94.2069 clear_inode 27540 11280486 0.2306 94.4375 copy_from_user 26509 11306995 0.2219 94.6594 init_timer 26363 11333358 0.2207 94.8801 discard_slab 25284 11358642 0.2117 95.0918 __fput 22482 11381124 0.1882 95.2800 __percpu_counter_add 20369 11401493 0.1705 95.4505 sock_alloc 18501 11419994 0.1549 95.6054 inet_csk_destroy_sock 17923 11437917 0.1500 95.7555 sys_close This patch serie avoids all contented cache lines and makes this "bench" pretty fast. New cost if run on one cpu : real 1.245s (instead of 1.561s) user 0.074s sys 1.161s If run on 8 CPUS : real 1.624s user 0.580s sys 12.296s On oprofile, we finally can see network stuff coming at the front of expensive stuff. (with the exception of kmem_cache_[z]alloc(), because it has to clear 192 bytes of file structures, this takes half of the time) CPU: Core 2, speed 3000.09 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100 000 samples cum. samples % cum. % symbol name 176586 176586 10.9376 10.9376 kmem_cache_alloc 169838 346424 10.5196 21.4572 tcp_close 105331 451755 6.5241 27.9813 tcp_v4_init_sock 105146 556901 6.5126 34.4939 tcp_v4_destroy_sock 83307 640208 5.1600 39.6539 sysenter_past_esp 80241 720449 4.9701 44.6239 inet_csk_destroy_sock 74263 794712 4.5998 49.2237 kmem_cache_free 56806 851518 3.5185 52.7422 __percpu_counter_add 48619 900137 3.0114 55.7536 copy_from_user 44803 944940 2.7751 58.5287 init_timer 28539 973479 1.7677 60.2964 d_alloc 27795 1001274 1.7216 62.0180 alloc_fd 26747 1028021 1.6567 63.6747 __fput 24312 1052333 1.5059 65.1805 sys_close 24205 1076538 1.4992 66.6798 inet_create 22409 1098947 1.3880 68.0677 alloc_inode 21359 1120306 1.3230 69.3907 release_sock 19865 1140171 1.2304 70.6211 fd_install 19472 1159643 1.2061 71.8272 lock_sock_nested 18956 1178599 1.1741 73.0013 sock_init_data 17301 1195900 1.0716 74.0729 drop_file_write_access 17113 1213013 1.0600 75.1329 inotify_d_instantiate 16384 1229397 1.0148 76.1477 dput 15173 1244570 0.9398 77.0875 local_bh_enable_ip 15017 1259587 0.9301 78.0176 local_bh_enable 13354 1272941 0.8271 78.8448 __sock_create 13139 1286080 0.8138 79.6586 inet_release 13062 1299142 0.8090 80.4676 sysenter_do_call 11935 1311077 0.7392 81.2069 iput_single This patch serie contains 7 patches, against linux-2.6 tree, plus one patch in mm (fs: filp_cachep can be static in fs/file_table.c) [PATCH 1/7] fs: Use a percpu_counter to track nr_dentry Adding a percpu_counter nr_dentry avoids cache line ping pongs between cpus to maintain this metric, and dcache_lock is no more needed to protect dentry_stat.nr_dentry We centralize nr_dentry updates at the right place : - increments in d_alloc() - decrements in d_free() d_alloc() can avoid taking dcache_lock if parent is NULL ("socketallocbench -n 8" bench result : 27.5s to 25s) [PATCH 2/7] fs: Use a percpu_counter to track nr_inodes Avoids cache line ping pongs between cpus and prepare next patch, because updates of nr_inodes dont need inode_lock anymore. ("socketallocbench -n 8" bench result : no difference at this point) [PATCH 3/7] fs: Introduce a per_cpu last_ino allocator new_inode() dirties a contended cache line to get increasing inode numbers. Solve this problem by providing to each cpu a per_cpu variable, feeded by the shared last_ino, but once every 1024 allocations. This reduce contention on the shared last_ino, and give same spreading ino numbers than before. (same wraparound after 232 allocations) ("socketallocbench -n 8" result : no difference) [PATCH 4/7] fs: Introduce SINGLE dentries for pipes, socket, anon fd Sockets, pipes and anonymous fds have interesting properties. Like other files, they use a dentry and an inode. But dentries for these kind of files are not hashed into dcache, since there is no way someone can lookup such a file in the vfs tree. (/proc/{pid}/fd/{number} uses a different mechanism) Still, allocating and freeing such dentries are expensive processes, because we currently take dcache_lock inside d_alloc(), d_instantiate(), and dput(). This lock is very contended on SMP machines. This patch defines a new DCACHE_SINGLE flag, to mark a dentry as a single one (for sockets, pipes, anonymous fd), and a new d_alloc_single(const struct qstr *name, struct inode *inode) method, called by the three subsystems. Internally, dput() can take a fast path to dput_single() for SINGLE dentries. No more atomic_dec_and_lock() for such dentries. Differences betwen an SINGLE dentry and a normal one are : 1) SINGLE dentry has the DCACHE_SINGLE flag 2) SINGLE dentry's parent is itself (DCACHE_DISCONNECTED) This to avoid taking a reference on sb 'root' dentry, shared by too many dentries. 3) They are not hashed into global hash table (DCACHE_UNHASHED) 4) Their d_alias list is empty (socket8 bench result : from 25s to 19.9s) [PATCH 5/7] fs: new_inode_single() and iput_single() Goal of this patch is to not touch inode_lock for socket/pipes/anonfd inodes allocation/freeing. SINGLE dentries are attached to inodes that dont need to be linked in a list of inodes, being "inode_in_use" or "sb->s_inodes" As inode_lock was taken only to protect these lists, we avoid taking it as well. Using iput_single() from dput_single() avoids taking inode_lock at freeing time. This patch has a very noticeable effect, because we avoid dirtying of three contended cache lines in new_inode(), and five cache lines in iput() ("socketallocbench -n 8" result : from 19.9s to 3.01s) [PATH 6/7] fs: struct file move from call_rcu() to SLAB_DESTROY_BY_RCU From: Christoph Lameter <cl@linux-foundation.org> Currently we schedule RCU frees for each file we free separately. That has several drawbacks against the earlier file handling (in 2.6.5 f.e.), which did not require RCU callbacks: 1. Excessive number of RCU callbacks can be generated causing long RCU queues that in turn cause long latencies. We hit SLUB page allocation more often than necessary. 2. The cache hot object is not preserved between free and realloc. A close followed by another open is very fast with the RCUless approach because the last freed object is returned by the slab allocator that is still cache hot. RCU free means that the object is not immediately available again. The new object is cache cold and therefore open/close performance tests show a significant degradation with the RCU implementation. One solution to this problem is to move the RCU freeing into the Slab allocator by specifying SLAB_DESTROY_BY_RCU as an option at slab creation time. The slab allocator will do RCU frees only when it is necessary to dispose of slabs of objects (rare). So with that approach we can cut out the RCU overhead significantly. However, the slab allocator may return the object for another use even before the RCU period has expired under SLAB_DESTROY_BY_RCU. This means there is the (unlikely) possibility that the object is going to be switched under us in sections protected by rcu_read_lock() and rcu_read_unlock(). So we need to verify that we have acquired the correct object after establishing a stable object reference (incrementing the refcounter does that). Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> ("socketallocbench -n 8" result : from 3.01s to 2.20s) [PATCH 7/7] fs: MS_NOREFCOUNT Some fs are hardwired into kernel, and mntput()/mntget() hit a contended cache line. We define a new superblock flag, MS_NOREFCOUNT, that is set on socket, pipes and anonymous fd superblocks. mntput()/mntget() become null ops on these fs. ("socketallocbench -n 8" result : from 2.20s to 1.64s) cat socketallocbench.c /* * socketallocbench benchmark * * Usage : socket [-n procs] [-l loops] */ #include <sys/socket.h> #include <unistd.h> #include <stdlib.h> #include <stdio.h> #include <sys/wait.h> void dowork(int loops) { int i; for (i = 0; i < loops; i++) close(socket(AF_INET, SOCK_STREAM, 0)); } int main(int argc, char *argv[]) { int i; int n = 1; int loops = 1000000; pid_t *pidtable; while ((i = getopt(argc, argv, "n:l:")) != EOF) { if (i == 'n') n = atoi(optarg); if (i == 'l') loops = atoi(optarg); } pidtable = malloc(n * sizeof(pid_t)); for (i = 1; i < n; i++) { pidtable[i] = fork(); if (pidtable[i] == 0) { dowork(loops); _exit(0); } if (pidtable[i] == -1) { perror("fork"); n = i; break; } } dowork(loops); for (i = 1; i < n; i++) { int status; wait(&status); } return 0; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c index 3662dd4..22cce87 100644 --- a/fs/anon_inodes.c +++ b/fs/anon_inodes.c @@ -92,7 +92,7 @@ int anon_inode_getfd(const char *name, const struct file_operations *fops, this.name = name; this.len = strlen(name); this.hash = 0; - dentry = d_alloc(anon_inode_mnt->mnt_sb->s_root, &this); + dentry = d_alloc(NULL, &this); if (!dentry) goto err_put_unused_fd; diff --git a/fs/dnotify.c b/fs/dnotify.c index 676073b..66066a3 100644 --- a/fs/dnotify.c +++ b/fs/dnotify.c @@ -173,7 +173,7 @@ void dnotify_parent(struct dentry *dentry, unsigned long event) spin_lock(&dentry->d_lock); parent = dentry->d_parent; - if (parent->d_inode->i_dnotify_mask & event) { + if (parent && parent->d_inode->i_dnotify_mask & event) { dget(parent); spin_unlock(&dentry->d_lock); __inode_dir_notify(parent->d_inode, event); diff --git a/fs/inotify.c b/fs/inotify.c index 7bbed1b..9f051bb 100644 --- a/fs/inotify.c +++ b/fs/inotify.c @@ -270,7 +270,7 @@ void inotify_d_instantiate(struct dentry *entry, struct inode *inode) spin_lock(&entry->d_lock); parent = entry->d_parent; - if (parent->d_inode && inotify_inode_watched(parent->d_inode)) + if (parent && parent->d_inode && inotify_inode_watched(parent->d_inode)) entry->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED; spin_unlock(&entry->d_lock); } diff --git a/fs/pipe.c b/fs/pipe.c index 7aea8b8..4b961bc 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -926,7 +926,7 @@ struct file *create_write_pipe(int flags) goto err; err = -ENOMEM; - dentry = d_alloc(pipe_mnt->mnt_sb->s_root, &name); + dentry = d_alloc(NULL, &name); if (!dentry) goto err_inode; diff --git a/net/socket.c b/net/socket.c index e9d65ea..b84de7d 100644 --- a/net/socket.c +++ b/net/socket.c @@ -373,7 +373,7 @@ static int sock_attach_fd(struct socket *sock, struct file *file, int flags) struct dentry *dentry; struct qstr name = { .name = "" }; - dentry = d_alloc(sock_mnt->mnt_sb->s_root, &name); + dentry = d_alloc(NULL, &name); if (unlikely(!dentry)) return -ENOMEM;