[RFC,09/09] Introduce cache=rdma moutning option

Message ID 20180518002214.5657-10-longli@linuxonhyperv.com
State New
Headers show
Series
  • Implement direct user I/O interfaces for RDMA
Related show

Commit Message

Long Li May 18, 2018, 12:22 a.m.
From: Long Li <longli@microsoft.com>

When cache=rdma is enabled on mount options, CIFS do not allocate internal data
buffer pages for I/O, data is read/writen directly to user memory via RDMA.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifs_fs_sb.h | 2 ++
 fs/cifs/cifsglob.h   | 1 +
 fs/cifs/connect.c    | 9 +++++++++
 fs/cifs/dir.c        | 5 +++++
 fs/cifs/file.c       | 4 ++++
 fs/cifs/inode.c      | 4 +++-
 6 files changed, 24 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig May 18, 2018, 7:26 a.m. | #1
On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> When cache=rdma is enabled on mount options, CIFS do not allocate internal data
> buffer pages for I/O, data is read/writen directly to user memory via RDMA.

I don't think this should be an option.  For direct I/O without signing
or encryption CIFS should always use get_user_pages, with or without
RDMA.
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Long Li May 18, 2018, 7 p.m. | #2
> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
> 
> On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > When cache=rdma is enabled on mount options, CIFS do not allocate
> > internal data buffer pages for I/O, data is read/writen directly to user
> memory via RDMA.
> 
> I don't think this should be an option.  For direct I/O without signing or
> encryption CIFS should always use get_user_pages, with or without RDMA.

Yes this should be done for all transport. If there are no objections, I'll send patches to change this.
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve French May 18, 2018, 8:44 p.m. | #3
On Fri, May 18, 2018 at 12:00 PM, Long Li via samba-technical
<samba-technical@lists.samba.org> wrote:
>> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
>>
>> On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote:
>> > From: Long Li <longli@microsoft.com>
>> >
>> > When cache=rdma is enabled on mount options, CIFS do not allocate
>> > internal data buffer pages for I/O, data is read/writen directly to user
>> memory via RDMA.
>>
>> I don't think this should be an option.  For direct I/O without signing or
>> encryption CIFS should always use get_user_pages, with or without RDMA.
>
> Yes this should be done for all transport. If there are no objections, I'll send patches to change this.

Would this help/change performance much?
Long Li May 18, 2018, 8:58 p.m. | #4
PiBTdWJqZWN0OiBSZTogW1JGQyBQQVRDSCAwOS8wOV0gSW50cm9kdWNlIGNhY2hlPXJkbWEgbW91
dG5pbmcgb3B0aW9uDQo+IA0KPiBPbiBGcmksIE1heSAxOCwgMjAxOCBhdCAxMjowMCBQTSwgTG9u
ZyBMaSB2aWEgc2FtYmEtdGVjaG5pY2FsIDxzYW1iYS0NCj4gdGVjaG5pY2FsQGxpc3RzLnNhbWJh
Lm9yZz4gd3JvdGU6DQo+ID4+IFN1YmplY3Q6IFJlOiBbUkZDIFBBVENIIDA5LzA5XSBJbnRyb2R1
Y2UgY2FjaGU9cmRtYSBtb3V0bmluZyBvcHRpb24NCj4gPj4NCj4gPj4gT24gVGh1LCBNYXkgMTcs
IDIwMTggYXQgMDU6MjI6MTRQTSAtMDcwMCwgTG9uZyBMaSB3cm90ZToNCj4gPj4gPiBGcm9tOiBM
b25nIExpIDxsb25nbGlAbWljcm9zb2Z0LmNvbT4NCj4gPj4gPg0KPiA+PiA+IFdoZW4gY2FjaGU9
cmRtYSBpcyBlbmFibGVkIG9uIG1vdW50IG9wdGlvbnMsIENJRlMgZG8gbm90IGFsbG9jYXRlDQo+
ID4+ID4gaW50ZXJuYWwgZGF0YSBidWZmZXIgcGFnZXMgZm9yIEkvTywgZGF0YSBpcyByZWFkL3dy
aXRlbiBkaXJlY3RseSB0bw0KPiA+PiA+IHVzZXINCj4gPj4gbWVtb3J5IHZpYSBSRE1BLg0KPiA+
Pg0KPiA+PiBJIGRvbid0IHRoaW5rIHRoaXMgc2hvdWxkIGJlIGFuIG9wdGlvbi4gIEZvciBkaXJl
Y3QgSS9PIHdpdGhvdXQNCj4gPj4gc2lnbmluZyBvciBlbmNyeXB0aW9uIENJRlMgc2hvdWxkIGFs
d2F5cyB1c2UgZ2V0X3VzZXJfcGFnZXMsIHdpdGggb3INCj4gd2l0aG91dCBSRE1BLg0KPiA+DQo+
ID4gWWVzIHRoaXMgc2hvdWxkIGJlIGRvbmUgZm9yIGFsbCB0cmFuc3BvcnQuIElmIHRoZXJlIGFy
ZSBubyBvYmplY3Rpb25zLCBJJ2xsIHNlbmQNCj4gcGF0Y2hlcyB0byBjaGFuZ2UgdGhpcy4NCj4g
DQo+IFdvdWxkIHRoaXMgaGVscC9jaGFuZ2UgcGVyZm9ybWFuY2UgbXVjaD8NCg0KT24gUkRNQSwg
aXQgaGVscHMgd2l0aCBJL08gbGF0ZW5jeSBhbmQgcmVkdWNlcyBDUFUgdXNhZ2Ugb24gY2VydGFp
biBJL08gcGF0dGVybnMuDQoNCkJ1dCBJIGhhdmVuJ3QgdGVzdGVkIG9uIFRDUC4gTWF5YmUgaXQg
d2lsbCBoZWxwIGEgbGl0dGxlIGJpdC4NCg0KPiANCj4gDQo+IA0KPiAtLQ0KPiBUaGFua3MsDQo+
IA0KPiBTdGV2ZQ0K
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Talpey May 19, 2018, 1:20 a.m. | #5
On 5/18/2018 1:58 PM, Long Li wrote:
>> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
>>
>> On Fri, May 18, 2018 at 12:00 PM, Long Li via samba-technical <samba-
>> technical@lists.samba.org> wrote:
>>>> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
>>>>
>>>> On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote:
>>>>> From: Long Li <longli@microsoft.com>
>>>>>
>>>>> When cache=rdma is enabled on mount options, CIFS do not allocate
>>>>> internal data buffer pages for I/O, data is read/writen directly to
>>>>> user
>>>> memory via RDMA.
>>>>
>>>> I don't think this should be an option.  For direct I/O without
>>>> signing or encryption CIFS should always use get_user_pages, with or
>> without RDMA.
>>>
>>> Yes this should be done for all transport. If there are no objections, I'll send
>> patches to change this.
>>
>> Would this help/change performance much?
> 
> On RDMA, it helps with I/O latency and reduces CPU usage on certain I/O patterns.
> 
> But I haven't tested on TCP. Maybe it will help a little bit.

Well, when the application requests direct i/o on a TCP connection,
you definitely don't want to cache it! So even if the performance
is different, correctness would dictate doing this.

You probably don't need to pin the buffer in the TCP case, which
might be worth avoiding.

Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/cifs/cifs_fs_sb.h b/fs/cifs/cifs_fs_sb.h
index 350fa55..7c28dc3 100644
--- a/fs/cifs/cifs_fs_sb.h
+++ b/fs/cifs/cifs_fs_sb.h
@@ -51,6 +51,8 @@ 
 					      */
 #define CIFS_MOUNT_UID_FROM_ACL 0x2000000 /* try to get UID via special SID */
 
+#define CIFS_MOUNT_DIRECT_RDMA	0x4000000
+
 struct cifs_sb_info {
 	struct rb_root tlink_tree;
 	spinlock_t tlink_tree_lock;
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index a51855c..3bec63f 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -518,6 +518,7 @@  struct smb_vol {
 	bool server_ino:1; /* use inode numbers from server ie UniqueId */
 	bool direct_io:1;
 	bool strict_io:1; /* strict cache behavior */
+	bool rdma_io:1;
 	bool remap:1;      /* set to remap seven reserved chars in filenames */
 	bool sfu_remap:1;  /* remap seven reserved chars ala SFU */
 	bool posix_paths:1; /* unset to not ask for posix pathnames. */
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 46d0cf4..c92b3d8 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -282,6 +282,7 @@  enum {
 	Opt_cache_loose,
 	Opt_cache_strict,
 	Opt_cache_none,
+	Opt_cache_rdma,
 	Opt_cache_err
 };
 
@@ -289,6 +290,7 @@  static const match_table_t cifs_cacheflavor_tokens = {
 	{ Opt_cache_loose, "loose" },
 	{ Opt_cache_strict, "strict" },
 	{ Opt_cache_none, "none" },
+	{ Opt_cache_rdma, "rdma" },
 	{ Opt_cache_err, NULL }
 };
 
@@ -1128,6 +1130,9 @@  cifs_parse_cache_flavor(char *value, struct smb_vol *vol)
 		vol->direct_io = true;
 		vol->strict_io = false;
 		break;
+	case Opt_cache_rdma:
+		vol->rdma_io = true;
+		break;
 	default:
 		cifs_dbg(VFS, "bad cache= option: %s\n", value);
 		return 1;
@@ -3612,6 +3617,10 @@  int cifs_setup_cifs_sb(struct smb_vol *pvolume_info,
 		cifs_dbg(FYI, "mounting share using direct i/o\n");
 		cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_DIRECT_IO;
 	}
+	if (pvolume_info->rdma_io) {
+		cifs_dbg(VFS, "mounting share using rdma i/o\n");
+		cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_DIRECT_RDMA;
+	}
 	if (pvolume_info->mfsymlinks) {
 		if (pvolume_info->sfu_emul) {
 			/*
diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
index 81ba6e0..ce69b1c 100644
--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -557,6 +557,11 @@  cifs_atomic_open(struct inode *inode, struct dentry *direntry,
 			file->f_op = &cifs_file_direct_ops;
 		}
 
+	if (file->f_flags & O_DIRECT &&
+		CIFS_SB(inode->i_sb)->mnt_cifs_flags & CIFS_MOUNT_DIRECT_RDMA)
+			file->f_op = &cifs_file_direct_rdma_ops;
+
+
 	file_info = cifs_new_fileinfo(&fid, file, tlink, oplock);
 	if (file_info == NULL) {
 		if (server->ops->close)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0b394db..30ccf6a 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -492,6 +492,10 @@  int cifs_open(struct inode *inode, struct file *file)
 			file->f_op = &cifs_file_direct_ops;
 	}
 
+	if (file->f_flags & O_DIRECT &&
+            cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_RDMA)
+		file->f_op = &cifs_file_direct_rdma_ops;
+
 	if (server->oplocks)
 		oplock = REQ_OPLOCK;
 	else
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 3c371f7..7991298 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -44,7 +44,9 @@  static void cifs_set_ops(struct inode *inode)
 	switch (inode->i_mode & S_IFMT) {
 	case S_IFREG:
 		inode->i_op = &cifs_file_inode_ops;
-		if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_IO) {
+		if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_RDMA)
+			inode->i_fop = &cifs_file_direct_rdma_ops;
+		else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DIRECT_IO) {
 			if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NO_BRL)
 				inode->i_fop = &cifs_file_direct_nobrl_ops;
 			else