Message ID | 20180125025349.31494-1-krisman@collabora.co.uk |
---|---|
Headers | show |
Series | NLS/UTF-8 Case-Insensitive lookups for ext4 and VFS proposal | expand |
On Thu, Jan 25, 2018 at 12:53:36AM -0200, Gabriel Krisman Bertazi wrote: > The second proposal is related to the VFS layer: > > (2) Enable Insensitive lookup support on a per-mountpoint basis, > via a MS_CASEFOLD flag, with the ultimate goal of supporting a > case-insensitive bind mount of a subtree, side-by-side with a > sensitive version of the filesystem. First reaction: No. With the side of HELL NO. Your ultimate goal sounds utterly insane - dcache tree must be shared for all mounts. Moreover, "would these two names refer to the same object" can not be mount-dependent. Not going to happen. Please, post the description of what you are planning to do. Detailed. I'm not saying that it's 100% impossible to do correctly, but I'm _very_ sceptical about the feasibility. I'm certainly not going to ACK any VFS changes until you convince me that this thing can be done with sane semantics.
On Thu, Jan 25, 2018 at 03:16:50AM +0000, Al Viro wrote: > On Thu, Jan 25, 2018 at 12:53:36AM -0200, Gabriel Krisman Bertazi wrote: > > > The second proposal is related to the VFS layer: > > > > (2) Enable Insensitive lookup support on a per-mountpoint basis, > > via a MS_CASEFOLD flag, with the ultimate goal of supporting a > > case-insensitive bind mount of a subtree, side-by-side with a > > sensitive version of the filesystem. > > First reaction: No. With the side of HELL NO. > > Your ultimate goal sounds utterly insane - dcache tree must be shared > for all mounts. Moreover, "would these two names refer to the same > object" can not be mount-dependent. Not going to happen. > > Please, post the description of what you are planning to do. > Detailed. I'm not saying that it's 100% impossible to do correctly, > but I'm _very_ sceptical about the feasibility. So I can't speak authoratatively about the Collabora's customer requirements that drove this particular feature, but I can talk about why I am interested in this feature --- although I understand why it's hard to do correctly. Android is currently using a horrible wrapfs scheme to provide case insesitive lookups. We talked about this last year at LSF/MM[1]. It is using separate dentry entries for the upper and lower layers, and it is *definitely* prone to races. Running fsstress on the top and lower layers simultaneously is a pretty simple way to induce hilarity, although my understanding is that the developers working on it having been hammering on it for a while so that the races that cause kernel panicks or file system damage are rare. I don't believe they have been completely elimniated, because it's not clear to me that any wrapfs solution can ever be made race-free. [1] https://lwn.net/Articles/718640/ The problem is that both the case insensitive and case sensitive directory trees are made available to userspace, and for backwards compatibility reasons, they can't just make it disappear. Although they don't have a Linus Torvalds going all Mr. Shouty about breaking userspace, the combined whinging from the App developer ecosystem probably can be a fairly good substitute for Linus when they complain about compatibility breakages visible to the Android application programmer. :-) So my hope has been to find a way to (a) make wrapfs go away, and (b) allow the functionality of case sensitive and case insensitive bind mounts of the same underlying file system directories be *somehow* supported. If the answer is that if two files that differ only in case are created via the case sensitive mount results in it being undefined which file you get when you open, rename, or delete that file via the case insensitive mount, that's fine. I think that's what happpens in the wrapfs implementation today anyway. It would probably be fine if the answer was you get the file with an exact match if it is present, but if the two files created on the case sensitive side are "makefile" and "Makefile", and you try open/delete "MaKeFiLe" via the case insensitive mount, it's undefined which file you get --- that's fine. Anyone who does that will get everything they deserve, and they get that today anyway with the wrapfs or the previous fuse hack anyway. This may indeed be an O_PONIES style request. And in the worst case, Android will continue using an out-of-tree wrapfs solution, and hopefully I can fend off people asking me for help when the wrapfs code blows up in various not-so-entertainng ways. Which is, admittedly, a purely selfish reason for why I would like to see if we can give Android folks equivalent functionality to the wrapfs hack in a sane, safe, and consensual way. :-) Cheers, - Ted
Hi Ted, On 2018/1/26 3:32, Theodore Ts'o wrote: > On Thu, Jan 25, 2018 at 03:16:50AM +0000, Al Viro wrote: >> On Thu, Jan 25, 2018 at 12:53:36AM -0200, Gabriel Krisman Bertazi wrote: >> >>> The second proposal is related to the VFS layer: >>> >>> (2) Enable Insensitive lookup support on a per-mountpoint basis, >>> via a MS_CASEFOLD flag, with the ultimate goal of supporting a >>> case-insensitive bind mount of a subtree, side-by-side with a >>> sensitive version of the filesystem. >> >> First reaction: No. With the side of HELL NO. >> >> Your ultimate goal sounds utterly insane - dcache tree must be shared >> for all mounts. Moreover, "would these two names refer to the same >> object" can not be mount-dependent. Not going to happen. >> >> Please, post the description of what you are planning to do. >> Detailed. I'm not saying that it's 100% impossible to do correctly, >> but I'm _very_ sceptical about the feasibility. > > So I can't speak authoratatively about the Collabora's customer > requirements that drove this particular feature, but I can talk about > why I am interested in this feature --- although I understand why it's > hard to do correctly. > > Android is currently using a horrible wrapfs scheme to provide case > insesitive lookups. We talked about this last year at LSF/MM[1]. It > is using separate dentry entries for the upper and lower layers, and > it is *definitely* prone to races. Running fsstress on the top and > lower layers simultaneously is a pretty simple way to induce hilarity, > although my understanding is that the developers working on it having > been hammering on it for a while so that the races that cause kernel > panicks or file system damage are rare. I don't believe they have > been completely elimniated, because it's not clear to me that any > wrapfs solution can ever be made race-free. Sorry for bother, I mean you refer to sdcardfs, The current Samsung sdcardfs and esdfs are all fsstress broken for many issues, However we developed an another approach last year, in https://android-review.googlesource.com/#/c/kernel/common/+/574402/3/Documentation/filesystems/hwsdcardfs.txt We applied it on HUAWEI Mate 10 (pro), and the complete solution is https://android-review.googlesource.com/#/q/status:open+project:kernel/common+branch:android-4.4+topic:huawei_sdcardfs It supports stackable case-insensitive without modifing VFS smoothly, and we tested fsstress on the top and lower layers at the same time without any problem. > > [1] https://lwn.net/Articles/718640/ > > The problem is that both the case insensitive and case sensitive > directory trees are made available to userspace, and for backwards > compatibility reasons, they can't just make it disappear. Although > they don't have a Linus Torvalds going all Mr. Shouty about breaking > userspace, the combined whinging from the App developer ecosystem > probably can be a fairly good substitute for Linus when they complain > about compatibility breakages visible to the Android application > programmer. :-) > > So my hope has been to find a way to (a) make wrapfs go away, and (b) > allow the functionality of case sensitive and case insensitive bind > mounts of the same underlying file system directories be *somehow* > supported. > > If the answer is that if two files that differ only in case are > created via the case sensitive mount results in it being undefined > which file you get when you open, rename, or delete that file via the > case insensitive mount, that's fine. I think that's what happpens in > the wrapfs implementation today anyway. It would probably be fine if > the answer was you get the file with an exact match if it is present, > but if the two files created on the case sensitive side are "makefile" > and "Makefile", and you try open/delete "MaKeFiLe" via the case > insensitive mount, it's undefined which file you get --- that's fine. We have a solution to solve case-insensitive stackable but without modifing VFS. Thanks, > > Anyone who does that will get everything they deserve, and they get > that today anyway with the wrapfs or the previous fuse hack anyway. > > This may indeed be an O_PONIES style request. And in the worst case, > Android will continue using an out-of-tree wrapfs solution, and > hopefully I can fend off people asking me for help when the wrapfs > code blows up in various not-so-entertainng ways. Which is, > admittedly, a purely selfish reason for why I would like to see if we > can give Android folks equivalent functionality to the wrapfs hack in > a sane, safe, and consensual way. :-) > > Cheers, > > - Ted >
Al Viro <viro@ZenIV.linux.org.uk> writes: > On Thu, Jan 25, 2018 at 12:53:36AM -0200, Gabriel Krisman Bertazi wrote: > >> The second proposal is related to the VFS layer: >> >> (2) Enable Insensitive lookup support on a per-mountpoint basis, >> via a MS_CASEFOLD flag, with the ultimate goal of supporting a >> case-insensitive bind mount of a subtree, side-by-side with a >> sensitive version of the filesystem. > > First reaction: No. With the side of HELL NO. > > Your ultimate goal sounds utterly insane - dcache tree must be shared > for all mounts. Moreover, "would these two names refer to the same > object" can not be mount-dependent. Not going to happen. > > Please, post the description of what you are planning to do. > Detailed. I'm not saying that it's 100% impossible to do correctly, > but I'm _very_ sceptical about the feasibility. > > I'm certainly not going to ACK any VFS changes until you convince > me that this thing can be done with sane semantics. Hi Viro, Sorry for the delay. I wanted to get a better understanding of the code before getting back to you. Our customer is interested in exposing a subtree of an existing filesystem (native Linux filesystems, xfs, ext4 and others) in an case-insensitive lookup manner, without paying the cost of a userspace getdents implementation, and, preferably, without requiring the user to modify data on the disk. After learning, last year, about the similar Android issue from Ted at a conference, I started to believe a bind mount is the appropriate solution for both our use cases. Like Ted mentioned, we let the user shot herself in the foot by having two files with the exact CI name, and which one she gets will be undefined. My current solution gets the exact match if it is present, and my second approach could be adapted to do that with a performance cost only if both files are not cached. That seem to be acceptable by Ted and shouldn't be a problem for me too. Let me start by discussing what I have now, which is a functional solution, to the best of my knowledge, but which still requires rework to better benefit from the dentry cache. Then I will discuss the solutions I am working on. (1) What I have now: As I mentioned in the first email, MS_CASEFOLD triggers a MNT_CASEFOLD flag in the vfsmount structure. This flag, which gets recalculated every time a mountpoint is traversed, is used to decide if we do a casefolded ->lookup() for each component of the path. This obviously allows paths with multiple nested mountpoints to work. The ->lookup() function calls d_add_ci(), making sure that only the exact-match dentry is inserted in the cache. This means that we don't have multiple dentries, differing only by case, associated to the same file. This is an important semantic, which I am not modifying. This also means that a following CI lookup with an inexact-case match will trigger another ->lookup(), which is bad, but is manageable (for now). I address this problem later in this email. Regarding negative dentries. If we have a negative dentry in the cache, the case-sensitive algorithm will abort the lookup and return the dentry, nothing new here. If, on the other hand, That component lookup is under a MNT_CASEFOLD search, the code does a bigger effort, ignores the negative dentry and still do ->lookup(). This means that: - Case-sensitive (CS) search performance is not affected, except for the following sequence: Do a CS(foo) which returns a negative_dentry, then CI(FOO), and finally CS(foo). This will require the second CS(foo) to re-do the search and recreate the negative dentry. If only CS searches are done, the performance should be unaffected. - Case-insensitive (CI) benefits from the dentry cache as long as it is doing an exact-case name match. Otherwise, it always falls to ->lookup(). The exact-name dentry, if already in the cache, will be returned in both cases. - Negative dentries are ignored if doing CI and it always re-do the search. The prototype code is in the following branch. I can send patches if you want. https://gitlab.collabora.com/krisman/linux.git -b vfs-ms_casefold (2) What I want to do: I want to allow lookups of the inexact name to find the cached dentry, preventing it from going to the disk only to find out later it already has that dentry cached in some other cache bucket. This means that, whenever there is a MNT_CASEFOLD enabled, every mountpoint of that superblock needs to use d_hash(casefold(name)), such that every file is in a known bucket. Obviously this is a worse hash function, but the cost is only payed by filesystems using the feature. If a bind mount is done, we must rehash all existing entries of that filesystem at "bind,remount,ignorecase" time. Like the previous version, we only store exact name dentries, even though the bucket is different now. When doing a LOOKUP_CASEFOLD, we use a different ->d_compare_CI() to do the matching, such that we can find the right dentry or not, in both scenarios (CI and CS). We still can't trust negative dentries when doing a CI lookup. Consider the case where we do CS(FOO) and foo doesn't exist. Now we have a negative_dentry(FOO), but CI(FOO) would still need to find an existing 'foo' file. The first solution is to make fast_CI_lookup() walk the entire bucket and ignore previous matching negative dentries if a positive one is found. If no positive dentry is found, then fallback and do ->lookup(). A superior solution is to mark negative dentries as soft negatives and hard negatives, depending on what kind of lookup created the dentry, whether CS or CI, respectively. Then, we can allow a CI lookup to return the negative dentry only if it is a HARD negative, and try the ->lookup() otherwise. For this to work, any dentry creation on that parent requires that we invalidate hard negative children dentries, and make them soft, at insertion/rename time. With these changes I believe the dentry cache could be shared by code doing CI and CS lookups of the same dentry. My main concern is whether we could invalidate a negative dentry without harming parallel lookups using that dentry, but I believe I can work around that. I believe this covers everything I could identify, but please let me know if I missed something, obvious or not.
Hi Gabriel, On 2018/2/6 10:24, Gabriel Krisman Bertazi wrote: > Al Viro <viro@ZenIV.linux.org.uk> writes: > >> On Thu, Jan 25, 2018 at 12:53:36AM -0200, Gabriel Krisman Bertazi wrote: >> >>> The second proposal is related to the VFS layer: >>> >>> (2) Enable Insensitive lookup support on a per-mountpoint basis, >>> via a MS_CASEFOLD flag, with the ultimate goal of supporting a >>> case-insensitive bind mount of a subtree, side-by-side with a >>> sensitive version of the filesystem. >> >> First reaction: No. With the side of HELL NO. >> >> Your ultimate goal sounds utterly insane - dcache tree must be shared >> for all mounts. Moreover, "would these two names refer to the same >> object" can not be mount-dependent. Not going to happen. >> >> Please, post the description of what you are planning to do. >> Detailed. I'm not saying that it's 100% impossible to do correctly, >> but I'm _very_ sceptical about the feasibility. >> >> I'm certainly not going to ACK any VFS changes until you convince >> me that this thing can be done with sane semantics. > > Hi Viro, > > Sorry for the delay. I wanted to get a better understanding of the > code before getting back to you. > > Our customer is interested in exposing a subtree of an existing > filesystem (native Linux filesystems, xfs, ext4 and others) in an > case-insensitive lookup manner, without paying the cost of a userspace > getdents implementation, and, preferably, without requiring the user to > modify data on the disk. Could I express my opinion? I have working on case-insensitive sdcardfs for months. I think your problem is how we optimise a case-insensitive lookup on the file system with a case-sensitive dcache (I mean d_add and no d_compare and d_hash). In that case, we could not trust the negative dentry when _creating_ a case-insensitive file, for example: there exists "anDroid" on-disk, but ext4's in-memory dcache only has the negative "Android", if we lookup "Android" we will get the _negative_ dentry, but we _cannot_ create it since "anDroid" exists on disk. In the create case, an on-disk _iterate_ (or readdir) is necessary. I could give another example, if we uses case-insentive ext4 and create "Android" and "anDroid", how to deal with the case in the case-insensitive way? I mean in that case we should make both "Android" and "anDroid" can access, right? I think we need to build a special case-sensitive dcache rather than a case-insensitive dcache following the native case-insentive fs(use d_add_ci, d_compare and d_hash, eg. fat, ntfs...) Finally, I agree "let the user shot herself in the foot by having two files with the exact CI name", but I think it could not the VFS _busniess_ itself since each customer solution "case-sensitive ext4 -> case-insensitive lookup" has their _perfered_ way (for example, "android" and "Android" exist, A perfers android and B perfers Android. Finally, I think for optmization, ext4 or other fs could add some dir inode _tag_ and supports native case-insensitive for these dirs could be better.... Sorry to bother, Thanks, > > After learning, last year, about the similar Android issue from Ted at a > conference, I started to believe a bind mount is the appropriate > solution for both our use cases. > > Like Ted mentioned, we let the user shot herself in the foot by having > two files with the exact CI name, and which one she gets will be > undefined. My current solution gets the exact match if it is present, > and my second approach could be adapted to do that with a performance > cost only if both files are not cached. That seem to be acceptable by > Ted and shouldn't be a problem for me too. > > Let me start by discussing what I have now, which is a functional > solution, to the best of my knowledge, but which still requires rework > to better benefit from the dentry cache. Then I will discuss the > solutions I am working on. > > (1) What I have now: > > As I mentioned in the first email, MS_CASEFOLD triggers a MNT_CASEFOLD > flag in the vfsmount structure. This flag, which gets recalculated > every time a mountpoint is traversed, is used to decide if we do a > casefolded ->lookup() for each component of the path. This obviously > allows paths with multiple nested mountpoints to work. > > The ->lookup() function calls d_add_ci(), making sure that only the > exact-match dentry is inserted in the cache. This means that we don't > have multiple dentries, differing only by case, associated to the same > file. This is an important semantic, which I am not modifying. This > also means that a following CI lookup with an inexact-case match will > trigger another ->lookup(), which is bad, but is manageable (for now). > I address this problem later in this email. > > Regarding negative dentries. If we have a negative dentry in the cache, > the case-sensitive algorithm will abort the lookup and return the > dentry, nothing new here. If, on the other hand, That component lookup > is under a MNT_CASEFOLD search, the code does a bigger effort, ignores > the negative dentry and still do ->lookup(). > > This means that: > > - Case-sensitive (CS) search performance is not affected, except for > the following sequence: Do a CS(foo) which returns a negative_dentry, > then CI(FOO), and finally CS(foo). This will require the second > CS(foo) to re-do the search and recreate the negative dentry. If > only CS searches are done, the performance should be unaffected. > > - Case-insensitive (CI) benefits from the dentry cache as long as it is > doing an exact-case name match. Otherwise, it always falls to > ->lookup(). The exact-name dentry, if already in the cache, will be > returned in both cases. > > - Negative dentries are ignored if doing CI and it always re-do the > search. > > The prototype code is in the following branch. I can send > patches if you want. > > https://gitlab.collabora.com/krisman/linux.git -b vfs-ms_casefold > > (2) What I want to do: > > I want to allow lookups of the inexact name to find the cached dentry, > preventing it from going to the disk only to find out later it already > has that dentry cached in some other cache bucket. This means that, > whenever there is a MNT_CASEFOLD enabled, every mountpoint of that > superblock needs to use d_hash(casefold(name)), such that every file is > in a known bucket. Obviously this is a worse hash function, but the > cost is only payed by filesystems using the feature. If a bind mount is > done, we must rehash all existing entries of that filesystem at > "bind,remount,ignorecase" time. > > Like the previous version, we only store exact name dentries, even > though the bucket is different now. When doing a LOOKUP_CASEFOLD, we > use a different ->d_compare_CI() to do the matching, such that we can > find the right dentry or not, in both scenarios (CI and CS). > > We still can't trust negative dentries when doing a CI lookup. Consider > the case where we do CS(FOO) and foo doesn't exist. Now we have a > negative_dentry(FOO), but CI(FOO) would still need to find an existing > 'foo' file. The first solution is to make fast_CI_lookup() walk the > entire bucket and ignore previous matching negative dentries if a > positive one is found. If no positive dentry is found, then fallback > and do ->lookup(). A superior solution is to mark negative dentries as > soft negatives and hard negatives, depending on what kind of lookup > created the dentry, whether CS or CI, respectively. Then, we can allow > a CI lookup to return the negative dentry only if it is a HARD negative, > and try the ->lookup() otherwise. For this to work, any dentry creation > on that parent requires that we invalidate hard negative children > dentries, and make them soft, at insertion/rename time. > > With these changes I believe the dentry cache could be shared by code > doing CI and CS lookups of the same dentry. My main concern is whether > we could invalidate a negative dentry without harming parallel lookups > using that dentry, but I believe I can work around that. > > I believe this covers everything I could identify, but please let me > know if I missed something, obvious or not. >
Gao Xiang <gaoxiang25@huawei.com> writes: > Could I express my opinion? I have working on case-insensitive sdcardfs > for months. Hi Gao, Thanks for helping out with this topic. > I think your problem is how we optimise a case-insensitive lookup on the > file system with a case-sensitive dcache (I mean d_add and no d_compare > and d_hash). Are d_compare and d_hash to be considered really disruptive performance-wise? Even if they are only used when casefold/encoding support is enabled? I don't see how we could better use the dcache without at least requiring these functions to handle CI cases. > In that case, we could not trust the negative dentry when _creating_ a > case-insensitive file, for example: > there exists "anDroid" on-disk, but ext4's in-memory dcache only has > the negative "Android", if we lookup "Android" we will get the > _negative_ dentry, but we _cannot_ create it since "anDroid" exists on > disk. In the create case, an on-disk _iterate_ (or readdir) is > necessary. In my previous email, I mentioned my current implementation ignores negative dentries and forces a ->lookup(), which walks over the disk entries. (I had to add a fix to the creation path in the vfs-ms_casefold branch to exactly match that description, so you might have missed the updated version in that branch). Either way, this case is supported like this: If we have two bind-mounts of the same directory, /mnt and /mnt-ci, case-sensitive and case-insensitive, respectively, We can do: open("/mnt/anDroid", O_EXCL|O_CREAT) = 3 open("/mnt/Android", 0) = -2 No such file or directory open("/mnt-ci/Android", 0) = 4 open("/mnt-ci/Android", O_EXCL|O_CREAT) = -17 File exists open("/mnt-ci/AndROID", O_EXCL|O_CREAT) = -17 File exists The second open() is expected to create an negative_dentry of "Android", which, if it wasn't ignored by the 3th open(), the CI operation would have failed. Notice that the 3th open() operation actually opens the file that was created by the first open(). It doesn't create a new file. Following on, the 4th operation (file creation) *must fail* because there is a CI name collision with /mnt-ci/anDroid. The same is true for the final case. > I could give another example, if we uses case-insentive ext4 and create > "Android" and "anDroid", how to deal with the case in the > case-insensitive way? > I mean in that case we should make both "Android" and "anDroid" can > access, right? Not sure if I follow you here, but I'm assuming we create Android and anDroid in the sensitive mountpoint, because, otherwise the second file creation in the insensitive mountpoint would fail. This is the case where I'm hiding one of the previously (CS) created files, when in the insensitive mountpoint, and the user is shooting himself. For the sensitive case, Both stays visible to the user. > I think we need to build a special case-sensitive dcache rather than > a case-insensitive dcache following the native case-insentive fs(use > d_add_ci, d_compare and d_hash, eg. fat, ntfs...) What do you think about the second part of my proposal, where I mention dealing differently with negative dentries created by a CI lookup? We don't need to ignore them if we can invalidate them after a creation in the directory. > Finally, I agree "let the user shot herself in the foot by having two > files with the exact CI name", but I think it could not the VFS > _busniess_ itself since each customer solution "case-sensitive ext4 -> > case-insensitive lookup" has their _perfered_ way (for example, > "android" and "Android" exist, A perfers android and B perfers Android. I don't see how we could defer the decision to the filesystem, that's a pretty good problem, which I don't have a solution right now. > Finally, I think for optmization, ext4 or other fs could add some dir > inode _tag_ and supports native case-insensitive for these dirs could be > better.... Agreed. But I'm seeing this as outside the scope of my proposal, since it is specific to each filesystem. My ext4 adaptation, for instance, falls back to linear search when it can't find the exact match. Thanks,
Hi Gabriel, Thanks for your reply. On 2018/2/13 3:56, Gabriel Krisman Bertazi wrote: > Gao Xiang <gaoxiang25@huawei.com> writes: > >> Could I express my opinion? I have working on case-insensitive sdcardfs >> for months. > Hi Gao, > > Thanks for helping out with this topic. > >> I think your problem is how we optimise a case-insensitive lookup on the >> file system with a case-sensitive dcache (I mean d_add and no d_compare >> and d_hash). > Are d_compare and d_hash to be considered really disruptive > performance-wise? Even if they are only used when casefold/encoding > support is enabled? I don't see how we could better use the dcache > without at least requiring these functions to handle CI cases. I mean if both "Android" and "anDroid" exist in the same directory of on-disk ext4, I could tend to make both "Android" and "anDroid" accessed by the case-exact name lookup, otherwise do a case-insensitive lookup. eg. (my current implementation), 1. open("/mnt/anDroid", O_EXCL|O_CREAT) 2. open("/mnt/Android", O_EXCL|O_CREAT) 3. open("/mnt-ci/Android") --- exactly Android 4. open("/mnt-ci/anDroid") --- exactly anDroid 5. open("/mnt-ci/ANDROID") --- Android or anDroid, do a case-insensitive lookup. 6. unlink("/mnt-ci/Android") --- exactly Android 7. unlink("/mnt-ci/ANDROID") --- anDroid, do a case-insensitive lookup. >> In that case, we could not trust the negative dentry when _creating_ a >> case-insensitive file, for example: >> there exists "anDroid" on-disk, but ext4's in-memory dcache only has >> the negative "Android", if we lookup "Android" we will get the >> _negative_ dentry, but we _cannot_ create it since "anDroid" exists on >> disk. In the create case, an on-disk _iterate_ (or readdir) is >> necessary. > In my previous email, I mentioned my current implementation ignores > negative dentries and forces a ->lookup(), which walks over the disk > entries. (I had to add a fix to the creation path in the vfs-ms_casefold > branch to exactly match that description, so you might have missed the > updated version in that branch). I am sorry, I haven't looked into your CI patch yet. I decided to join in this topic because in the past months, I did some work on the case-insensitive sdcardfs, and I just want to express my thoughts for this topic. I don't know which level negative dentries you ignore, your case-insensitive negative dentries or the original underlayfs(eg. ext4) case-sensitive negative dentries? Actually I observed some false "negative dentries" race or deadlock if multiple case-insensitive mounts exist and do fs-ops on these mounts at the same time, but I am not sure whether your implementation has these potential issues in principle or not. In addition, quote ` Our customer is interested in exposing a subtree of an existing filesystem (native Linux filesystems, xfs, ext4 and others) in an case-insensitive lookup manner, without paying the cost of a userspace getdents implementation, and, preferably, without requiring the user to modify data on the disk.` I think the *most expensive* operation for case-insensitive lookup is to create a not-exist file rather than do a existed-file lookup. It takes much overhead and I think no straightforward way to directly reduce the cost and improve its performance. > > Either way, this case is supported like this: > > If we have two bind-mounts of the same directory, /mnt and /mnt-ci, > case-sensitive and case-insensitive, respectively, We can do: > > open("/mnt/anDroid", O_EXCL|O_CREAT) = 3 > open("/mnt/Android", 0) = -2 No such file or directory > open("/mnt-ci/Android", 0) = 4 > open("/mnt-ci/Android", O_EXCL|O_CREAT) = -17 File exists > open("/mnt-ci/AndROID", O_EXCL|O_CREAT) = -17 File exists > > The second open() is expected to create an negative_dentry of "Android", > which, if it wasn't ignored by the 3th open(), the CI operation would > have failed. Notice that the 3th open() operation actually opens the Yes, the 3th open() should invalidate the 2nd negative dentry. However, I don't know your detailed implementation behavior, for example as follows: 1. open("/mnt/anDroid", O_EXCL|O_CREAT) 2. open("/mnt/Android", 0) ---- dentry A 3. open("/mnt-ci/Android", 0) ---- CI-dentry B or the same dentry A? if 2 and 3 are the different denties but the same inode, furthermore, if the inode is a directory inode, it seems like a hard link directory, any *potential deadlock* with that? and how about? 1. open("/mnt/anDroid", O_EXCL|O_CREAT) --- inode A 2. close("/mnt/anDroid") --- still positive, referenced 3. open("/mnt/Android", 0) --- No such file or directory 4. open("/mnt-ci/Android", 0) --- positive, inode A 5. close("/mnt-ci/Android") --- still positive, referenced 6. open("/mnt/android", O_EXCL|O_CREAT) --- inode B 7. unlink("/mnt/anDroid") --- expected behavior? since we have 2) and 5) referenced, can the inode finally be evicted? --- 8. open("/mnt-ci/Android", 0) --- ??? positive and inode B? --- > file that was created by the first open(). It doesn't create a new > file. > > Following on, the 4th operation (file creation) *must fail* because > there is a CI name collision with /mnt-ci/anDroid. The same is true for > the final case. > >> I could give another example, if we uses case-insentive ext4 and create >> "Android" and "anDroid", how to deal with the case in the >> case-insensitive way? >> I mean in that case we should make both "Android" and "anDroid" can >> access, right? > Not sure if I follow you here, but I'm assuming we create Android and > anDroid in the sensitive mountpoint, because, otherwise the > second file creation in the insensitive mountpoint would fail. > > This is the case where I'm hiding one of the previously (CS) created > files, when in the insensitive mountpoint, and the user is shooting > himself. For the sensitive case, Both stays visible to the user. As I mentioned above.... 1. open("/mnt/anDroid", O_EXCL|O_CREAT) 2. open("/mnt/Android", O_EXCL|O_CREAT) 3. open("/mnt-ci/Android") --- exactly "Android" 4. open("/mnt-ci/anDroid") --- exactly "anDroid" 5. open("/mnt-ci/ANDROID") --- "Android" or "anDroid", do a case-insensitive lookup. 6. unlink("/mnt-ci/Android") --- exactly "Android" 7. unlink("/mnt-ci/ANDROID") --- "anDroid", do a case-insensitive lookup. >> I think we need to build a special case-sensitive dcache rather than >> a case-insensitive dcache following the native case-insentive fs(use >> d_add_ci, d_compare and d_hash, eg. fat, ntfs...) > What do you think about the second part of my proposal, where I mention > dealing differently with negative dentries created by a CI lookup? > We don't need to ignore them if we can invalidate them after a creation > in the directory. I feel some complex of your second part... I still need some time to look into that... and I think it is useless of negative dentries for that case intuitively..... > >> Finally, I agree "let the user shot herself in the foot by having two >> files with the exact CI name", but I think it could not the VFS >> _busniess_ itself since each customer solution "case-sensitive ext4 -> >> case-insensitive lookup" has their _perfered_ way (for example, >> "android" and "Android" exist, A perfers android and B perfers Android. > I don't see how we could defer the decision to the filesystem, that's a > pretty good problem, which I don't have a solution right now. > >> Finally, I think for optmization, ext4 or other fs could add some dir >> inode _tag_ and supports native case-insensitive for these dirs could be >> better.... > Agreed. But I'm seeing this as outside the scope of my proposal, since it > is specific to each filesystem. My ext4 adaptation, for instance, falls > back to linear search when it can't find the exact match. > > Thanks, > I could miss something important, if I recall, I will reply in the future.... Thanks,
Gao Xiang <hsiangkao@aol.com> writes: > Hi Gabriel, > > Thanks for your reply. > > On 2018/2/13 3:56, Gabriel Krisman Bertazi wrote: >> Gao Xiang <gaoxiang25@huawei.com> writes: >> >>> Could I express my opinion? I have working on case-insensitive sdcardfs >>> for months. >> Hi Gao, >> >> Thanks for helping out with this topic. >> >>> I think your problem is how we optimise a case-insensitive lookup on the >>> file system with a case-sensitive dcache (I mean d_add and no d_compare >>> and d_hash). >> Are d_compare and d_hash to be considered really disruptive >> performance-wise? Even if they are only used when casefold/encoding >> support is enabled? I don't see how we could better use the dcache >> without at least requiring these functions to handle CI cases. > I mean if both "Android" and "anDroid" exist in the same directory of > on-disk ext4, > I could tend to make both "Android" and "anDroid" accessed by > the case-exact name lookup, otherwise do a case-insensitive > lookup. eg. (my current implementation), > 1. open("/mnt/anDroid", O_EXCL|O_CREAT) > 2. open("/mnt/Android", O_EXCL|O_CREAT) > 3. open("/mnt-ci/Android") --- exactly Android > 4. open("/mnt-ci/anDroid") --- exactly anDroid > 5. open("/mnt-ci/ANDROID") --- Android or anDroid, do a case-insensitive > lookup. > 6. unlink("/mnt-ci/Android") --- exactly Android > 7. unlink("/mnt-ci/ANDROID") --- anDroid, do a case-insensitive lookup. Hmm, let me correct what I said before. My code is not fetching the exact-match in this case, it always returns the result of a CI lookup. Still, I adapted it last night to return the exact-case file, by touching the lookup hook itself, which I believe allows it to be made a filesystem decision, despite being much more expensive. Apart from that, can we revisit this design decision ? It feels a bit awkward that we can change the file returned by the lookup by doing an unlink() or creat() on the other mountpoint. $ echo foo > /mnt/bla $ cat /mnt-ci/BLA foo $ echo bar > /mnt/BLA $ cat /mnt-ci/BLA <--- Same lookup has different result now bar This seems awfully confusing. Maybe that's the nature of CI, but, wouldn't entirely hiding the duplicated file be less confusing? Is there any benefit or use case that requires this behavior? >>> In that case, we could not trust the negative dentry when _creating_ a >>> case-insensitive file, for example: >>> there exists "anDroid" on-disk, but ext4's in-memory dcache only has >>> the negative "Android", if we lookup "Android" we will get the >>> _negative_ dentry, but we _cannot_ create it since "anDroid" exists on >>> disk. In the create case, an on-disk _iterate_ (or readdir) is >>> necessary. >> In my previous email, I mentioned my current implementation ignores >> negative dentries and forces a ->lookup(), which walks over the disk >> entries. (I had to add a fix to the creation path in the vfs-ms_casefold >> branch to exactly match that description, so you might have missed the >> updated version in that branch). > I am sorry, I haven't looked into your CI patch yet. > > I decided to join in this topic because in the past months, I did some work > on the case-insensitive sdcardfs, and I just want to express my thoughts > for this topic. > > I don't know which level negative dentries you ignore, your > case-insensitive negative dentries > or the original underlayfs(eg. ext4) case-sensitive negative dentries? I'm talking about the existing case-sensitive negative dentries. I'm not adding any dentries other than the ones created by the underlayfs > Actually I observed some false "negative dentries" race or deadlock > if multiple case-insensitive mounts exist and do fs-ops on these mounts > at the same time, > but I am not sure whether your implementation has these potential issues > in principle or not. Do you have a test case? I didn't observe that, but I might not be stressing my code enough. > > In addition, quote ` > Our customer is interested in exposing a subtree of an existing > filesystem (native Linux filesystems, xfs, ext4 and others) in an > case-insensitive lookup manner, without paying the cost of a userspace > getdents implementation, and, preferably, without requiring the user to > modify data on the disk.` > > I think the *most expensive* operation for case-insensitive lookup is to > create a not-exist file rather than do a existed-file lookup. > It takes much overhead and I think no straightforward way to directly > reduce the cost > and improve its performance. > >> >> Either way, this case is supported like this: >> >> If we have two bind-mounts of the same directory, /mnt and /mnt-ci, >> case-sensitive and case-insensitive, respectively, We can do: >> >> open("/mnt/anDroid", O_EXCL|O_CREAT) = 3 >> open("/mnt/Android", 0) = -2 No such file or directory >> open("/mnt-ci/Android", 0) = 4 >> open("/mnt-ci/Android", O_EXCL|O_CREAT) = -17 File exists >> open("/mnt-ci/AndROID", O_EXCL|O_CREAT) = -17 File exists >> >> The second open() is expected to create an negative_dentry of "Android", >> which, if it wasn't ignored by the 3th open(), the CI operation would >> have failed. Notice that the 3th open() operation actually opens the > Yes, the 3th open() should invalidate the 2nd negative dentry. > However, I don't know your detailed implementation behavior, for example > as follows: > > 1. open("/mnt/anDroid", O_EXCL|O_CREAT) > 2. open("/mnt/Android", 0) ---- dentry A > 3. open("/mnt-ci/Android", 0) ---- CI-dentry B or the same > dentry A? > > if 2 and 3 are the different denties but the same inode, furthermore, > if the inode is a directory inode, it seems like a hard link directory, > any *potential deadlock* with that? 2 and 3 are the same dentry, not a duplicate or anything else. The point is to avoid dealing with multiple dentries kind of complexity. > and how about? > 1. open("/mnt/anDroid", O_EXCL|O_CREAT) --- inode A > 2. close("/mnt/anDroid") --- still positive, referenced > 3. open("/mnt/Android", 0) --- No such file or directory > 4. open("/mnt-ci/Android", 0) --- positive, inode A > 5. close("/mnt-ci/Android") --- still positive, referenced > 6. open("/mnt/android", O_EXCL|O_CREAT) --- inode B > 7. unlink("/mnt/anDroid") --- expected behavior? since we have 2) and > 5) referenced, can the inode finally be evicted? --- > 8. open("/mnt-ci/Android", 0) --- ??? positive and inode B? --- If I understand correctly, since the dentry recovered by 1 and 4 are the same, 7 should unlink /mnt/anDroid and make it negative, since there are no more references. Since /mnt/anDroid no longer exists we are sure that step 8. when accessing /mnt-ci/Android will return inode B, since that is the only one remaining. I think this is it. >> file that was created by the first open(). It doesn't create a new >> file. >> >> Following on, the 4th operation (file creation) *must fail* because >> there is a CI name collision with /mnt-ci/anDroid. The same is true for >> the final case. >> >>> I could give another example, if we uses case-insentive ext4 and create >>> "Android" and "anDroid", how to deal with the case in the >>> case-insensitive way? >>> I mean in that case we should make both "Android" and "anDroid" can >>> access, right? >> Not sure if I follow you here, but I'm assuming we create Android and >> anDroid in the sensitive mountpoint, because, otherwise the >> second file creation in the insensitive mountpoint would fail. >> >> This is the case where I'm hiding one of the previously (CS) created >> files, when in the insensitive mountpoint, and the user is shooting >> himself. For the sensitive case, Both stays visible to the user. > As I mentioned above.... > 1. open("/mnt/anDroid", O_EXCL|O_CREAT) > 2. open("/mnt/Android", O_EXCL|O_CREAT) > 3. open("/mnt-ci/Android") --- exactly "Android" > 4. open("/mnt-ci/anDroid") --- exactly "anDroid" > 5. open("/mnt-ci/ANDROID") --- "Android" or "anDroid", do a > case-insensitive lookup. > 6. unlink("/mnt-ci/Android") --- exactly "Android" > 7. unlink("/mnt-ci/ANDROID") --- "anDroid", do a case-insensitive > lookup. This depends on my question about the behavior of doing an exact name lookup. > >>> I think we need to build a special case-sensitive dcache rather than >>> a case-insensitive dcache following the native case-insentive fs(use >>> d_add_ci, d_compare and d_hash, eg. fat, ntfs...) >> What do you think about the second part of my proposal, where I mention >> dealing differently with negative dentries created by a CI lookup? >> We don't need to ignore them if we can invalidate them after a creation >> in the directory. > I feel some complex of your second part... I still need some time to > look into that... > and I think it is useless of negative dentries for that case > intuitively..... >> >>> Finally, I agree "let the user shot herself in the foot by having two >>> files with the exact CI name", but I think it could not the VFS >>> _busniess_ itself since each customer solution "case-sensitive ext4 -> >>> case-insensitive lookup" has their _perfered_ way (for example, >>> "android" and "Android" exist, A perfers android and B perfers Android. >> I don't see how we could defer the decision to the filesystem, that's a >> pretty good problem, which I don't have a solution right now. >> >>> Finally, I think for optmization, ext4 or other fs could add some dir >>> inode _tag_ and supports native case-insensitive for these dirs could be >>> better.... >> Agreed. But I'm seeing this as outside the scope of my proposal, since it >> is specific to each filesystem. My ext4 adaptation, for instance, falls >> back to linear search when it can't find the exact match. >> >> Thanks, >> > I could miss something important, if I recall, I will reply in the > future.... > > Thanks,
Hi Gabriel, On 2018/2/14 6:20, Gabriel Krisman Bertazi wrote: > Gao Xiang <hsiangkao@aol.com> writes: > >> Hi Gabriel, >> >> Thanks for your reply. >> >> On 2018/2/13 3:56, Gabriel Krisman Bertazi wrote: >>> Gao Xiang <gaoxiang25@huawei.com> writes: >>> >>>> Could I express my opinion? I have working on case-insensitive sdcardfs >>>> for months. >>> Hi Gao, >>> >>> Thanks for helping out with this topic. >>> >>>> I think your problem is how we optimise a case-insensitive lookup on the >>>> file system with a case-sensitive dcache (I mean d_add and no d_compare >>>> and d_hash). >>> Are d_compare and d_hash to be considered really disruptive >>> performance-wise? Even if they are only used when casefold/encoding >>> support is enabled? I don't see how we could better use the dcache >>> without at least requiring these functions to handle CI cases. >> I mean if both "Android" and "anDroid" exist in the same directory of >> on-disk ext4, >> I could tend to make both "Android" and "anDroid" accessed by >> the case-exact name lookup, otherwise do a case-insensitive >> lookup. eg. (my current implementation), >> 1. open("/mnt/anDroid", O_EXCL|O_CREAT) >> 2. open("/mnt/Android", O_EXCL|O_CREAT) >> 3. open("/mnt-ci/Android") --- exactly Android >> 4. open("/mnt-ci/anDroid") --- exactly anDroid >> 5. open("/mnt-ci/ANDROID") --- Android or anDroid, do a case-insensitive >> lookup. >> 6. unlink("/mnt-ci/Android") --- exactly Android >> 7. unlink("/mnt-ci/ANDROID") --- anDroid, do a case-insensitive lookup. > Hmm, let me correct what I said before. My code is not fetching the > exact-match in this case, it always returns the result of a CI lookup. > > Still, I adapted it last night to return the exact-case file, by > touching the lookup hook itself, which I believe allows it to be made a > filesystem decision, despite being much more expensive. I don't think it is a purely filesystem decision. CI selection depends on the users...(eg. some users perfer the smallest alphabet order, some users select CI file by time, such as the oldest one)... IMO, it behaves as a file system filter users could select by themselves rather than the part of VFS or filesystems. > Apart from that, can we revisit this design decision ? It feels a bit > awkward that we can change the file returned by the lookup by doing an > unlink() or creat() on the other mountpoint. > > $ echo foo > /mnt/bla > $ cat /mnt-ci/BLA > foo > $ echo bar > /mnt/BLA > $ cat /mnt-ci/BLA <--- Same lookup has different result now > bar > > This seems awfully confusing. Maybe that's the nature of CI, but, > wouldn't entirely hiding the duplicated file be less confusing? Is > there any benefit or use case that requires this behavior? It seems no confusing... bacause the user creates a new case-exact "/mnt/BLA", and the first "cat /mnt-ci/BLA" and the second "cat /mnt-ci/BLA" are not the same fd. If the exact-case file exist, I think we should *give a way* to get these files by the exact-case names on the CI mountpoint, and that is why I said "a case-insensitive lookup on the file system with a case-sensitive dcache", since it can create/rename a case-exact file on the original case-sensitive mnt. BTW, If you feel confused, how about the following? 1) echo foo > /mnt/bla 2) echo bar > /mnt/BLA 3) cat /mnt-ci/bla --- assume it is "/mnt/bla" 4) rm /mnt/bla 3) cat /mnt-ci/bla ---> it will be "/mnt/BLA" because the user operates /mnt I don't think these two have too much difference... > >>>> In that case, we could not trust the negative dentry when _creating_ a >>>> case-insensitive file, for example: >>>> there exists "anDroid" on-disk, but ext4's in-memory dcache only has >>>> the negative "Android", if we lookup "Android" we will get the >>>> _negative_ dentry, but we _cannot_ create it since "anDroid" exists on >>>> disk. In the create case, an on-disk _iterate_ (or readdir) is >>>> necessary. >>> In my previous email, I mentioned my current implementation ignores >>> negative dentries and forces a ->lookup(), which walks over the disk >>> entries. (I had to add a fix to the creation path in the vfs-ms_casefold >>> branch to exactly match that description, so you might have missed the >>> updated version in that branch). >> I am sorry, I haven't looked into your CI patch yet. >> >> I decided to join in this topic because in the past months, I did some work >> on the case-insensitive sdcardfs, and I just want to express my thoughts >> for this topic. >> >> I don't know which level negative dentries you ignore, your >> case-insensitive negative dentries >> or the original underlayfs(eg. ext4) case-sensitive negative dentries? > I'm talking about the existing case-sensitive negative dentries. I'm > not adding any dentries other than the ones created by the underlayfs OK, if you don't create duplicated dentries, I don't think it has the following two potential issues. I have no more question at the moment, and I need to look into your detailed implementation first. Thanks for your reply. Thanks, >> Actually I observed some false "negative dentries" race or deadlock >> if multiple case-insensitive mounts exist and do fs-ops on these mounts >> at the same time, >> but I am not sure whether your implementation has these potential issues >> in principle or not. > Do you have a test case? I didn't observe that, but I might not be > stressing my code enough. > >> In addition, quote ` >> Our customer is interested in exposing a subtree of an existing >> filesystem (native Linux filesystems, xfs, ext4 and others) in an >> case-insensitive lookup manner, without paying the cost of a userspace >> getdents implementation, and, preferably, without requiring the user to >> modify data on the disk.` >> >> I think the *most expensive* operation for case-insensitive lookup is to >> create a not-exist file rather than do a existed-file lookup. >> It takes much overhead and I think no straightforward way to directly >> reduce the cost >> and improve its performance. >> >>> Either way, this case is supported like this: >>> >>> If we have two bind-mounts of the same directory, /mnt and /mnt-ci, >>> case-sensitive and case-insensitive, respectively, We can do: >>> >>> open("/mnt/anDroid", O_EXCL|O_CREAT) = 3 >>> open("/mnt/Android", 0) = -2 No such file or directory >>> open("/mnt-ci/Android", 0) = 4 >>> open("/mnt-ci/Android", O_EXCL|O_CREAT) = -17 File exists >>> open("/mnt-ci/AndROID", O_EXCL|O_CREAT) = -17 File exists >>> >>> The second open() is expected to create an negative_dentry of "Android", >>> which, if it wasn't ignored by the 3th open(), the CI operation would >>> have failed. Notice that the 3th open() operation actually opens the >> Yes, the 3th open() should invalidate the 2nd negative dentry. >> However, I don't know your detailed implementation behavior, for example >> as follows: >> >> 1. open("/mnt/anDroid", O_EXCL|O_CREAT) >> 2. open("/mnt/Android", 0) ---- dentry A >> 3. open("/mnt-ci/Android", 0) ---- CI-dentry B or the same >> dentry A? >> >> if 2 and 3 are the different denties but the same inode, furthermore, >> if the inode is a directory inode, it seems like a hard link directory, >> any *potential deadlock* with that? > 2 and 3 are the same dentry, not a duplicate or anything else. The > point is to avoid dealing with multiple dentries kind of complexity. > > >> and how about? >> 1. open("/mnt/anDroid", O_EXCL|O_CREAT) --- inode A >> 2. close("/mnt/anDroid") --- still positive, referenced >> 3. open("/mnt/Android", 0) --- No such file or directory >> 4. open("/mnt-ci/Android", 0) --- positive, inode A >> 5. close("/mnt-ci/Android") --- still positive, referenced >> 6. open("/mnt/android", O_EXCL|O_CREAT) --- inode B >> 7. unlink("/mnt/anDroid") --- expected behavior? since we have 2) and >> 5) referenced, can the inode finally be evicted? --- >> 8. open("/mnt-ci/Android", 0) --- ??? positive and inode B? --- > If I understand correctly, since the dentry recovered by 1 and 4 are the > same, 7 should unlink /mnt/anDroid and make it negative, since there are > no more references. > > Since /mnt/anDroid no longer exists we are sure that step 8. when > accessing /mnt-ci/Android will return inode B, since that is the only > one remaining. I think this is it. > >>> file that was created by the first open(). It doesn't create a new >>> file. >>> >>> Following on, the 4th operation (file creation) *must fail* because >>> there is a CI name collision with /mnt-ci/anDroid. The same is true for >>> the final case. >>> >>>> I could give another example, if we uses case-insentive ext4 and create >>>> "Android" and "anDroid", how to deal with the case in the >>>> case-insensitive way? >>>> I mean in that case we should make both "Android" and "anDroid" can >>>> access, right? >>> Not sure if I follow you here, but I'm assuming we create Android and >>> anDroid in the sensitive mountpoint, because, otherwise the >>> second file creation in the insensitive mountpoint would fail. >>> >>> This is the case where I'm hiding one of the previously (CS) created >>> files, when in the insensitive mountpoint, and the user is shooting >>> himself. For the sensitive case, Both stays visible to the user. >> As I mentioned above.... >> 1. open("/mnt/anDroid", O_EXCL|O_CREAT) >> 2. open("/mnt/Android", O_EXCL|O_CREAT) >> 3. open("/mnt-ci/Android") --- exactly "Android" >> 4. open("/mnt-ci/anDroid") --- exactly "anDroid" >> 5. open("/mnt-ci/ANDROID") --- "Android" or "anDroid", do a >> case-insensitive lookup. >> 6. unlink("/mnt-ci/Android") --- exactly "Android" >> 7. unlink("/mnt-ci/ANDROID") --- "anDroid", do a case-insensitive >> lookup. > This depends on my question about the behavior of doing an exact name > lookup. > >>>> I think we need to build a special case-sensitive dcache rather than >>>> a case-insensitive dcache following the native case-insentive fs(use >>>> d_add_ci, d_compare and d_hash, eg. fat, ntfs...) >>> What do you think about the second part of my proposal, where I mention >>> dealing differently with negative dentries created by a CI lookup? >>> We don't need to ignore them if we can invalidate them after a creation >>> in the directory. >> I feel some complex of your second part... I still need some time to >> look into that... >> and I think it is useless of negative dentries for that case >> intuitively..... >>>> Finally, I agree "let the user shot herself in the foot by having two >>>> files with the exact CI name", but I think it could not the VFS >>>> _busniess_ itself since each customer solution "case-sensitive ext4 -> >>>> case-insensitive lookup" has their _perfered_ way (for example, >>>> "android" and "Android" exist, A perfers android and B perfers Android. >>> I don't see how we could defer the decision to the filesystem, that's a >>> pretty good problem, which I don't have a solution right now. >>> >>>> Finally, I think for optmization, ext4 or other fs could add some dir >>>> inode _tag_ and supports native case-insensitive for these dirs could be >>>> better.... >>> Agreed. But I'm seeing this as outside the scope of my proposal, since it >>> is specific to each filesystem. My ext4 adaptation, for instance, falls >>> back to linear search when it can't find the exact match. >>> >>> Thanks, >>> >> I could miss something important, if I recall, I will reply in the >> future.... >> >> Thanks,
On Fri, Jan 26, 2018 at 02:52:28AM +0000, Gaoxiang (OS) wrote: > > Sorry for bother, I mean you refer to sdcardfs, > The current Samsung sdcardfs and esdfs are all fsstress broken for many issues, > However we developed an another approach last year, in > > https://android-review.googlesource.com/#/c/kernel/common/+/574402/3/Documentation/filesystems/hwsdcardfs.txt > > We applied it on HUAWEI Mate 10 (pro), and the complete solution is > > https://android-review.googlesource.com/#/q/status:open+project:kernel/common+branch:android-4.4+topic:huawei_sdcardfs > > It supports stackable case-insensitive without modifing VFS smoothly, > and we tested fsstress on the top and lower layers at the same time > without any problem. Any specific reason that filesystem code has not been submitted here for inclusion upstream? It seems like all of the various sdcardfs forks/rewrites need to be standardized upstream first to allow everyone to benifit, instead of all of the duplicated efforts that are happening right now :( By getting yours, or someone's, code merged properly, that should reduce the amount of duplication here. thanks, greg k-h