Why wouldn't it to be easy to implement? You could probably make a global LD_PRELOAD library that hooks into all open() CREAT calls and tags the inode's xattr with the program that created the file. You could use an attribute like, user.created_by.
Why such a hack? This is a solved problem already. Try Linux audit API. You can set it up for tracking various actions, and you'll get a log with all the information including a process name.
I believe the question was asking for a method of storing the metadata, I was just suggesting xattrs. This will work at the expense of running a daemon.
Can audit be configured to only log file creation, not modification or access? Last time I didn't find a way to do that and I don't want to write a log file entry for every file access.
Programs can use programs like 'touch' to create the files so you'd have to get meta into a chain of process ownership or something while exempting the user's shell.
I think touch is a generic example, but not a practical one. The idea is that program A can launch program B, and you'd have to recursively search for the parent. But it's not easy to find which parent is the true owner, maybe an exception can be made for init but shells can't have that exception as they are sometimes the owner.
MacOS does something sorta like this. It uses extended attributes to mark where a file came from in some cases, such as when an executable is downloaded from a browser, it will mark it as "unsafe" and when you attempt to run it, this causes it to fail to run and you have to whitelist it via security preferences.
Adding an interface to access the tag store won't be very hard.
It would be harder to make it so that all the tag-oblivious programs, like mv or vi, to say nothing of rsync, would preserve the attributes when moving or modifying a file.
f = open("the_file.new")
write(f, new_contents);
close(f);
rename("the_file", "the_file~");
rename("the_file.new", "the_file");
So the old file is never modified, it's renamed to the backup copy, and an entirely new file is created to take its place.
This has a number of advantages, but does not play well with any extended info the old file used to have, unless they are copied explicitly. And in a tagging-oblivious program, they won't be.
f = open( "the_file.new.$$" );
write( f, new_contents );
close( f );
rename( "the_file.new.$$", "the_file.new" )
If not implemented like this, then something that attempts to read "the_file.new" may get partial contents, normally truncated along the file system block size.
Backups often take place, too, like you mentioned, depending on editor.
If the value matches, you renamed the right file. If it does not match, you renamed a wrong file. Without holding the fd, it is impossible to know if it is the right file.
Thanks for teaching me this! I wasn't aware. So, the original xattr (and the program that created it) would be deleted and replaced by vi for example. Still technically right, but I'm unsure how you'd keep a history.
Please no. This article is literally "Dotfile madness", don't make it worse!
I have to deal already with these on file shares, specifically for Apple: .DS_Store, .Trashes and .AppleDouble, or for Windows: Thumbs.db, $RECYCLE.BIN (for some reason Windows sometimes ignores the fact I've disabled the recycle bin on a share and creates this instead) and desktop.ini. Please don't drop crap around directories where there exist a multitude of tidier alternatives.
Because when you have multiple OSs accessing the same shared folder, they all create their own crap, which is then visible to the other OSs, and fills up directories with stuff that confuses normal users.
I don't run multiple OSes, but I do use USB sticks and SD cards on other people's computers and vice versa. I'm pretty sure that is "normal user" behaviour.
It gets annoying real quickly because you see the "crap" files of all the OSes you're not using. And you delete them, you only have to insert the stick and they're back. This happens in different ways for all of Mac, Windows, Android and certain Linuxes.
Sidecar files for storing file tags make querying the tag system a huge chore; it totally kills performance. I understand the conceptual appeal but it's just not the way to go.