Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why wouldn't it to be easy to implement? You could probably make a global LD_PRELOAD library that hooks into all open() CREAT calls and tags the inode's xattr with the program that created the file. You could use an attribute like, user.created_by.


Why such a hack? This is a solved problem already. Try Linux audit API. You can set it up for tracking various actions, and you'll get a log with all the information including a process name.

https://linux-audit.com/configuring-and-auditing-linux-syste...


I believe the question was asking for a method of storing the metadata, I was just suggesting xattrs. This will work at the expense of running a daemon.


Can audit be configured to only log file creation, not modification or access? Last time I didn't find a way to do that and I don't want to write a log file entry for every file access.


You can at least ignore read events, not sure about separating write from creation.


Programs can use programs like 'touch' to create the files so you'd have to get meta into a chain of process ownership or something while exempting the user's shell.


Have you seen actual programs (not shell scripts) that do that? I've heard of code smells, but that's a programmer smell with there.


I think touch is a generic example, but not a practical one. The idea is that program A can launch program B, and you'd have to recursively search for the parent. But it's not easy to find which parent is the true owner, maybe an exception can be made for init but shells can't have that exception as they are sometimes the owner.


Couldn't you just record the creating process and all ancestors?


You're right. I can see anything like scripts getting difficult. It wouldn't work for all use cases.


MacOS does something sorta like this. It uses extended attributes to mark where a file came from in some cases, such as when an executable is downloaded from a browser, it will mark it as "unsafe" and when you attempt to run it, this causes it to fail to run and you have to whitelist it via security preferences.


Adding an interface to access the tag store won't be very hard.

It would be harder to make it so that all the tag-oblivious programs, like mv or vi, to say nothing of rsync, would preserve the attributes when moving or modifying a file.


Sounds like you might need to hook rename() if you'd like to add that feature too, but I would still say it's straightforward.


The way most editors save a file is like this:

  f = open("the_file.new")
  write(f, new_contents);
  close(f);
  rename("the_file", "the_file~");
  rename("the_file.new", "the_file");
So the old file is never modified, it's renamed to the backup copy, and an entirely new file is created to take its place.

This has a number of advantages, but does not play well with any extended info the old file used to have, unless they are copied explicitly. And in a tagging-oblivious program, they won't be.


I see a lot of saves work like this:

    f = open( "the_file.new.$$" );
    write( f, new_contents );
    close( f );
    rename( "the_file.new.$$", "the_file.new" )
If not implemented like this, then something that attempts to read "the_file.new" may get partial contents, normally truncated along the file system block size.

Backups often take place, too, like you mentioned, depending on editor.


If you do not rename files while holding their open file descriptors there's no guarantee that you are renaming the file you just wrote to.


There is no guarantee of that if one does retain the open file descriptor.


before rename:

fstat the fd, get st_dev and st_ino.

rename

stat the new name. Compare st_dev and st_ino.

If the value matches, you renamed the right file. If it does not match, you renamed a wrong file. Without holding the fd, it is impossible to know if it is the right file.


"you renamed a wrong file" shows that "there's no guarantee that you are renaming the file you just wrote to".


In this case you know that you renamed a wrong file. In the close before rename you do not know that you have renamed a wrong file.


Knowing that something bad happened after the fact is not the same as your idea that there's a guarantee that something bad will not happen.


Thanks for teaching me this! I wasn't aware. So, the original xattr (and the program that created it) would be deleted and replaced by vi for example. Still technically right, but I'm unsure how you'd keep a history.


Sidecar files might work, and would also be filesystem-portable.


Please no. This article is literally "Dotfile madness", don't make it worse!

I have to deal already with these on file shares, specifically for Apple: .DS_Store, .Trashes and .AppleDouble, or for Windows: Thumbs.db, $RECYCLE.BIN (for some reason Windows sometimes ignores the fact I've disabled the recycle bin on a share and creates this instead) and desktop.ini. Please don't drop crap around directories where there exist a multitude of tidier alternatives.


How do you have to “deal with” them? What are the portable tidier alternatives?


Because when you have multiple OSs accessing the same shared folder, they all create their own crap, which is then visible to the other OSs, and fills up directories with stuff that confuses normal users.


Normal users don’t run multiple OSes.


I don't run multiple OSes, but I do use USB sticks and SD cards on other people's computers and vice versa. I'm pretty sure that is "normal user" behaviour.

It gets annoying real quickly because you see the "crap" files of all the OSes you're not using. And you delete them, you only have to insert the stick and they're back. This happens in different ways for all of Mac, Windows, Android and certain Linuxes.


Some users however access our file shares from their Macs, or their Windows PCs, or even some from their Linux machines. That's fairly normal.


A daemon with a CLI and a programmatic interface, backed by a SQLite store, hooked into Linux audit and perhaps an LD_PRELOAD?


For me, mostly exclude them from select all, or perhaps scroll past them to see meaningful files.


Sidecar files for storing file tags make querying the tag system a huge chore; it totally kills performance. I understand the conceptual appeal but it's just not the way to go.


This is what Mac does to track download locations, if I remember correctly (._file).


How do you sleep at night?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: