[!WARNING] This is a BETA release! Breaking changes are possible.
This Emacs library provides two things:
Truenames are useful as a way to de-duplicate file lists and to cross-reference names in one list with names in another list.
But if you write code that just wraps every file name it encounters in ‘(file-truename FILE)’, it gets slow if you have large lists of file names. It takes 1,000 milliseconds to process 1,000 file names on my machine.
That is unacceptable, at least in the use-case where you often scan a list of directories to see if any new files have appeared or any files were modified or deleted.
That’s the sort of thing that might be done as part of a user command. If the command is to be pleasant to use, it must take less than 100 milliseconds so it feels "instant". And you may be dealing with not 1,000 but 10,000 or even 100,000 files.
Sidenote for Elisp devs: It might occur to you that you can also de-dup by filesystem inodes. See Appendix: On referring to inodes instead of truenames.
The routine ‘truename-cache-collect-files-and-attributes’ can be used to merge multiple file lists and return de-duplicated truenames.
Why? See some example file-lists in Emacs that may overlap a lot:
Even if you ‘append’ and ‘seq-uniq’ these lists, a given file may still be represented multiple times under different names.
To merge, pass all your file-lists in the argument ‘:infer-dirs-from’. In truth, it doesn’t operate directly on any of the files given, it just infers their parent directories and then scans each directory once. That turns out to be efficient, even if it’s likely to pull in more unique files than were mentioned by any name in the input.
While you could simply let ‘truename-cache-collect-files-and-attributes’ return a giant file list and filter it afterwards, there are two reasons to do some filtering through the arguments ‘:relative-file-deny’, ‘:relative-dir-deny’ and/or ‘:full-dir-deny’ (which take lists of regular expressions).
It can easily make the difference between a runtime of 2.00 seconds and 0.02 seconds! That is what happens inside my ‘~/.emacs.d/’ when I prevent recursion into ‘elpa/’, ‘elpaca/’ and ‘.git/’.
That’s why it provides ‘:relative-file-deny’, ‘:relative-dir-deny’. Another bottleneck dodged.
Sometimes you do not want a true name, but a name abbreviated with ‘abbreviate-file-name’. For one thing, it’s just preferable to present such names to the user, but for another, that’s what will match the confusingly named buffer-local variable ‘buffer-file-truename’ – the actual truename will not.
(Even more confusingly, the function ‘get-truename-buffer’ needs the actual truename…)
But ‘abbreviate-file-name’ is another thing that can consume much of our aforementioned 100 millisecond budget, all by itself.
So ‘truename-cache-collect-files-and-attributes’ can pre-abbreviate names for you with the argument ‘:abbrev 'full’.
This does it slightly more efficiently (informal benchmark: 50-75% of normal runtime), and much more if you also pass ‘:local-name-handlers nil’ (informal benchmark: 20% of normal runtime).
[!TIP] For those of you who roll your own code, you can get the same effect by using a copy-pasted definition of consult–fast-abbreviate-file-name or come close by just let-binding ‘file-name-handlers-alist’ to nil.
In that case, this library only sets itself apart from your solution by the fact it falls back on ‘:remote-name-handlers’ if remote names are encountered, in case that is needed for correctness.
I have a theory that if de-dup is all you want, it would be possible by making use of the Emacs 29 function ‘file-attribute-file-identifier’.
I’ve not tried that. However, the truename-based method brings some upsides.
The assumption that they are true leads to other safe assumptions, such as that an alphabetic sort automatically groups by directory.
You can use trivial string comparisons like ‘string-prefix-p’ in place of ‘file-in-directory-p’, saving performance (one is ~10,000x slower than the other).
Example use-case: org-node–root-dirs, which takes shortcuts because it knows the input is all truenames.