Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

According to GitHub, the totals are:

backendA: 11 files, 1 directory, 799 lines (676 sloc), 23.56KB

backendB: 23 files, 5 directories, 1578 lines (1306 sloc), 42.26KB

It's approximately twice as big for the same functionality, and I had to spend a lot more time "digging" through the second one to get an overall idea of how everything works. Jumping around between lots of tiny files is a big waste of time and overhead, and one of the pet peeves I have with a lot of how "modern" software is organised. If you believe that the number of bugs is directly proportional to the number of lines of code, thus "less code, fewer bugs", then backendA is far superior.

backendB required a bit more work

I'm not surprised that it did. This experiment reminds me of the "enterprise Hello World" parodies, and although backendB isn't quite as extreme, it has some indications of going in that direction. The excessive bureaucracy of Enterprise Java (and to a lesser extent, C#) leads to even simple changes requiring lots of "threading the data" through many layers. I've worked with codebases like that before, many years ago, and don't ever wish to do it again.

I really don't get this fetish for lots of tiny files and nested directories, which seems to be a recent trend; "maintainability" is often dogmatically quoted as the reason, but when it comes time to actually do something to the code, I much prefer a few larger files in a flat structure, where I can scroll through and search, instead of jumping around lots of tiny files nested several directories deep. It might look simpler at the micro level if each file is tiny, or the functions in them are also very short, but all that means is the complexity of the system has increased at the macro level and largely become hidden in the interaction of the parts.



I really don't get this fetish for lots of tiny files and nested directories, which seems to be a recent trend;

I suspect it is the same kind of thinking that says all functions should be very small (without reference to whether each function provides a single meaningful behaviour). Locally, this keeps things relatively simple, but it ignores the global issue that now there are potentially many more connections to follow around and everything becomes less cohesive. As far as I’m aware, such research as we have available on this still tends to show worse results (in particular, higher bug frequencies) in very short and very long functions, but that doesn’t stop a lot of people from making an intuitive argument for keeping individual elements very small.

A similar issue comes up once again in designing APIs: do you go for minimal but complete, or do you also provide extra help in common cases even if it is technically redundant? The former is “cleaner”, but in practice the latter is often easier to use for those writing a client for that API. Smaller isn’t automatically better.


The book "A Philosophy of Software Design" should interest you then: https://www.amazon.com/t/dp/1732102201 It argues, among other things, that deep interfaces matter more than code complexity inside a module.


That one was the first software book I read in a while where I got to the end and felt like if I wrote a book myself then that is very close to what I would want it to say. I highly recommend it to anyone who has built up a bit of practical programming experience and wants to improve further.


> The excessive bureaucracy of Enterprise Java (and to a lesser extent, C#) leads to even simple changes requiring lots of "threading the data" through many layers. I've worked with codebases like that before, many years ago, and don't ever wish to do it again.

Yeah I tend to like something like a semantic compression approach: I'll start in a single file, and then split it into separate files organized by domain as the length of the file starts to get unwieldy. And so on into more files and later subdirectories as the program grows.

In my opinion it's much better to let the "needs of the program" dictate code and filesystem structure rather than some academic ideas about how a program should be organized. As you say, when I've worked on projects which are very strict about adopting a particular structure, a lot of time ends up being wasted figuring out how to map my intent to that structure rather than just writing the damn code.


> excessive bureaucracy

I like to call this mountain of abstractions forced on you (as opposed to coming from your domain): gratuitous object astronautics.


When we are talking about 500-1500 sloc I completely agree this kind of structure is overkill. But when dealing with medium to large codebases (anything beyond, say, 100kloc) I much prefer the second approach, bonus points if you can get a fractal-like hierarchy.

Digging through files manually (I.e. Using a mouse) is painful, but your IDE is your friend. It takes me less than 3 seconds to search and open any file of the codebase I currently work in (it has a bit more than 2k files). And having a sane hierarchy means I type the folder / file name as I remember it, and filter the search results on-demand.


Splitting things up into multiple independent translation units enables incremental compilation. One function per file is the most extreme version of this. For example:

https://git.musl-libc.org/cgit/musl/tree/src/stdio


That seems like a problem for compiler optimizers to solve, not programmers.


It's actually the domain of build systems. Splitting code into as many independent files as possible gives the build system more data to work with, allowing it to compile more parts of the program in parallel only when necessary.

If a file contains two functions and the developer changes one of them, both functions will be recompiled. If two files contain one function each, only the file with the changed function will be recompiled.

Build times increase with language power and complexity as well as the size of the project. Avoiding needless work is always a major victory.


In C++, the experience is the opposite - a "unity build", where everything is #included into a single translation unit, tends to be faster:

https://mesonbuild.com/Unity-builds.html

http://onqtam.com/programming/2018-07-07-unity-builds/

https://buffered.io/posts/the-magic-of-unity-builds/


Unity builds are useful too but they have limitations. They are equivalent to full rebuilds and can't be done in parallel. The optimizations they enable can also be achieved via link time optimization. Language features that leverage file scope can interact badly with this type of build. They require a lot of memory since the compiler reads and processes the entire source code of the project and its dependencies.

Unity builds improve compilation times because the preprocessor and compiler is invoked only once. It is most useful in projects with lots of huge dependencies that require the inclusion of complex headers. The effect is less pronounced in simpler projects and they shouldn't be necessary at all in languages that have an actual module system instead of a preprocessor: Rust, Zig.


I have to clarify here a little bit and say that it is faster on one core. If you have multiple cores, having your translation unit count in the same order of magnitude as your core count will be faster. There is a lot more redundant work going on, but the parallelism can make up for it.


> If a file contains two functions and the developer changes one of them, both functions will be recompiled. If two files contain one function each, only the file with the changed function will be recompiled.

Still sounds like a compiler problem


Multiple files also seems like a problem for IDEs to solve, not programmers.


That brings to mind an interesting idea for an IDE: having one big virtual file that you edit, which gets split into multiple physical files on disk (based on module/class/whatever). Although, thinking about it, there are some languages that would make such automatic restructuring rather difficult.


You've just described Leo - leoeditor.com - where you're effectively editing a gigantic single xml file hidden by a GUI. The structuring is only occasionally automatic - mostly manual. It has python available the way emacs has elisp.

Git conflict resolution of that single file is intractable, so I convert the representation into thousands of tiny files for git, which I reassemble into the xml for Leo.


Yes! Why can't OOP language editors (IDE) simply represent the source code of classes, interfaces and other type definitions as they are without even revealing anything about the files they reside in? The technical detail of source code being stored in files is mundane.


That brings to mind an interesting idea for an IDE: having one big virtual file that you edit, which gets split into multiple physical files on disk

If you're going to work with it as one big file, then what's the point of multiple physical files anyway? Just store it as one big file then.


Why store it as a (text) file at all? Why not store the code in a database? Or as binary? Then you can store metadata pertaining to the code and not just the code itself. Unreal blueprints are an interesting way of structuring code and providing a componetized api. It would be interesting if they were more closely integrated with the code itself. Then you could manipulate data flows, code and even do debugging from inside the same interface.

Yes, this is all pie in the sky stuff, but it's interesting to think about.


I have been toying with the idea to store programming projects in a single sqlite3 database butt never seen enough value to actually pursue it.

As you mentioned though, it's interesting to think about.


There’s not one perfect answer, the eye is in the beholder.

I personally have a harder time coming up to speed on things that don’t break things down into fairly small chunks. I have an easier time dealing with abstraction and would rather implementation details of what I’m looking at to be hidden until I drill in another level. IDEs make that latter part easy.

However I’ve come to realize that there’s not a one size fits all here. I’ve worked with people who are the exact opposite, and everything in between.

The best one can do is try to find the happiest medium for everyone involved and power on




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: