Kraken Technologies: How we organise our large Python monolith

CraigJPerry · on July 18, 2023

This is java circa 2009 flavoured, this model was a popular approach in the java community around that timeframe. Was enforcer the name of the maven plugin? I can’t remember now, but it became fairly popular.

I wonder if this might be susceptible to the same issues that we had in java, specifically it doesn’t prevent accidental duplication, it makes refactoring harder/slower and it’s problematic for the team when it unexpectedly doesn’t catch a misuse - and this happens because the rules enforcement can only happen effectively if the rules themselves are perfect and complete and that gets harder to guarantee with scale and speed of iteration.

For my money today, there’s another model “polylith” which is similar in some ways and as easy to understand but maybe simpler to evolve/change your mind over time.

shoo · on July 19, 2023

> prevent accidental duplication

sometimes sections of code can be syntactically duplicate at some point in time but aren't really semantically duplicate.

e.g. in terms of the example in the post, of having different client-specific logic. at times two different clients may end up with syntactically duplicate code in their corresponding client-specific layers. but unless this code reflects that some detail specific to both clients is actually subject to the same pressures and constraints, what is syntactically duplicate today may not be tomorrow when client A's code needs to change due to pressure P that does not also impact client B, so the syntactically duplicate code is no longer syntactically duplicate.

factoring out syntactically duplicate common code between things that need to change at different times for different reasons is unhelpful as it introduces unnecessary coupling.

not sure if i've understood the context of "prevent accidental duplication" but reasons for change and avoiding unnecessarily coupling is something that jumps into mind when considering multiple clients or multiple regions

skinnyarms · on July 19, 2023

Maven Enforcer does have a rule for duplicate class _names_ (which includes the entire namespace), but I think the author was more talking about it's enforcement of duplicate dependencies. For example, it will detect when you have different (transitive) versions of the same library referenced by your code or dependencies.

photonthug · on July 18, 2023

> this happens because the rules enforcement can only happen effectively if the rules themselves are perfect and complete and that gets harder to guarantee with scale and speed of iteration.

IMHO, project-level assertions about initialization-order feels a little bit dirty, but potentially practical as long as there's only a few rules. Surely even a bazillion line monolith can still often be thought of in terms of a few layers. But by the time you get to lots of rules it's getting hard to stomach the code-smell and keep justifying it as practical. At that point I'd want something dynamic and backed by a real solver.. not just a layer of static-analysis.

The deal framework for DBC doesn't quite do this afaik but is starting to expose some stuff along these lines, and at least can assert imports are "pure". https://deal.readthedocs.io/details/module_load.html

ianandrich · on July 19, 2023

Have you had any experience using deal on personal projects or professionally yet?

photonthug · on July 19, 2023

So far I'm working it into personal projects gradually. Runtime checks are at least as useful as something like pydantic's validation, and more expressive, because validation is basically "pre" but now deal can provide the "post". DBC via pre/post might be "just" syntactic sugar on asserts, but notation matters: it looks great, it reduces cognitive load. It easily does stuff that's impossible or just awkward with types, so having an alternate way to express constraints available probably keeps me out of certain rabbit holes.

Really looking forward to trying verification ( https://deal.readthedocs.io/basic/verification.html ) but I think I need to hit some critical mass of annotation first.

fijiaarone · on July 19, 2023

You can see duplication in the one example they provide —- between clients and territories.

Every client has to have duplicate modules for every territory that they operate in and if they begin to operate in another territory, that has to be copied in — or they have to keep a copy of all territory modules for every client. I think it is probably the former, since they talk about pruning and ignored imports.

bnert · on July 19, 2023

polylith ftw.

Been learning some of the idioms of a polylith architecture using the `poly` tool + clojure. Really enjoying the idioms/practices it encourages.

barefeg · on July 19, 2023

What is the current flavor in Java?

mentaltomorrow · on July 21, 2023

Spring (and all its reflections under the hood), of course.

gyulai · on July 19, 2023

I quite like layered dependency structures.

I even have a lua codebase where I've linearized the sourcecode files into a particular order. The layout looks like this:

  01_foo/  
    03_foo.lua  
    04_bar.lua  
    15_whatever.lua  
  02_bar/  
    22_whatever.lua  
    56_something.lua

It has always bothered me in codebases that they don't have a "beginning" and an "end", and therefore it's difficult for a reader to know where to start when they just want to read the codebase.

So, my approach is to impose this linear order and only allow code that comes later to depend on code that comes earlier, never the other way around. This way, you can read through the code rather easily. As you look at any piece of code, it will only depend on code you've already looked at, so you don't have to be constantly jumping through the codebase as you're reading.

I've also found that failure to correspond to a linearizable dependency structure is a "code smell", and that, as I try to eradicate that code smell, I frequently end up with code that's better in all kinds of ways.

globalreset · on July 19, 2023

That sounds like one of the things they could write about in dailywtf.

First, who has time to read a project from start to beginning. Second, as the time changes the dependencies between parts will change as well.

gyulai · on July 19, 2023

Who has time to write documentation? As the software evolves, the documentation will need changing. Who has time to write unit tests? As the modules evolve, the unit tests will need updating. Why do any of these things?

A well-written piece of code that actually solves a complex problem is a demanding mistress and doesn't appreciate the "who has time..." attitude.

Among the software engineering good practices that I engage in, I'm finding this to be one of the highest bang-for-the-buck ones.

For example, literate programming is a thing that many people believe is worth the effort, at least in certain scenarios and at least for certain parts of codebases: But, if you can make your codebase more readable without requiring additional prose to be written, then that's a comparatively high bang-for-the-buck alternative.

In practice, linear structure doesn't need all that much updating at all: You may have noticed that I don't use strictly sequential numbers. Early in the life of the codebase, I will generally skip lots of numbers, so that I can fill in the gaps with code that comes later. Mostly, all it really takes is to be intentional about where in the linear order to insert new code.

Also, note that it's still useful, even if no one ever goes on a mission to read the codebase in its entirety. If the codebase as a whole has a linear structure, that also imposes a linear structure on any subset of it. If you have five sourcecode files that are somehow pertinent to something you're doing, you immediately know the order in which to look at them.

globalreset · on July 19, 2023

Large part of biggest crimes I've seen in software comes down to picking some allegedly good property and optimizing for it irrespective of costs and limitations it involves. Encapsulation, modularity, flexibility, readability, purity, testability - you name it. Some dubious, some reasonable - no matter what, it usually involves costs and drawbacks.

I have 99 big problems developing software, and not being able to read it from start to end ain't one.

I'm a bit jealous for people advocating such things, being able to work on software tiny enough to even be able to think this is practical. Literate programming could work for tiny software that can be approximately flattened to a 1d "story": "beginning -> end" with only minor detours.

Most of projects I worked with was large enough that things are optimized precisely for not having to understand large parts that were abstracted away (and the problem is to try to do it without introducing too much complexity).

gyulai · on July 19, 2023

It's not so much about the size of the codebase, it's more about the tradeoff of how much code there is, versus how much intrinsic/real complexity.

By "real" complexity, I mean to exclude complexity for complexity's sake, or complexity that's a side effect of a piece of software being concerned with itself and its own structure rather than with solving any problem that exists outside of it. -- We can surely all agree that the latter is bad.

Academic prototypes, for example, tend to solve highly complex problems with small codebases, but the problems are often contrived or they are tiny pieces of systems that would have to be much larger to be of any use.

Software systems you encounter in your run-of-the-mill software engineering business often have huge codebases, but the large size is not due to the fact that there's any problem there of any real complexity. The system just needs to solve trivial problems, and there are just a lot of them.

But you can certainly have systems that need to solve many problems and some of those problems have high intrinsic complexity. In such a scenario, it's important to recognize that there's a part of your system that's "extra special" due to its complexity, and needs to be approached in a way that's different from how you approach most of your engineering. You might need to write a library in a literate programming style, and then use that library from your other code, which isn't written in literate style. Engineering organizations are seldom able to have that kind of flexibility. The opinion you're likely to get is "we don't do literate programming here". The outcome is that such organizations are just not capable of writing such systems, because a system is only as good as the worst job you've done at solving any of the problems it raises.

I'd also like to point out that you're guilty of a bit of a strawman argument there, because all I did was point out a technique I sometimes find useful, and you responded as if I was advocating applying it "irrespective of costs and drawbacks" by pointing out that this often ends up as "crimes against software".

banashark · on July 19, 2023

This was always one of my favorite parts of f#. It’s not only mandated for structures within a file (no hoisting), but also for files within a project.

Reading and understanding an f# code base has always been vastly more simple to me than the incredible indirection of some other codebases (trying to find where the provider for the factory of the abstract class member that’s an interface implemented by 10 almost exactly similar c# classes? Good luck /s)

samsquire · on July 19, 2023

I really like this idea, it reminds me of my "entrypoints" folder idea. The idea is you put the file where everything is registered or started in one place. Such as int main() {} and your dependency inversion container or controllers.

electroly · on July 19, 2023

In F# you have to order your source files and it sucks, hard. It's not typically considered a benefit but rather an unfortunate side effect of how F# works that you have to just live with.

gyulai · on July 19, 2023

> [...] It's not typically considered a benefit[...]

...see sibling comment, where the commenter does seem to imply they rather like it.

electroly · on July 19, 2023

I stand by my statement. Sibling comment is not typical. There's a reason that file order is rare among programming languages. Languages never decide to mandate this out of the blue when their language design doesn't otherwise require file order; it's something you only mandate if you can't implement orderless files. You get a bunch of nice properties when they're not strictly ordered, like compiling files in parallel, rebuilding an earlier file without having to rebuild a later file, etc. In F#, compilation is forced to start at the first modified file and proceed linearly to the end of the project.

OP's technique, notably, avoids these problems by numbering the files but not actually mandating at the compiler/language level that they be ordered. Here they are acting as a hint to the programmer but the files are still, as far as the language is concerned, orderless. That would seem to be a nice compromise if you like the idea of an ordered codebase. More of a literate programming technique.

gyulai · on July 19, 2023

Is that just because F# is a typed language?

My personal experience has been that it's a pain to implement this as soon as you have strong typing and try to avoid "unsafe" casts. For example, in modern Java, if you have some kind of a Container<T> over elements of type T, and a matching ContainerIterator<T>, you will want some way in each of the type declarations to refer to the other type declaration, so you need cyclical imports.

If you look at it through the lens of a dynamically-typed language like Python 3.4 or Lua, the problem just disappears. This is the case to a lesser extent even for Java before it had generics, and collections were just collections of Object, and you used "unsafe" casts all the time. I am finding this practice not the slightest bit limiting in Lua, nor do I get the feeling that the readability of my code suffers in any way at all. (Rather the contrary, as I've pointed out before).

electroly · on July 19, 2023

I wouldn't blame the static type system; this is more of a language design decision of its own. IMO, F# does it this way because ML did it this way. I think it's more of a cultural norm than a technical decision. There is a certain theoretical purity to requiring ordered declarations across a whole project.

It's valuable to compare F# against C++ here. F# and C++ both require ordered declarations within a file (while C# and Java do not), but F# requires the files themselves to be ordered, too, while C++ does not. Both are statically typed languages. C++ avoids requiring a file-level order by defining file-level compilation units and a system for importing and exporting symbols between compilation units to allow each file to be processed separately with the imports/exports resolved during a final linking phase. There's complexity involved in accomplishing this. F# has not done it; an F# project is the same as concatenating all of the source files together, and thus declarations must be ordered across files just like they are within a file.

IshKebab · on July 19, 2023

I also work on a codebase that has to have a particular order (because the compiler doesn't support any kind of modules or interfaces).

It's a nightmare. Very difficult to follow.

photonthug · on July 18, 2023

Never heard of https://import-linter.readthedocs.io/ before. Not sure if I like this type of solution, but it's interesting, and certainly the problem is real.

bitdivision · on July 18, 2023

The author, David Seddon, is also the author of the package [0]. So I'm guessing Kraken was an initial user.

0: https://import-linter.readthedocs.io/en/stable/authors.html

walthamstow · on July 19, 2023

Interesting that he doesn't blow his own horn about being the author of the linting tool. He simply describes it as 'an open source tool'

shoo · on July 19, 2023

Yeah, the problem is certainly real. I remember working with a much smaller python monorepo some years ago (probably 100x fewer python modules but the beginnings of multiple client-specific variants that depended upon a swathe of core libraries) and we were already starting to see places it would be valuable to declare constraints banning certain types of imports that broke our desired module architecture and enforce those constraints during CI without needing humans to notice the violations during code review.

mint2 · on July 19, 2023

Note, this is not the crypto kraken, it’s a different kraken for those who ignore crypto related stuff

loveparade · on July 19, 2023

Why does it matter? Just because I don't like Google doesn't mean they suck at writing Python code. The same goes for anything crypto.

ericjmorey · on July 19, 2023

People are emotional and sometimes addressing an emotional concern helps. In this case, people who are exhausted by cryptocurrency culture would be more willing to read this write-up being assured that it's not cryptocurrency related.

mint2 · on July 19, 2023

I personally almost skipped it, thinking it might be a crypto company puff piece masquerading as a python show and tell. crypto is known for puffery and it’s exhausting. But this article turned out to be not even from a crypto company, instead just having the misfortune of sharing name.

calderwoodra · on July 19, 2023

Code layers makes sense, but organizing your code in layers does not.

For example, I much prefer this:

  project/
    feature_1/
      view.py
      service.py
      model.py
    feature_2/
      view.py
      service.py
      model.py

over this:

  project/
    views/
      feature_1_view.py
      feature_2_view.py
    services/
      feature_1_service.py
      feature_2_service.py
    models/
      feature_1_models.py
      feature_2_models.py

And their import-linter library doesn't support this very well.

david-j-vujic · on July 19, 2023

Some have already mentioned Polylith, but what might not be known is that it is available for Python too (I’m the maintainer of the Python tooling).

What is mentioned in the article reminds of the Polylith Architecture, even if Polylith probably has a more simplistic view on code: a flat structure, made of something called “bricks” - that are small namespace packages. These ones are combined into features, the features are combined into apps or services.

You can choose how to deploy your artifacts - as a single Monolith, or several microservices. Since the code is not coupled to the built artifact, it isn’t a big thing to change it from one type to another.

Docs here if you want to know more: https://davidvujic.github.io/python-polylith-docs/

kgeist · on July 19, 2023

>Enforcing layering with Import Linter

We use a similar tool for PHP - deptrac, it checks that our architectural decisions about layers and modules, and their interactions, are not violated. If they're violated, the build fails.

Interestingly, we don't use similar tools for our microservices. I'm not sure whether it's an oversight or it's just unnecessary in a microservice.

febed · on July 19, 2023

IMHO the customer specific stuff should have been forked out into a separate repo instead of cluttering up the main one

dezgeg · on July 19, 2023

Hard disagree, splitting to multiple repos has a heavy cost; biggest of them the pain if you ever need to commit something to both repos at same time. Splitting the codebase to multiple repos shouldn't be done lightly.

lijok · on July 19, 2023

unusable_link · on July 19, 2023

I'm very sorry for the offtopic but how am I supposed to read the article if they covered the button to close the cookie popup by the button to subscribe to their newsletter? [0]

I tried clicking the bottom and the corner and it just doesn't work. Is it a dark pattern? Is it done deliberately to make people click on the newsletter button more?

[0] https://ibb.co/dg2krqL

nick-of-time · on July 19, 2023

I used uBlock's element zapper to kill it. Mobile Firefox supports it. To exit zapper mode oncxe you're done (I had to look this up) swipe right twice.

pxul · on July 19, 2023

Yes this bugged me too. Was able to clear the cookie pop-up by rotating my phone.