Hacker Newsnew | past | comments | ask | show | jobs | submit | moth-fuzz's commentslogin

I'm a huge fan of the 'parse, don't validate' idiom, but it feels like a bit of a hurdle to use it in C - in order to really encapsulate and avoid errors, you'd need to use opaque pointers to hidden types, which requires the use of malloc (or an object pool per-type or some other scaffolding, that would get quite repetitive after a while, but I digress).

You basically have to trade performance for correctness, whereas in a language like C++, that's the whole purpose of the constructor, which works for all kinds of memory: auto, static, dynamic, whatever.

In C, to initialize a struct without dynamic memory, you could always do the following:

    struct Name {
        const char *name;
    };

    int parse_name(const char *name, struct Name *ret) {
        if(name) {
            ret->name = name;
            return 1;
        } else {
            return 0;
        }
    }

    //in user code, *hopefully*...
    struct Name myname;
    parse_name("mothfuzz", &myname);
But then anyone could just instantiate an invalid Name without calling the parse_name function and pass it around wherever. This is very close to 'validation' type behaviour. So to get real 'parsing' behaviour, dynamic memory is required, which is off-limits for many of the kinds of projects one would use C for in the first place.

I'm very curious as to how the author resolves this, given that they say they don't use dynamic memory often. Maybe there's something I missed while reading.


You can play tricks if you’re willing to compromise on the ABI:

    typedef struct foo_ foo;
    enum { FOO_SIZE = 64 };
    foo *foo_init(void *p, size_t sz);
    void foo_destroy(foo *p);
    #define FOO_ALLOCA() \
      foo_init(alloca(FOO_SIZE), FOO_SIZE)
Implementation (size checks, etc. elided):

    struct foo_ {
        uint32_t magic;
        uint32_t val;
    };
    
    foo *foo_init(void *p, size_t sz) {
        foo *f = (foo *)p;
        f->magic = 1234;
        f->val = 0;
        return f;
    }
Caller:

    foo *f = FOO_ALLOCA();
    // Can’t see inside
    // APIs validate magic


> But then anyone could just instantiate an invalid Name without calling the parse_name function and pass it around wherever

This is nothing new in C. This problem has always existed by virtue of all struct members being public. Generally, programmers know to search the header file / documentation for constructor functions, instead of doing raw struct instantiation. Don‘t underestimate how good documentation can drive correct programming choices.

C++ is worse in this regard, as constructors don‘t really allow this pattern, since they can‘t return a None / false. The alternative is to throw an exception, which requires a runtime similar to malloc.


In C++ you can do: struct Foo { private: int val = 0; Foo(int newVal) : val(newVal) {} public: static optional<Foo> CreateFoo(int newVal) { if (newVal != SENTINEL_VALUE) { return Foo(newVal); } return {}; } };

    int main(int argc, char* argv[]) {
      if (auto f = CreateFoo(argc)) {
        cout << "Foo made with value " << f.val;
      } else {
        cout << "Foo not made";
      }
    }


In C++ you would have a protected constructor and related friend utility class to do the parsing, returning any error code, and constructing the thing, populating an optional, shared_ptr, whatever… don’t make constructors fallible.


Sometimes you want the struct to be defined in a header so it can be passed and returned by value rather than pointer.

A technique I use is to leverage GCC's `poison` pragma to cause an error if attempting to access the struct's fields directly. I give the fields names that won't collide with anything, use macros to access them within the header and then `#undef` the macros at the end of the header.

Example - an immutable, pass-by-value string which couples the `char*` with the length of the string:

    #ifndef FOO_STRING_H
    #define FOO_STRING_H
    
    #include <stddef.h>
    #include <stdlib.h>
    #include <string.h>
    #include "config.h"
    
    typedef size_t string_length_t;
    #define STRING_LENGTH_MAX CONFIG_STRING_LENGTH_MAX
    
    typedef struct {
        string_length_t _internal_string_length;
        char *_internal_string_chars;
    } string_t;
    
    #define STRING_LENGTH(s) (s._internal_string_length)
    #define STRING_CHARS(s) (s._internal_string_chars)
    
    #pragma GCC poison _internal_string_length _internal_string_chars
    
    constexpr string_t error_string = { 0, nullptr };
    constexpr string_t empty_string = { 0, "" };
    
    inline static string_t string_alloc_from_chars(const char *chars) {
        if (chars == nullptr) return error_string;
        size_t len = strnlen(chars, STRING_LENGTH_MAX);
        if (len == 0) return empty_string;
        if (len < STRING_LENGTH_MAX) {
            char *mem = malloc(len + 1);
            strncpy(mem, chars, len);
            mem[len] = '\0';
            return (string_t){ len, mem };
        } else return error_string;
    }
    
    inline static char * string_to_chars(string_t string) {
        return STRING_CHARS(string);
    }

    inline static string_length_t string_length(string_t string) {
        return STRING_LENGTH(string);
    }

    inline static void string_free(string_t s) {
        free(STRING_CHARS(s));
    }
    
    inline static bool string_is_valid(string_t string) {
        return STRING_CHARS(string) != nullptr
            && strnlen(STRING_CHARS(string), STRING_LENGTH_MAX) == STRING_LENGTH(string)
    }
    

    ...

    
    #undef STRING_LENGTH
    #undef STRING_CHARS
    
    #endif /* FOO_STRING_H */
It just wraps `<string.h>` functions in a way that is slightly less error prone to use, and adds zero cost. We can pass the string everywhere by value rather than needing an opaque pointer. It's equivalent on SYSV (64-bit) to passing them as two separate arguments:

    void foo(string_t str);
    //vs
    void foo(size_t length, char *chars); 
These have the exact same calling convention: length passed in `rdi` and `chars` passed in `rsi`. (Or equivalently, `r0:r1` on other architectures).

The main advantage is that we can also return by value without an "out parameter".

    string_t bar();
    //vs
    size_t bar(char **out_chars);
These DO NOT have the same calling convention. The latter is less efficient because it needs to dereference a pointer to return the out parameter. The former just returns length in `rax` and chars in `rdx` (`r0:r1`).

So returning a fat pointer is actually more efficient than returning a size and passing an out parameter on SYSV! (Though only marginally because in the latter case the pointer will be in cache).

Perhaps it's unfair to say "zero-cost" - it's slightly less than zero - cheaper than the conventional idiom of using an out parameter.

But it only works if the struct is <= 16-bytes and contains only INTEGER types. Any larger and the whole struct gets put on the stack for both arguments and returns. In that case it's probably better to use an opaque pointer.

That aside, when we define the struct in the header we can also `inline` most functions, so that avoids unnecessary branching overhead that we might have when using opaque pointers.

`#pragma GCC poison` is not portable, but it will be ignored wherever it isn't supported, so this won't prevent the code being compiled for other platforms - it just won't get the benefits we get from GCC & SYSV.

The biggest downside to this approach is we can't prevent the library user from using a struct initializer and creating an invalid structure (eg, length and actual string length not matching). It would be nice if there were some similar to trick to prevent using compound initializers with the type, then we could have full encapsulation without resorting to opaque pointers.


> The biggest downside to this approach is we can't prevent the library user from using a struct initializer and creating an invalid structure (eg, length and actual string length not matching). It would be nice if there were some similar to trick to prevent using compound initializers with the type, then we could have full encapsulation without resorting to opaque pointers.

Hmm, I found a solution and it was easier than expected. GCC has `__attribute__((designated_init))` we can stick on the struct which prevents positional initializers and requires the field names to be used (assuming -Werror). Since those names are poisoned, we won't be able to initialize except through functions defined in our library. We can similarly use a macro and #undef it.

Full encapsulation of a struct defined in a header:

    #ifndef FOO_STRING_H
    #define FOO_STRING_H

    #include <stddef.h>
    #include <stdlib.h>
    #include <string.h>
    #if defined __has_include
    # if __has_include("config.h")
    #  include "config.h"
    # endif
    #endif

    typedef size_t string_length_t;
    #ifdef CONFIG_STRING_LENGTH_MAX
    #define STRING_LENGTH_MAX CONFIG_STRING_LENGTH_MAX
    #else
    #define STRING_LENGTH_MAX (1 << 24)
    #endif

    typedef struct __attribute__((designated_init)) {
        const string_length_t _internal_string_length;
        const char *const _internal_string_chars;
    } string_t;

    #define STRING_CREATE(len, ptr) (string_t){ ._internal_string_length = (len), ._internal_string_chars = (ptr) }
    #define STRING_LENGTH(s) (s._internal_string_length)
    #define STRING_CHARS(s) (s._internal_string_chars)
    #pragma GCC poison _internal_string_length _internal_string_chars


    constexpr string_t error_string = STRING_CREATE(0, nullptr);
    constexpr string_t empty_string = STRING_CREATE(0, "");

    inline static string_t string_alloc_from_chars(const char *chars) {
        if (__builtin_expect(chars == nullptr, false)) return error_string;
        size_t len = strnlen(chars, STRING_LENGTH_MAX);
        if (__builtin_expect(len == 0, false)) return empty_string;
        if (__builtin_expect(len < STRING_LENGTH_MAX, true)) {
            char *mem = malloc(len + 1);
            strncpy(mem, chars, len);
            mem[len] = '\0';
            return STRING_CREATE(len, mem);
        } else return error_string;
    }

    inline static const char *string_to_chars(string_t string) {
        return STRING_CHARS(string);
    }

    inline static string_length_t string_length(string_t string) {
        return STRING_LENGTH(string);
    }

    inline static void string_free(string_t s) {
        free((char*)STRING_CHARS(s));
    }

    inline static bool string_is_valid(string_t string) {
        return STRING_CHARS(string) != nullptr;
    }

    // ... other string function

    #undef STRING_LENGTH
    #undef STRING_CHARS
    #undef STRING_CREATE

    #endif /* FOO_STRING_H */
Aside from horrible pointer aliasing tricks, the only way to create a `string_t` is via `string_alloc_from_chars` or other functions defined in the library which return `string_t`.

    #include <stdio.h>
    int main() {
        string_t s = string_alloc_from_chars("Hello World!");
        if (string_is_valid(s)) 
            puts(string_to_chars(s));
        string_free(s);
        return 0;
    }


If you don't want your types to be public, don't put them in the public interface, put them into the implementation.


I'm not a fan of the recent trend in software development, started by the OOP craze but in the modern day largely driven by Rust advocates, of noun-based programming, where type hierarchies are the primary interface between the programmer and the code, rather than the data or the instructions. It's just so... dogmatic. Inexpressive. It ultimately feels to me like a barrier between intention and reality, another abstraction. The type system is the program, rather than the program being the program. But speaking of dogma, the author's insistence that not abiding by this noun-based programming model is a form of 'lying' is quite the accusatory stretch of language... but I digress at the notion that I might just be a hit dog hollering.


The kind of noun-based programming you don’t like is great for large teams and large code bases where there is an inherent communication barrier based on the number of people involved. (N choose 2 = N*(N-1)/2 so it grows quadratically.) Type hierarchies need to be the primary interface between the programmers and the code because it communicates invariants on the data more precisely than words. It is dogmatic, because that’s the only way it could work for large teams.

When you are the only programmer, this matters way less. Just do whatever based on your personal taste.


That sounds eerily similar to the "OOP is for large teams" defence which is simply not true.

On the contrary, this noun-based programming explodes with complexity on large teams. Yes, interfaces are obviously important, but when every single thing is its own type and you try to solve problems with the type system leading to a combinatoric explosion of types and their interactions, what do you think happens when you scale the team up?


> That sounds eerily similar to the "OOP is for large teams" defence

False. They are only similar to you. Haskell is a pure functional programming language and it is very much noun-based. Type classes like functors and monads are nouns that describe the structure of many types. Modern Haskell best practices involve way more types than other languages. Very few people operate on JSON for example, instead almost everyone will parse that JSON into a domain-specific type. The “parse don’t validate” idea is based on the idea that data that has been checked and data that has not been checked should have different types.

Rust also is decidedly not OOP: it does not even have inheritance. Yet it also has way more types than usual. Most languages would be satisfied with something like a Hashable interface, but Rust further decouples the calculation of hash values into traversing a type's fields and updating the internal state of the hash function. This results in both Hash and Hasher types. This is a wonderful design decision that helps programmers despite an increase in the number of nouns.

> a combinatoric explosion of types and their interactions

Absolutely not my experience at all. There is nothing combinatoric here. Most types do not interact with many other types. The structure is more like a tree than a complete graph.


"Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

> It's just so... dogmatic. Inexpressive. It ultimately feels to me like a barrier between intention and reality, another abstraction.

On the contrary, it's a much more effective way to express intention when you have a language that can implement it. Programmers in C-family languages waste most of their time working around the absence of sum types, they just don't realise that that's what they're doing. Yes it is an abstraction, all programming is abstraction.


Agreed. It's often accompanied by the dogma "make invalid states unrepresentable" which sounds good until you start trying to encode into the type system foo.bar being 1-42 unless foo.baz is above 10, where now foo.bar can be -42-1 instead, but if foo.omfg is prefixed with "wtf" then foo.baz needs to be above 20 for its modifiers to kick in.

Yeah good luck doing that in the type system in a way that is maintainable, open to modification, an scales with complexity.


You have misunderstood what it means to make invalid states unrepresentable.

    data UnvalidatedFoo = UnvalidatedFoo
      { unvalidatedOmfg :: String,
        unvalidatedBar, unvalidatedBaz :: Int
      }
    
    data ValidatedFoo = ValidatedFoo
      { validatedOmfg :: String,
        validatedBar, validatedBaz :: Int
      }
    
    validate :: UnvalidatedFoo -> Maybe ValidatedFoo
    validate UnvalidatedFoo {..} = do
      when ("wtf" `isPrefixOf` unvalidatedOmfg) $ do
        guard (unvalidatedBaz > 20)
      if unvalidatedBaz > 10
        then guard (unvalidatedBar >= 1 && unvalidatedBar <= 42)
        else guard (unvalidatedBar >= -42 && unvalidatedBar <= 1)
      pure ValidatedFoo {validatedOmfg = unvalidatedOmfg, validatedBaz = unvalidatedBaz, validatedBar = unvalidatedBar}


I love Ruby and have tried to use mruby several times, but the one thing that always becomes an issue is that it uses Ruby’s own native-extension build system for compilation, which is configured in Ruby itself. It makes it a total pain to include in other build systems, or when compiling to other targets (i.e. WASM)

Frankly, I love Ruby as a language, but if it were as easy to embed as Lua I would have no reason to use Lua.


I agree with the author - I'm sick of hearing the cliches from people who prefer 'dark mode'. But I remember long before there was 'light mode' and 'dark mode' there were themes based on a spectrum of hues and values - actual colors. Why not bring that back? "Light mode" can be way more bearable if it's not pure #ffffff. I dislike the invented dichotomy of light and dark anyway, there's an entire spectrum that designers can use, and I think apps in general would look way better if they took advantage of that.


There are many light mode themes just like there are many dark mode themes. They even have colors.


You'd also be more productive and have less unknowns and potentially less decision paralysis if, say, everyone started using excel hooked up to a database instead of writing their own bespoke CRUD app, but alas, those aren't the reasons one asks programmers to program.


I love Crystal but I’m surprised at how nothing the WASM story is this late in the game. I’d love to run Crystal directly in the browser, especially given how web-focused they seem to be.

Also, windows support has been more or less “done” for a couple of years now, is the “preview” tag still necessary?


Regarding the 'Modular Monoliths' bit, I wholeheartedly agree. I always found it kind of disappointing that while we're told in our OOP classes that using interfaces increases modularity and cohesion and decreases coupling, in reality in most programming languages you're relying on the nominal type of said interface regardless. All libraries have to use a common interface at the source code level, which is obscenely rare. For interfaces to truly live up to what they're describing, they merely ought to be structural (or whatever the equivalent to functions is that structural typing is to data).

Edit, since I remembered Go has this behaviour: I think Go's auto-interfaces I think are easily one of its biggest selling points.


I'm of two minds when I see comments complaining about header files. Practically speaking, I think "have the preprocessor copy & paste source files together" is a bit of a hackjob, but, conceptually speaking, having your interface and implementation separate is ultimately a good thing.

The problem of course lies not with header files, but C++ the language, as all public fields and private fields must be specified in the class declaration so that the compiler knows the memory layout. It's kind of useless in that sense. You can move private methods out to a separate source file, but, you don't gain much in doing so, at least in terms of strict encapsulation. And of course, if you use templates at all, you can no longer even do that. Which is its own can of worms.

Unfortunately, none of these problems are problems that modules solve. Implementations very much disagree on interfaces vs implementations, precompiled vs simply included, etc etc. In my own usage of modules I've just found it to be header files with different syntax. Any API implemented via modules is still very leaky - it's hard to just import a module and know what's truly fair for application usage or not. You still ultimately have to rely on documentation for usage details.

At the end of the day I don't really care how the implementation puts together a particular feature, I care about how it affects the semantics and usability of the language. And modules do not really differ in proper usage from headers, even though the whole backend had to be changed, the frontend ends up being the same. So it's net nothing.

All said and done, when it comes to defining library APIs, I prefer C. No public/private, you just have some data laid out a particular way, and some functions to operate on it. The header file is essentially just a symbol table for the binary code - and said code can be a .c file or a .o file or even a .a or .lib or .dll or whatever - C doesn't care. Raw functionality, raw usability. No hoops.


The idea that arrays of structs are inherently more cache friendly and thus data-oriented-er is a bit reductive of the whole practice of data-oriented code. The point is to optimize data layout for access patterns. Putting fields of a struct into their own arrays is only actually an optimization if you're only accessing that field in-bulk. And if so, why is it even in a struct in the first place? If you use all fields of a struct in your algorithm, then an array of structs is the optimal way.

All the same is true for enums.


Access patterns matter, but just as important is to have less stuff to access. That's why arrays-of-structs are considered cache friendly - columnar data layouts open the door to optimizations that significantly reduce memory footprint. You no longer waste memory with struct padding. Boolean fields can become bitsets. Enums can be bit-packed. Often-null optional fields can become sparse maps. 8-byte pointers can become narrower-sized indices into object pools.


> “That's why arrays-of-structs are considered cache friendly”

Sounds like you mean structs-of-arrays?


Oops, brainfart on my part. Unfortunately, the edit window has passed.


"Putting fields of a struct into their own arrays is only actually an optimization if you're only accessing that field in-bulk" ... "If you use all fields of a struct in your algorithm, then an array of structs is the optimal way."

This is wrong! Cache optimization isn't the only factor here. Even given an algorithm that seemingly handles each object one-by-one and uses all fields, SIMD turns individual operations into a hidden bulk access, and giving each field its own array will speed things up. This is counter-intuitive at first but becomes obvious if you write SIMD by hand (the article mentions this but doesn't make it super clear IMO)


Indeed, a struct can also be cooked to pack down with no padding, and or be dynamically redefined with a union.

Performance issues start to crop up with naive pre-fetching, and thus 100% guaranteed cache misses if the arrays are larger than L2.

This is why LLM AI generated slop degrades blogs into slop delivery services. =3


> This is why LLM AI generated slop degrades blogs into slop delivery services. =3

Not sure what LLMs and AI have to do with any of this.


That is the primary problem domain, as there are a lot of folks that see well-structured nonsense as meaningful. =3


Why do you keep putting a penis "=3" at the end of your messages?


Same with row-major vs. column major, accessing contiguous data is faster than non-contiguous data, so you should align your algorithms and data structures.


> The point is to optimize data layout for access patterns.

Yes. That's the point.

> Putting fields of a struct into their own arrays is only actually an optimization if you're only accessing that field in-bulk.

Yes, that's the scenario.

> And if so, why is it even in a struct in the first place?

Because that's how everyone is taught to model domains.

> If you use all fields of a struct in your algorithm, then an array of structs is the optimal way.

No. Your personal belief goes against both theoretical and empirical evidence. Others already talked about cache, padding, vectorized instructions, etc. I recommend you do a quick googling on the topic.


I have a problem with how procrastination and perfectionism, this sense of being 'not good enough', is almost universally phrased as not being good enough for others. For caring too much about others' opinions. And that the solution is to just Do Art For Yourself :tm:.

I've tried that. I've tried shunting out everyone else's opinions. But then of course, if you lock me in a room with me, myself, and I, you now have 3 of my biggest critics all in the same room.

I don't really care what others think, never really did, and none of these anti-procrastination or anti-perfectionism pieces help when it's my own standards that I'm not meeting.


I was in a similar boat, and this book[1] helped me a lot. It deals with the roots of procrastination, which lie in poor mood management and the lack of self-compassion. If you want to go deeper into self-compassion, I found this book[2] very helpful.

[1] Procrastination - Fuschia M. Sirois PhD [2] Radical Acceptance - Tara Brach


I definitely feel this more than worrying about others' perception.

Paradoxically, I also often feel that some tasks are _too easy_ and won't start them because they're simple, uninteresting, or unrewarding. I feel like I can execute them perfectly, so I put off getting started.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: