Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Using Zig to Unit Test a C Application (mtlynch.io)
160 points by todsacerdoti on Dec 18, 2023 | hide | past | favorite | 52 comments


I also now always use Zig to write tests and benchmarks for C code.

This is for example the case in libaegis: https://github.com/jedisct1/libaegis

Calling C functions from Zig is easy (C headers can be imported directly) and doesn't have any overhead. So I can take advantage of the convenience of Zig, even if the tested code only requires a C compiler.

I also now always add Zig build files as an alternative to make/libtool/automake/cmake/meson/etc. The main advantage is that cross-compilation to many targets is supported out of the box, including to WebAssembly. So I can quickly test if the C code compiles fine before actually trying to run it on an emulator.


> The main advantage is that cross-compilation to many targets is supported out of the box

For the readers who aren't familiar:

The code for running test and then cross-compiling (on one machine and OS) for different target platforms is:

https://github.com/jedisct1/libaegis/blob/main/.github/workf...

and the only zig file in the repo which drives the build process is:

https://github.com/jedisct1/libaegis/blob/main/build.zig


Using it to benchmark C is a cool idea. I hadn't thought of that!

Thanks for sharing!


This is going to get really good when zlibc is done, you'll be able to for example, override the c stdlib "free" function in the c code and add features, for example (runtime) UAF/DF detection with metadata tracking (like stacktraces of where the memory was created and freed)


What is zlibc? A libc implementation in zig? Googling did not turn up much.


You can do that in any C program today


You can already do this with LD_PRELOAD, no?


Could be wrong, but only if libc is expected to be dynamically linked (and anyways you'd still need zlibc)


You can just define your own malloc and free in any C program. Am I missing what you're talking about?


I've done this where I use python's ctypes library to write tests for a c codebase. This feels very similar in that it can be tricky to get the type interop correct the first time around. What other strategies/solutions do people like to use for testing their c projects?


I've done a lot of CFFI with Python for this kind of testing in the past. But nowadays I'm also looking at Zig for this.


Has the way Python handles this changed a lot in the last decades? I have never did it because I feared having to chase changes in Python and an ever changing build environment. Mostly a focus thing I knew I alone would not be the one to fix it across all build envs. At the time I was the lone Pythonista now everyone we hire is comfortable with Python so I made the wrong choice.


the point of zig is that there is no FFI to get right


I think (but I'm not sure) that in Zig you can just @cInclude("header.h") or something like that? So your C types are automatically imported/converted to Zig types.


It would be nice if something in Zig could auto-magically generate the zig to c interop call and type declarations.

I see there's "translate-c" to help migrate C to Zig, and maybe that's enough.

Generally I fall on the side of if a compiler can toss out a type warning, it could certainly give a reasonable solution also. Best-guess with a "// Warning type was guessed" is better than an opaque "RTFM" in most every case.


> It would be nice if something in Zig could auto-magically generate the zig to c interop call and type declarations.

Do you mean building a .h file from zig code?

The other direction is just @cImport.


If you watch some of the zig streams, there are some C libraries that are quite complex for importing. It’s not always just an @cImport.

I wouldn’t so much pin this on zig as much as the complete clusterfuck that is building c libraries.


Oh sure, was just curious what feature gp was looking for. I would love the first one for example


Curious now if you could use D in the same way? Walter Bright built a faster preprocessor for Facebook (kind of a sidenote and not fully relevant), wouldnt surprise me if D is another good candidate especially given its age.


Yes: https://dlang.org/spec/importc.html

The D compiler(s), as Zig, can be used to compile C code... and D code can import C files as if they were D modules (as in Zig, there are some special types to represent C strings and other types that are not exactly the same).


Thanks for that! I'm a big fan of D, as someone who primarily codes in C# and Python. I really wish a big company would take on D in a serious way that puts it on the map significantly. It is an underrated language in my opinion.


Absolutely. It was way ahead of its time. And it's improving, albeit slowly. With stuff like @safe, @nogc, pure, betterC[1], and its unmatched metaprogramming, it's a more modern language than most of the newer languages popping up.

[1] https://dlang.org/blog/2017/08/23/d-as-a-better-c/


If someone would hire me to use it, I would take the offer in a heartbeat. Until then I just play with it to test out language features and check it out. I wanted to like Vibe.d but I wanted something a little more batteries included. I might check out the Hunt Framework though.


This is interesting! I like how unit tests are easier with Zig.

The approach I am more familiar with is using Google test and C++ to test C code. It's pretty easy if you already have a cmake project set up, and most C developers can wrap their heads around GTest.


Great use case and a possible killer app for Zig.

However, for the complete newcomer to Zig, this post seems a bit convoluted in this presentation.

What about the Zig authors do a step by step tutorial on how to leverage Zig to implement tests in a C codebase?

Would be a great entry point for the broader community!


zig , ocaml, odin

many system languages are making the headlines lately its very hard to pick one to learn

not sure how to deal with this, learn them all, bet on one, what should we do


I think the two main ones are Rust and Zig. This may be controversial (but it seems very obvious to me) that:

Rust is created in the same spirit that created and evolved C++: create a complex and featureful language that enables compiling your solution from a high level representation in an expressive/safe/performant way.

Zig is created in the same spirit that created and evolved C: create a simple language that allows you to directly and transparently represent and reason about what you want to have happen.

You'll probably think that one of these statements is more biased than the other, and that probably reflects your own preferences :)


There's something to be said about philosophy of simplicity in C. However, C pretty clearly evolved into the opposite direction. This is nearly all due to compiler developers, and the fact that C has to cater to so many different hardware requirements.

Unlike C++, ISO C is nothing more than culmination of features that more than 1 compiler has implemented (and doesn't interrupt the compilation process of a micro-controller firmware that was released literally 40+ years ago). Anything else, is GNU C. And it is so incredibly complex and obtuse at times that clang still can't compile glibc after years of work.

Zig was not created with the same spirit that created and evolved C. Zig was created with the idea of a simple C, one that does not match reality, and frankly leans more on Go rather than C. Zig, Odin, V, nearly all these better-C languages are more inspired by Go itself, than what C actually is. What they want from C is just the performance; that's why they're so focused on manual memory management one way or another.


Zig borrows some ideas from go. Probably defer is the big one, but if you watch "the road to zig 1.0" you will understand that zig is not really a go derivate. Most things in zig are directly addressing issues in c.

If you squint zig's error return fusion looks a bit like go's tuple error return but it actually is more "first-classing certain c conventions" than "adopting a go pattern". Same goes for slices.


Go's defer is incredibly flawed:

1. It only allows function calls instead of any expression.

2. It allocates memory dynamically and attaches the function call expression to the function, rather than the current scope exit. This has surprising and harmful consequences if you use it inside a loop.

So, I wouldn't say that zig's defer is borrowed from Go.


Not in implementation but surely in spirit. Didn't you mention in road 1.0 that the idea came from go?

Edit: ok rewatched it and I didn't see that come up, i was just misremembering


Most of C's issues were directly addressed in Go as well. Only, Go did away with manual memory management.

C never had the philosophy of keeping things simple through the years. If it did, we would not have time traveling UBs to begin with. The lauded simplicity and explicitness comes directly from Go, where the philosophy was crystalized and preserved very early on.

You might say it is semantics, to call improving upon C being a derivative of Go (with manual memory management). You would be partially correct, it is semantics, but one that holds up very well if you look at how languages developed over the decades.


> The lauded simplicity and explicitness comes directly from Go

I have two words for you: json marshalling


I think a stronger statement is that zig is explicitly created to out-C c. Even the high level comptime stuff evolved out of simplification and explicit-ification of some opaque things that c (and especially c++) compilers might do.


I was in the exact same situation. I know Rust but don't want to use it for everything... had a look at:

Zig - attempts to stay simple, like C, but with warts fixed and with cool compile-time programming. Its biggest strength seems to be not the language itself, but the compiler and build system which can cross-compile seamlessly, including C code.

Nim - a systems-language that looks like Python and tries to be fun to write. Has macros that may remind you of Lisp macros. Compiles to C or JS.

D - older but very cool as well... I was surprised to find out its metaprogramming capabilities are as good as Zig's or Nim, and that is has a lot of cool features not seen in mainstream languages, like contract programming and executable documentation. Much more mature than the previous ones. Also seamlessly compiles and imports C.

Odin - really reminds me of Go. It's used in production to create fluid simulation for Holywood movies apparently. Very minimalistic language but I couldn't see what it brings to the table that the ones above do not. It's kind of similar also to Jai which is also upcoming but focusing on game programming from what I understand... that's still not even publicly available yet.

Which one to choose really depends on your taste, hope my descriptions above help, even if they're pretty rough simplifications.

If you want the most popular language in this area, that's undoubtedly Rust though.


> its very hard to pick one to learn

If you don't know it already, you learn C, as well as you can. It is not a hard language to learn, but it has a lot of footguns. That's why you learn it along with tools like Valgrind & sanitizers.

Then you look at Rust. That's somewhat harder to learn, but nails some important details you need to think about while writing C. It will make some things obvious that you'd need to learn by shooting yourself in the foot repeatedly in C. You don't need to use Rust once you got what you need out of it education-wise, but a lot of people like and use it.

At this point it is kind of unimportant what other "systems" language you decide to learn, but here's my opinion of some of them:

- Personally I like Odin's ergonomics. It is incredibly convenient. You can just jump in and start writing OpenGL code without dealing with wrappers and all that. Included vendor libraries take care of a lot.

- I also like the explicitness of Zig. It seems like it'll be the most popular one in the future, most likely not because of the language itself but because of the tooling. By the way the reason I say "not because of the language" is that the maintainers seem uninterested in having some way to constrain generics. The language sorely needs some sort of comptime interface / traits / concepts, anytype-everything is not nice. In 10 years someone will come up with a Boost-like library that implements just that in userspace and it'll be horrible.

- Ocaml is garbage collected Rust, or rather, Rust is non-garbage-collected Ocaml. It is underrated. Jane Street people are adding borrow checker to it. Could be more popular in the future. Also, all languages are "systems" languages depending on how you wield them. No need to bikeshed about Ocaml's "system"ness status.

- D is pretty cool. Very fragmented library ecosystem if you want to do betterC or no-gc though. Be prepared to just use C or C++ libraries, which it can talk to pretty easily.

- Nim is a nice language. Does reference counting so it has a low memory footprint compared to other GC'd languages. It compiles to C so technically it is the most portable language in this list. You can easily run it on microprocessors that others won't run on. Try it, you'll very quickly land on "like it" / "don't like it" territory depending on your programming style.

- Jai is non-existent right now. Doesn't warrant a discussion until Jon Blow feels it is ready for prime time. But since that's his strategy, expect something practical and polished. If it sucks, two possibilities: 1) he didn't deliver and it won't get drastically better or 2) your use case was not in consideration.

- C3, it exists, it is usable, it is like a halfway between C and D. I didn't spend much time on it yet.

- Free Pascal: I didn't use it but just putting it here because this list is getting long & it kind of deserves a shout. Lazarus looks nice.

- Go: Use Java or C# instead, they can compile to native now.

- C++: It exists, it is used everywhere, it sucks. As opposed to most other languages on this list, it wasn't designed. It kind of picked up random features along the way because they looked good. You kind of design it by picking up a subset and putting up with its weirdnesses. Don't use it if you can help it. If you have to use it you most likely didn't have a choice in the first place.


> The language sorely needs some sort of comptime interface / traits / concepts, anytype-everything is not nice.

I'm not sure I agree. Anything which attempts to constrain "anytype" makes the language much, much more complicated.

For example, if it were constrained, Zig "allocgate" would likely have necessitated compiler changes instead of just library changes.

And I often think that we conflate two different things--"Generic" and "Libraries like the big boys build".

I generally don't need fully generic programming.

What I do need is the ability to build a library that works exactly like the standard library. All libraries need to be precisely equal in expressive power and composability to those that have been officially "blessed".

Oddly, Rust fails at this due to things like the Orphan Rule even though its "genericity" is quite expansive. The invasiveness of Serde is a prime example. If an external crate doesn't support Serde, you can't add it. You have to literally copy the entire library over to your code in order to add Serde support. This blocks an alternative to Serde from ever arising because it will never be as convenient as Serde which has been "blessed" by the community.


Strong replacement for Ocaml which is also performant and has rich ecosystem is F#.

Technically speaking, Java and C# would serve somewhat different purposes and I would rather recommend Kotlin and C# with the former serving higher-level code goals better with existential types, SRTPs and overall strong type system and the latter for lower-level and/or performance-sensitive code with SIMD, pointers/byrefs, struct generics (zero-cost abstractions) and free/cheap interop.

The only concern is how well Kotlin Native works today regarding compatibility. NativeAOT has seen a lot of work to improve this and scenarios that will never be supported (runtime reflection emit which needs JIT or arbitrary unbound reflection) are now well-documented.

You may also be interested in Bflat[0] which has 'UEFI' as a target or even Zerosharp[1] as a demonstration how far you can push this (which is, of course, impractical, just use Rust :D)

[0] https://github.com/bflattened/bflat

[1] https://github.com/MichalStrehovsky/zerosharp/tree/master/no...


I've been having the same dilemma, right now my focus is:

- Typescript for anything related to webdev, not a huge fan of the language but can't deny the ecosystem for productivity

- Rust for systems-level projects

- Crystal for quick scripts, although I'm still on the fence here


ocaml is a systems language ?


Is ocaml a system language?


It depends on what you mean by "system language." For me, this category mainly includes languages that provide fine-grained control over memory management (C, C++, Zig, Rust, ...), so I personally wouldn't include OCaml.


yes it can be used to create systems and backend tools

   - it is used to create an OS https://github.com/mirage
   - it is used to create a transpiler https://melange.re/v2.2.0/
   - it was used to create Rust first compiler
ocaml is surely a systems language


The predominant OCaml shop is Jane Street, they have a good podcast where they talk to those involved in their infrastructure. A lot of the episodes go into the tradeoffs between the GC'd and functional OCaml vs languages like C++ and Rust:

https://signalsandthreads.com/


There's nothing "system" about a compiler. What language hasn't been used to create a compiler or two? If languages used to create compilers are system languages, then all languages are system languages.

Writing a OS is more "system", and certainly those using the Mirage operating system use OCaml as a system language.


So sad IncludeOS https://github.com/includeos/IncludeOS is no longer developed.


This question reminds me of is golf a sport or a game?

A colloquial definition of systems language seems close to: "exposes low level details and doesn't have garbage collection." By this definition c, c++, zig and rust are system languages, java and ocaml are not. Go is debatable (and people do debate this).

Personally though, I prefer to think of what types of _systems_ I can build with a language. I can design a framework in a high level language which then transpiles to, say, c. I consider this high level language to be a systems language because it facilitates a system (the framework). Others will disagree but it comes down essentially to a semantic question over what counts as a system language. That debate doesn't seem especially fruitful to me.


If Go is a "system language" then OCaml certainly is. It's suitable for writing system tools in, and people do. TBH the whole concept is pretty meaningless though.


Shell is a systems language; people boot systems with it, including embedded.


Yes believe it or not ocaml is a systems language.


C++ doing this for decades.


Unit test it with TXR:

  $ pwd
  /home/kaz/ustreamer
  $ gcc -D_GNU_SOURCE -shared src/libs/base64.c -o base64.o
  $ valgrind txr --free-all
  ==8785== Memcheck, a memory error detector
  ==8785== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
  ==8785== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
  ==8785== Command: txr --free-all
  ==8785==
  This is the TXR Lisp interactive listener of TXR 292.
  Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
  Psst! The complimentary Allen key that comes with TXR is inpired by IKEA.
  1> (with-dyn-lib "./base64.o"
       (deffi us-base64-encode "us_base64_encode"
              void (buf size-t (ptr (array 1 str-d)) (ptr (array 1 size-t)))))
  us-base64-encode
  2> (let ((out (vec nil))
           (sz (vec 0)))
       (us-base64-encode #b'00112233445566778899AABBCCDDEEFF' 16 out sz)
       (list out sz))
  (#("ABEiM0RVZneImaq7zN3u/w==") #(25))
  3> (let ((out (vec "abcde"))
           (sz (vec 5)))
       (us-base64-encode #b'00112233445566778899AABBCCDDEEFF' 16 out sz)
       (list out sz))
  (#("ABEiM0RVZneImaq7zN3u/w==") #(25))
  4> (let ((out (vec "xxxxxxxxxxxxxxxxxxxxxxxxxx"))
           (sz (vec 25)))
       (us-base64-encode #b'00112233445566778899AABBCCDDEEFF' 16 out sz)
       (list out sz))
  (#("ABEiM0RVZneImaq7zN3u/w==") #(25))
  5> (let ((out (vec "xxxxxxxxxxxxxxxxxxxxxxxxxxxx"))
           (sz (vec 27)))
       (us-base64-encode #b'00112233445566778899AABBCCDDEEFF' 16 out sz)
     (list out sz))
  (#("ABEiM0RVZneImaq7zN3u/w==") #(27))
  6>
  ==8785==
  ==8785== HEAP SUMMARY:
  ==8785==     in use at exit: 0 bytes in 0 blocks
  ==8785==   total heap usage: 24,782 allocs, 24,782 frees, 4,806,317 bytes allocated
  ==8785==
  ==8785== All heap blocks were freed -- no leaks are possible
  ==8785==
  ==8785== For counts of detected and suppressed errors, rerun with: -v
  ==8785== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
We covered the cases when the the destination buffer is already allocated, and smaller, equal to or in excess of the space being required. Thus testing the cases when realloc is or isn't necessary. No memory errors or leaks.

We can see in the last case that when the buffer is larger, the function leaves the size alone, returning our original 27.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: