Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Gonix – Unix tools written in Go (github.com/polegone)
148 points by polegone on May 9, 2015 | hide | past | favorite | 82 comments


Not meaning to bring up a Rust vs Go debate, but since there are a few comments claiming that this is a waste of time, I figured it's worth mentioning the Rust based re-implementation of coreutils:

https://github.com/uutils/coreutils/

And another by suckless in plain C:

http://git.suckless.org/sbase/tree/README


Or my rewrite of GNU's coreutils in Go: https://github.com/EricLagerg/go-coreutils

It's not complete yet, but I couldn't pass up this thread :-)


I must say that yours is much more complete than mine, though.


Work together!

It would be great if complete coreutils was implemented!


I'd love some help. I'm juggling 3 side projects right now, and it's hard making time for all of them. :)


I linked it in the readme so all those people visiting my project will see yours.

I can't help much myself (you are way ahead of me), but more people should visit your project now.


Thank you! That's very generous of you.


No problem.


I'd love if our projects could somehow work together. My contact info is in my account's description, so shoot me a message!


At least, Rust version has CLI options implemented


nice. This is the first time I actually read Rust code, and aside from "something macro" that scares me, it's actually mostly readable. Will definitely check out.


That re-write lists its license as MIT, not GPLv2. Is that legitimate? e.g. in general, can you re-implement a library with one license and change it to a non-compatible license?


From everything I've read I'm pretty sure that if it's a complete rewrite in another language then it's legally "divorced" from the original project. You can own (and ergo license) an implementation of an idea, but you can't own or license the idea itself. Same with patents - you own the how, not the why.



I don't know much about go, but taking a quick look at implementations - they seem to be written by a programming novice, and the are quite primitive. Don't mean to be negative, just my opinion.


I am a novice, and that is one of the reasons why I started this project (to learn).


    bytes, _ := ioutil.ReadAll(os.Stdin)
    lines := strings.Split(string(bytes), "\n")
Tip: use bufio.Scanner.

    scanner := bufio.NewScanner(reader)
    scanner.Split(bufio.ScanLines)
    for scanner.Scan() {
      // Do stuff
    }
And you can iterate over the lines 'lazily'. If you don't want to consume \r's, make your own ScanLines :).


Tail also doesn't always need to read its entire input. When given a file path, it can read backwards from the end of the file until it has found its n lines. That makes tail fast on huge files (as long as they have normal line lengths), but it also complicates the code.

And you may want to mmap the file, rather than open it. Whether that is a speeds up things depends on OS, OS version, file size, file system, available memory, phase of the moon, etc.


[deleted]


You mean bufio.ReadLine? As the documentation says, it's low-level, since you have to do the buffer allocation yourself. ReadBytes/ReadString are nicer interface-wise, but it allocates new buffers on every call.

I like Scanner because it provides a nice high-level interface, but still maintains an internal buffer, reducing GC pressure. Of course, it's not one size fits all.


I tested it by on a several megabyte text file (not that large) and I can see a huge improvement in speed when I use a reader vs. loading the whole thing into memory as I did at first.

I can see now how much of a difference it can make on a really large file, like in the gigabyte range.

BTW, I deleted my earlier comment because the problem I had wasn't anything to do with bufio. I had just made an obvious mistake elsewhere in my code, which I've fixed now.


The problem is something like tail should not read the whole file into memory. tail works on a 100 gig file even with 100 megs of ram.


Great going. This will also help other novice Go programmers learning the language, at the same time getting a sense of how to implement their own Unix commands.


Forget the haters, this is awesome.

GNU started as GNU's Not Unix, reimplementing Unix userland for free. The people who are adamant about calling it GNU/Linux are, on some level, remembering that the userspace is historically a reimplementation.

Give me a busybox that I can 'go build' and that becomes really quite interesting.


Except for the fact that it'll be years before all of the subtle bugs are worked out and you can rely on those apps to be as stable as the ones we've got:

http://www.joelonsoftware.com/articles/fog0000000069.html


We have years. They're passing whether we like it or not. In the meantime you can choose whichever implementation of the Unix tools that you want. In the future there may be more options that fit the bill, developed in a safe language without the kinds of buffer overflows and things that exist in C tools today.


Yeah, just because Joel said so doesn't mean it's true.

I've been responsible for a bunch of rewrites that were resounding successes and typically reduced the bug load (subtle or not) by orders of magnitude.


What's wrong with doing this for fun's sake or educational value (both for the authors or people who want to study Go)?

If it turns out to be more widely used, at least it's safer than C code.


We really need a busybox in Rust or Go. The existing one has licensing problems and security problems, and is built into too many embedded devices.


Busybox is GPLv2. That's the same license as the linux kernel; it's not a problem. You just have to release your modified version of the busybox source if you distribute it in a product. The only difference is that busybox developers enforce the license against the many companies who can't be bothered and respond to requests with lawyers.

That said, a bsd-licensed re-implementation of busybox (still in c) that's pretty far along and has gotten support from some of those companies (including Sony) is "toybox" http://www.landley.net/toybox/


Also, Toybox will replace Toolbox in Android M.


There is a busybox in Go. https://github.com/surma/gobox


What licensing problems specifically do you mean here?



A few data points on GNU coreutils.

1. It has a fairly good test suite, that rewrites should leverage. That can be easily done by setting $PATH to prepend the dir of the new tools, and running `make check`

2. To give an indication of the size of coreutils:

$ for r in gnulib coreutils; do (cd $r && git ls-files | tr '\n' '\0' | wc -l --files0-from=- | tail -n1); done

985050 total 243154 total


It's also worth pointing out, since busybox was mentioned a few times, that the latest release of coreutils has the ./configure --enable-single-binary option to build as a multi-call binary linke busybox etc.


Heh, this brings me back. I was a young guy at Sun, perl 4 was a thing, I actually argued that we should redo /usr/bin in perl. In the days of a 20mhz SPARC.

Silly me. Maybe it makes sense now.


There's "Perl Power Tools: Unix Reconstruction Project" [0], which doesn't seem to have activity since 2004. I remember something older than 2004, I think, back from when perl was first available on Windows, to bring UNIX command line utilities to Windows through perl.

It's great that they all have inline POD documentation too.

[0] http://search.cpan.org/dist/ppt/


The "Perl Power Tools" project has been revived by brian d foy: https://metacpan.org/pod/PerlPowerTools


I'm curious: what was your main motivation? I can understand it as a worthy challenge, but it would probably have led to a worse performance than the C-based utilities, no?


It would be pretty cool if all applications on your system were written as scripts, especially if they are very simple scripts, as it means you can open any of them in a text editor and see what they do.

It means you can modify any of them just as easily.

I remember one time when I needed information about how much CPU usage a program was using, I could see it in the output from `top`. So to figure out how `top` was getting that info, I had to download the source code, and grep through several .c files. If I could just `vim top`, and read the code, it would be very cool.

As an educational resource, being able to tell all your students "all binaries in /bin and /usr/bin are editable! Read them! Find out how they do what they do!" would be incredible.


Thanks for your reply. Yes, the ready availability of the code would be a plus. In my mind utilities like grep or sort are very sensitive in terms of performance, but that's just an impression.


Oh yes, absolutely. That is the big tradeoff.

One could hope that with many of these utils they're mostly IO bound, and so a scripting language wouldn't change that much. I know at one point I actually had a python based md5sum that was faster than the regular gnu one for big files - as it loaded the data off the disk in huge chunks.

I believe ack-grep was written in perl? And that's pretty fast.

Also - how fast does it Actually need to be? It's not high-frequency-trading or a 60fps game... A computer these days can probably load perl, run a script, and display the output faster than a computer from 20 years ago could run the original c util... And if everything is written in it, then a lot will be already in memory. If your shell is written in it too, then you could just call the utils from within there, rather than having to fork/exec it anyway.

For most higher level scripting languages (such as perl, python, ruby, etc). Things like regexps and hash tables are written in c underneath anyway.

With a lot of shell scripting, you call many many utils many times - which would slow everything down hugely if you needed the scripting language startup costs each time. However, if you use the same higher-level scripting language to write all your scripts in - rather than writing SH scripts that call your utilities, then you might even get faster than SH calling external utils (possibly). (If that makes sense...)


Simplicity mainly, memory management.

while (<>) print;

makes for a pretty simple core of cat.

At the time I had written a source management system (nselite, morphed into teamware) almost entirely in perl except for one C program that was performance sensitive.

And I just really liked perl4 at the time.


Thanks for answering. I guess then what you had in mind was a change in the syntax of the shell itself, right? I'm a casual user of POSIX utilities and the typical shells, and have yet to try Perl! In any event, your last sentence pretty much validates what I had in mind. Thanks again.


https://github.com/uiri/coreutils << I wrote some with a friend a couple years ago, maybe they suck, or maybe they have usage strings. >.>


While I'm not sure if he posts here, I used to work in the same group with Jim Meyering, who maintained/maintains coreutils. Great guy.

Anyway, he told some great stories of the complexities of POSIX, and what happens in Solaris when you have directories 20,000 lines deep (and how to do it efficently, and the fun of teaching various coreutils commands about SELinux). Lots of it gets surprisingly low level quick.

Coreutils is complex for many good reasons, so while these tools look all nice and clever, they are not dealing with alot of the same kinds of issues.

(I also recall some of the heck Ansible had to go through to deal with atomic moves, groking when something was on a NFS partition, and so on. Not the same thing, but small seemingly easy things get tricky quickly.)

Bottom line is appreciate those little tiny Unix utilities, a lot went into them even when you think they aren't doing a whole lot :)


agreed! so many folks see all that stuff as "bloat". In many cases, that code is there for a reason.


bloat always does have some reason. but it's still bloat.

Good engineering makes good tradeoffs, rather than throwing in everything anyone wants.

(Not that I think this effort in go is "awesome". For one, it's probably bigger executables than even statically linked coreutils. The suckless sbase/ubase have some real potential though.)


People are lazy and we took legacy complexity as granted. Time evolves and rarely any problem looks the same. If for no other values, having a second (and third and beyond) look at legacy complexity will worth all the attention.


It would be cool to have a "busybox" alternative targeted at Plan9 commands. I personally find Plan9 utils much more logical (and easier to implement!). Something like 9base from suckless, but as a single binary and hopefully in a more modern language (most of the code like sam or rc is not so easy to understand).


> I personally find Plan9 utils much more logical

Could you elaborate on that?


Plan9 utils usually have less options and smaller functionality, but you can easily compose them.

"rc": I like it much more than sh for scripting. It's easier to learn and has less pitfalls.

"cat" has no options at all, it just concatenates.

"du": I like plan's "du" utility, which is easier than "find" ("du" simply lists file names recursively, while "find"... Can anyone list all the options of the find utility from memory?).

"rm" is a combination of rm and rmdir (directory is removed only if empty, unless you do "rm -r" to delete it recursively. Only two options: "-r" and "-f"

"tar" is a simple way to do recursive copying instead of cp, scp etc

"sleep" takes seconds only as an argument, but it can be a floating point (no suffixes like in coreutils, e.g. "sleep 1h" or "sleep 3d")

"who" takes no options at all

I'm not really good at plan9 utils, and I still use coreutils much more, but for embedded systems I would prefer to have a tiny number of simple building bricks like "du", "cp", "rm", "tar" etc, and a few smarter commands like awk/sed/grep/rc. I like toybox a lot, but if only that had an rc shell..


What happens when you tail a 32 gig file?


From a glance at the code [0], it looks like it will read it all into memory first.

A better way to do this would be to utilize the io.Reader interface.

[0] https://github.com/polegone/gonix/blob/0b65cd4fb9c6c44357d0a...


Thank you. I will look into this.


No problem. Also, please do not ignore errors. They're meant to be handled.


I'm working on that as well. I just fixed cat to crash and log on error, and I'll be fixing the other commands soon.

By the way, I'm wondering how I should go through the file line by line with a reader.

I think the most efficient way may be to scan byte by byte from the start (head) or the end (tail), and count until reaching n amount of newlines (or stop if at the end of the file), then print the bytes between the start/end and the nth newline.

How does this sound?


Good question. I'm not sure. You might want to seek to the end and move back. You probably shouldn't do it byte-by-byte directly from the file since that's very inefficient. As you can tell, this is already starting to get complicated! Maybe you could try mmaping the file so you could treat it as a []byte.


Last year I was trying to write a Go routine that read a file backwards. I was amazed how unexpectedly difficult that proved to be.

In the end I settled for reading it from the start which worked 99.999% of the time and enabled me to finish the project to the tight deadline I had. But I've always meant to go back and "fix" that code at some point.


> By the way, I'm wondering how I should go through the file line by line with a reader.

Take a look at my Go package that allows you to programmatically do 'tail -f' - https://github.com/activestate/tail


The strategy that is used by the original GNU coreutils written in C, and the one I used to implement tail with Rust, is to jump to the end of the file, than rewind AVERAGE_CHARS_PER_LINE * NUMBER_OF_LINES_TO_BE_READ, check if enough lines have been read, and repeat until enough lines have been found.

I found the optimal value of AVERAGE_CHARS_PER_LINE to be around 40 characters, but of course it hugely depends on the file being read.


https://github.com/polegone/gonix/blob/master/cp.go

I think it needs some work for actual parity =)


The author specifically singles out cp as being incomplete in the readme.


Very neat. I started rewriting GNU's coreutils in Rust and find it to be a nice way to learn the language.

Also, it is interesting how many obscure and less-known features some of the tools provide. In this case I clearly see the 80/20 rule, you can implement 80% of the main functionality in 20% of the time, but if you want to make exact clones you're going to need invest a lot more time.


I'm surprised nobody has shown someone doing this in JavaScript.


The line mentioning using gccgo to make the binaries small intrigued me... it worked! I've only written a couple of small tools in go but the size of the binary always bugged me. It's just a shame that setting up cross-compilation with gccgo looks a lot more involved vs. gc.


Very nice, please add at least some of this; grep, rm, secure, file, top, trace, ping, whois


Nice, I was thinking of doing something like that, but I don't know as many unix tools as the author does. :-P


I rewrote "pause" from Windows into my Ubuntu box out of habit of using it in some cases. First it was written in perl, then in Python, I also symlink clear as cls out of habit from Windows. There's small little "hacks" that you can do that are kind of fun.


Windows' cls actually erases the buffer from previous commands (so you can't scroll up past it), clear does not. I alias "cls=printf '\033c'" to get the Windows behavior.


There is no reason to ever want to do this.


I'm just doing this for fun and to learn about Go and Unix at the same time.

I am a beginner, and a lot of my code is inefficient and/or incomplete, but by putting it on the Internet I can get criticism and find out where I went wrong.

For instance, some of you have told me that the way I've been reading files is very inefficient, so now I'll try and do it the correct way.


That is a great reason to do this and the best way to learn.


That is the best reason. I am glad HN has people like you who are not afraid to put yourself out there and reminds us all that this is what being a hacker is: doing stuff for fun and learning.


except to you know, learn stuff


That's lame. I logged special just to downvote you.


the point being?


1. Learning a programming language

2. Learning some less-known flags and use cases for the tools we use everyday

3. Reasoning about OS features and how they work

4. Having fun

Please, if you want to be this rude go back to your troll cave.


Why do Go programmers always want to redo everything? It's rare to see something actually new written in Go, proving that Go can do everything C can (except make shared libraries and produce small binary sizes) but not that it's actually better.


Because that is the whole point.

If C is still present in the stack, the typical C exploits are possible, which were how many Oracle JVM exploits came to be, for example.

Reducing C presence to the same as Assembly, will just make everything safer in our systems.

Not that it will ever happen in UNIX systems, given how C came to life.


The same thing happens in Java… and Ruby… well, maybe every language.


You should totally change the name to Goonix (it just reminds me of one of my favourite movies - Goonies that is)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: