Fun fact, Feynman didn't work directly on the bomb. His role was managing high school students who performed computations, fixing adding machines, and did some safety work at the uranium isotope plant in Tennessee.
More fun facts: almost nobody worked directly on the bomb. Thousands of people worked on manufacturing equipment and components and had no idea what they were supposed to be building. They split the project up into very granular pieces so that nobody really had full insight into what was going on except for the core group of scientists. (Source for this I believe is Feynman himself in Surely You're Joking)
Gladys Owens, the woman seated in the foreground, did not know what she had been involved with until seeing this photo in a public tour of the facility fifty years later.
Feynman is one of the coolest guys in my books, but do note that "didn't work directly on the holocaust...was just counting beans to facilitate" was not a defense that saved necks, they were hanged. No moral judgement implied here, just the fact that history is written by winners.
Cool, email me if you need any help with it. In fact, email me regardless of if you need help telling me what your project is and how syntect might help.
The number of collisions is only loosely based on the hash function, it's much more closely tied to the fill factor.
If you keep your hash table 1% full then you're unlikely to run into a collision no matter how bad your hash function is. If you keep your table 99% full then it doesn't matter how good your hash function is. You still have to modulo your hash by the number of buckets.
Yes, there are implemenations of hash tables that solve this problem. Hash collisions still ruin your worst case performance, but as long as nobody is asking how long you spend on inserts you can solve that too. As a trivial example, on each insert make the hashtable larger and change the hash function randomly until you found a solution that has O(1) lookup for all elements.
But now we're far outside the realm of "properties of hash tables" and instead inside "properties of my hash table", and any decent interviewer should recognize that.
I think people here seem to implicitly assume linked buckets, those are bad on modern architectures for several reasons.
Look at (Hopscotch,) Robin Hood or Cuckoo Hashing for hashing with linear probing, high fill factor (~0.9) and _amortized_ O(1). I've seen a paper somewhere that proved Robin Hood worst-case O(log log n) afair.
My first tought about dynamic resizing is that you can have an amortized O(1) but still is O(n) worst case. Both then exploiting the hash function to force all elements into the same bucket, but also when triggering a resize.
You can make an expanding array with O(1) operations, but I'm unsure if it's applicable to a hashtable. It may be, but it's beginning to become very complex.
"High quality" is an strange concept. I would look at code you actually use and rely on - that's the best indication of quality. A lot of critical code deals with inelegant, complex problems correctly and efficiently - I'd consider anything that can be relied on to manage that "high quality", even if it is unclean, inelegant, poorly formatted and algorithmically mundane.
That said, if you want to read elegant code, I'd recommend the stb parser libraries (written in C). They are small self-contained decoders for many common media formats, with excellent documentation:
These libraries are likely insecure, handle many edge-cases incorrectly, implement fewer features, and perform worse than other options. However, they meet your criteria better.
It is not nearly enough for a code to just work and be useful. Code quality is what determines how maintainable it is, how long will it stay relevant, how long will it survive the changing requirements and environment. And it is much harder to get this than just something that (sometimes) work.
Absolutely, by all means look at old code - code that has survived and been useful for a long time. It's either adaptable (Linux) or doesn't need to change or adapt much (TeX).
Do you currently use and rely on software which you expect won't be useful to you in ten years time? I can't think of much personally.
(I do use IDA Pro, which has clearly adapted poorly to changing requirements - it still has scars of the 32-bit to 64-bit transition that get in the way of day-to-day usage. I hope there'll be something better in ten years. Of course, I could buy a cheaper, "higher quality" tool instead, but none of them are as powerful or as useful.)
This is an interesting article, but the "Portability" comments could be a lot more useful: strndup and open_memstream are both "POSIX 2008", but strndup can be used on OS X while open_memstream cannot.
I deliberately didn't write that to avoid the page going stale when OS X adds it. But OS X is behind the times and that's harmful. POSIX 2008 has been out for years and most of the lacking features are trivial to add. They're being actively harmful to Unix software by not having modern interfaces, forcing portable software to be worse. The purpose of the article is to highlight the interfaces, and when they're suited, rather than being a replacement for your system manual page or a portability guide. Since OS X isn't a free software Unix (though its libc is), I don't really consider it among the relevant modern Unix systems. Linux, the BSDs, and so on all have the POSIX 2008 features mentioned here.
Just use setlocale(LC_ALL, "") in main, and use mbrtowc to translate from whatever the system encoding is into the wchar_t type. There's no need to bake assumptions about the system encoding into most programs.
I'm guessing the speed difference comes from this linker using a different approach to symbol resolution which requires visiting files fewer times, but I'm not sure.