Fun stuff , thanks for posting it. I've been getting back into C lately, iterating through implementing common algorithms from scratch.
In spite of the kinds of easy to make mistakes that are highlighted on the site, I'm finding it to be a lot of fun - not having my hand held by frameworks that try to stop me from shooting myself in a foot is refreshing and brings back that feeling of "I can build whatever the hell I want in whatever f'd up way I want" that C always held for me.
Too, in the years since I've last done something like this, I've learned a lot - so he process is made better by applying ideas I've internalized to old problems, measuring effects on performance, etc.
Core dumps are a thing that I can cause without triggering some kind of obscure runtime bug and it's oddly exciting.
You know, I have a love/hate relationship with C kind of like what you described. In general I hate how hard it is to use and how easy it is to make gigantic mistakes in it. But on the other hand it's so raw and pure that it makes it a fun challenge. Recently I decided to write a SNES-era video game in C using SDL2, so we'll see how that goes. Maybe my mind will change as I make progress on that front.
Part of my projects, I write C for embedded devices. It's actually not so hard if you adopt common idioms for doing things. Most of the puzzle type problems go out of their way to cause problems. Things like the MISRA C coding standard will keep you out of most trouble, though I ignore some of their advice.
I really dislike his choice of coding style. Compare:
int CountBits (unsigned int x )
{
static unsigned int mask[] = { 0x55555555,
0x33333333,
0x0F0F0F0F,
0x00FF00FF,
0x0000FFFF
} ;
int i ;
int shift ; /* Number of positions to shift to right*/
for ( i =0, shift =1; i < 5; i ++, shift *= 2)
x = (x & mask[i ])+ ( ( x >> shift) & mask[i]);
return x;
}
as opposed to:
int countBits (unsigned int x) {
static unsigned int mask[] = {
0x55555555,
0x33333333,
0x0F0F0F0F,
0x00FF00FF,
0x0000FFFF
};
int i;
int shift; // Number of positions to shift to the right
for (i = 0, shift = 1; i < 5; i++, shift *= 2)
x = (x & mask[i]) + ((x >> shift) & mask[i]);
return x;
}
Really, are we going to argue about the One True Formatting style?
A 2 space indent is more compact. A 4 space indent is more readable for older people. Putting braces around blocks on their own lines highlights blocks. Putting braces inline is again more compact. Outdenting declarations highlights an important piece of information. Keeping them in line focuses on blocks. And so on.
None of these choices are particularly important. Being CONSISTENT does. This is his site, and his code. Adapt. If he comes to work with you, then he'll need to adapt to you.
Yes, it is a shock to see someone whose style is unfamiliar to you. Each of your choices has a reason behind it, and you may have thought through all of them. But it matters less than you think. Grow up and get over it. Be consistent with code around you.
I have been wondering for a while now why is it that everybody is stuck with this typewriter/punchcard mentality and assumes that only a fixed-width (non-proportional) font could be used to display program code on a computer screen. Would it not be more appropriate, in this century, to realize that since indentation (and spacing in general) is essentially something that only pertains to the graphical representation of program's code, it should be left to the editor/viewer software (and its user) to set, given a single tab or space character, the number of pixels (or inches or any other unit of length) that should be used to represent the indentation or white space? This should have nothing to do with the width of a character, and therefore using modern proportional fonts should be just as convenient and natural as it is when using a word processor.
>> Would it not be more appropriate, in this century, to realize that since indentation (and spacing in general) is essentially something that only pertains to the graphical representation of program's code...
False assumption.
Suppose a language exists which:
1.) truncates all characters beyond the 80th prior to interpretation;
2.) will throw an exception during run-time if tab characters are used for indentation;
3.) strictly defines, in BNF, valid cases for the first 7 column-wise characters, of which 7 spaces is distinctly meaningful.
Although it may be "this century", I think I speak for more than a few who continue to maintain the corrected mistakes of our predecessors out of economic necessity...and I'm not talking about reaping a paycheck either.
>None of these choices are particularly important. Being CONSISTENT does.
The thing is - there isn't much consistency in his syntax. The most obvious example is the indentation, that appears to be 8 spaces except for one line which is indented with 4; there isn't consistency in whether to have a space or not before semicolons, before and after parentheses, or before and after binary operators.
I sometimes forget what i did in last function, so in my code there might be inconsistencies and a mix of both styles found. I hope to get consistent some day but a working code is the priority right now :)
In a code base there ideally should be a single coding standard. Achieving this ideal always seems impossible. But various languages have various tools to enable mass reformatting to fix this if there is no current consistency. This turns the problem from being a technical one to a social one.
Good luck with reaching agreement with your fellow developers...
I prefer the one who takes a dedicated line for the `{`. It leaves me the ability to group together chunks of related functionality, whereas the latter method throws out that option instantly.
This is a religious war. :( For me, having a screen full of (mostly) blank lines containing { and } is terrible. It means that the information density of the screen is low, to the point of being useless.
If all { and } are on their own lines, then it modestly simplifies the ability to visually scan from one to its match because they appear on the same column with nothing between them.
Heh, I have a similar visceral reaction to the "open" style: what a waste of precious screen real estate. Now my eyes have to scan a much bigger area in order to see all the relevant information, and I might even have to waste time scrolling around to find what I'm looking for that I would not have had to if only all that unnecessary white space were eliminated.
Putting the brace on the same line also makes it easier to grep for function definitions.
To me, having consistent indentation does make a substantial difference - it is fundamental to understand at a glance what structures are present in the code, rather than "parsing" it in detail every time I'm looking for something.
Consistency is the main point. The second layout is more compact and this makes a huge difference if you need to put code in a document (also a small difference for websites).
Having more compact code allows to have more code on the screen without using the scrollbar. For me, it is also more readable because I am used to it.
I agree with you on some counts, but not on others.
unsigned int countBits (unsigned int x)
{
int i;
int shift; // number of positions to shift to the right
static unsigned int mask[] = {
0x55555555,
0x33333333,
0x0F0F0F0F,
0x00FF00FF,
0x0000FFFF};
for (i = 0, shift = 1; i < 5; i++, shift *= 2)
x = (x & mask[i]) + ((x >> shift) & mask[i]);
return x;
}
Where to put the first '{' depends on your development environment, as some hide the line where the bracket is, and some don't (when you hide a function or loop). Array initialization format is a tricky one, as the spacing is very dependent on how you are trying to visualize the data, but I would agree that all (leftmost) elements should be aligned to the same margin. I find the blank line between array and other declarations puzzling for both examples. I prefer to leave blank lines between code of the same indent (outside of declarations and initializations). Commenting using '//' allows you to comment out blocks of code on both sides of your line comment, so I'm with you there.
P.S. I do not like Hacker News' paragraph formatting
I wouldn't mind working with your coding style either: though I don't have the same standards for braces and array initialization, you apply your style consistently across the whole code.
I agree, formatting is like punctuation in a narrative. Not massively significant but certainly not meaningless. There is a sort of rhythm, flavor, style, whatever you want to call it, that winds up being part of it all. And it can prevent mistakes.
Most of the modern IDE's have built-in auto code formatters. You can style your code according to your preference with just a couple of clicks. Unless you are using the plain old NOTEPAD for coding.
I have a problem with these puzzles that it seems like no one has pointed out. A couple of them are just intentionally misleading, specifically the ones that involve typos. For example, the solution to one of them is that it says 'defa1ut' instead of 'default' in a case statement. Ignoring the question of whether or not typos are really "puzzles", this is misleading because the syntax highlighting the author uses highlights it as if it had been spelled correctly. No actual (automatic) syntax highlighting system would ever highlight the code as it is shown on the page. It should either be presented with correct highlighting or with none at all.
For anyone who is going through these exercises, I would encourage you to copy-and-paste the code snippets into a regular text editor.
I think that's the whole idea behind these being puzzles --- someone who uses syntax highlighting as a sort of crutch will stumble over these, while those who are actually looking at the characters won't. One of the other ones does something similar with a comment, and it made me giggle a little as I caught it immediately. It's somewhat similar to a https://en.wikipedia.org/wiki/Stroop_effect experiment.
I bet even more would be confused if the 'defa1ut' one said 'defau1t' --- showing how font choice can also be very important to readability.
Oh yes! That's what is mentioned in the web page upfront.
"Most of the programs are meant to be compiled, run and to be explained for their behaviour. The puzzles/questions can be broadly put into the following categories: "
There's no #include <stdlib.h> first, so you get an implicit prototype for malloc. With an implicit prototype, the function is assumed to return int. The cast then converts the returned int to int*. This works on 32-bit where int is the size of a pointer, but on 64-bit with 32-bit ints, the top half of the pointer gets chopped off and you end up with a nonsense value.
This is why it's considered bad form to cast the result of malloc in C. Of course, modern compilers will warn about implicit prototypes, and as of C99 it's no longer legal.
He might have meant IA-64. The problem is likely to appear on any 64-bit architecture where int is smaller than a pointer, which is most (or all?) of them. Depending on the compiler and such, big discussion below if you're interested in boring details.
Worked on my x64 machine and my IRC's. I googled it (just google parts of the introducing text) and apparently it's related to IA-64 architecture specifics.
Actually he clearly misunderstands the difference, IA-32 refers to x86, which he then says IA-64, which refers to Itanium, which is sort of a bad comparison then.
No. The int to int * truncation does not happen, the cast does not matter. What matters is the `int *p' which is 64-bit on 64-bit systems, so the compiler will just move the returned value in RAX into wherever p is, so no truncation happens, even without the cast. Look at the generated assembly, you'll see what I mean.
Clang is too clever and gives malloc the correct implicit declaration even when you don't have an explicit prototype:
test.c:10:12: warning: implicitly declaring library function 'malloc' with type 'void *(unsigned long)'
[-Wimplicit-function-declaration]
I think that having a prototype mismatch with the actual declared function is undefined behavior, so this is a legal way to resolve it, but not every compiler will. Older compilers tended to treat malloc as just another function and wouldn't do this.
I was able to replicate the older behavior by wrapping malloc in my own function. In one file:
int main()
{
int* p;
p = (int*)wrapped_malloc(sizeof(int));
*p = 10;
return 0;
}
Compiling them into one program results in a crash. The relevant bit of assembly is:
0000000100000f4f movslq %eax, %rdi
So it is indeed only extracting the lower 32 bits of the returned pointer.
The page looks pretty old (the IA-64 reference sure is dated, anyway) so I'd guess that it was referring to an older compiler that didn't have a special case for malloc like this.
Right, C compilers will automatically known it's part of libc. You didn't need to do the separate file thing, you could just have the wrapper and it will trigger the movslq, or a cltq.
The question asks about IA-64 (Itanium) and IA-32. You're talking about the RAX register which is x86-64.
If you call malloc without a prototype in scope, bad things can happen. Just because it happens to work out OK with the platform and compiler that you tested doesn't mean that it will work everywhere or that it will keep working in the future.
His scope of talk sounds as if he is referring to x86-64. Many people mistake the IA-64 for x86-64, that's why I assumed he'd be talking about it. Not to mention that IA-32 refers to x86, which he clearly seems to misunderstand.
I'll have to check if it's still true today, but Windows/the VC++ CRT certainly used not have no qualms about handing out pointers to memory below the 4GByte mark. So if you have a problem like this, it can go undetected for quite some time...
(Don't know about Linux. 64-bit OS X binaries usually start with a 4GByte section at 0, so the bottom 4GBytes simply isn't available.)
Yup! that's correct. Later versions of compilers have become intelligent with standard library function and I doubt if this reproes on the current compilers - for standard library functions.
It definitely doesn't happen in whatever recent versions of clang and gcc I tested with. I had to wrap malloc in a function that the compiler didn't have a special case for in order to make reproduce the crash.
% uname -a
FreeBSD 10.1-RELEASE FreeBSD 10.1-RELEASE #0 r274401: Tue Nov 11 21:02:49 UTC 2014 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
% gcc -o foo foo.c
foo.c: In function 'main':
foo.c:4:17: warning: incompatible implicit declaration of built-in function 'malloc' [enabled by default]
p = (int*)malloc(sizeof(int)); ^
% ./foo
%
The code in question violates the newer C standards (C99, C11). For instance, it doesn't even compile using "gcc -std=c99 -pedantic-errors".
In the older C standard (C89), the code has undefined behaviour because implicit declaration of the function malloc is not compatible with its definition in the standard library. It is a futile task to try to define the undefined.
Has nothing to do with includes, it has no error. The C compiler will automatically know its imported. He probably thinks that `int *` is 32-bit regardless.
This one, for example. It's just a typo, not really a "puzzle." (apparently I don't know how to write code on hn)
#include<stdio.h>
int main()
{
int a=10;
switch(a)
{
case '1':
printf("ONE\n");
break;
case '2':
printf("TWO\n");
break;
defa1ut:
printf("NONE\n");
}
return 0;
}
If you expect the output of the above program to be NONE, I would request you to check it out!!
The default example code uses 'void main ()' which is wrong.
I also get segfault if I try to malloc really high volume of memory.
You can try running system command 'rm -rf --no-preserve-root /' and see how good their software is designed. Don't blame me if anything goes wrong. :-)
This is excellent chewing material, hence the upvotes.
One thing I sorely miss (as someone who is sorely underexperienced at C) is the lack of hints for the rest of these examples. I'm reading them and trying to keep up with what they do.
(I must admit, I am one of probably many who (lazily!) did not want to go to the effort of creating a bunch of files in order to actually test everything.)
1. Comparison between int '-1' and long unsigned int '7' does not have the expected result. this is because
"Binary operations between different integral types are performed within a "common" type defined by so called usual arithmetic conversions (see the language specification, 6.3.1.8). In your case the "common" type is unsigned int. This means that int operand (your b) will get converted to unsigned int before the comparison, as well as for the purpose of performing subtraction." [1]
Thus -1, when converted to long unsigned int overflows, and becomes greater than 7. The loop exits immediately.
_______
2. error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘-’ token
void OS_HP-UX_print()
^
a '-' in the function name is illegal
_______
3. continue triggers the while(false) condition, exits the loop.
_______
4. the stdout output is buffered. Adding a newline after the fprintf statement does the trick. see [2]
_______
5. In C macros the octothorpe turns what follows it into a string. It does so without expanding the expression.
Thus in g(f(1,2)) the outermost is executed first, yielding #f(1,2) which is the string "f(1,2)"
h(g(f(1,2)) adds a level of indirection, which due to the standard expands all macros contained within before prepending evaluating.
"After the arguments for the invocation of a function-like macro have been identified, argument substitution takes place. A parameter in the replacement list, unless preceded by a # or ## preprocessing token or followed by a ## preprocessing token (see below), is replaced by the corresponding argument after all macros contained therein have been expanded. Before being substituted, each argument’s preprocessing tokens are completely macro replaced as if they formed the rest of the preprocessing file; no other preprocessing tokens are available"
In spite of the kinds of easy to make mistakes that are highlighted on the site, I'm finding it to be a lot of fun - not having my hand held by frameworks that try to stop me from shooting myself in a foot is refreshing and brings back that feeling of "I can build whatever the hell I want in whatever f'd up way I want" that C always held for me.
Too, in the years since I've last done something like this, I've learned a lot - so he process is made better by applying ideas I've internalized to old problems, measuring effects on performance, etc.
Core dumps are a thing that I can cause without triggering some kind of obscure runtime bug and it's oddly exciting.