Javascript · C++ · DSA/CS

The C iceberg: surprising facts every C beginner should know

By Abdulkader Safi / June 09, 2026 / 5 min read

A beginner-friendly tour of C's weirdest corners: pointers, array decay, the stack and heap, struct padding, octal literals, and why a byte isn't always 8 bits.

Dark terminal showing C pointer and array code with an iceberg motif

On this page

C is old. It first showed up in the early 1970s, and it never really left. Your operating system, your database, and most of the languages you use day to day are built on top of it or borrowed ideas from it. That age is a gift and a curse. The gift is that C is everywhere. The curse is that it carries decades of strange behavior, and some of that weirdness has spread into newer languages where it still trips people up.

I want to walk you through a stack of those facts, from the ones you really should know to the obscure trivia at the bottom. Think of it as an iceberg. The top is the stuff you meet on day one. The deeper you go, the stranger it gets. By the end you will understand a famous "JavaScript is weird" joke that is actually C's fault.

Almost none of this needs prior knowledge. Where I use a term, I explain it in one line.

Pointers, the thing everyone trips on first

A pointer is just a variable that holds the location of another variable in memory. That's it. Instead of storing a number, it stores the address where a number lives.

The syntax is what makes it feel scary. Here are the three pieces you need:

int *p declares a pointer called p that points to an integer.
&a gives you the memory address of the variable a. Read & as "address of".
*p gives you the value stored at the address p holds. Read * as "value at".

int a = 42;
int *p = &a;   // p now holds the address of a
printf("%d\n", *p);  // prints 42, the value at that address

Once those three click, most of C opens up. Pointers are how C passes big things around cheaply, how it builds linked lists and trees, and how it talks to hardware.

Arrays are (not) pointers

Here's the fact that made pointers finally click for me. An array in C is basically a pointer to its first element. Basically, not exactly. Hold that thought.

Because an array points at its first element, you can do math on it. Add 1 to the pointer and C gives you the address of the second element. The clever part is that C accounts for the size of each element automatically, so you don't multiply by anything yourself.

That means *(array + 1) gives you the same thing as array[1]. And in general, *(array + index) is the same as array[index]. The square-bracket syntax you already know is just shorthand for pointer math.

Now for the party trick. If array[index] is the same as *(array + index), and addition doesn't care about order (2 + 3 is the same as 3 + 2), then you can swap the two and write index[array]. It compiles. It runs. It gives the exact same result.

int nums[3] = {10, 20, 30};
printf("%d\n", nums[1]);   // 20
printf("%d\n", 1[nums]);   // also 20, and yes this is real C

Never write 1[nums] in real code. But it proves the point: the brackets are sugar over pointer math.

So why did I say "basically but not exactly"? Because an array is not a pointer. The array owns its memory and knows its own size. What you're seeing when an array acts like a pointer is called array decay: in most situations, C quietly converts the array into a pointer to its first element. The array itself is still its own thing. If you want the deeper why-it-matters behind tricks like this, I get into it in where do we actually use data structures and algorithms.

The stack: where your function's variables live

When your program runs a function, it needs somewhere to keep that function's local variables. That place is the stack.

Picture a stack of plates. Every time you call a function, a new plate (called a stack frame) goes on top. The frame holds all the local variables for that function plus a note about where to go back to when the function finishes. When the function returns, the plate comes off the top and those local variables are gone.

This leads to one of the most common beginner mistakes in C: returning a pointer to a local variable.

int *make_number() {
    int x = 5;
    return &x;   // danger: x dies when this function returns
}

The moment make_number returns, its stack frame is popped and x no longer exists. The address you handed back now points at memory that could hold anything. Using it is what C calls undefined behavior, which is the standard's way of saying "anything can happen, and none of it is your friend". I'll come back to undefined behavior, because it's the theme that ties a lot of C bugs together.

The heap: memory that outlives the function

If you need memory to survive after a function ends, or you need a lot of it, you use the heap. The heap is memory you ask for and give back by hand.

You ask for it with malloc, short for "memory allocate". You tell it how many bytes you want and it hands you a pointer to that block. When you're done, you call free to give it back.

int *p = malloc(sizeof(int));  // ask for room for one int
*p = 5;                        // use it
free(p);                       // give it back when done

Forget to call free and you get a memory leak: your program keeps grabbing more memory and never returns it, so it slowly bloats until it crawls or crashes. Forgetting to free is one half of memory bugs in C. Using memory after you free it is the other half.

Why your struct is bigger than you think

A struct is a way to group related variables together, a bit like a simple class without the methods. Here's one:

struct Thing {
    char  a;
    int   b;
    char  c;
};

Quick question: how much memory does this take? A char is 1 byte and an int is 4 bytes, so 1 + 4 + 1 = 6, right?

On most systems it's actually 12 bytes. The reason is struct padding.

Processors read memory in chunks, usually 4 or 8 bytes at a time. To keep that fast, the compiler lines up each field on an address that matches its size, and it pads the gaps with invisible bytes to make the alignment work out. After the first char, it adds 3 bytes of padding so the int starts on a clean 4-byte boundary. After the second char, it pads again so the whole struct is a multiple of 4. Six bytes of real data, six bytes of padding.

Now reorder the fields so the two chars sit together:

struct Thing {
    int   b;
    char  a;
    char  c;
};

This one takes 8 bytes. Same data, less waste, just because the two small fields share space instead of each forcing their own padding. Field order matters. If you're creating millions of these, or working on a device with tight memory, grouping smaller types together saves real space.

The compiler is smarter than you expect

Modern C compilers do some genuinely surprising things. My favorite is called scalar evolution.

Say you write a simple loop to add up the numbers from 1 to n:

int sum = 0;
for (int i = 1; i <= n; i++) {
    sum += i;
}

That loop runs n times. Double n and it takes twice as long. But turn on even basic optimization and the compiler can replace the whole loop with the formula for triangular numbers, n * (n + 1) / 2. Now it runs in the same tiny amount of time no matter how big n is. And it isn't matching a pattern from a list. It works the result out from first principles.

The lesson isn't "write clever loops". It's the opposite. Write clear, readable code and let the compiler do the optimizing. Twisting your code into knots to save a few cycles usually just makes it harder to read, and sometimes it even runs slower because the compiler can no longer understand your intent.

Bit tricks on negative numbers

C lets you flip the individual bits inside a value with bitwise operations. So what happens when the number is negative?

First you need to know how negative numbers are stored. Almost every machine uses a scheme called two's complement. It works like normal binary, except the leftmost bit counts as a negative amount instead of a positive one. That one change lets the same addition circuit handle both positive and negative numbers.

Here's the catch for beginners. Before 2023, the C standard never required two's complement, so doing bit operations on signed (can-be-negative) numbers gave results that depended on your compiler and machine. As of the C23 standard, two's complement is finally required, so these operations are now well defined everywhere. If you're starting today, you're on the good side of that change.

The standard library has unsafe tools in the box

Some of C's built-in functions will hurt you if you trust them blindly.

Take strcpy, which copies text from one place to another. It does not check whether the destination has enough room. If the text is longer than the space you gave it, it writes past the end. That's a buffer overflow, and it's behind a huge share of real security holes. The fix is to use the size-aware versions and always know your buffer length.

Then there's atoi, which turns text like "42" into the number 42. The problem is it has no way to tell you when it fails. Hand it garbage and it quietly returns 0 with no complaint, so a parsing error can sail straight through unnoticed. For real work, strtol is the safer choice because it can report what went wrong.

If security is your angle here, the same "trust nothing from outside" instinct shows up all over modern apps, which I get into in why your auth system is probably wrong and how to fix it.

Digraphs and trigraphs: museum pieces

Here's a piece of pure history. Decades ago, some keyboards didn't have characters like the curly braces { and }. So C added alternative spellings.

Digraphs are two-character stand-ins. You can write <% and %> instead of { and }. This is valid C:

int main() <%
    return 0;
%>

Trigraphs go further, using three characters that start with two question marks, and they make code even uglier. These days trigraphs throw a compiler warning by default and have to be switched on with a flag. You will almost certainly never need either. They exist because C is old enough to remember keyboards that couldn't type a brace.

main is not where your program starts

You probably think main is the first thing that runs. It isn't, quite.

Before main can run, the program has setup to do: prepare the command-line arguments, get the standard library ready, and so on. That happens in a function called _start, which then calls your main. You can even replace _start with your own and take full control of how the program boots.

When would you want to? If you're writing something tiny or working on an embedded chip with no room to spare, you might skip all the standard-library setup and do only what you need. For most programs, leave it alone. It's just good to know there's a step before the step you thought was first.

A byte is not always 8 bits

This one sounds pedantic until it bites you. You know a char is 1 byte. You probably also "know" a byte is 8 bits.

The C standard guarantees the first part: a char is one byte. It says nothing about how many bits are in that byte. Because the bit count is never pinned down, it's left to the machine. On almost everything you'll touch it's 8. But there are real architectures, mostly digital signal processors and specialized embedded chips, where a byte is 16, 24, or even 32 bits.

If you ever need the true number, it's in the CHAR_BIT macro from limits.h. Don't hardcode 8 in code that has to run on exotic hardware.

Don't start a number with 0

And here's the fact that the whole iceberg was building toward. You've seen the posts mocking JavaScript, where 010 == "8" comes out true and everyone laughs at how broken JavaScript is. The thing is, this is true in C too, and C is where it comes from.

So what's going on? Sometimes you want to write numbers in a different base. C lets you mark the base with a prefix. The common one is 0x for hexadecimal (base 16), so 0xFF is 255. You can also write numbers in octal (base 8), and the prefix chosen for octal, long ago, was just a leading 0.

So 010 isn't ten. It's 1-zero in base 8, which works out to 8.

printf("%d\n", 010);  // prints 8, not 10

That same rule got carried into JavaScript, which is why the famous "weird JavaScript" screenshot is really inherited C behavior. If you've enjoyed seeing where one language's quirks leak into another, that's a running theme in everything you need to know about ES2025.

The practical takeaway is small and useful: never pad a number with a leading zero unless you actually mean octal. Writing 09 to line things up will either confuse you or refuse to compile, because 9 isn't a valid octal digit.

What to do with all this

You don't need to memorize an iceberg to write good C. But a handful of these will save you real pain early on, so keep them close:

A pointer holds an address; * reads the value there, & takes the address.
Arrays decay into pointers, which is why array[i] is just pointer math.
Never return a pointer to a local variable. It dies with the function.
Every malloc needs a matching free, or you leak memory.
Reorder struct fields by size to cut wasted padding.
A leading 0 means octal, so 010 is 8.

If you're learning C as a foundation for going lower or building bigger, the concepts here carry straight into systems work. I put them to use in building a game engine from scratch with C and C++, where memory layout and pointers stop being trivia and start being the whole job.

Pick one fact from this list you didn't know, open a compiler, and prove it to yourself. That's how every one of these stuck for me.

Frequently asked questions

Are arrays the same as pointers in C?

No. An array name often decays into a pointer to its first element, which is why array[i] and *(array + i) do the same thing. But an array is not a pointer. The array owns its storage and knows its full size at compile time, while a pointer just holds an address. The decay is automatic in most contexts, which is what makes them feel identical.

Why does 010 equal 8 in C?

A leading zero is the literal prefix for octal, which is base 8. So 010 is read as 1 times 8 plus 0, which is 8. This is not a bug. C uses 0x for hexadecimal and a bare 0 for octal. The same rule was carried into JavaScript, which is why people post 010 == 8 as a JavaScript oddity when it actually starts in C.

Is a byte always 8 bits in C?

Not according to the C standard. The standard guarantees a char is one byte, but it never fixes how many bits are in a byte. On almost every machine you will use it is 8, but some digital signal processors and embedded chips use 16, 24, or 32 bits. The real value is in the CHAR_BIT macro from limits.h.

Why does my struct take more memory than the sum of its fields?

Because of padding. The compiler aligns fields to addresses that match their size so the processor can read them efficiently, and it adds invisible bytes to make that happen. A struct with a char, an int, and a char can take 12 bytes instead of 6. Reordering the fields so similar sizes sit together often shrinks it, in that case down to 8.

What is undefined behavior in C?

Undefined behavior is any operation the C standard does not define the result of, like reading memory after it is freed or returning a pointer to a local variable. The compiler is allowed to do anything: crash, return garbage, or appear to work until it doesn't. It is the source of many of C's hardest bugs, so the goal is to never trigger it.

Abdulkader Safi Senior & Lead Software Engineer

Building scalable systems and developer-first tools. Lead Software Engineer at DSRPT.

← Previous openapi-typescript: stop hand-writing your API types Next → I redesigned my portfolio from tech dashboard to editorial

FAQ

Frequently asked

: No. An array name often decays into a pointer to its first element, which is why array[i] and *(array + i) do the same thing. But an array is not a pointer. The array owns its storage and knows its full size at compile time, while a pointer just holds an address. The decay is automatic in most contexts, which is what makes them feel identical.
: A leading zero is the literal prefix for octal, which is base 8. So 010 is read as 1 times 8 plus 0, which is 8. This is not a bug. C uses 0x for hexadecimal and a bare 0 for octal. The same rule was carried into JavaScript, which is why people post 010 == 8 as a JavaScript oddity when it actually starts in C.
: Not according to the C standard. The standard guarantees a char is one byte, but it never fixes how many bits are in a byte. On almost every machine you will use it is 8, but some digital signal processors and embedded chips use 16, 24, or 32 bits. The real value is in the CHAR_BIT macro from limits.h.
: Because of padding. The compiler aligns fields to addresses that match their size so the processor can read them efficiently, and it adds invisible bytes to make that happen. A struct with a char, an int, and a char can take 12 bytes instead of 6. Reordering the fields so similar sizes sit together often shrinks it, in that case down to 8.
: Undefined behavior is any operation the C standard does not define the result of, like reading memory after it is freed or returning a pointer to a local variable. The compiler is allowed to do anything: crash, return garbage, or appear to work until it doesn't. It is the source of many of C's hardest bugs, so the goal is to never trigger it.

Enjoyed this? Start a project.

Start a conversation →