Bare-Metal ARM: Writing Your First OS Kernel for the Cortex-M

Ever looked at a high-level OS like Linux or Windows and wondered, “What’s really happening underneath? How does the code actually talk to the silicon?”

Writing your own operating system is the ultimate “deep dive” into computer science. It’s a challenging journey, but the rewards—in terms of pure understanding—are immense.

Today, we’re not going to build the next Linux. We’re going to do something more fundamental: write the bare-minimum “kernel” to take an ARM processor from a hard reset, through a proper C environment setup, and into our own main() function. This is the foundation upon which all other OS features (scheduling, memory management, drivers) are built.

Our target will be the ARM Cortex-M family. This family is ubiquitous in the microcontroller world (think STM32, Raspberry Pi Pico, nRF52). We’ll be focusing on the concepts from the ARMv7-M Architecture Reference Manual (ARM ARM), the bible for this kind of work. While our concepts apply broadly, our final “blinky” example will use register addresses for a popular STM32F103 (“Blue Pill”) board, which features a Cortex-M3 core.

1. The Tools You’ll Need

Before we write a line of code, we need a “cross-compiler.” We’re writing code on our PC (likely x86) to run on a different architecture (ARM).

GNU Arm Embedded Toolchain: This is the core of our setup. It provides arm-none-eabi-gcc (the compiler), arm-none-eabi-ld (the linker), and arm-none-eabi-objcopy (for converting file formats). The none-eabi part is crucial: it means the compiler won’t assume any underlying OS (the “none”) and will use the “Embedded Application Binary Interface” (EABI).
A Development Board: An STM32 “Blue Pill” (Cortex-M3) or a “Nucleo” board (Cortex-M4/M7) is a perfect, inexpensive starting point.
A Debugger/Flasher: An ST-Link V2 (or built-in on Nucleo boards).
Build Tools: make for automation and openocd for flashing the code to the chip.

2. The Boot Sequence: From Reset to `main()`

When you apply power to a Cortex-M chip, it doesn’t magically look for a main() function. The hardware is hard-wired to do something very specific, defined by the ARMv7-M architecture:

Read Initial Stack Pointer: The CPU looks at memory address 0x00000000 and loads whatever 4-byte value is there into the Main Stack Pointer (MSP) register.
Read Reset Vector: The CPU looks at memory address 0x00000004, reads the 4-byte value, and sets the Program Counter (PC) register to that value. This value is the address of our reset code.
Execute: The CPU begins executing instructions at the address it just loaded into the PC.

This pair of addresses—and the list of other “exception” handlers that follow—is called the Vector Table. This is the single most important data structure in a bare-metal ARM system. Our first job as an OS developer is to create this table.

3. The Vector Table (Our OS’s Front Door)

We’ll create a new C file, let’s call it startup.c, to define our vector table. On most microcontrollers, the flash memory (where our code lives) is “aliased” to address 0x00000000 on boot, so we just need to make sure this table is the very first thing in our final program.

Here’s a minimal vector table.

// In startup.c

#include <stdint.h>

/*
 * We need to tell the linker where the end of our RAM is.
 * We'll define a symbol_estack in our linker script.
 * 'extern' tells C this variable exists, but is defined elsewhere.
 */
extern uint32_t _estack;

/*
 * The Reset_Handler is our entry point. It's also defined elsewhere
 * (in this same file, just further down).
 */
extern void Reset_Handler(void);

/*
 * This is our minimal vector table.
 * It's an array of 'void*' (generic pointers).
 * We use __attribute__((section(".isr_vector"))) to tell the compiler
 * to put this specific array in a section named ".isr_vector".
 * We will then tell the *linker* to place this section at the
 * very beginning of our program.
 */
__attribute__((section(".isr_vector")))
void *vector_table[] = {
    &_estack,          // 0. Initial Stack Pointer (MSP)
    Reset_Handler,     // 1. Reset_Handler
    // ... We would add other handlers here (HardFault, SysTick, etc.)
    // For now, we'll let them default.
};

4. The Linker Script (Our Memory Map)

We’ve created a vector table, but how do we force it to be at address 0x00000000? We can’t trust the compiler to guess. We must instruct the linker.

We do this with a Linker Script (e.g., linker.ld). This file is the “blueprint” for our final executable. It tells the linker:

Where FLASH and RAM are physically located.
What to call the “end of RAM” (for our _estack symbol).
To place our .isr_vector section first at the start of FLASH.
Where to put all other code (.text).
Where to put initialized global variables (.data) and uninitialized ones (.bss).

This is a minimal linker script for a typical STM32F103 (64KB FLASH, 20KB RAM).

Plaintext

/* In linker.ld */

/* Define our memory regions */
MEMORY
{
  FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 64K
  RAM (rwx)  : ORIGIN = 0x20000000, LENGTH = 20K
}

/* Define a symbol for the end of RAM, which is the top of our stack */
_estack = ORIGIN(RAM) + LENGTH(RAM);

/* Define the program's entry point */
ENTRY(Reset_Handler)

/* Define the sections of our program */
SECTIONS
{
  /* The .isr_vector section goes FIRST at the origin of FLASH */
  .isr_vector :
  {
    KEEP(*(.isr_vector)) /* 'KEEP' prevents the linker from discarding it */
  } > FLASH

  /* Then, all other code (.text) */
  .text :
  {
    *(.text*)
  } > FLASH

  /* * This is for initialized global variables (e.g., int x = 10;).
   * We must store them in FLASH (read-only)
   * and tell the linker to copy them to RAM at startup.
   * _la_data is the "load address" in FLASH.
   */
  .data :
  {
    _sdata = .; /* Start of .data in RAM */
    *(.data*)
    _edata = .; /* End of .data in RAM */
  } > RAM AT> FLASH
  _la_data = LOADADDR(.data); /* Get the FLASH address */

  /* * This is for uninitialized global variables (e.g., int y;).
   * We just need to reserve space for them in RAM and clear it to zero.
   */
  .bss :
  {
    _sbss = .; /* Start of .bss */
    *(.bss*)
    *(COMMON)
    _ebss = .; /* End of .bss */
  } > RAM
}

Note: Why 0x08000000 for FLASH? While the ARM Core maps vectors to 0x00000000, the vendor (ST) physically places FLASH at 0x08000000 and has a “boot” pin that aliases this region to 0x00000000 on reset. We link against the physical address.

5. The Startup Code (The `Reset_Handler`)

Now we can write our Reset_Handler function. This is the true “kernel” initialization. Its job is to create a C-compatible environment and then call main().

What does a C environment need?

.data initialized: Global variables with values (e.g., int x = 10;) must be copied from their storage in FLASH to their runtime location in RAM. Our linker script gives us the addresses: _la_data (FLASH source), _sdata (RAM destination), and _edata (RAM end).
.bss zeroed: Global variables without values (e.g., int y;) must be zeroed out. The linker gives us _sbss (RAM start) and _ebss (RAM end).

Once this is done, we can safely call main().

// In startup.c (continued)

/* Define the symbols from the linker script */
extern uint32_t _sdata, _edata, _la_data;
extern uint32_t _sbss, _ebss;

/* Our main application */
extern int main(void);

/* This is our Reset_Handler, the entry point of the program */
void Reset_Handler(void)
{
    uint32_t *data_flash = &_la_data;
    uint32_t *data_ram = &_sdata;

    /* 1. Copy .data section from FLASH to RAM */
    while (data_ram < &_edata)
    {
        *data_ram++ = *data_flash++;
    }

    uint32_t *bss_ram = &_sbss;

    /* 2. Zero-fill the .bss section in RAM */
    while (bss_ram < &_ebss)
    {
        *bss_ram++ = 0;
    }

    /* 3. Call main() */
    main();

    /* * If main() ever returns (it shouldn't), 
     * just loop forever.
     */
    while (1);
}

6. The Application: `main.c` (Blinky!)

We’ve done it! We’ve handled the boot process and prepared the C environment. Now we can finally write a normal main.c file.

To prove it works, we’ll blink the onboard LED (PC13) on an STM32F103. In bare-metal, there are no “driver” functions. We talk directly to the hardware by writing values to specific memory addresses. This is called Memory-Mapped I/O.

The STM32F103 datasheet tells us:

To use GPIOC, we must first enable its clock in the RCC_APB2ENR register.
To set pin 13 as an output, we must configure the GPIOC_CRH register.
To turn the pin on/off, we write to the GPIOC_ODR register.

// In main.c
#include <stdint.h>

/* Define the hardware register addresses */
#define RCC_BASE      0x40021000
#define GPIOC_BASE    0x40011000

#define RCC_APB2ENR   (*((volatile uint32_t*)(RCC_BASE + 0x18)))
#define GPIOC_CRH     (*((volatile uint32_t*)(GPIOC_BASE + 0x04)))
#define GPIOC_ODR     (*((volatile uint32_t*)(GPIOC_BASE + 0x0C)))

/* A simple blocking delay function */
void delay(volatile uint32_t count)
{
    while (count--);
}

int main(void)
{
    /* 1. Enable the GPIOC peripheral clock */
    // RCC_APB2ENR register, set bit 4 (IOPCEN)
    RCC_APB2ENR |= (1 << 4);

    /* 2. Configure Pin PC13 as a push-pull output */
    // GPIOC_CRH register (controls pins 8-15)
    // We want to set PC13 (bits 20-23) to '0011' (Output, max 50MHz)
    GPIOC_CRH &= ~(0xF << 20); // Clear existing configuration
    GPIOC_CRH |=  (0x3 << 20); // Set as Output, 50MHz

    /* 3. The main application loop */
    while (1)
    {
        /* Set PC13 low (turns LED on on Blue Pill) */
        GPIOC_ODR &= ~(1 << 13);
        delay(300000);

        /* Set PC13 high (turns LED off) */
        GPIOC_ODR |= (1 << 13);
        delay(300000);
    }
    
    return 0; // Should never be reached
}

7. Building and Running

To compile this, you’d use a simple Makefile:

Makefile

# Simple Makefile

CC = arm-none-eabi-gcc
LD = arm-none-eabi-ld
OBJCOPY = arm-none-eabi-objcopy

# Flags for a Cortex-M3
CFLAGS = -mcpu=cortex-m3 -mthumb -Wall -g -std=c11
LDFLAGS = -T linker.ld -nostdlib

# Our source files
SRCS = startup.c main.c
OBJS = $(SRCS:.c=.o)

TARGET = kernel

all: $(TARGET).bin

$(TARGET).elf: $(OBJS)
	$(CC) $(LDFLAGS) $(CFLAGS) -o $@ $(OBJS)

%.o: %.c
	$(CC) $(CFLAGS) -c $< -o $@

$(TARGET).bin: $(TARGET).elf
	$(OBJCOPY) -O binary $< $@

clean:
	rm -f $(OBJS) $(TARGET).elf $(TARGET).bin

Running make will produce kernel.bin. You can then use openocd (or STM32CubeProgrammer) to flash this binary to your board. If all went well, you’ll have a blinking LED, powered by a kernel you wrote from scratch.

What’s Next?

This is the foundation. We have a single-threaded “OS” that can run one task. From here, the entire world of embedded OS development opens up:

SysTick Timer: Using the built-in Cortex-M SysTick timer for proper, non-blocking delays.
Interrupts: Handling a button press using an EXTI (External Interrupt).
Scheduling: Saving and restoring the CPU state (all the registers) to perform context switching—the heart of a true multi-threaded Real-Time Operating System (RTOS).

It’s a long road, but you’ve just taken the most important step. Happy hacking!

Bare-Metal ARM: Writing Your First OS Kernel for the Cortex-M

1. The Tools You’ll Need

2. The Boot Sequence: From Reset to `main()`

3. The Vector Table (Our OS’s Front Door)

4. The Linker Script (Our Memory Map)

5. The Startup Code (The `Reset_Handler`)

6. The Application: `main.c` (Blinky!)

7. Building and Running

What’s Next?

hosting-panel

rpcemu-extended

oak-script

video-vault

riscos-mail-module

streamdeck-webmonitor

riscos-door-server

riscos-telnet-server

riscos-armbbs

riscos-armbbs-example-bbs

riscos-armbbs-door-libraries

riscos-cjson-library

1. The Tools You’ll Need

2. The Boot Sequence: From Reset to main()

3. The Vector Table (Our OS’s Front Door)

4. The Linker Script (Our Memory Map)

5. The Startup Code (The Reset_Handler)

6. The Application: main.c (Blinky!)

7. Building and Running

What’s Next?

2. The Boot Sequence: From Reset to `main()`

5. The Startup Code (The `Reset_Handler`)

6. The Application: `main.c` (Blinky!)