Day 1 of making an ARM assembler š ļø I'm learning assembly
A little background on why Iām doing this, I plan to complete the From the Transistor to the Web Browser course in 2025. Iāve been slacking on this so if I donāt start picking up the pace starting now, itāll never get done and I will never learn how computers actually work.
I come at this with the idea that I know practically nothing about computers besides what Iām learning in the course, so that I keep a clear mind and let the ideas work their magic, instead of coming in with preconceived notions. I used to think that I had work on finding all the solutions myself. But in fact Iām starting to change my idea of what it means to learn fast and effectively, Iām slowly building a mental model and a methodical approach but I donāt want to rush it. Iāll let the ideas simmer in my subconscious for a little while before I put it on paper, and ultimately bring it to this blog and my study sessions!
With that introduction being done, I still want to preface this article with some ideas on how I want to approach the rest of the course and learning in general. The basic idea is to start by exposing myself to as much information about the topic as possible and follow along with study material, tutorials, etcā¦ After that I want to tweak the project just a bit to understand the core of it. Then I extract the learnings from that cycle into an article to test my knowledge (gaps) which is what Iām doing right here. As a practical complement to this article I also started a mini-course called Assemblings, itās the equivalent of rustlings for ARM assembly. I just asked Claude.ai to make a course outline and ran with it, weāll see how that goes I guessā¦
You can do this as well if you wish, itās similar to a concept called Ć©tudes in music.
An Ć©tude (/ĖeÉŖtjuĖd/; French: [e.tyd]) or study is an instrumental musical composition, usually short, designed to provide practice material for perfecting a particular musical skill.
ā Wikipedia
Jim Morrison used to follow a similar practice to improve his vocabulary[Ā¹] by writing stories around each new word he would learn. This cycle of learning new information and practicing applying it in different contexts and scenarios is the blueprint to effective knowledge consolidation, as your brain becomes flexible to the ideas and understands how and when it makes sense to apply them.
Meeting Assembly
At itās core, assembly is simply machine code in a human-readable format. We have some directives to point to certain sections/areas in memory, directives to make labels global so theyāre accessible to external programs, etcā¦ But the most important thing to remember, and focus on is the first sentence of this paragraph!
So whatās machine code?
Well hereās an example from the program weāre going to be covering today: 1110 0011 1010 0000 0111 0000 0000 0001
. As you can see itās complete gibberish, and a totally impractical way to program. But this is the only thing your machine understands, patterns of electrical signals. So to combat this weāve made a programming language that is one abstraction layer above that, which is called assembly.
Now you can also notice that our machine code example is 32 bits, thatās because Iām going to be working with the ARMv7 in mind which is a 32-bit CPU, so all our instructions must respect that word[Ā²] length.
I followed a tutorial by Laurie Wired to learn my very first instructions in assembly. The main goal is to learn how to successfully exit a program. We do this by using the registers at our disposal, for the ARMv7 there are 16 of them. Registers are small blocks of fast access memory that are very close to the processor, this is what we use to do our business.
For now thatās basically all we have to remember about how our ARMv7 CPU works:
- CPU gets instructions (from main memory)
- Execute instructions on the registers
- [optionally write back to main memory]
simple
My first exercise
Letās not waste any more time, hereās the code:
.global _start
.section .text
_start:
mov r7, #1
swi 0
As you can see it already looks much better than the machine code I showed you earlier. So what does all of this mean?
Line 1: Declares the _start
label as global, meaning that other programs will have access to it. If youāre already familiar with programming, you can think of this as scope, but at the OS level!
Line 3: Declares a section, a very particular one at that. The .text
section, is where weāll put all the instructions of our program.
Line 5: This is the entry point of our program, I didnāt mention this yet but what the _start
label does, is stand as a placeholder for the starting location in memory for the first instructions, and that tells our program where to go look for them. Weāll see what this actually looks like later.
Line 6: Here goes our first instruction! We use mov
which is an instruction that moves a value to a register. In our case weāre putting the immediate value #1
inside our R7
register. This is actually the syscall
[Ā³] register, and putting 1
inside it calls the exit
system call. (Something that happens in the background, is that the actual exit code gets fetched from R0
. You can manually set that exit code but if you donāt, it automatically gets initialized as 0
to signal a successful exit.)
Line 7: The instruction we previously called i.e. mov r7, #1
doesnāt actually get executed until we call a software interrupt or swi
. The actual implications of this command are above my current level of understanding, but Iāll get back to it later in this journey.
So this is a high level overview of the source code, youād now have to assemble and link it for it to turn into an ELF executable!
Once youāve done that we can start playing with the executable to extract more knowledge about this whole process, and how assembly works. Iād first check that you get a successful exit call, which should be the value you manually put there, or a 0
.
Letās review our disassembled executable now:
Disassembly of section .text:
00008000 <_start>:
8000: e3a07001 mov r7, #1
8004: ef000000 svc 0x00000000
Now perhaps the _start
label makes a bit more sense to you, as you can see itās serves as a placeholder that points to the memory location of the first instruction. 0x8000
in our case. Under that we can see our two instructions, in both hexadecimal and assembly representation! Well do you remember our machine code? 1110 0011 1010 0000 0111 0000 0000 0001
Perhaps itās a little more clear now but it is our first instruction! (mov r7, #1
)
1110 0011 1010 0000 0111 0000 0000 0001
is the same as e3a07001
is the same as mov r7, #1
1110 1111 0000 0000 0000 0000 0000 0000
is the same as ef000000
is the same as svc 0x00000000
Whatās also worth noting is the memory location to the left of our instructions. Itās 4 bytes/32 bits here because weāre working with the ARM instruction set, and all instructions within that ISA are 4 bytes long indeed. But the ARMv7 supports both ARM and Thumb instructions sets, which are 16-bit instructions so theyād take up less space in memory.
If you were wondering why our swi
instruction transformed to svc
thatās because swi
is basically a legacy instruction, starting from the ARMv6 theyāve introduced svc
because it better describes what it does, itās the Supervisor Call. However swi
is still accepted in many assemblers for backwards compatibility and also because itās semantically indentical. These are all little things Iām excited to get into as I learn how computers really work!
I also have to mention that while Iāve simplified the instruction structure quite a lot here, we have to keep in mind that their capabilities are much more intricate, Iāve seen things like conditionals which are set with a few bits in the documentation. But I see these things as advanced for now, but Iāll get into them in due time!
Endnotes
That pretty much concludes my first experience with ARM assembly! As evidenced by the excessive use of exclamation marks throughout this article, Iām quite happy to be learning this stuff and Iāll try to make time for it each day, and who knows, if I start working on it more regularly Iāll have the capacity to digest my learnings faster and write articles that are a little less verbose and all over the place as this one!
As I always say, if you know computers better than I do, and spotted a few (or many) mistakes in this article, feel free to reach out on X @pindjouf.
References:
[1] Interview with Jim Morrisonās father and sister YouTube, Aug 9, 2010. [Online video]. Available: https://youtu.be/Kz63-q8otYM?si=VZ8Q1r7DmZ8yJLZQ&t=16
[2] Word (computer architecture), Dec 24, 2024. [Wikipedia article]. Available: https://en.wikipedia.org/wiki/Word_(computer_architecture)
[3] List of ARM system calls, 2024. [Online index]. Available: https://arm.syscall.sh/