Writing a toy DNS Resolver in Rust

June 23, 2023

I’ve been trying to learn Rust for awhile now, sporadically reading the Rust book during my free time. A couple of weeks ago, I chanced upon Julia Evans’ Introducing “Implement DNS in a Weekend” and thought “oh hm this seems pretty cool! maybe I can do this in Rust as a fun weekend project”. Since the guide was written in Python, it was an opportunity to refresh my knowledge of DNS and practice Rust at the same time. That was about three weeks ago. The project took longer than expected (surprise!) but it was definitely a good distraction from just reading the book. The book had questions to follow along with and a mini project every few chapters, which were great, but they were still quite “guided”, so I appreciated the chance to hack on something and reify the concepts.

A protocol is just a set of predefined (and sometimes complicated) rules

That was tautological, but as I was working on parsing a nameserver’s DNS response and having to wrangle between the bytes that represented u16 numbers, ascii code points or just the byte itself as a u8 number, it occured to me that in implementing a protocol, a lot of values needed to be hardcoded and the meaning of the values weren’t obvious (unless one is familiar with the protocol). For example, when decoding a domain name in the packet, the first byte represents the length of the domain name components, while the next length bytes each represent a limited subset of ascii characters. Or when parsing a DNS header, the bytes are assumed to represent certain values.

fn parse(buf: &[u8]) -> DnsHeader {
    DnsHeader {
        id: u16::from_be_bytes(buf[0..2].try_into().unwrap()),
        flags: u16::from_be_bytes(buf[2..4].try_into().unwrap()),
        num_questions: u16::from_be_bytes(buf[4..6].try_into().unwrap()),
        num_answers: u16::from_be_bytes(buf[6..8].try_into().unwrap()),
        num_authorities: u16::from_be_bytes(buf[8..10].try_into().unwrap()),
        num_additionals: u16::from_be_bytes(buf[10..12].try_into().unwrap()),
    }
}

To be expected or 🪄🔢?

Having small and frequent sanity checks when parsing raw data is incredibly helpful

Reading bytes in a response and assuming the position of certain values can make for really sneaky bugs if there’s an off by one (or more) error. It helped that the response to example.com’s DNS query (what i prototyped against) was small enough to print and work with, but even then, what was more helpful was forcing myself to develop a systematic approach to keeping track of bytes read and making sure that assumptions about the data being interpreted along the way is being validated often (either through print or assertion statements).

Rust exposes the complexities of its string type upfront

One of the more difficult parts of the project for me was at the beginning, when I was attempting to translate the Python code for building the DNS query into Rust. I was still new to Rust’s slice type and unique handling of strings. This, combined with a superficial understanding of the ownership rules, led to my first pass translation running into a flood of compiler errors. It probably helped reduce the number of bugs and debugging time overall, but was daunting to get through. Rereading the chapters related to them (4.4 and 8.2 respectively) before continuing and lots of googling afterwards helped a lot.

Rust’s ownership system requires memory management awareness

19: A language that doesn’t affect the way you think about programming, is not worth knowing. —Alan Perlis, Epigrams on Programming

I’m not sure if I’d put it as emphatically, but I think it’s really fun to learn a language that encourages one to think of programming and problem solving through a different lens. I’ve found that strongly typed languages like OCaml encourages one to be more aware and deliberate of a program’s typings and write code in a certain way. Logic programming languages such as Prolog makes one think of and solve problems in a different way than an imperative language such as C would. To a similar effect, I was curious if Rust’s ownership system would introduce a new perspective when approaching programming problems.

Based on my experience with this project, I think the ownership system made me more aware of whether data held by a variable is stored on the stack or heap (on top of being strongly typed as well). More than once I found myself asking, “who owns this data on the heap?” and “where is its owner located?” This was especially interesting since I haven’t had to think about memory management since I worked on Pintos in my OS class a decade ago, and even so, the ownership rules are new to me and is starting to shape how I’m thinking about data in collections.

`match` statements!

The last time I got to use them was when I was writing OCaml in school, and it’s one of the things that I’ve missed when using JS (still waiting on this TC39 proposal 😔) so it’s great that Rust has quite expressive support for them.

if let Some(answer) = get_answer(&response) {
    match answer {
        DnsRecord {
            data: DnsRecordData::Ipv4Addr(ip),
            type_: TYPE_A,
            ..
        } => return Ok(*ip),
        DnsRecord {
            data: DnsRecordData::Name(name),
            type_: TYPE_CNAME,
            ..
        } => return resolve(name, TYPE_A),
        _ => {
            panic!("resolve: something went wrong")
        }
    }
}

Using match to destructure the data enum based on a DNS record's class value.

Other random Rust and DNS things I found

Working with bytes and strings in Rust and having to translate bytes into ascii when Rust only supports UTF-8 forced me to be more familiar with string encodings. For example, I learnt that ascii forms the first 128 characters of utf-8 (so I could just use String::from_utf8).
Related to the above, using the debug formatter ({:?}) was helpful since the usual formatter ({}) doesn’t print out escape codes.
Rust has naming conventions when working with types to indicate cheap vs expensive (in terms of memory used) type conversions.
Networks use big endian bytes.
Domain names can have a maximum of 253 chars and each label a maximum of 63 chars.

What’s next

Thanks for making it this far! If you’re interested in checking out the code, it’s on GitHub. Caveat, I’m still unsure if the code and its organisation is idiomatic (it’s probably not), but I’m planning to make this a playground for me to explore and play with the concepts from the Rust book as I continue with it (so if you’re reading the code and wondering why would we need X here it might be because I just read the chapter on X).

Shoutout to Julia Evans for sharing her resources on DNS and making it simple and fun to work on something like this! The debugging tips in Part 1 was especially helpful to getting started.

Hi! I’m Stacey. Welcome to my technical blog. Outside of computers, I also love brewing Japanese teas 🍵, reading fiction, and discovering random word origins.