RC02: Notes from attempting to implement TCP on macOS

October 01, 2023

I'm attending the Fall 2 batch at Recurse Center! Posts in this series cover things I'm working on or find interesting during my time here.

Last week, I wrote about what I did during my first week at Recurse and set some goals for this week. One of those goals was to find out more about Unix raw sockets since the Rust libpnet crate’s transport_channel API uses them (here) to send and receive Layer 4 packets. This week’s post recounts how I ended up debugging macOS’ raw sockets as I attempted to filter out TCP packets in my Rust program.

The case of the missing packets

To send and receive TCP packets, I modified libpnet’s transport layer echo example and set the protocol to TCP. My program (more like a hacked script at this stage) sends a TCP SYN packet to a remote server and listens for its SYN ACK packet response. I could see with tcpdump that the two packets were sent and received, but for some reason my Rust program never receives the SYN ACK packet. Since Unix sockets and low level network programming are both quite new to me, I was quite confused.

Is there a bug in my program? Did I misuse libpnet’s APIs? Is the program not receiving packets because I didn’t have a port attached to the receiver stream? Do I need a port? Is this because I didn’t call bind? How does bind factor into all of these anyway? This led me down a rabbit hole of reading the man page for the socket system call and related resources.

Meanwhile, I also set about looking for examples on GitHub to see how this API was being used in the wild. I found at least two other projects using the same transport_channel API to receive TCP packets. Going through the examples, I couldn’t quite spot any significant difference in how I tried to get the packets. Although I didn’t run those programs, their READMEs seem to suggest that it did work for them. Since they seemed to be side projects and both were last updated more than two years ago, I did some additional googling. The search led me to a blog post (from another Recurser!) directly using raw sockets to receive TCP packets and Stack Overflow answers suggesting that raw sockets do get TCP and UDP packets, as long as the protocol arg is set to the desired protocol.

I was out of ideas. And then I found an open tab on Creating a UDP connection with netcat and thought, maybe I could try receiving UDP packets in my program instead? Maybe going through the code as I made the changes for UDP, and UDP being another protocol, and a simpler one, it might reveal a bug or at least some answers. The guide also had tcpdump filter for both UDP and ICMP packets, since when a UDP port is unreachable, the client receives a ICMP Port Unreachable packet. Unfortunately, capturing UDP packets proved unsuccessful too. There were packets in tcpdump, but none in the program.

At this point I was ready to leave my desk for a break, but then I saw that tcpdump also recorded some ICMP packets, and in a last-ditch attempt, I thought perhaps I could try receiving ICMP packets instead? Alas, when I modified the program to listen to ICMP packets, it did capture the ICMP Port Unreachable packet from (I assume) the kernel’s network stack. Finally some progress! This made me curious as to what makes ICMP so special anyway. Aren’t all three protocols transport layer protocols?1

In any case, this led me to search google again, and combing through the results, I saw this interesting link from the Apple Developer forum: Raw Socket recvfrom not working for TCP. Clicking on it,

This is not going to work on any BSD Sockets implementation. You can’t use raw sockets to read TCP or UDP. To quote my trusty (and dusty, literally!) copy of Stevens [1]:

The following rules apply;

  1. Received UDP packets and received TCP packets are never [their emphasis] passed to a raw socket.

This is UNIX Network Programming, Volume 1, Second Edition, section 25.4 Raw Socket Input, page 659. —Raw Socket recvfrom not working for TCP.

I was in disbelief at first, but checking my own copy of UNIX Network Programming, 3rd Edition, there it was on page 872, the same sentence (no pun intended). But, there was hope:

If a process wants to read IP datagrams containing UDP or TCP packets, the packets must be read at the datalink layer, as described in Chapter 29.

Debugging, when a bug is elusive, can oftentimes be a frustrating and (sometimes) cathartic event. Sometimes the bug is staring at me right in the face and I’m convinced that my 4 line LeetCode answer is absolutely correct and somehow, the automatic checker is wrong. At least in such moments I can identify the offending test case and print each input and variable one by one if I need to. Debugging a program using a new library in an unfamiliar language that interacts with the OS network stack though, that seems easier to end up in a depth-first search path when one should be instead doing an A* search.

The sane approach to debugging is to start from a known base of knowledge (for example a known working program) and change one thing at a time to establish some working facts. But sometimes though, in the haze of confusion, it’s tempting to change multiple things at once in hopes of saving time™. And although this haphazard approach does work from time to time (if only rarely), more often than not it ends up introducing much more uncertainty and moving variables, making it hard to keep track of things, ultimately wasting more time and energy. 2

Suppose, you wrote a sockets-based program in C. You know it is going to run on a Pentium®, so you enter all your constants in reverse and force them to the network byte order. It works well.

Then, some day, your trusted old Pentium® becomes a rusty old Pentium®. You replace it with a system whose host order is the same as the network order. You need to recompile all your software. All of your software continues to perform well, except the one program you wrote.

You have since forgotten that you had forced all of your constants to the opposite of the host order. You spend some quality time tearing out your hair, calling the names of all gods you ever heard of (and some you made up), hitting your monitor with a nerf bat, and performing all the other traditional ceremonies of trying to figure out why something that has worked so well is suddenly not working at all.

Eventually, you figure it out, say a couple of swear words, and start rewriting our code.

How debugging can feel like, even if the exact bug and details differ. Excerpt from FreeBSD Developers' Handbook: Chapter 7. Sockets.

I went back to look through the libpnet examples and found an example for an ethernet echo server that connects to the datalink layer. This is one layer more than what I expected to go down to but it was worth a try since the raw sockets approach didn’t work out and the textbook above suggests that this is the only way to get TCP packets on BSD systems. And this did work, albeit not for localhost connections since that seems to operate only on the transport layer. I could receive ethernet packets that contained TCP packets, including the anticipated SYN ACK packet from above. I just needed to parse and filter out the ethernet headers, which wasn’t too difficult with the packet manipulation functions from libpnet.

A mystery (and problem) solved, but what’s next?

Sending out a TCP SYN packet and receiving a response is just the beginning. Before I can write out the code for my TCP implementation, there are still a number of open questions:

  • It turns out that when the OS network stack receives a SYN ACK packet to a destination connection that it did not open and so doesn’t recognise (i.e. the connection from my TCP implementation), it sends out a RST packet to the server, effectively closing the connection. So, I’d need to figure out how to prevent or circumvent this.
    • One approach I’ve found is to disable all outgoing RST packets using iptables, but what are the downsides to this approach?
    • Would using a TUN/TAP device avoid this altogether? How difficult would it be to set that up on macOS?
  • How should I structure my TCP implementation and interface? I was initially planning to model it after Rust’s TCPListener and TCPStream, but maybe I should take a simpler approach? Perhaps I could just implement a CLI that’s similar to netcat with the custom TCP implementation?
  • Is it possible to use the datalink connection described above to listen to TCP packets from localhost? Considering that (I think) there’s no datalink for localhost? This is less important than the first two since I can test with a remote server instead of using localhost but I was curious since I was initially using localhost to test my implementation.
    • One approach I’m thinking would be to use the Rust pcap package directly in my program to capture localhost packets, instead of going through libpcap, which uses pcap to access datalink packets, but always assumes an ethernet device.

Other things that happened this week

I didn’t just spend time on TCP and network programming (even though it sometimes felt like it). Some other highlights:

  • Attended the Moldable Development group’s overview of the Glamorous Toolkit (v cool stuff).
  • Had more coffee chats with Recursers.
  • Started on the cryptopals crypto challenges and joined the Cryptopals Group’s first meeting. This has been an unexpected source of great fun and is probably why there wasn’t further significant progress for my TCP project during the second half of the week.

Plans for next week

  • Talk to some other folks working on their own TCP implementation and figure out some of the open questions above.
  • Attend RC’s weekly presentation event. (I missed this week’s presentations because I forgot to RSVP for the calendar event and didn’t see the message for it until it was over.)
  • I’ve been using Rust for the TCP implementation and cryptopals challenges, but I’m still facing some Rust lifetime errors that I don’t quite fully understand. So far I’ve been able to work around them, but I’d like to get a better working understanding of the errors and how to reason about references and ownership, instead of just relying on SO answers.
  • Spend more time learning cryptography basics and finishing the first set of the cryptopals challenges.

  1. While I’m now aware that ICMP is actually a network layer protocol, at that time I assumed they were all operating on the transport layer, and libpnet’s use of trasport in pnet::transport::icmp_packet_iter didn’t help.
  2. On a side note, I feel like debugging can sometimes be similar to a variable reward environment where in moments of irrationality, System 1 hijacks the rational mind to take the seemingly attractive quick fix, even though experience should prove otherwise, but that’s a topic to be explored in another blog post.

Stacey Tay

Hi! I’m Stacey. Welcome to my blog. I’m a software engineer with an interest in programming languages and web performance. I also like making 🍵, reading fiction, and discovering random word origins.