I bet most (if not all) engineers in Telecom industry have experience of using packet capture tools such as tcpdump / wireshark. I think it was started whenever telecom industry begin to use IP technology, that was the era of SIGTRAN. As of now, tcpdump / wireshark is a common tools used in telecom especially on core side. But have people ever asked themselves “how those tools were created in the beginning?”, “who created them?”, “what motivate them to create such a tools, and why it is opensourced”?
We can watch the answer through keynotes video and presentation of sharkfest 2011 where Steve McCanne tell the stories how it all begin. IMHO, almost 80% of video (or presentation document as available on the website) contains technical stuff, if you’re technical enough and have pretty much time to read then you will enjoy them. However, I will write the summary of Steve’s presentation here.
The stories begin whenever Steve took compilers course in computer science at U.C Berkeley back in spring 1988. He was taught by guest lecturer named Van Jacobson. During the course, Steve learned standard compiler topics such as scanning, parsing, code generation, optimization. Later on, these topics were used by him whenever creating tcpdump (yes, tcpdump was coming first and then followed by libpcap!). At the end of term, Steve took summer job in Van’s group. Steve was part of “Network Research Group”. They had several project to focus on, amongst them are BPF (BSD Packet Filter), tcpdump, and pcap.
When Steve first joined, Van was wrapping up his work on TCP congestion control. Van was analyzing why the Arpanet kept collapsing. And due to that, he need a packet capture tool to look at packet traces in order to understand the problem, to experience with fixes, and to see if the solution was working. At the time, packet capture tool already exist, he used etherfind from Sun, etherfind based on Unix find command. However, Van felt that etherfind had several problem such as clumsy filtering syntarx, protocol decoding was weak and crypting, horrible performance. So he want to have a better tool to analyze Arpanet. And so tcpdump born.
The specification of tcpdump is to “filter” packets before they come up the networking stack, compile high-level filter specification into low level code that filters packet at driver level, and in order to do so, introducing kernel module called Barkeley Packet Filter (BPF). Here’s the snapshot of how tcpdump work (taken from Steve’s presentation slide):
Steve start with BPF stuff. He had to design a VM (Virtual Machine) model that would run in the kernel. He finally come up with a VM architecture and set of machine instructions based on his experience with Apple II and modeled after Motorola 6502. Steve able to write a low-level BPF programs to do packet filtering. But of course he doesn’t want to write low-level stuff every time he wanted to filter packets, nor everybody else. So he needed to create a higher level model for packet filtering based on ‘human languange’ which then later compiled into low-level stuff.
So that was the time when he design a BPF filter language from scratch, start from basic predicate consisting field and value. If you’re using wireshark or tcpdump, I bet you will use such language to search specific packet. For example, this is the command I used to get DIAMETER traffic for specific node:
# tcpdump -i bond0.351 -w /tmp/diameter.pcap ip host x.x.x.x and sctp port 3868
It seems an easy task, but as Steve’s described, it was difficult to implement the design in the beginning. He worked on parser and other stuff while consulting the result to Van. He came up with various ideas and start again with another idea in order to get better implementation. He incorporated his knowledge of compiler during his time at computer science (who said school is not important, eh?). In the end, his parser not only work but also optimized.
Tcpdump was a useful tools, but later on, they realized that they (and so other people) would want to build other packet capture applications beyond tcpdump. That’s why they pulled compiler system and filtering engine out of tcpdump, created and API and reusable library, and finally release it as libpcap. Since different apps were going to be built around libpcap as common library, then they should define an interchangeable file format for packet traces. And thats when pcap file format come into the picture. We can capture packet, bypass protocol decoding logic and just write packets straight to disk.
The tools and library later on released to opensource community, which then ported to various operating systems such as BSD, SunOS, Linux, windows, and even Mac OSX.
It is interesting to see how such a tool after several years would be one of powerful tools used (mostly) by network engineer. Network engineer since arpanet era up to these days still rely on packet capture tool to analyze about things happened in the network. And due to great tools such as tcpdump, wireshark, or any other tools based on libpcap, we can build, maintain, troubleshoot, fixing various issue in the network and let the users enjoy the network.