Welcome to Linux Support and Sun Help
Search LinuxSupport
From: Subject: The journey of a packet through the linux 2.4 network stack Date: Thu, 9 Aug 2001 14:16:12 +0100 MIME-Version: 1.0 Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Content-Location: http://www.gnumonks.org/ftp/pub/doc/packet-journey-2.4.html X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6600 The journey of a packet through the linux 2.4 network = stack

The journey of a packet through the linux 2.4 network stack

Harald Welte laforge@gnumonks.org

1.4, 2000/10/14 = 20:27:43=20


This document describes the journey of a network packet inside the = linux=20 kernel 2.4.x. This has changed drastically since 2.2 because the = globally=20 serialized bottom half was abandoned in favor of the new softirq = system.=20

1. Preface

I have to excuse for my ignorance, but this document has a strong = focus on=20 the "default case": x86 architecture and ip packets which get forwarded. =

I am definitely no kernel guru and the information provided by this = document=20 may be wrong. So don't expect too much, I'll always appreciate Your = comments and=20 bugfixes.=20

2. Receiving the packet

2.1 The receive interrupt

If the network card receives an ethernet frame which matches the = local MAC=20 address or is a linklayer broadcast, it issues an interrupt. The network = driver=20 for this particular card handles the interrupt, fetches the packet data = via DMA=20 / PIO / whatever into RAM. It then allocates a skb and calls a function = of the=20 protocol independent device support routines:=20 net/core/dev.c:netif_rx(skb).=20

If the driver didn't already timestamp the skb, it is timestamped = now.=20 Afterwards the skb gets enqueued in the apropriate queue for the = processor=20 handling this packet. If the queue backlog is full the packet is dropped = at this=20 place. After enqueuing the skb the receive softinterrupt is marked for = execution=20 via include/linux/interrupt.h:__cpu_raise_softirq().=20

The interrupt handler exits and all interrupts are reenabled.=20

2.2 The network RX softirq

Now we encounter one of the big changes between 2.2 and 2.4: The = whole=20 network stack is no longer a bottom half, but a softirq. Softirqs have = the major=20 advantage, that they may run on more than one CPU simultaneously. bh's = were=20 guaranteed to run only on one CPU at a time.=20

Our network receive softirq is registered in=20 net/core/dev.c:net_init() using the function=20 kernel/softirq.c:open_softirq() provided by the softirq = subsystem.=20

Further handling of our packet is done in the network receive softirq = (NET_RX_SOFTIRQ) which is called from=20 kernel/softirq.c:do_softirq(). do_softirq() itself is = called from=20 three places within the kernel:=20

  1. from arch/i386/kernel/irq.c:do_IRQ(), which is the = generic=20 IRQ handler=20
  2. from arch/i386/kernel/entry.S in case the kernel just = returned from a syscall=20
  3. inside the main process scheduler in=20 kernel/sched.c:schedule()

So if execution passes one of these points, do_softirq() is called, = it=20 detects the NET_RX_SOFTIRQ marked an calls=20 net/core/dev.c:net_rx_action(). Here the sbk is dequeued = from this=20 cpu's receive queue and afterwards handled to the apropriate packet = handler. In=20 case of IPv4 this is the IPv4 packet handler.=20

2.3 The IPv4 packet handler

The IP packet handler is registered via=20 net/core/dev.c:dev_add_pack() called from=20 net/ipv4/ip_output.c:ip_init().=20

The IPv4 packet handling function is=20 net/ipv4/ip_input.c:ip_rcv(). After some initial checks (if = the=20 packet is for this host, ...) the ip checksum is calculated. Additional = checks=20 are done on the length and IP protocol version 4.=20

Every packet failing one of the sanity checks is dropped at this = point.=20

If the packet passes the tests, we determine the size of the ip = packet and=20 trim the skb in case the transport medium has appended some padding.=20

Now it is the first time one of the netfilter hooks is called.=20

Netfilter provides an generict and abstract interface to the standard = routing=20 code. This is currently used for packet filtering, mangling, NAT and = queuing=20 packets to userspace. For further reference see my conference paper 'The = netfilter subsystem in Linux 2.4' or one of Rustys unreliable guides, = i.e the=20 netfilter-hacking-guide.=20

After successful traversal the netfilter hook,=20 net/ipv4/ipv_input.c:ip_rcv_finish() is called.=20

Inside ip_rcv_finish(), the packet's destination is determined by = calling the=20 routing function net/ipv4/route.c:ip_route_input(). = Furthermore, if=20 our IP packet has IP options, they are processed now. Depending on the = routing=20 decision made by net/ipv4/route.c:ip_route_input_slow(), = the=20 journey of our packet continues in one of the following functions:=20

net/ipv4/ip_input.c:ip_local_deliver()

The packet's destination is local, we have to process the layer 4 = protocol=20 and pass it to an userspace process.=20

net/ipv4/ip_forward.c:ip_forward()

The packet's destination is not local, we have to forward it to = another=20 network=20

net/ipv4/route.c:ip_error()

An error occurred, we are unable to find an apropriate routing = table entry=20 for this packet.=20

net/ipv4/ipmr.c:ip_mr_input()

It is a Multicast packet and we have to do some multicast routing.=20

3. Packet forwarding to another device

If the routing decided that this packet has to be forwarded to = another=20 device, the function net/ipv4/ip_forward.c:ip_forward() is = called.=20

The first task of this function is to check the ip header's TTL. If = it is=20 <= 1 we drop the packet and return an ICMP time exceeded message to = the=20 sender.=20

We check the header's tailroom if we have enough tailroom for the = destination=20 device's link layer header and expand the skb if neccessary.=20

Next the TTL is decremented by one.=20

If our new packet is bigger than the MTU of the destination device = and the=20 don't fragment bit in the IP header is set, we drop the packet and send = a ICMP=20 frag needed message to the sender.=20

Finally it is time to call another one of the netfilter hooks - this = time it=20 is the NF_IP_FORWARD hook.=20

Assuming that the netfilter hooks is returning a NF_ACCEPT verdict, = the=20 function net/ipv4/ip_forward.c:ip_forward_finish() is the = next step=20 in our packet's journey.=20

ip_forward_finish() itself checks if we need to set any additional = options in=20 the IP header, and has ip_optFIXME doing this. Afterwards it calls=20 include/net/ip.h:ip_send().=20

If we need some fragmentation, FIXME:ip_fragment gets called, = otherwise we=20 continue in net/ipv4/ip_forward:ip_finish_output().=20

ip_finish_output() again does nothing else than calling the netfilter = postrouting hook NF_IP_POST_ROUTING and calling ip_finish_output2() on=20 successful traversal of this hook.=20

ip_finish_output2() calls prepends the hardware (link layer) header = to our=20 skb and calls net/ipv4/ip_output.c:ip_output().=20

Valid HTML 4.01! Valid CSS!