laforge@gnumonks.org
I have to excuse for my ignorance, but this document has a strong = focus on=20 the "default case": x86 architecture and ip packets which get forwarded. =
I am definitely no kernel guru and the information provided by this = document=20 may be wrong. So don't expect too much, I'll always appreciate Your = comments and=20 bugfixes.=20
If the network card receives an ethernet frame which matches the =
local MAC=20
address or is a linklayer broadcast, it issues an interrupt. The network =
driver=20
for this particular card handles the interrupt, fetches the packet data =
via DMA=20
/ PIO / whatever into RAM. It then allocates a skb and calls a function =
of the=20
protocol independent device support routines:=20
net/core/dev.c:netif_rx(skb)
.=20
If the driver didn't already timestamp the skb, it is timestamped =
now.=20
Afterwards the skb gets enqueued in the apropriate queue for the =
processor=20
handling this packet. If the queue backlog is full the packet is dropped =
at this=20
place. After enqueuing the skb the receive softinterrupt is marked for =
execution=20
via include/linux/interrupt.h:__cpu_raise_softirq()
.=20
The interrupt handler exits and all interrupts are reenabled.=20
Now we encounter one of the big changes between 2.2 and 2.4: The = whole=20 network stack is no longer a bottom half, but a softirq. Softirqs have = the major=20 advantage, that they may run on more than one CPU simultaneously. bh's = were=20 guaranteed to run only on one CPU at a time.=20
Our network receive softirq is registered in=20
net/core/dev.c:net_init()
using the function=20
kernel/softirq.c:open_softirq()
provided by the softirq =
subsystem.=20
Further handling of our packet is done in the network receive softirq =
(NET_RX_SOFTIRQ) which is called from=20
kernel/softirq.c:do_softirq()
. do_softirq() itself is =
called from=20
three places within the kernel:=20
arch/i386/kernel/irq.c:do_IRQ()
, which is the =
generic=20
IRQ handler=20
arch/i386/kernel/entry.S
in case the kernel just =
returned from a syscall=20
kernel/sched.c:schedule()
So if execution passes one of these points, do_softirq() is called, =
it=20
detects the NET_RX_SOFTIRQ marked an calls=20
net/core/dev.c:net_rx_action()
. Here the sbk is dequeued =
from this=20
cpu's receive queue and afterwards handled to the apropriate packet =
handler. In=20
case of IPv4 this is the IPv4 packet handler.=20
The IP packet handler is registered via=20
net/core/dev.c:dev_add_pack()
called from=20
net/ipv4/ip_output.c:ip_init()
.=20
The IPv4 packet handling function is=20
net/ipv4/ip_input.c:ip_rcv()
. After some initial checks (if =
the=20
packet is for this host, ...) the ip checksum is calculated. Additional =
checks=20
are done on the length and IP protocol version 4.=20
Every packet failing one of the sanity checks is dropped at this = point.=20
If the packet passes the tests, we determine the size of the ip = packet and=20 trim the skb in case the transport medium has appended some padding.=20
Now it is the first time one of the netfilter hooks is called.=20
Netfilter provides an generict and abstract interface to the standard = routing=20 code. This is currently used for packet filtering, mangling, NAT and = queuing=20 packets to userspace. For further reference see my conference paper 'The = netfilter subsystem in Linux 2.4' or one of Rustys unreliable guides, = i.e the=20 netfilter-hacking-guide.=20
After successful traversal the netfilter hook,=20
net/ipv4/ipv_input.c:ip_rcv_finish()
is called.=20
Inside ip_rcv_finish(), the packet's destination is determined by =
calling the=20
routing function net/ipv4/route.c:ip_route_input()
. =
Furthermore, if=20
our IP packet has IP options, they are processed now. Depending on the =
routing=20
decision made by net/ipv4/route.c:ip_route_input_slow()
, =
the=20
journey of our packet continues in one of the following functions:=20
The packet's destination is local, we have to process the layer 4 = protocol=20 and pass it to an userspace process.=20
The packet's destination is not local, we have to forward it to = another=20 network=20
An error occurred, we are unable to find an apropriate routing = table entry=20 for this packet.=20
It is a Multicast packet and we have to do some multicast routing.=20
If the routing decided that this packet has to be forwarded to =
another=20
device, the function net/ipv4/ip_forward.c:ip_forward()
is =
called.=20
The first task of this function is to check the ip header's TTL. If = it is=20 <= 1 we drop the packet and return an ICMP time exceeded message to = the=20 sender.=20
We check the header's tailroom if we have enough tailroom for the = destination=20 device's link layer header and expand the skb if neccessary.=20
Next the TTL is decremented by one.=20
If our new packet is bigger than the MTU of the destination device = and the=20 don't fragment bit in the IP header is set, we drop the packet and send = a ICMP=20 frag needed message to the sender.=20
Finally it is time to call another one of the netfilter hooks - this = time it=20 is the NF_IP_FORWARD hook.=20
Assuming that the netfilter hooks is returning a NF_ACCEPT verdict, =
the=20
function net/ipv4/ip_forward.c:ip_forward_finish()
is the =
next step=20
in our packet's journey.=20
ip_forward_finish() itself checks if we need to set any additional =
options in=20
the IP header, and has ip_optFIXME doing this. Afterwards it calls=20
include/net/ip.h:ip_send()
.=20
If we need some fragmentation, FIXME:ip_fragment gets called, =
otherwise we=20
continue in net/ipv4/ip_forward:ip_finish_output()
.=20
ip_finish_output() again does nothing else than calling the netfilter = postrouting hook NF_IP_POST_ROUTING and calling ip_finish_output2() on=20 successful traversal of this hook.=20
ip_finish_output2() calls prepends the hardware (link layer) header =
to our=20
skb and calls net/ipv4/ip_output.c:ip_output()
.=20
![]() ![]() |