June 13, 201411 yr The server has been humming along for years now (937 days). The traffic graphs don’t show a lot of data moving through it because it only serves DNS requests (plus MySQL replication) in the form of tiny UDP packets. We started seeing these spikes in traffic but everything on the server seemed to be working properly. Test connections with dig proved that the server was accurately responding to requests, but external tests showed the server going up and down. I started going through logs to see if we were being DoSed or if it was some sort of configuration problem. Everything seemed to be running properly and the requests seemed to be legit. Within the flood of messages I spied error messages such as this: printk: 2758 messages suppressed.ip_conntrack: table full, dropping packet. Let’s check the current numbers of ip_conntrack, which is a kernel function for the firewall which keeps tabs on packets heading into the system. # head /proc/slabinfoslabinfo - version: 2.0# name : tunables : slabdata ip_conntrack_expect 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0 0ip_conntrack 34543 34576 384 10 1 : tunables 54 27 8 : slabdata 1612 1612 108fib6_nodes 5 119 32 119 1 : tunables 120 60 8 : slabdata 1 1 0ip6_dst_cache 4 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0ndisc_cache 1 20 192 20 1 : tunables 120 60 8 : slabdata 1 1 0rawv6_sock 4 11 704 11 2 : tunables 54 27 8 : slabdata 1 1 0udpv6_sock 0 0 704 11 2 : tunables 54 27 8 : slabdata 0 0 0tcpv6_sock 8 12 1216 3 1 : tunables 24 12 8 : slabdata 4 4 0 Lets check our current value for this setting: # sysctl net.ipv4.netfilter.ip_conntrack_maxnet.ipv4.netfilter.ip_conntrack_max = 65536 So it looks like we are hitting up against this limit. After the number of connections reaches this number, the kernel will simply drop the packet. It does this so that it will not overload and freeze up due to too many packets coming into it at once. For maximum efficiency we keep this number at multiples of 2. The top size depends on your memory, so just be careful as overloading it may cause you to run out of it. In my case I decided to go up 2 steps to 131072. To temporarily set it, use sysctl: # sysctl -w net.ipv4.netfilter.ip_conntrack_max=131072net.ipv4.netfilter.ip_conntrack_max = 131072 Test everything out, if you have some problems with your network or system crashing, a reboot will set the value back to normal. To make the setting permanent on reboot, add the following line to your /etc/sysctl.conf file: # need to increase this due to volume of connections to the servernet.ipv4.netfilter.ip_conntrack_max=131072 My theory is that since the server was dropping packets, remote hosts were re-sending their DNS requests causing a ‘flood’ of traffic to the server and the spikes you see in the traffic graph above whenever traffic was mildly elevated. The bandwidth spikes were caused by amplification of traffic due to resending of the requests. After increasing ip_conntrack_max I immediately saw the bandwidth resume to normal levels. Your server should now be set against an onslaught of tiny packets, legitimate or not. If you have even more connections than what you can safely track with ip_conntrack you may need to move to the next level which involves hardware firewalls and other methods for packet inspection off-server on dedicated hardware.
Create an account or sign in to comment