Jozsef Kadlecsik, IPSET status

October 1, 2008 on 3:06 pm | In Uncategorized | No Comments

Latest developements

Latest developements include a new type ipportiphash which can be used to store IP address, arbitrary port number and setlist which is a union of sets.

Jozsef Kadlecsik

Milestone planning

Ipset 2.4 will be released in october and will feature the new modules. Around the end of 2008, ipset 3.0 should be released. It will feature a new protocol that will be netlink compatible TLVs and keep the previous for backward compatibility. If all is ok, ipset will be renamed to nfset and the old protocol will be suppressed (middle 2009). After nfset renaming the work will focus on IPv6 support and merging effort.

Patrick MCHardy, NFtables

September 30, 2008 on 5:24 pm | In Uncategorized | No Comments

Iptables and Netfilter problems (bashing iptables)

  • Full dump during ruleset update. Userspace is very primitive. Extentions build a blob which is passed to the core. There is no optimisation
  • Very little abstraction: UDP and TCP ports for example don’t share port matching code
  • Inconsistencies among matches: negation, range, prefixes support defer.
  • Iptables parsing code is not in the core and it introduces inconsistencies in command line
  • One target per rule: It requires rule duplication and duplicated changes (ex: LOG + DROP, MARK + ACCEPT)
  • Static target module parametrization: option of the target expands to constant. You can not use flexible settings of the variable. It leads to build specific modules such as IPMARK, IPCLASSIFY and could lead to a TCPPORTCONNMARK module

Iptables good side

  • Lots of matches, very flexible classification
  • Features beside packet filtering: load balancing, multipath routing, packet manipulation
  • Fast built-in filtering operations compared to TC
  • Easy to add new extensions but these extensions are responsible for the poor quality of the external modules

Some other classifiers

  • tc has u32 which is powerful but not easy to use
  • BPF is interesting because it is fully programmable

It could be interesting to have a similar syntax as Linux classifiers’. NFtables aims at this.

Patrick McHardy

Feature wishlist

  • Use netlink for incremental changes and change notifications
  • Multijump: Extensions can return arbitrary verdicts, and in particular jump to be able to orientate the flow in subchains
  • Runtime target parametrization

Nftables components

Three components:

  • Kernel implementation
  • nftables userspace frontend: parses textual rule set representation and perform error checks and postprocessing. And finally send raw data needed by libnl
  • libnl: netlink message parsing and construction

The libnl point is discussed heavily as libnl is distributed under LGPL. Almost all developers don’t want to ease proprietary interface writer work but they prefer to build a library using GPL license and that could be use by third party open source projects.

Libnl: Legal point and objectives

This legal point is followed by a discussion on the problem of avoiding code duplication and at what point the userspace tools provided by Netfilter project should go. Some Netfilter developers want to only provide a near code library and let other developers bring the tool to a higher level. Others want to provide a more user-ready interface.

Some Nftables details

Patrick shows a big work on internal structure definitions. He has tried to factorize things and specifically to avoid to put protocols related information in the main structure. This is the case in the packet structure which contains a pointer to network header and a pointer to transport header. This is used to easily jump to protocol data.

Expression of filters look like TC u32 syntax and are expressed internally with register based operations.

ip daddr 192.168.0.1

[ payload load 4 offset network header + 16 => reg 1 ]
[ compare reg 1 192.168.0.1 ]

Sets

Sets are supported and they should be able to support ipset type sets. They are currently represented as rbtree. This is suboptimal in many ways but there are a lots of possibilities to optimize.

An intervals tree set is implemented. It could be combined to jump maps to represent the nf-hipac algorithm. But incremental changes will be very difficult.

A hash set has been developed too. It is fixed size hash (non resizable) and can be used for arbitrary sets.

Matching

  • Payload: The payload module implement generic packet parsing. It support encapsulation even if userspace part is not implemented
  • Meta: It is used to match on specific Netfilter data (length, protocol, mark, iifindex, iifname, …)
  • ct: conntrack matching
  • concat (planned): concatenate multiple keys to do multidimensionnal equality expression and lookups

Kernel userspace interface

This is basically a netlink interface:

  • standard operations (GET/NEW/DELETE)
  • tables, chains and rules can be addressed

Performance comparison with iptables

For a ruleset consisting of 1000 instances of a an ICMP rule, the average cycle per rule is 110 for iptables and 115-120 for nftables. The performance are thus quite comparable although nftables is more generic and has not been optimized yet.

Userspace tool

The basic syntax is the following:

nft rule add filter output tcp dport ssh

nft rule add filter output ip daddr 191.68.0.1 ip protocol tcp

or

nft rule add filter output tcp dport == 22

nft rule add filter output ip daddr == 191.68.0.1 ip protocol == tcp

The implicit equality sign introduces some complexity in the implementation.

nft rule add filter output tcp flags syn

Flags support is not fully implemented but syntax could look like this:

nft rule add filter output tcp flags syn | ack

It is possible to define composed objects:

nft add filter output ip addr  {192.168.0.0/24, 192.168.1.1, 10.0.0.0/8}

Patrick McHardy, one year of Netfilter developement

September 30, 2008 on 11:09 am | In Uncategorized | No Comments

Developer days start with a presentation of Patrick on this last year evolutions of Netfilter.

2.6.24

  • conversion of nfnetlink to new netlink infrastrucutre
  • conntrack defragmentation code (for IPv6)
  • remove “unique ID” from connection tracking (but still problems to identify connection)
  • ctnetlink for creating related connections
  • new time match
  • fix on bridging (netfilter bridging still misses time)
  • fix device reference leak

2.6.25

  • TCPOPTSTRIP: get around broken firewall
  • x_tables unification:
  • SAME removal
  • RATEEST: useful for load balancing
  • GID logging in nfnetlink_log (and family)
  • Queing cleanups and optimisations; deobfuscation, RCU for queue instance hash, resync of ip6_queue
  • ip6_tables/arp_tables compat support
  • hash optimization for conntrack hash (mutliply+shift instead of modulo)
  • use of RCU for conntrack hash
  • getting rid of non fixed type (unsigned long for example)
  • IPv6 iprange
  • hashlimit: support for network prefix and negation
  • namespace support work
  • conntrack support for inactive expectations
  • conntrack expectation group and limit
  • rewrite of SIP conntrack helper

2.6.26

  • don’t abort dump operations during ruleset changes
  • nf_nat_proto_common
  • UDP-LITE SCTP DCCP NAT

2.6.27

  • SCTP support for ctnetlink
  • conntrack accounting fixes for abnormal connection termination
  • fast dead TCP connection: keep track of unacknowledged data, set connection dead if there is no answer after some time (5min), instead of 5 days timeout
  • NAT port randimization improvements
  • new “security” table for SELinux
  • more netfilter namespace work

Patrick finally conludes that developments were on core previous year and are now in surrounding places. Things seem to be in a better shape.

Ulrich Weber, ASG V7 Cluster

September 12, 2007 on 2:43 pm | In Uncategorized | No Comments

Ulrich Weber presents the work done at Astaro to provide active clusters. He works with Patrick McHardy on this subject.
Ulrich Weber

The goal was to build a High availability/ Load balancing system without external Load Balancer. All services are available on a single IP.

The target service are IPSec, proxies. The distribution algorithm are Round Robin and per-IP Hash.

The master distributes the packets to the slaves and forward the packet to network after having received the answer from the slave. The conntrack has been modified to contain the ID of the node to be able to distribute all packets of a connection to the same node.

Packets can not be sent in a raw format as information will be missed by the slave. Thus a WARP structure has been used to send all needed informations from the master to the slave. Packets reinjection system is quiet hacky because it injects packet in the middle of the network stack. Thus it only works between system which run exactly the same version of the system.

Ipsec load balancing shows that traffic increase from 100Mb/s when adding a first node. With other three nodes, the master task has to only be packet distribution to maximize the performance. Test performance shows that, with three machines, the IPsec bandwith reach is about 850Mb/s.

Jozsef Kadlecsik, ippool and ipset

September 12, 2007 on 2:01 pm | In Uncategorized | No Comments

Jozsef Kadlecsik presents ippool and ipset.

Jozsef Kadlecsik

Ipset is a matching system which supports a lot of set types:

  • ipmap is a bitmap where each bit represents one IP.
  • macimap is a bitmap where each bit represents one IP and ARP address
  • portmap is the same for port
  • iphash stores IP in a hash
  • nethash is a variant of iphash which also stores the mask
  • ipporthash stores IP addresses in a /16 ranges and free port
  • iptree stores IPv4 addresses in a tree-like structure
  • iptreemap

Problems with hash types is that you need to fix hash properties depending on your own needs. There is no magic value.
Recent work is a testsuite to test all possible bugs.

Jozsef pointed out an error he has made when defining the match command line option. He has used –set which is too generic and is user by other matches.

Nfset will be the future of ipset. It is a nfnetlink port of ipset. Work is in progress but Jozsef lacks time.

Discussions:

Merging of ipset in Netfilter is discussed. The main issue is still the userspace to kernel interface. Nfset will also need to focus on IPv6. It is decided that Ipset will stay in POM and the merge will be done when Nfset will be ready. Davem wants badly to have ipset in the kernel because it is a missing feature. The only point is that it will cause changes that are incompatible. Samir Bellabes points out that some distributions like Mandriva have already an ipset patched kernel. Harald agrees that the main concern is relative to distributions. As they are testing their kernel before release it should not be a problem for them.

Samir proposes to work on ipset to do the integration work. He already knows the project and will be able to improve things faster than Jozsef who has not enough time.

Samir exposes the demand of consumers about a p2p like blocking system. Patrick and Harald propose to use the string match to do so and hope somebody will contribute a script that could block P2P.

Netfilter developpements in the past two years

September 11, 2007 on 10:51 am | In Uncategorized | No Comments

Patrick McHardy has done a brief overview of 2 years of developpement on Netfilter. He describes the work done, for each new kernel version.

Patrick McHardy

2.6.14

This was the release of ctnetlink, the new connection tracking message system.

2.6.15

Merge of the new non IPv4 implementation of connection tracking: nf_conntrack.

IPv6 conntrack without NAT is possible

2.6.16

It adds IPsec support and in particular it adds the policy match.
x_tables has been introduced to proceeds to some unification of code but a complete unification was not possible for ABI compatiblity reason.

2.6.17

It brings an H323 helper and the important merge of ip_tables compat/layer on 32/64.

It also features HW checksumming verification.

2.6.18
POM merging:

  • quota
  • statistic

fixed timeout for ctnetlink (mainly for nufw)
amanda conntrack helper: textsearch API conversion

H323 cal forwarding
SIP

2.6.19
Checksum enhancements
Some GSO fixes (GSO ?)
ctnetlink optimization: don’t allocate when no listeners are present
TCP conntrack dead connection: don’t count pure window update as retransmit
ip_tables compat: simplification -> just specify the size difference due to alignment
32/64 compat support for lot of matches: mark, connmark, limit
Add an automatic system to do endian check

2.6.20

more endian annotations for automatic problem detection
nf_conntrack: split file
nf_conntrack: finally a replacement to ip_conntrack
* All helpers are ported
New NFLOG target

2.6.21
new SANE protocol HELPER
TCP conntrack improvement: liberal tracking for picked up connection
TPMSS in x_tables
NAT source randomization
Proper use of RCU throughout Netfilter (was hand made before)

2.6.22
removal of IPV4 only connection tracking/NAT
NAT helper HW checksum
more compat support: “conntrack” “ULOG”
DNAT port randomization: can be used to distribute target accross multiple ports
No complain about removal of IPv4

2.6.23
packet mangling optimization:

  • Don’t copy headerless clones, 90% less CPU in some cases
  • new target u32

new target TRACE:

  • iptables -t raw $FILTER -j TRACE
  • TRACE: a line for each reached hook

Conntrack extension API:

  • conntrack conversion to hlists for better cache locality
  • removal of ability to fully mask helper and expectation tuples

Hash expectation by destination:

  • remove easy DOS
  • limit per destination

CLUSTERIP compat
new connlimit match
UDPLITE support

Second session (nfnetlink and tproxy cases)

October 5, 2005 on 4:51 pm | In Uncategorized | No Comments

Martin Josefssons has presented the “hashtrie”, a new data structure that could replace the hashtable to store conntrack entries. It has very good lookup performance, but delete and insert are quite bad. In fact this is not really a problem as a lot of things taking a long time occur during this step. It also points out that the default ratio bucket size /hashtabe size is really bad.

After that, Harald Welte has done a presentation of the new nfnetlink framework that will bring a bunch of new things such as ulogd2 and improved capabilities in the new queue extension.

The planned discussion about the Tproxy module has somehow gone far beyond the working frame originally defined, as the needs to be able to send expectation from userspace to conntrack have been seen as a key point.

Hardware Firewall

October 3, 2005 on 5:41 pm | In Uncategorized | 1 Comment

Javier de Miguel Rodriguez has given a really good talk about the choice of hardware dedicated to firewall. He pointed out interesting things such as the current performance problem of netfilter when dealing with a lot of packets. Results of bench performed by Mara systems show that performances drop as soon as you’ve got a certain number of rules.

After showing this performance issue and discussing about connection tracking size optimisation, he focused on hardware configuration. To sum up :
– chooze a dual opteron with one of the processors dedicated to interrupts : the bus is better for memory access heavily used by the hash.
- use ethernet cards on PCI-X or PCI-express : PCI is REALLY not able to assume a gigabit traffic
- use ethernet cards with NAPI and zero copy feature

Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds. Valid XHTML and CSS. ^Top^