Scalable Name-Based Packet Forwarding

Internet-Draft	Scalable Name-Based Forwarding	January 2025
Song, et al.	Expires 31 July 2025	[Page]

Abstract

This document specifies a scalable approach to name-based packet forwarding, a fundamental component of content semantic network architectures. Unlike IP-based forwarding, name-based forwarding must address the challenges of handling variable-length, unbounded keys and significantly larger namespaces. The proposed approach leverages two key insights to enable scalable forwarding for billions of prefixes. First, by representing names as binary strings, forwarding tables with millions of entries can be effectively compressed using Patricia tries to fit within contemporary line card memory. Second, the data structure is designed and optimized to ensure its size is dependent only on the number of rules, not the length of the rules themselves.¶

1. Introduction

Name-based packet forwarding requires handling variable-length names, performing longest-prefix match (LPM) over a global namespace, and meeting high-throughput requirements within strict memory constraints.¶

Traditional LPM methods for IP forwarding have proven remarkably successful. Core innovations from the late 1990s, refined with techniques such as prefix expansion and bitmap representations, have enabled scalable, cost-effective solutions for IP LPM. With fixed-length IP addresses and relatively small rulesets (fewer than 500,000 entries), these implementations achieve high performance while requiring only a few megabytes of fast memory, such as SRAM or TCAM. For IP forwarding, LPM is widely considered a solved problem.¶

However, these methods are not directly applicable to content semantic name-based forwarding. Name-based forwarding faces unique challenges:¶

Complex Name Structures: content semantic names are more flexible and complex than IP addresses. There are hierarchical, URL-like names, and also flat, self-certifying names are supposed to be used for content semantic packet forwarding. A scalable forwarding solution must accommodate diverse naming schemes without being tied to a specific format.¶
Large Forwarding Tables: The forwarding information base (FIB) for name-based forwarding is orders of magnitude larger than that of IP. For example, the DNS namespace suggests FIBs at the scale of hundreds of millions of entries, with potential growth into billions as adoption of content semantic packet forwarding solution increases.¶
High Throughput: With the deployment of 100 Gbps Ethernet, name-based forwarding must sustain line-rate performance. Semantic packet forwarding requires efficient handling of large-scale rulesets at high speeds.¶

Existing solutions often assume hierarchical naming structures and rely on hash tables for FIB storage. These approaches are limited by their dependency on specific name formats and the inefficiencies of redundant data storage. Furthermore, their performance degrades with longer names, posing scalability challenges for large namespaces.¶

This document specific a scalable approach to name-based forwarding that addresses these challenges using a binary Patricia trie, which is first proposed in [Song2015]. The design is guided by two key insights:¶

By representing names as binary strings, the FIB can be compressed using a trie structure to fit within contemporary fast memory, even for millions of entries.¶
The data structure is optimized to ensure that its size depends only on the number of rules in the ruleset, not on the length of the names.¶

This approach is intended to work across diverse namespaces, whether flat or hierarchical, without making assumptions about their specific characteristics.¶

1.1. Terminology

The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶

3. Design Rationale

Scalable packet forwarding has two fundamental requirements: (1) fast FIB lookup and (2) small memory footprint, and these two requirements are highly related.¶

The main contributor to FIB lookup time is memory access time, which depends on the type of memory used to store the FIB. TCAM and SRAM are commonly used in a line card for fast FIB lookup. However, name-based FIB costs hundreds of MiB to store even a few million name prefixes, which can not fit in TCAM or SRAM with a limited size of a few MiB.¶

The design goal of this draft is to have a compact FIB data structure:¶

FIBs with a few million entries can still fit in SRAM for fast lookup.¶
The memory footprint of FIB is scalable in that it only depends on the number of entries rather than the length of names.¶

In this draft, binary Patricia trie data structure is used to minimize the redundant information stored, compressing FIBs of a few million entries to fit in tens of MiB of SRAM. Togerther with a corresponding forwarding behavior that we termed "speculative forwarding", only the differentiating bits between different rules are stored to scale to billions of FIB entries.¶

4. Binary Patricia Trie Functional Specification

4.1. Binary Patricia Trie Structure

Patricia trie [PatriciaTrie], also known as radix tree, is a space optimized general trie structure. It is generated from a character-based prefix trie by merging every single-child node with its child. The internal nodes in Patricia trie are used to branch by discrimination characters while leaf nodes contain the match information, i.e., the prefix.¶

Binary Patricia is a commonly used variant, which treats prefixes as binary strings in building the trie. A binary Patricia is a full binary tree in which every non-leaf node has exact two children.¶

Figure 1 shows a character-based Patricia and a binary Patricia built from the same FIB. In a character-based Patricia, each internal node may have at most 256 children, branched by octet characters. From the root to a leaf, all tokens stored in the nodes and characters along the arrows together form a complete prefix. The delimiter ('/') is treated as a character as well.¶

The ASCII code is used to represent prefixes in building a binary Patricia, and other coding schemes can be used too. Each node has at most two children, branched by a bit of 0 and 1. All bits in the nodes and along the arrows from the root to a leaf together form a complete prefix. A prefix is divided to a sequence of tokens. In this document, this binary Patricia is also called as "tokenized binary Patricia". Especially, the branched 0 and 1 in an internal node are part of the sequence.¶

                                                     +--------------+
                             +-------+               | (/)_2 011000 |
                             |  -/-  |               +--------------+
---------+------             +-------+                  0 /      \1
 prefixes| port             a/       \c             +-------+    xxxxx
---------+------          +---+      xxxxx          | 10010 |    x 4 x
 /a/b/   |  1   ==>       |   |      x 4 x   ==>    +-------+    xxxxx
 /ab/c/  |  2          ---+-+-+--    xxxxx         0/      \1  1 (/d/)_2
 /ac/d/  |  3        //    b|   c\    /d/        +----+    xxxxx
 /c/d/   |  4      xxxxx xxxxx xxxxx             | 01 |    x 1 x
---------+-----    x 1 x x 2 x x 3 x             +----+    xxxxx
                   xxxxx xxxxx xxxxx            0/    \1  111 (b/)_2
                     b/   /c/   /d/          xxxxx   xxxxx
                                             x 2 x   x 3 x
                                             xxxxx   xxxxx
                                           (/c/)_2   (/d/)_2

       (a)                     (b)                         (c)

Figure 1: A Patricia and Binary Patricia trie: (a) A small set of prefixes; (b) Patricia representation; (c) Binary Patricia representation.

(str)_2: a binary representation of str.¶

4.2. Insert

A binary Patricia trie is constructed by sequentially inserting FIB entries into an initially empty tree. The key insertion algorithm is detailed in Figure 6. The algorithm works as follows:¶

A common prefix is determined by comparing the token stored at the root node with the key (k).¶
The process terminates in one of the following three cases:¶
- The common prefix equals both (k) and the token Figure 2.¶
- The common prefix equals (k) but not the token Figure 3.¶
- The common prefix equals neither (k) nor the token Figure 4.
  ¶

           011                                011
          +---+                              +----+
          | 3 |         Insert 011(r3)       | r3 | (3)
          +---+           ========>          +----+
         0/   \1                            0/    \1
     xxxxxx   xxxxxx                    xxxxxx    xxxxxx
  10 x r1 x   x r2 x 11              10 x r1 x    x r2 x 11
     xxxxxx   xxxxxx                    xxxxxx    xxxxxx

Figure 2: Insert 011(r3) to {011010(r1), 011111(r2)}.

                                                    01
           011                                  +--------+
          +---+                                 |   r3   | (2)
          | 3 |                                 +--------+
          +---+          Insert 01(r3)          /        \1
         0/   \1           ========>          null       +---+
     xxxxxx   xxxxxx                                     | 3 | null
  10 x r1 x   x r2 x 11                                  +---+
     xxxxxx   xxxxxx                                    0/   \1
                                                 xxxxxx   xxxxxx
                                              10 x r1 x   x r2 x 11
                                                 xxxxxx   xxxxxx

Figure 3: Insert 01(r3) to {011010(r1), 011111(r2)}.

                                                 01
         011                                  +-------+
        +---+                                 |   2   |
        | 3 |                                 +-------+
        +---+        Insert 01010(r3)        0/       \1
       0/   \1          ========>        xxxxxx       +---+
   xxxxxx   xxxxxx                    10 x r3 x       | 3 | null
10 x r1 x   x r2 x 11                    xxxxxx       +---+
   xxxxxx   xxxxxx                                   0/   \1
                                                 xxxxxx   xxxxxx
                                              10 x r1 x   x r2 x 11
                                                 xxxxxx   xxxxxx

Figure 4: Insert 01010(r3) to {011010(r1), 011111(r2)}.

If the common prefix equals the token but not (k), an additional iteration is required. In this case, the remaining sequence of (k) is used to continue the search at the lower-level nodes of the trie Figure 5. For clarity, certain implementation details, such as labeling bit positions and creating new nodes, are omitted here but can be found in [PatriciaTrie] [ComputerProg].¶

                                                  011
         011                                     +-------+
        +---+                                    |   3   |
        | 3 |                                    +-------+
        +---+      Insert 011110101(r3)         0/       \1
       0/   \1          ========>           xxxxxx       +---+
   xxxxxx   xxxxxx                       10 x r1 x       | 5 | 1
10 x r1 x   x r2 x 11                       xxxxxx       +---+
   xxxxxx   xxxxxx                                      0/   \1
                                                  xxxxxx   xxxxxx
                                              101 x r3 x   x r2 x null
                                                  xxxxxx   xxxxxx

Figure 5: Insert 01110101(r3) to {011010(r1), 011111(r2)}.

The structure of a binary Patricia trie is independent of the order in which keys are inserted. That is, for any given set of keys, the resulting trie is unique.¶

Input:  t: binary Patricia, static, initial is t[0]
        k: binary digits of a key
Output: t: binary Patricia with k
Algorithm:
cp = find_common(t[0].token, k); i = 0;
while TRUE do
  if cp == t[i].token and cp == k then
    mark t[i] with k's next-hop
  else if cp == t[i].token then
    i = t[i].next[k[|cp|]]
    cp = find_common(t[i].token, k[|cp|+1:])
    continue
  else if cp == k then
    add two nodes with an # label
  else
    add two nodes
  end if
  break
end while

Figure 6: Key Insertion Algorithm in Binary Patricia.

4.3. Remove

To remove a key from a binary Patricia trie:¶

Perform a key query to locate the corresponding leaf node.
¶
Delete the leaf node and recursively merge any single-child nodes with their children along the path back toward the root.¶

4.4. Update

Updating a key involves:¶

Deleting the old key using the procedure described in Section 4.3.
¶
Inserting the new key into the trie following the procedure described in Section 4.2.¶

5. Speculative Forwarding

In speculative forwarding, the binary Patricia trie removes tokens, leaving only the differentiating bits between prefixes. This reduces memory usage, making it independent of name length and more scalable. However, this approach shifts the lookup behavior from longest-prefix match (LPM) to longest-prefix classification (LPC), requiring an additional verification step to confirm the correct match. From the perspective of matching, LPM generally consists of two steps: longest-prefix classification and verification. LPC is a step to filter some candidate rules, and an additional verification is required to affirm the right one.¶

These aspects are discussed in the following sections.¶

5.1. Forwarding Behaviors and Speculative Data Plane

5.1.1. Forwarding Behaviors

Forwarding behaviors in name-based systems involve matching packet names to prefixes. Conventional forwarding uses longest-prefix match (LPM), where classification identifies the longest matching prefix, followed by a verification step to ensure correctness. Speculative forwarding relays packets by Longest-prefix classification (LPC) instead of LPM. LPC is a step to filter some candidate rules, and an additional verification is required to affirm the right one. LPC lookup guarantees that if there is a match, the packet will be forwarded to the same nexthop that LPM would use, but if there is no match, the packet is still forwarded.¶

Specifically , in the speculative forwarding, the process of verification is removed. While speculative forwarding ensures accurate forwarding for packets with known prefixes, it may incorrectly forward packets with unknown prefixes that would typically be dropped under traditional LPM systems.¶

5.1.2. Speculative Data Plane

The speculative data plane uses speculative forwarding in the default-free zone (DFZ) and conventional forwarding in networks with default routes. DFZ routers handle all global name prefixes and require high-speed forwarding, benefiting from speculative forwarding's smaller routing tables and faster lookups. Edge routers, which manage fewer prefixes due to default routes, use conventional forwarding as they can store tokens and terminate packets misforwarded by speculative forwarding. This hybrid approach combines speculative forwarding in core networks and conventional forwarding in edge networks to create an efficient global data plane.¶

5.2. Dual Binary Patricia for Speculative Forwarding

Dual Binary Patricia (DuBP) supports LPC in a speculative data plane by dividing the FIB (S) into two subsets:¶

Flat Subset (F): Contains flat name prefixes where no prefix is a proper prefix of another. Uses speculative Patricia for efficient lookups.¶

Covered Subset (P): Contains prefixes covered by those in the flat subset. Uses tokenized Patricia for verification. This structure combines speculative forwarding for scalability with conventional forwarding for accuracy, optimizing memory usage and lookup speed.¶

Thus, S=F+P. As its name implies, dual Patricia is a data structure that consists of two different styles of Patricia: a speculative binary Patricia (sBP) for subset F and a tokenized binary Patricia (tBP) for subset P.¶

5.2.1. Speculative Binary Patricia

Speculative Binary Patricia (sBP) is a binary Patricia variant designed for speculative forwarding. It eliminates token storage in both internal and leaf nodes. Instead, each internal node stores a discrimination bit position to determine the branching direction based on the bit value at that position in the lookup name. For example, if the root node is labeled p:15, it checks the 15th bit of the lookup name and branches left if the bit is 0 or right if it is 1. The structure only uses discrimination bits for branching and does not store or verify other prefix information.¶

5.2.2. Dual Binary Patricia

Dual Binary Patricia (DuBP) consists of two parts: a tokenized Patricia for the proper prefix subset (P) to perform longest-prefix match (LPM) and a speculative Patricia for the flat name subset (F) to perform longest-prefix classification (LPC). For name lookup, the packet name is first processed by the tokenized Patricia. If no match is found, it is sent to the speculative Patricia, which always outputs a port. This design supports speculative forwarding while maintaining classification for flat names. If multiple ports are matched, the forwarding strategy determines the correct port.¶

7. Security Considerations

7.1. Scalability Concerns

As the network size grows, the scalability of name-based packet forwarding systems becomes increasingly critical. The FIB, which stores all the routing information, may grow to billions of entries, making it difficult to manage and look up packet names efficiently. Without proper solutions, the forwarding process can slow down, and the system may not be able to scale effectively to handle the growing number of packets and names.¶

To address scalability concerns, advanced data structures like Patricia tries or hash tables can be used to compress the FIB, improving both memory usage and lookup efficiency. Additionally, distributed systems can share the load of forwarding decisions across multiple nodes, allowing for parallel processing of large numbers of forwarding entries. Incremental updates to the FIB can also help, enabling the system to adjust as network conditions change without requiring complete recalculations or reconfigurations.¶

Despite these solutions, there are significant trade-offs. Patricia tries can still cause memory overhead when the FIB grows very large, as even compressed data structures require substantial resources to store billions of entries. Distributed systems introduce complexities like synchronization across nodes, which can increase latency and decrease system reliability. Furthermore, incremental updates can complicate real-time routing decisions, as they might cause delays in adapting to rapid network changes or lead to inconsistencies in the forwarding table.¶

7.2. Denial of Service (DoS) Attacks

DoS attacks represent a significant security threat to name-based packet forwarding systems. Attackers can overwhelm the system by sending a high volume of requests or injecting malformed packets, which may exhaust system resources and disrupt normal operations.¶

To mitigate DoS attacks, packet filtering and rate limiting can be implemented. These measures can prevent malicious packets from consuming excessive resources by limiting the frequency of requests and filtering out illegitimate packets. Additionally, distributed load balancing can help distribute the traffic load across multiple nodes, preventing a single node from becoming overwhelmed.¶

While packet filtering and rate limiting are effective, they may introduce additional latency, particularly under high traffic conditions. Distributed load balancing requires a more complex infrastructure and management, which can increase the cost and introduce potential points of failure.¶

Scalable Name-Based Packet Forwarding

Abstract

Status of This Memo

Copyright Notice

Table of Contents