DetNet Shaofu. Peng Internet-Draft ZTE Intended status: Standards Track Peng. Liu Expires: 22 December 2024 China Mobile Kashinath. Basu Oxford Brookes University Aihua. Liu ZTE Dong. Yang Beijing Jiaotong University Guoyu. Peng Beijing University of Posts and Telecommunications 20 June 2024 Timeslot Queueing and Forwarding Mechanism draft-peng-detnet-packet-timeslot-mechanism-07 Abstract IP/MPLS networks use packet switching (with the feature store-and- forward) and are based on statistical multiplexing. Statistical multiplexing is essentially a variant of time division multiplexing, which refers to the asynchronous and dynamic allocation of link timeslot resources. In this case, the service flow does not occupy a fixed timeslot, and the length of the timeslot is not fixed, but depends on the size of the packet. Statistical multiplexing has certain challenges and complexity in meeting deterministic QoS, and its delay performance is dependent on the used queueing mechanism. This document further describes a generic time division multiplexing scheme in IP/MPLS networks, which we call timeslot queueing and forwarding (TQF) mechanism. TQF is an enhancement based on TSN TAS, which defines a cyclic period consisting of multiple timeslots, and a flow is assigned to be transmited within one or more dedicated timeslots. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Peng, et al. Expires 22 December 2024 [Page 1] Internet-Draft Timeslot Queueing and Forwarding June 2024 Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 22 December 2024. Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 6 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4. Timeslot Mapping Relationship . . . . . . . . . . . . . . . . 9 4.1. Deduced by BTM . . . . . . . . . . . . . . . . . . . . . 10 4.2. Deduced by BOM . . . . . . . . . . . . . . . . . . . . . 13 5. Resources Used by TQF . . . . . . . . . . . . . . . . . . . . 15 6. Arrival Postion in the Orchestration Period . . . . . . . . . 15 7. Residence Delay Evaluation . . . . . . . . . . . . . . . . . 18 7.1. Residence Delay on the Ingress Node . . . . . . . . . . . 18 7.2. Residence Delay on the Transit Node . . . . . . . . . . . 19 7.3. Residence Delay on the Egress Node . . . . . . . . . . . 20 7.4. End-to-end Delay and Jitter . . . . . . . . . . . . . . . 20 8. Flow States in Data-plane . . . . . . . . . . . . . . . . . . 21 9. Queue Allocation Rule of Round Robin Queue . . . . . . . . . 21 10. Queue Allocation Rule of PIFO Queue . . . . . . . . . . . . . 23 10.1. PIFO with On-time Scheduling Mode . . . . . . . . . . . 23 10.2. PIFO with In-time Scheduling Mode . . . . . . . . . . . 24 11. Global Timeslot ID . . . . . . . . . . . . . . . . . . . . . 25 12. Multiple Orchestration Periods . . . . . . . . . . . . . . . 28 13. Admission Control on the Headend . . . . . . . . . . . . . . 30 14. Frequency Synchronization . . . . . . . . . . . . . . . . . . 31 15. Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . 32 15.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . 34 Peng, et al. Expires 22 December 2024 [Page 2] Internet-Draft Timeslot Queueing and Forwarding June 2024 16. Taxonomy Considerations . . . . . . . . . . . . . . . . . . . 37 17. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 18. Security Considerations . . . . . . . . . . . . . . . . . . . 38 19. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 38 20. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 20.1. Normative References . . . . . . . . . . . . . . . . . . 38 20.2. Informative References . . . . . . . . . . . . . . . . . 40 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 1. Introduction IP/MPLS networks use packet switching (with the feature store-and- forward) and are based on statistical multiplexing. The discussion of supporting multiplexing in the network was first seen in the time division multiplexing (TDM), frequency division multiplexing (FDM) and other technologies of telephone communication network (using circuit switching). Statistical multiplexing is essentially a variant of time division multiplexing, which refers to the asynchronous and dynamic allocation of link resources. In this case, the service flow does not occupy a fixed timeslot, and the length of the timeslot is not fixed, but depends on the size of the packet. In contrast, synchronous time division multiplexing means that a sampling frame (or termed as time frame) includes a fixed number of fixed length timeslots, and the timeslot at a specific position is allocated to a specific service. The utilization rate of link resources in statistical multiplexing is higher than that in synchronous time division multiplexing. However, if we want to provide deterministic end-to-end delay in packet switched networks based on statistical multiplexing, the difficulty is greater than that in synchronous time division multiplexing. The main challenge is to obtain a deterministic upper bound on the queueing delay, which is closely related to the queueing mechanism used in the network. In addition to IP/MPLS network, other packet switched network technologies, such as ATM, also discusses how to provide corresponding transmission quality guarantee for different service types. Before service communication, ATM needs to establish a connection to reserve virtual path/channel resources, and use fixed- length short cells and timeslots. The advantage of short cell is small interference delay, but the disadvantage is low encoding efficiency. The mapping relationship between ATM cells and timeslots is not fixed, so it still depends on a specific cells scheduling mechanism (such as [ATM-LATENCY]) to ensure delay performance. Although the calculation of delay performance based on short and fixed-length cells is more concise than that of IP/MPLS networks based on variable length packets, they all essentially depend on the queueing mechanism. Peng, et al. Expires 22 December 2024 [Page 3] Internet-Draft Timeslot Queueing and Forwarding June 2024 [TAS] introduces a synchronous time-division multiplexing method based on gate control list (GCL) rotation in Ethernet LAN. Its basic idea is to calculate when the packets of the service flow arrive at a certain node, then the node will turn on the green light (i.e., the transmission state is set to OPEN) for the corresponding queue inserted by the service flow at that time duration, which is defined as TimeInterval between two adjacent items in gating cycle. The TimeInterval is exactly the timeslot resource that can be reserved for service flow. A set of queues is controlled by the GCL, with round robin per gating cycle. The gating cycle (e.g, 250 us) contains a lot of items, and each item is used to set the OPEN/CLOSED states of all traffic class queues. By strictly controlling the release time of service flow at the network entry node, multiple flows always arrive sequentially during each gating cycle at the intermediate node and are sent during their respective fixed timeslot to avoid conflicts, with extremely low queueing delay. However, the GCL state (i.e., items set, and different TimeInterval value between any two adjacent items) is related with all ordered flows that passes through the node. Calculating and installing GCL states separately on each node has scalability issues. [CQF] introduces a synchronous time-division multiplexing method based on fixed-length cycle in Ethernet LAN. [ECQF] is a further enhancement of the classic CQF. CQF with 2-buffer mode or ECQF with x-bin mode only uses a small number of cycles to establish the cycle mapping between a port-pair of two adjacent nodes, which is independent of the individual service flow. The cycle mapping may be maintained on each node and swaped based on a single cycle id carried in the packet during forwarding ([I-D.eckert-detnet-tcqf]), or all cycle mappings are carried in the packet as a cycle stack and read per hop during forwarding ([I-D.chen-detnet-sr-based-bounded-latency]). According to [ECQF], how many cycles (i.e., x-bin mode) are required depends on the proportion of the variation in intra-node forwarding delay relative to the cycle size. If the proportion is small, 3-bin is enough, otherwise, more than 3 bins needed. Compared to TAS, CQF/ECQF no longer maintains GCL on each node, but instead replaces the large number of variable length of timeslots related to service flows in GCL with a small number of fixed length cycles unrelated to service flows. Thus, CQF/ECQF simplifies the data plane, but leaves the complexity to the control plane, by calculating and controlling the release time of service flow at the network entry, to guarantee no conflicts between flows in any cycle on any intermediate nodes. In order to meet the large scaling requirements, this document describes a scheduling mechanism for enhancing TAS. It defines a cyclic period consisting of multiple timeslots that share limited buffer resources, and a flow is assigned to be transmited within one Peng, et al. Expires 22 December 2024 [Page 4] Internet-Draft Timeslot Queueing and Forwarding June 2024 or more dedicated timeslots. It does not rely on time synchronization, but needs to detect and maintain the phase difference of cyclic period between adjacent nodes. It further defines two scheduling modes: on-time or in-time mode. We call this mechanism as Timeslot Queueing and Forwarding (TQF), as a supplement to IEEE 802.1 TSN TAS. In TQF, The selected length of the cyclic period (i.e., gating cycle of TAS) depends on the length of the supported service burst interval. Similar to TAS and CQF/ECQF, TQF is also TDM based scheduling mechanisms. * Compared to classic TAS, TQF may use round robin queues corresponding to the count of timeslots during gating cycle, while TAS only maintains queues corresponding to the number of traffic classes and one of them is used for the Scheduled Traffic (i.e., DetNet flows). That means TQF need more queues than TAS (i.e., multiple timeslot queues vs single traffic class queue). However, TAS needs to use other complex methods to control the arrival order of all flows sharing the same traffic class queue to isolate them (so that each flow faces almost zero queuing delay), while TQF's timeslot queue naturally isolates flows by timeslot id of gating cycle. And, TQF with in-time scheduling mode may use a single PIFO (put in first out) queue to approximate the ultra-low delay of TAS. * Compared to CQF/ECQF, TQF on-time scheduling maintains round robin queues corresponding to the count of timeslots during gating cycle, while CQF/ECQF maintains extra tolerating queues depending on the proportion of the variation in intra-node forwarding delay relative to the cycle size. There is no gating cycle with its timeslot resources designed by CQF/ECQF, it needs to use additional flow interleaving method to control the arrival order of flows sharing the same cycle queue to isolate flows (or alternatively tolerate overprovision), while TQF's timeslot queue naturally isolates flows by timeslot id of gating cycle. This is also the semantic difference between cycle id and timeslot id, where the former is used to indicate the NO. of the aggregated queues such as sending, receiving, or tolerating queue, rather than indicating the individual timeslot resource within the gating cycle like the later. That is, after defining timeslot resources in IP/MPLS, TQF does not limit the implementations of the data structure type corresponding to timeslot resources on the data plane, which may be round robin queues, or a single PIFO queue. Peng, et al. Expires 22 December 2024 [Page 5] Internet-Draft Timeslot Queueing and Forwarding June 2024 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2. Terminology The following terminology is introduced in this document: Timeslot: The unit of TQF scheduling. It needs to design a reasonable value, such as 10us, to send at least one complete packet. Different nodes can be configured with different length of timeslot. Timeslot Scheduling: The packet is stored in the buffer zone corresponding to a specific timeslot id, and may be sent before (in-time mode) or within (on-time mode) that timeslot. The timeslot id is always a NO. from the orchestration period. Service Burst Interval: The traffic specification of DetNet flow generally follows the principle of generating a specific burst amounts within a specific length of periodic burst interval. For example, a service generates 1000 bits of burst per 1 ms, where 1 ms is the service burs interval. Orchestration Period: The orchestration period is a cyclic period and used to instantiate timeslot resources on the link. The selection of orchestration period length depends on the service burst interval of DetNet flows, e.g., the Least Common Multiple of service burst intervals of all flows. It is actually the gating cycle in TAS, just with different queue allocation rules. It contains a fixed count (termed as N and numbered from 0 to N-1) of timeslots. For example, the orchestration period include 1000 timeslots and each timeslot length is 10 us. Multiple orchestration period length may be enabled on the link, but all nodes included in the DetNet path must interoperate based on the same orchestration period length. A specific orchestration period length can be used to indicate the corresponding TQF forwarding instance. Ongoing Sending Period: The orchestration period which the ongoing sending timeslot belongs to. Scheduling Period: The scheduling period of a TQF forwarding Peng, et al. Expires 22 December 2024 [Page 6] Internet-Draft Timeslot Queueing and Forwarding June 2024 instance depends on the hardware's buffer resources that is supported by the device. Its length reflects the count of the timeslot resources (termed as M and numbered from 0 to M-1) with related buffer resources that is actually instantiated on the data plane, which is limited by hardware capabilities. Scheduling period length may be less than or equal to orchestration period length in the case of on-time mode, or larger than or equal to orchestration period length in the case of in-time mode. Packets belonging to a specific timeslot (in orchestration period) will be mapped and stored in the buffer zone of the corresponding timeslot of the scheduling period. Incoming Timeslot: For the headend of the DetNet path, the current timeslot of UNI at which a flow arriving and after being policing is the incoming timeslot. For any intermediate node of the DetNet path, the timeslot contained in the packet received from the upstream node (i.e., the outgoing timeslot of the upstream node) is the incoming timeslot. An incoming timeslot id is the timeslot in the context of the orchestration period. Outgoing Timeslot: When sending a packet to the outgoing port, according to resource reservation or certain rules, it chooses to send packet in the specified timeslot of that port, which is the outgoing timeslot. An outgoing timeslot id is the timeslot in the context of the orchestration period. Ongoing Sending Timeslot: When the end of the incoming timeslot to which the packet belongs reaches a specific port, the timeslot currently in the sending state is the ongoing sending timeslot of that port. Note that the ongoing sending timeslot is different with the outgoing timeslot. An ongoing sending timeslot id is the timeslot in the context of the orchestration period. 3. Overview This scheme introduces the time-division multiplexing scheduling mechanism based on the fixed length timeslot in the IP/MPLS network. Note that the time-division multiplexing here is a L3 packet-level scheduling mechanism, rather than the TDM port (such as SONET/SDH) implemented in L1. The latter generally involves the time frame and the corresponding framing specification, which is not necessary in this document. The data structure associated with timeslot resources may be implemented using round robin queues, or a single PIFO queue, etc. Figure 1 shows the TQF scheduling behavior implemented by the intermediate node P through which a deterministic path passes. Peng, et al. Expires 22 December 2024 [Page 7] Internet-Draft Timeslot Queueing and Forwarding June 2024 incoming slots: i,j,k +---+ +---+ +---+ |PE1| --------------- | P | --------------- |PE2| +---+ +---+ +---+ orchestration period (OP) +---+---+-+-+---+---------+---+ | 0 | 1 | 2 | 3 | ... ... |N-1| +---+---+---+---+---------+---+ ^ Outgoing slots | a,b,c @OP | DetNet path ------------------o-------------------> |\ | \ (rank by a,b,c @OP) access slots: | \-----------------------+ a',b',c' @SP v | / +-------------------+ __ v | | queue-0 @slot 0 | / \ +---+ | +-------------------+ | | +---+ | | queue-1 @slot 1 | | | +---+ Scheduling < +-------------------+ | +---+ Period (SP) | | ... ... | | ^ +---+ | +-------------------+ | | +---+ | | queue-n @slot M-1| \__/ +---+ \ +-------------------+ +---+ (Round Robin Queue) (PIFO) Figure 1: Timeslot Based Scheduling Mechanism Where, both the orchestration period and the scheduling period consist of multiple timeslots, the number of timeslots supported by orchestration period is related to the length of the service burst interval, while the number of timeslots supported by scheduling period is limited by hardware capabilities, and it may be instantiated by a Round Robin queue or PIFO. A TQF enabled link may configure multiple TQF scheduling instances each with specific orchestration period length. Nodes communicate with each other based on the same instance. Peng, et al. Expires 22 December 2024 [Page 8] Internet-Draft Timeslot Queueing and Forwarding June 2024 Each TQF scheduling instance related to a specific orchestration period length may configure its service rate, and the sum of service rates of all instances must not exceed the port bandwidth. For a TQF scheduling instance, the total amount of bits that can be consumed in each timeslot is generally not exceeding the result of the service rate multiplied by the timeslot length. For each orchestration period length, all nodes in the network does not require phase alignment. The phase difference of orchestration period between adjacent nodes should be detected. For a specific orchestration period configured on different links in the network, these links may configure different timeslot lengths because their capabilities vary, for example, the link capability of the edge nodes is weaker than that of core nodes. In Figure 1, a DetNet path consumes timeslots i, j, k from the orchestration period of the link PE1-P, and a, b, c from the orchestration period of the link P-PE2 respectively. From node P's perspective, i, j, k are incoming timeslots, while a, b, c are outgoing timeslots. The cross connection between an incoming timeslot and an outgoing timeslot will result in the corresponding residence delay, which depends on the offset between the incoming and outgoing timeslots based on the phase difference of the orchestration periods of link PE1-P and P-PE2. An outgoing timeslot in the orchestration period will finally access the mapped timeslot in the scheduling period. There is a mapping function from the timeslot z in the orchestration period to the timeslot z' in the scheduling period, i.e., z' = f(z). For example, the mapping function may be z' = z, z' = z + offset, z' = z % M, and z' = random(z), etc. Which function to use depends on the queue allocation rule and data structure instantiated for timeslot resources. In this document, we mainly discuss two mapping function, z' = z % M (in the case of round robin queue), and z' = z (in the case of PIFO). How to calculate a DetNet path that meets the flow requirements is not within the scope of this document. 4. Timeslot Mapping Relationship In order to determine the offset between the incoming timeslot and the outgoing timeslot in the context of specific TQF scheduling instance that is identified by orchestration period length, it is necessary to first determine the ongoing sending timeslot that the incoming timeslot falls into, i.e., the mapping relationship between the incoming timeslot and the ongoing sending timeslot. Peng, et al. Expires 22 December 2024 [Page 9] Internet-Draft Timeslot Queueing and Forwarding June 2024 Two methods are provided in the following sub-sections to determine the mapping relationship between the incoming timeslot and the ongoing sending timeslot. 4.1. Deduced by BTM Figure 2 shows that there are three nodes U, V, and W in turn along the path. All nodes are configured with the same orchestration period length (termed as OPL), which is crucial for establishing a fixed timeslot mapping relationship between the adjacent nodes. * Port_u2 has timeslot length L_u2, and an orchestration period contains N_u2 timeslots. * Port_v1 has timeslot length L_v1, and an orchestration period contains N_v1 timeslots. * Port_v2 has timeslot length L_v2, and an orchestration period contains N_v2 timeslots. Hence, L_u2*N_u2 = L_v1*N_v1 = L_v2*N_v2. In general, the link bandwidth of edge nodes is small, and they will be configured with a larger timeslot length than the aggregated/backbone nodes. It has been mathematically proven that if the least common multiple of L_u# and L_v# is LCM, OPL is also a multiple of LCM. Node U may send a detection packet from the end (or head, the process is similar) of an arbitrary timeslot i of port_u2 connected to node V. After a certain link propagation delay (D_propagation), the packet is received by the incoming port of node V, and i is regarded as the incoming timeslot by V. At this time, the ongoing sending timeslot of port_v1 is j, and there is time T_ij left before the end of the timeslot j. This mapping relationship is termed as: * To avoid confusion, we refer to this mapping relationship as the base timeslot mapping (BTM), as it is independent of the DetNet flows. Later, we will see the timeslot mapping relationship related to DetNet flow, which is the mapping relationship between the outgoing timeslot of port_u2 and the outgoing timeslot of port_v2, which is based on timeslot resource reservation and termed as the forwarding timeslot mapping (FTM). Peng, et al. Expires 22 December 2024 [Page 10] Internet-Draft Timeslot Queueing and Forwarding June 2024 BTM is generally maintained by node V when processing probe message received from node U. However, node U may also obtain this information from node V, e.g, by an ACK message. The advantage of maintaining BTM by node U is that it is consistent with the unidirectional link from node U to V, so it is more appropriate for node U (rather than V) to advertise it in the network. A BTM detection method can be found in [I-D.xp-ippm-detnet-stamp], and the advertisement method can be found in [I-D.peng-lsr-deterministic-traffic-engineering]. Note that this document does not recommend directly detecting and maintaining BTM between the outgoing timeslot of port_u2 and the ongoing sending timeslot of port_v2 (i.e., the outgoing port of downstream node V), as this is too trivial. In fact, as shown above, maintaining only BTM between the outgoing timeslot of port_u2 and the ongoing sending timeslot of port_v1 (i.e., the incoming port of downstream node V) is sufficient to derive other mapping relationships. Peng, et al. Expires 22 December 2024 [Page 11] Internet-Draft Timeslot Queueing and Forwarding June 2024 | port_u1 Node U | port_u2 | | |<---------------------- OP of port_u2 -------------------->| | +-----------+---+-------------------------------------------+ | | ... ... | i | ... ... | | +-----------+---+-------------------------------------------+ | (departured from port_u2) | | | \ (link delay) | v | |<---------------------- OP of port_u2 -------------------->| | +-----------+---+-------------------------------------------+ | | ... ... | i | ... ... | | +-----------+---+-------------------------------------------+ | (arrived at port_v1) | | | |<-T_ij-->| | v (i map to j) | +-----------+-----------+-----------------------------------+ | | ... ... | j | ... ... | | +-----------+-----------+-----------------------------------+ | |<--------------------- OP of port_v1 --------------------->| | port_v1 \ v | Node V | | \ (intra-node forwarding delay) | port_v2 v | +---------------+-------+-----------------------------------+ | | ... ... | j' | ... ... | | +---------------+-------+-----------------------------------+ | |<--------------------- OP of port_v2 --------------------->| | | port_w1 v Node W Figure 2: BTM Detection Based on the above detected BTM, and knowing the intra-node forwarding delay (F) including parsing, table lookup, internal fabric exchange, we can derive BTM between any outgoing timelot x of port_u2 and the ongoing timeslot y of port_v2. Let t is the offset between the end of the timeslot x of port_u2 and the beginning of the orchestration period of the port_v2. Peng, et al. Expires 22 December 2024 [Page 12] Internet-Draft Timeslot Queueing and Forwarding June 2024 * t = ((j'+1)*L_v1 - T_ij' + OPL + (x-i)*L_u2 + F) % OPL Then, * y = [t/L_v2] And the time T_xy left before the end of the timeslot y is: * T_xy = (y+1)*L_v2 - t This document recommends that the time of each port within the same node must be synchronized, that is, all ports of a node share the same local system time, which is easy to achieve. It is also recommended that the begin time of the orchestration period for all ports within the same node be the same or differ by an integer multiple of OPL, e.g, maintaining a global initial time as the logical begin time for the first round of orchestration period for all ports. Whether node restart or port restart, this initial time should continue to take effect to avoid affecting the timeslot mapping relationship between each node. Depending on the implementation, considering that the initial time may be a historical time that is too far away from the current system time, regular updates may be made to it (e.g, self increasing k*OPL, where k is a natural number) to be closer to the current system time. 4.2. Deduced by BOM Figure 3 shows that there are three nodes U, V, and W in turn along the path. Similar to Section 4.1, it still has L_u2*N_u2 = L_v1*N_v1 = L_v2*N_v2. Node U may send a detection packet from the head (or end, the process is similar) of the orchestration period of port_u2 connected to node V. After a certain link propagation delay (D_propagation), the packet is received by the incoming port of node V. At this time, there is time P_uv left before the end of the ongoing sending period of port_v1. This mapping relationship is termed as: * We refer to this mapping relationship as the base orchestration- period mapping (BOM), which it is independent of the DetNet flows. BOM is generally maintained by node V when processing probe message received from node U. However, node U may also obtain this information from node V, e.g, by an ACK message. The advantage of Peng, et al. Expires 22 December 2024 [Page 13] Internet-Draft Timeslot Queueing and Forwarding June 2024 maintaining BOM by node U is that it is consistent with the unidirectional link from node U to V, so it is more appropriate for node U (rather than V) to advertise it in the network. A BOM detection method can be found in [I-D.xp-ippm-detnet-stamp], and the advertisement method can be found in [I-D.peng-lsr-deterministic-traffic-engineering]. | port_u1 Node U | port_u2 | | |<---------------------- OP of port_u2 -------------------->| | +-----------+---+-------------------------------------------+ | | ... ... | | ... ... | | +-----------+---+-------------------------------------------+ | (departured from port_u2) | \ | \ (link delay) | \ | |<---------------------- OP of port_u2 -------------------->| | +-----------+---+-------------------------------------------+ | | ... ... | | ... ... | | +-----------+---+-------------------------------------------+ | (arrived at port_v1) | | | |<---------------------- P_uv -------------------------->| | v | +-----------+-----------+-----------------------------------+ | | ... ... | | ... ... | | +-----------+-----------+-----------------------------------+ | |<--------------------- OP of port_v1 --------------------->| | port_v1 v \ Node V \ (intra-node forwarding delay) | \ | port_v2 \ | +---------------+-------+-----------------------------------+ | | ... ... | | ... ... | | +---------------+-------+-----------------------------------+ | |<--------------------- OP of port_v2 --------------------->| | | port_w1 v Node W Figure 3: BOM Detection Peng, et al. Expires 22 December 2024 [Page 14] Internet-Draft Timeslot Queueing and Forwarding June 2024 Based on BOM, and knowing the intra-node forwarding delay (F), we can derive the mapping relationship between any outgoing timelot x of port_u2 and the ongoing timeslot y of port_v2. Let t is the offset between the end of the timeslot x of port_u2 and the beginning of the orchestration period of the port_v2. * t = ((x+1)*L_u2 + OPL - P_uv + F) % OPL Then, * y = [t/L_v2] And the time T_xy left before the end of the timeslot y is: * T_xy = (y+1)*L_v2 - t 5. Resources Used by TQF The operation of TQF scheduling mechanism will consume two types of resources: * Bandwidth: Each TQF scheduling instance may configure its service rate that is a dedicated bandwidth resource from the outgoing port, and the sum of service rates of all instances must not exceed the port bandwidth. * Burst: The burst resources of a specific TQF scheduling instance can be represented as the corresponding bit amounts of all timeslots included in the orchestration period. The total amount of bits that can be consumed by flows in each timeslot is generally not exceeding the result of the service rate multiplied by the timeslot length. 6. Arrival Postion in the Orchestration Period Generally, a DetNet flow has its TSpec, such as periodically generating traffic of a specific burst size within a specific length of burst interval, which regularly reaches the network entry. The headend executes traffic regulation (e.g, setting appropriate parameters for leaky bucket shaping), which generally make packets evenly distributed within the service burst interval, i.e, there are one or more shaped sub-burst in the service burst interval. There is an ideal positional relationship between the departure time (when each sub-burst leaves the regulator) and the orchestration period of UNI port, that is, each sub-burst corresponds to an ideal incoming timeslot of UNI port. Based on the ideal incoming timeslot, an ideal outgoing timeslot of NNI port may be selected and consumed by the Peng, et al. Expires 22 December 2024 [Page 15] Internet-Draft Timeslot Queueing and Forwarding June 2024 sub-burst. For example, if a DetNet flow distributes m sub-bursts during the orchestration period, the network entry should maintain m states for that flow: * * * ... ... * However, the packets arrived at the network entry are not always ideal, and the departure time from regulator may not be in a certain ideal incoming timeslot. Therefore, an important operation that needs to be performed by the network entry is to determine the ideal incoming timeslot i based on the actual departure time. This can first determine the actual incoming timeslot based on the actual departure time, and then select an ideal incoming timeslot that is closest to the actual incoming timeslot and not earlier than the actual incoming timeslot. Figure 4 shows, for some typical DetNet flows, the ideal incoming timeslots in the orchestration period of UNI, as well as the ideal outgoing timeslots of NNI consumed by these DetNet flows. Peng, et al. Expires 22 December 2024 [Page 16] Internet-Draft Timeslot Queueing and Forwarding June 2024 |<--------------------- OPL ---------------------->| +----+----+----+----+----+----+----+----------+----+ | #0 | #1 | #2 | #3 | #4 | #5 | #6 | ... ... |#N-1| +----+----+----+----+----+----+----+----------+----+ +--+ Flow 1: | |b1| | +-----+--+-----------------------------------------+ |<------------------- SBI ------------------------>| +--+ +--+ Flow 2: | |b1| |b2| +------------+--+------------------------+--+------+ |<------------------- SBI ------------------------>| +------+ Flow 3: | | b1 | +---------------------------+------+---------------+ |<------------------- SBI ------------------------>| +--+ +--+ +--+ Flow 4: | |b1| | |b1| | |b1| | +----+--+--------+----+--+--------+----+--+--------+ |<----- SBI ---->|<----- SBI ---->|<----- SBI ---->| Figure 4: Relationship between SBI and OP As shown in the figure, the service burst interval of flows 1, 2, 3 is equal to the orchestration period length, while the service burst interval of flow 4 is only 1/3 of the orchestration period length. * Flow 1 generates a small single burst within its burst interval, which may consume timeslot 2 or other subsequent timeslot of NNI; * Flow 2 generates two small discrete sub-bursts within its burst interval and also be shaped, which may respectively consume slots 4 and N-1 of NNI; * Flow 3 generates a large single burst within its burst interval but not be really shaped (due to purchasing a larger burst resource and served by a larger bucket depth), which may also be split to multiple back-to-back sub-bursts and consume multiple consecutive timeslots, such as 8 and 9 of NNI. Peng, et al. Expires 22 December 2024 [Page 17] Internet-Draft Timeslot Queueing and Forwarding June 2024 * The service burst interval of flow 4 is only 1/3 of the orchestration period. Hence, construct flow 4' with 3 occurrence of the flow 4 within an orchestration period. So flow 4' is similar to flow 2, generating three separate sub-bursts within its burst interval. It may consume timeslots 3, 7, and N-1 of NNI. 7. Residence Delay Evaluation 7.1. Residence Delay on the Ingress Node On the headend H, the received flow corresponds to an ideal incoming timeslot i of UNI port. Although there is actually no timeslot mapping relationship established between the end-system and the headend, we can still assume that the end-system applies the same orchestration period as UNI, and the BOM with phase aligned is detected. Then, according to Section 4, for the above incoming timeslot i, the ongoing sending timeslot j of NNI port, as well as the remaining time T_ij of timeslot j , can be deduced. An outgoing timeslot z of NNI, which offset o (>=1) timeslots from j, can be selected for the flow. That is, z = (j+o)%N_h2, where N_h2 is the number of timeslots in the orchestration period of NNI. Thus, on the headend H the residence delay obtained from the outgoing timeslot z is: Best Residence Delay = F + T_ij + (o-1)*L_h2 Worst Residence Delay = F + L_h1 + T_ij + o*L_h2 Average Residence Delay = F + T_ij + (L_h1 + (2o-1)*L_h2)/2 where, L_h1 is the timeslot length of UNI port, L_h2 is the timeslot length of NNI port. The best residence delay occurs when the flow arrived at the end of the ideal incoming timeslot i, and sent at the head of the outgoing timeslot z. The worst residence delay occurs when the flow arrived at the head of the ideal incoming timeslot i, and sent at the end of the outgoing timeslot z. The delay jitter within the headend is (L_h1 + L_h2). However, the jitter of the entire path is not the sum of the jitters of all nodes. Peng, et al. Expires 22 December 2024 [Page 18] Internet-Draft Timeslot Queueing and Forwarding June 2024 Note that there is a runtime jitter, as mentioned earlier, which depends on the deviation between the actual incoming timeslot i' and the ideal incoming timeslot i. Assuming that i = (i'+e)%N_h1, where e is the deviation, N_h1 is the number of timeslots in the orchestration period of UNI port, then the additional runtime jitter is e*L_h1, that should be carried in the packet to eliminate jitter at the network egress. 7.2. Residence Delay on the Transit Node On the transit node V, according to Section 4, for any given incoming timeslot i, the ongoing sending timeslot j of the outgoing port (port_v2), as well as the remaining time T_ij of timeslot j, can be deduced. An outgoing timeslot z of the outgoing port, which offset o (>=1) timeslots from j, can be selected for the flow. That is, z = (j+o)%N_v2, where N_v2 is the number of timeslots in the orchestration period of port_v2. Thus, on the transit node V the residence delay obtained from the outgoing timeslot z is: Best Residence Delay = F + T_ij + (o-1)*L_v2 Worst Residence Delay = F + T_ij + L_u2 + o*L_v2 Average Residence Delay = F + T_ij + (L_u2+(2o-1)*L_v2)/2 where, L_u2 and L_v2 is the timeslot length of port_u2 and port_v2 respectively. The best residence delay occurs when the flow is received at the end of the incoming timeslot i and sent at the head of the outgoing timeslot z. The worst residence delay occurs when the flow is received at the head of the incoming timeslot i and sent at the end of the outgoing timeslot z. The delay jitter within the node is (L_u2 + L_v2). However, the jitter of the entire path is not the sum of the jitters of all nodes. Peng, et al. Expires 22 December 2024 [Page 19] Internet-Draft Timeslot Queueing and Forwarding June 2024 7.3. Residence Delay on the Egress Node Generally, for the deterministic path carrying the DetNet flow, the flow needs to continue forwarding from the outgoing port of the egress node to the client side, and also faces the issues of queueing. However, the outgoing port facing the client side is the part of the overlay routing. It is possible to continue supporting TQF mechanism on that port. In this case, the underlay DetNet path will serve as a virtual link of the overlay path, providing a deterministic delay performance. Therefore, for the underlay deterministic paths, the residence dalay on the egress node is only contributed by the forwarding delay (F) including parsing, table lookup, internal fabric exchange, etc. 7.4. End-to-end Delay and Jitter Figure 5 shows that a path from headend P1 to endpoint E, for each node Pi, the timeslot length of the outgoing port is L_i, the intra- node forwarding delay is F_i, the remaining time of the mapped ongoing sending timeslot is T_i, the number of timeslots offset by the outgoing timeslot relative to the ongoing sending timeslot is o_i, especially on node P1 the timeslot length of UNI is L_h, then the end to end delay can be evaluted as follows (not including link propagation delay): Best E2E Delay = sum(F_i+T_i+o_i*L_i, for 1<=i<=n) - L_n + F_e Worst E2E Delay = sum(F_i+T_i+o_i*L_i, for 1<=i<=n) + L_h + F_e +---+ +---+ +---+ +---+ +---+ | P1| --- | P2| --- | P3| --- ... --- | Pn| --- | E | +---+ +---+ +---+ +---+ +---+ Figure 5: TQF Forwarding Path The best E2E delay occurs when the flow arrived at the end of the ideal incoming timeslot and sent at the head of the outgoing timeslot of each node pi. The worst E2E delay occurs when the flow arrived at the head of the ideal incoming timeslot and sent at the end of the outgoing timeslot of each node Pi. The E2E delay jitter is (L_h + L_n). Peng, et al. Expires 22 December 2024 [Page 20] Internet-Draft Timeslot Queueing and Forwarding June 2024 8. Flow States in Data-plane The headend of the path needs to maintain the timeslot resource information with the granularity of sub-burst of each flow, so that each sub-burst of the DetNet flow can access the mapped timeslot resources. However, the intermediate node does not need to maintain states per flow, but only access the timeslot resources based on the timeslot id carried in the packets. [I-D.pb-6man-deterministic-crh], [I-D.p-6man-deterministic-eh] defined methods to carry the stack of timeslot id in the IPv6 packets. 9. Queue Allocation Rule of Round Robin Queue TQF may organize its buffer resources in the form of fixed number of round robin queues. In this case, only on-time scheduling mode is considered. In-time scheduling mode may cause urgent and non urgent packets to be stored in the same queue. The number of round robin queues should be designed according to the number of timeslots included in the scheduling period. Each timeslot corresponds to a separate queue, in which the buffered packets must be able to be sent within a timeslot. The length of the queue, i.e., the total number of bits that can be sent for a timeslot, equals to the allocated bandwidth of the corresponding TQF instance (see Section 12) multiplied by the timeslot length. Figure 1 shows that the scheduling period actually instantiated on the data plane is not completely equivalent to the orchestration period. The scheduling period includes M timeslots (from 0 to M-1), while the orchestration period includes N timeslots (from 0 to N-1). N is an integer multiple of M. In the orchestration period, from timeslot 0 to M-1 is the first scheduling period, from timeslot M to slot 2M-1 is the second scheduling period, and so on. Therefore, it is necessary to convert the outgoing timeslot of the orchestration period to the target timeslot of the scheduling period, and insert the packet to the round robin queue corresponding to the target timeslot for transmission. A simple conversion method is: * target scheduling timeslot = outgoing timeslot % M Peng, et al. Expires 22 December 2024 [Page 21] Internet-Draft Timeslot Queueing and Forwarding June 2024 This is safe when o < M is always followed, where o is the number of offset timeslots between the outgoing timeslot z and the ongoing sending timeslot j (please refer to Section 7). Next, we briefly demonstrate that the sub-burst that arrives at the outgoing port during the ongoing sending timeslot (j) can be safely inserted into the corresponding queue in the scheduling period mapped by the outgoing timeslot z, and that queue will not overflow. Assuming that each timeslot in the orchestration period has a virtual queue, for example, termed the virtual queue corresponding to the outgoing timeslot z as queue-z, the packets that can be inserted into queue-z may only come from the following flows: During the ongoing sending timeslot j = (z-M+1+N)%N, the flows that arrive at the outgoing port, that is, these flows may consume the outgoing timeslot (z) according to o = M-1. During the ongoing sending timeslot j = (z-M+2+N)%N, the flows that arrive at the outgoing port, that is, these flows may consume the outgoing timeslot (z) according to o = M-2. ... ... During the ongoing sending timeslot j = (z-1+N)%N, the flows that arrive at the outgoing port, that is, these flows may consume the outgoing timeslot (z) according to o = 1; The total consumed burst resources of all these flows does not exceed the burst resource of the outgoing timeslot (z). Then, when the ongoing sending timeslot changes to z, queue-z will be sent and cleared. In the following time, starting from timeslot z+1 to the last timeslot N-1, there are no longer any packets inserted into queue-z. Obviously, this virtual queue is a great waste of queue resources. In fact, queue-z can be reused by the subsequent outgoing timeslot (z+M)%N. Namely: During the ongoing sending timeslot j = (z+1)%N, the flows that arrive at the outgoing port, that is, these flows may consume the outgoing timeslot (z+M)%N according to o = M-1. During the ongoing sending timeslot j = (z+2)%N, the flows that arrive at the outgoing port, that is, these flows may consume the outgoing timeslot (z+M)%N according to o = M-2. ... ... Peng, et al. Expires 22 December 2024 [Page 22] Internet-Draft Timeslot Queueing and Forwarding June 2024 During the ongoing sending timeslot j = (z+M-1)%N, the flows that arrive at the outgoing port, that is, these flows may consume the outgoing timeslot (z+M)%N according to o = 1. The total consumed burst resources of all these flows does not exceed the burst resource of the outgoing timeslot (z+M)%N. It can be seen that queue-z can be used by any outgoing timeslot (z+k*M)%N, where k is a non negative integer. By observing (z+k*M)%N, it can be seen that the minimum z satisfies 0<= z< M, that is, the entire orchestration period actually only requires M queues to store packets, which are the queues corresponding to M timeslots in the scheduling period. That is to say, the minimum z is the timeslot id in the scheduling period, while the outgoing timeslot (z+k*M)% N is the timeslot id in the orchestration period. The latter obtains the former by moduling M, which can then access the queue corresponding to the former. In short, the reason why a queue can store packets from multiple outgoing timeslots without being overflowed is that the packets stored in the queue earlier (more than M timeslots ago) have already been sent. For example, if the total length of all queues supported by the hardware is 4G bytes, the queue length corresponding to a timeslot of 10us at a port rate of 100G bps is 1M bits, then a maximum of 32K timeslot queues can be provided, and TQF scheduler can use some of the queue resources, e.g., M may be 1K queues to construct a scheduling periold with 10 ms, and the corresponding orchestration period may be several 10ms. 10. Queue Allocation Rule of PIFO Queue TQF may also organize its buffer resources in the form of PIFO queue. In this case, both in-time and on-time scheduling mode can be easily supported, because packets with different rank always ensure scheduling order. 10.1. PIFO with On-time Scheduling Mode In the case of on-time mode, the buffer cost of PIFO queue is the same as that of round robin queues. It can directly use the begin time of the outgoing timeslot z as the rank of the packet and insert the packet into the PIFO for transmission. * rank = z.begin Here, the outgoing timeslot z refers to the outgoing timeslot z that is after the arrival time at the scheduler and closest to the arrival time. Peng, et al. Expires 22 December 2024 [Page 23] Internet-Draft Timeslot Queueing and Forwarding June 2024 The rule of the on-time scheduling mode is that if the PIFO is not empty and the rank of the head of queue is equal to or earlier than the current system time, the head of queue will be sent; otherwise, not. 10.2. PIFO with In-time Scheduling Mode In the case of in-time mode, the buffer cost of PIFO queue is generally larger than that of on-time mode due to burst accumulation. [SP-LATENCY] provides guidance for evaluating excess buffer requirements. Similar to Section 10.1, it can directly use the begin time of the outgoing timeslot z as the rank of the packet and insert the packet into the PIFO for transmission. However, due to in-time scheduling behavior, the outgoing timeslot z may not be the outgoing timeslot z that is after the arrival time at the scheduler and closest to the arrival time, instead, it may be an outgoing timeslot z far away from the arrival time. A time deviation (E) may be carried in the packet to help determine the outgoing timeslot z. On the headend node: * E initially equals to the begin time of the ideal incoming timeslot minus the actual departure time from the regulator. * Use the result of "departure time + E" (note that it is just the begin time of the ideal incoming timeslot, and the main purpose here is to describe how E works) to determine the expected outgoing timeslot z that is after this result and closest to this result. * rank = z.begin * When the packet leaves the headend, E is updated to z.begin minus the actual sending time from the PIFO. The updated E will be carried in the sending packet. On the transit node: * Obtain E from the received packet. * Use the result of "arrival time + E" to determine the expected outgoing timeslot z that is after this result and closest to this result. Here, the arrival time is the time that the packet arrived at the scheduler. Peng, et al. Expires 22 December 2024 [Page 24] Internet-Draft Timeslot Queueing and Forwarding June 2024 * rank = z.begin * When the packet leaves the headend, E is updated to z.begin minus the actual sending time from the PIFO. The updated E will be carried in the sending packet. The rule of the in-time scheduling mode is that as long as the PIFO is not empty, packets are always obtained from the head of queue for transmission. In summary, the in-time scheduling with the help of time deviation (E), can suffer from the uncertainty caused by burst accumulation, and it is recommended only deployed in small networks, i.e., a limited domain with a small number of hops, where the burst accumulation issue is not serious; The on-time scheduling is recommended to be used in large networks. 11. Global Timeslot ID The outgoing timeslots discussed in the previous sections are local timeslots style for all nodes. This section discusses the situation based on global timeslot style. Global timeslot style refers to that all nodes in the path are identified with the same timeslot id, which of course requires all nodes to use the same timeslot length. There is no need to establish FTM for the DetNet flow on each node or carry FTM in packets. The packet only needs to carry the unique global timeslot id. However, the disadvantage is that the latency performance of the path may be large, which depends on BOM between the adjacent nodes. Another disadvantage is that the success rate of finding a path that matches the service requirements is not as high as local timeslot style. Global timeslot style requires that the orchestration period is equal to the scheduling period, mainly considering that arrival packets with any global timeslot id can be successfully inserted into the corresponding queue without overflow. However, as the ideal design goal is to keep the scheduling period less than the orchestration period, further research is needed on other methods (such as basically aligning orchestration period between nodes), to ensure that packets with any global timeslot id can queue normally when the scheduling period is less than the orchestration period. Compared to the local timeslot style, global timeslot style means that the incoming timeslot i must map to the outgoing timeslot i too. Peng, et al. Expires 22 December 2024 [Page 25] Internet-Draft Timeslot Queueing and Forwarding June 2024 As the example shown in Figure 6, each orchestration period contains 6 timeslots. Node V has three connected upstream nodes U1, U2, and U3. During each hop forwarding, the packet accesses the outgoing timeslot corresponding to the global timeslot id and forwards to the downstream node with the global timeslot id unchanged. For example, U1 sends some packets with global slot-id 0, termed as g0, in the outgoing timeslot 0. The packets with other global slot-id 1~5 are similarly termed as g1~g5 respectively. The figure shows the scheduling results of these 6 batches of packets sent by upstream nodes when node V continues to send them. 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ U1 | g0| g1| g2| | | | | | | | | | | | | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ U2 | | | g3| g4| | | | | | | | | | | | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ U3 | g5| | | | | | | | | | | | | | | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ V | | | | g3| g4| g5| g0| g1| g2| | | | | | | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ Figure 6: Global Timeslot Style Example In this example: * BTM between the outgoing timeslot of U1 and the ongoing sending timeslot of V is i -> i, so the global outgoing timeslot for the incoming timeslot i is i+6 (i.e., belongs to next round of orchestration periold). * BTM between the outgoing timeslot of U2 and the ongoing sending timeslot of V is i -> i-1, so the global outgoing timeslot for the incoming timeslot i is i (i.e., belongs to current round of orchestration periold). Peng, et al. Expires 22 December 2024 [Page 26] Internet-Draft Timeslot Queueing and Forwarding June 2024 * BTM between the outgoing timeslot from U3 and the ongoing sending timeslot of V is i -> i+1, so the global outgoing timeslot for the incoming timeslot i is i+6 (i.e., belongs to next round of orchestration periold). It can be seen that packets from U1 and U3 has large residency delay in the node V, while packets from U2 has small residency delay in the node V. It should be noted that if round robin queue is used, for the orginal BTM i -> i (example of U1), or i -> i+1 (example of U3), the packets need to be stored in a buffer prior to the TQF scheduler (such as the buffer on the input port side) for a fixed latency (such as serveral timeslots) and then released to the scheduler. Otherwise, directly inserting the queue may cause jitter, i.e., part of the packets belonging to the same incoming timeslot i can be sent in the outgoing timeslot i, while the other part of the packets has to be delayed to be sent in the next round of timeslot i. This fixed-latency buffer is only introduced for specific upstream nodes. It can be determined according to the initial detection result of BTM between the adjacent nodes. If the original BTM is i -> i or i -> i+1, it needed, otherwise not. After the introduction of fixed-latency buffer, the new detection result of BTM will no longer be i -> i, or i -> i+1. If PIFO queue is used, there is no need to introduce a fixed-latency buffer because in this case, rank = i.begin + OPL, and it will not be scheduled to be sent in the current outgoing timeslot i, but in the next round. However, in this case the PIFO itself serves as a fixed- latency buffer. For the headend, the residence delay is similar to Section 7.1. For a flow which has the ideal incoming timeslot i, it may select a global outgoing timeslot z based on the BTM i -> j, where, j is the ongoing sending timeslot of the ougtoing port, and the timeslot offset o equals (N+z-j)%N. For transit nodes, the residence delay is similar to Section 7.2, in addition to considering possible latency contributed by the above fixed-latency buffer. For a flow with the global incoming timeslot z, it still select the global outgoing timeslot z based on the BTM z -> j, where, j is the ongoing sending timeslot of the ougtoing port, and the timeslot offset o equals (N+z-j)%N. The end-to-end delay equation is similar to Section 7.4, in addition to considering possible cumulated latency contributed by the above fixed-latency buffer. Peng, et al. Expires 22 December 2024 [Page 27] Internet-Draft Timeslot Queueing and Forwarding June 2024 12. Multiple Orchestration Periods A single orchestration period may not be able to cover a wide range of service needs, such as some with a burst interval of microseconds, while others have a burst interval of minutes or even larger. When using a single orchestration period to simultaneously serve these services, the timeslot length must be microseconds, but the orchestration period length is minutes or more, resulting in the need to include a large number of timeslots in the orchestration period. The final result is a proportional increase in the buffer size required for the scheduling period. Multiple orchestration periods each with different length may be provided by the network. A TQF enabled link can be configured with multiple TQF scheduling instances each corresponding to specific orchestration period length. For simplicity, the orchestration period length itself can be used to identify a specific instance. For example, one orchestration period length is 300 us, termed as OPL-300us, which is the LCM of the burst interval of the set of flows served. Another orchestration period length is 100 ms, termed as OPL-100ms, which is the LCM of the burst interval of another set of flows served. Each orchestration period instance has its own timeslot length. The timeslot length of a long orchestration period instance should be longer than that of a short orchestration period instance, and the former is an integer multiple of the latter. But the long orchestration period itself may not necessarily be an integer multiple of the short orchestration period. As shown in Figure 7, both link-a and link-b are configured with n orchestration period instances, with the corresponding orchestration period lengths OPL_1, OPL_2, ..., OPL_n in ascending order. For each orchestration period length OPL_i, the dedicated bandwidth resource is BW_U_i for node U (or BW_V_i for node V), and the timeslot length is TL_U_i for node U (or TL_V_i for node V). For each TQF enabled link, the sum of dedicated bandwidth resources of all TQF scheduling instances must not exceed the total bandwidth of the link. Peng, et al. Expires 22 December 2024 [Page 28] Internet-Draft Timeslot Queueing and Forwarding June 2024 +---+ link-a +---+ link-b +---+ | U | -------------------- | V | -------------------- | W | +---+ +---+ +---+ OPL_1: OPL_1: TL_U_1 TL_V_1 BW_U_1 BW_V_1 OPL_2: OPL_2: TL_U_2 TL_V_2 BW_U_2 BW_V_2 ... ... ... ... OPL_n: OPL_n: TL_U_n TL_V_n BW_U_n BW_V_n Figure 7: Multiple TQF Instances Due to the fact that long orchestration periods serve DetNet flows with large burst intervals, for a given burst size, the larger the burst interval, the less bandwidth consumed by the DetNet flow. Therefore, it is recommended that the bandwidth resources of the long orchestration period is less than that of the short orchestration period, which is beneficial for reducing the buffer required for long orchestration period. Interworking between different nodes is based on the same orchestration period. That means that the timeslot mapping described in Section 4 should be maintained in the context of the specific orchestration period. The orchestration period length should be carried in the forwarding packets to let the DetNet flow to consume the timeslot resources corresponding to the TQF scheduling instance. If round robin queues are used, each TQF scheduling instance has its own separate queue set. Time division multiplexing scheduling is based on the granularity of the minimum timeslot length of all instances. Within each time unit of this granularity, the queues in the sending state of all instances are always scheduled in the order of OPL_1, OPL_2, ..., OPL_n. If PIFO queue is used, all TQF scheduling instances may share a single PIFO queue. An implementation may use rank (i.e., the beginning of the outgoing timeslot z) plus timeslot length to determine the insertion order of two packets from different instances, so that the packet from the short orchstration period inserted at the front. Peng, et al. Expires 22 December 2024 [Page 29] Internet-Draft Timeslot Queueing and Forwarding June 2024 13. Admission Control on the Headend On the network entry, traffic regulation must be performed on the incoming port, so that the DetNet flow does not exceed its T-SPEC such as burst interval, burst size, maximum packet size, etc. This kind of regulation is usually the shaping using leaky bucket combined with the incoming queue that receives DetNet flow. A DetNet flow may contain discrete multiple sub-bursts within its periodic burst interval. The leaky bucket depth should be larger than the maximum packet size, and should be consistent with the reserved burst resources required for the maximum sub-burst. The scheduling mechanism described in this document has a requirement on the arrival time of DetNet flows on the network entry. It is expected that the distribution of sub-bursts (after regulation) of the DetNet flow will always appear in an ideal incoming timeslot of UNI port. Based on this ideal position, an ideal outgoing timeslot is selected. For a single DetNet flow, the network entry may maintain multiple forwarding states each containing , due to many sub-bursts within the service burst interval. For example, the network entry may maintain up to 3 sub-burst forwarding states for a flow. Ideally, all packets of this flow are split into 3 sub-bursts after regulation, each sub-burst matching one of the states. Here, 3 is the maximum sub-bursts for this flow, and it does not always contain so many bursts within the burst interval during actual sending. For a specific sub-burst, some amount of deviation (i.e., the deviation between the actual incoming timeslot and the ideal incoming timeslot) is permitted. Generally, the headend will select an ideal incoming timeslot closet to the actual incoming timeslot for the packet. For on-time scheduling, the position deviation should not exceed o-1 for late arrival case, or M-o-1 for early arrival case, where o is the offset between the outgoing timeslot and ongoing sending timeslot as mentioned above. Intuitively, large o can tolerate large late arrival deviations, while small o (or large M even for large o) can tolerate large early arrival deviations. This position deviation limitation is beneficial for on-time scheduling, to achieve the ideal design goal that scheduling period is smaller than the orchestration period, and packets can always be successfully inserted into the scheduling queue (RR or PIFO) without overflow. For example, there may contain one or more scheduling periods between the departure time (from the regulator) and the Peng, et al. Expires 22 December 2024 [Page 30] Internet-Draft Timeslot Queueing and Forwarding June 2024 choosed ideal incoming timeslot, and therefore there is an overflow risk when inserting packets into the queue based on the corresponding ideal outgoing timeslot z at the departue time. Otherwise, for randomly arriving DetNet flows, it can be supported by taking a large M (or even M = N) (option-1) to accommodate random arrival, or it can be supported by introducing an explicit buffer put before the scheduler on the network entry to let the arrival time always meet the position deviation limitation (option-2). * Note that due to randomness of arrival time, the packet may just miss the scheduling (or arrive too earlier) and need to wait in the scheduling queue (in the case of option-1) or the explicit buffer (in the case of option-2) for the next orchestration period. For in-time scheduling, the position deviation should not exceed o-1 for late arrival case. We only focus on late arrivals here, as in- time scheduling naturally handles early arrivals. If the late arrival exceed the above limitation, the sub-burst may need to be sent during the next orchestration period in the worst case, or may be lucky to be scheduled immediately. Note that the position deviation is a runtime latency during forwarding, whether using PIFO or RR. It should be carried in the packet to eliminate jitter at the network egress on demand. Please refer to [I-D.peng-detnet-policing-jitter-control] for the elimination of jitter caused by policing delay on the network entrance node. The runtime position deviation should be considered as a part of policing delay. 14. Frequency Synchronization The basic explanation for frequency synchronization is that the crystal frequency of the hardware is consistent, which enables all nodes in the network to be in the same inertial frame and have the same time lapse rate. This is a prerequisite for TQF mechanism. The related frequency synchronization mechanisms, such as IEEE 1588-2008 Precision Time Protocol (PTP) [IEEE-1588] and synchronous Ethernet (syncE) [syncE], are not within the scope of this document. Sometimes, people also refer to the frequency asynchrony as the timeslot rotation frequency difference caused by different node configurations with different timeslot lengths. This document supports the interconnection between nodes with this type of frequency asynchrony. Peng, et al. Expires 22 December 2024 [Page 31] Internet-Draft Timeslot Queueing and Forwarding June 2024 15. Evaluations This section gives the evaluation results of the TQF mechanism based on the requirements that is defined in [I-D.ietf-detnet-scaling-requirements]. Peng, et al. Expires 22 December 2024 [Page 32] Internet-Draft Timeslot Queueing and Forwarding June 2024 +======================+============+===============================+ | Requirements | Evaluation | Notes | +======================+============+===============================+ | 3.1 Tolerate Time | Yes | No time sync needed, only need| | Asynchrony | | frequency sync (3.1.3). | +----------------------+------------+-------------------------------+ | 3.2 Support Large | | The timeslot mapping covers | | Single-hop | Yes | any value of link propagation | | Propagation | | delay. | | Latency | | | +----------------------+------------+-------------------------------+ | 3.3 Accommodate the | | The higher the service rate, | | Higher Link | Partial | the more buffer needed for the| | Speed | | same timeslot length. | +----------------------+------------+-------------------------------+ | 3.4 Be Scalable to | | Multiple OPL instance, each | | the Large Number | | for a set of serivce flows, | | of Flows and | | without overprovision. | | Tolerate High | | Utilization may reach 100% | | Utilization | Yes | link bandwidth. | | | | The unused bandwidth of the | | | | timeslot can be used by | | | | best-effot flows. | | | | Calculating paths is NP-hard. | +----------------------+------------+-------------------------------+ | 3.5 Tolerate Failures| | Independent of queueing | | of Links or Nodes| N/A | mechanism. | | and Topology | | | | Changes | | | +----------------------+------------+-------------------------------+ | 3.6 Prevent Flow | | Flows are isolated from each | | Fluctuation | Yes | other through timeslots. | +----------------------+------------+-------------------------------+ | 3.7 Be scalable to a | | E2E latency is liner with hops| | Large Number of | | , from ultra-low to low | | Hops with Complex| Yes | latency by multiple OPL. | | Topology | | E2E jitter is low by on-time | | | | mode. | | | | Calculating paths is NP-hard. | +----------------------+------------+-------------------------------+ | 3.8 Support Multi- | | Independent of queueing | | Mechanisms in | N/A | mechanism. | | Single Domain and| | | | Multi-Domains | | | +----------------------+------------+-------------------------------+ Figure 8: Evaluation for Large Scaling Requirements Peng, et al. Expires 22 December 2024 [Page 33] Internet-Draft Timeslot Queueing and Forwarding June 2024 15.1. Examples This section will describe the example of how the TQF mechanism supports DetNet flows with different latency requirements. As shown in Figure 9: * Network transmission capacity: each link has rate 10 Gbps. Assuming the service rate of TQF scheduler allocate the total port bandwidth. * TSpec of each flow, maybe: - burst size 1000 bits, SBI 1 ms, and average arrival rate 1 Mbps. - or, burst size 1000 bits, SBI 100 us, and average arrival rate 10 Mbps. - or, burst size 1000 bits, SBI 100 us, and average arrival rate 100 Mbps. - or, burst size 10000 bits, SBI 10 ms, and average arrival rate 1 Mbps. - or, burst size 10000 bits, SBI 1 ms, and average arrival rate 10 Mbps. - or, burst size 10000 bits, SBI 100 us, and average arrival rate 100 Mbps. * RSpec of each flow, maybe: - E2E latency 100us, and E2E jitter less than 10us or 100us. - or, E2E latency 200us, and E2E jitter less than 20us or 200us. - or, E2E latency 300us, and E2E jitter less than 30us or 300us. - or, E2E latency 400us, and E2E jitter less than 40us or 400us. - or, E2E latency 500us, and E2E jitter less than 50us or 500us. - or, E2E latency 600us, and E2E jitter less than 60us or 600us. - or, E2E latency 700us, and E2E jitter less than 70us or 700us. - or, E2E latency 800us, and E2E jitter less than 80us or 800us. Peng, et al. Expires 22 December 2024 [Page 34] Internet-Draft Timeslot Queueing and Forwarding June 2024 - or, E2E latency 900us, and E2E jitter less than 90us or 900us. - or, E2E latency 1000us, and E2E jitter less than 100us or 1ms. @ # $ @ # $ v v v +---+ @@@ +---+ ### +---+ $$$ &&& +---+ src ---> | 0 o-----| 1 o-----| 2 o---- ... ... ----| 9 o----> dest (flow i:*)+---+ *** +---+ *** +---+ *** *** +---+ *** | |@ |# |& | |@ |# |& | |v |v |v +---+ +---+ +---+ +---+ --- | | --- | | --- | | --- ... ... --- | | --- +---+ +---+ +---+ +---+ | | | | | | | | ... ... ... ... ... ... ... ... Figure 9: Common Topology Example For the observed flow i (marked with *), its TSpec and RSpec may be any of the above. Assuming that the path calculated by the controller for the flow i passes through 10 nodes (i.e., node 0~9). Especially, at each hop, flow i may conflict with other competitive flows, also with similar TSpec and RSpec as above, originated from other sources, e.g, competing with flow-set "@" at node 0, competing with flow-set "#" at node 1, etc. For each link along the path, it may configure OPL-10ms instance with dedicated bandwidth 10 Gbps, containing 1000 timeslots each with length 10us. Assuming no link propagation delay and intra node forwarding delay, if flow i consumes outgoing timeslot by o=1, it can ensure an E2E latency of 100us (i.e., o * TL * 10 hops), and jitter of 20us(on-time mode) or 100us (in-time mode). The consumption by other o values is similar. The table below shows the possible supported service scales. As flows arrived synchronously, the consumption of each timeslot in the orchestration period may be caused by any value of o. For example, if the ideal incoming timeslots of all flows are perfectly interleaved, then they can all consume timeslots by o=1 to get per- hop latency 10us, or all consume timeslots by o=2 to get per-hop latency 20us, etc. However, due to the fixed length of OPL, after all timeslot resources are exhausted by specific o value, it means that there are no timeslot resources used by other o values. Another Peng, et al. Expires 22 December 2024 [Page 35] Internet-Draft Timeslot Queueing and Forwarding June 2024 example is that the ideal incoming timeslots of all flows are the same, then some of them consume timeslots by o=1, some consume timeslots by o=2, and so on. In either case, the total service scale is OPL * C / burst_size, that is composed of sum(s_i), where s_i is the service scale for o = i. The table provides the total scale and the average scale corresponding to each o value. Note that in the table each column only shows the data where all flows served based on all o values have the same TSpec (e.g, in the first colunm, TSpec per flow is burst size 1000 bits and arrival rate 1 Mbps), while in reality, flows served based on different o values generally have different TSpec. It is easy to add colunms to describe various combinations. =================================================== | o=1| o=2| o=3| o=4| o=5| o=6| o=7| o=8| o=9|o=10| =================================================================== |TSpec: | total = 10000 | | 1000 bits |----+----+----+----+----+----+----+----+----+----| | SBI 1 ms |1000|1000|1000|1000|1000|1000|1000|1000|1000|1000| =================================================================== |TSpec: | total = 1000 | | 1000 bits |----+----+----+----+----+----+----+----+----+----| | SBI 100 us | 100| 100| 100| 100| 100| 100| 100| 100| 100| 100| =================================================================== |TSpec: | total = 100 | | 1000 bits |----+----+----+----+----+----+----+----+----+----| | SBI 10 us | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | =================================================================== |TSpec: | total = 10000 | | 10000 bits |----+----+----+----+----+----+----+----+----+----| | SBI 10 ms |1000|1000|1000|1000|1000|1000|1000|1000|1000|1000| =================================================================== |TSpec: | total = 1000 | | 10000 bits |----+----+----+----+----+----+----+----+----+----| | SBI 1 ms | 100| 100| 100| 100| 100| 100| 100| 100| 100| 100| =================================================================== |TSpec: | total = 100 | | 10000 bits |----+----+----+----+----+----+----+----+----+----| | SBI 100 us | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | =================================================================== Figure 10: Timeslot Reservation and Service Scale Example Peng, et al. Expires 22 December 2024 [Page 36] Internet-Draft Timeslot Queueing and Forwarding June 2024 16. Taxonomy Considerations [I-D.ietf-detnet-dataplane-taxonomy] provides criteria for classifying data plane solutions. TQF is a periodic, frequency synchronous, class level, work-conserving/non-work-conserving configurable, in-time/on-time configurable, time based solution. * Periodic: Periodicity of TQF contains two characteristics, the first is that there is a time period P (i.e., orchestration periold) containing multiple time slots, and the second is that a flow is assigned repeatly to a particular set of time slots in the period. * Frequency synchronous: TQF requires frequency synchronization (i.e., crystal frequency of the hardware) so that all nodes in the network have the same time lapse rate. TQF does not require different nodes to use the same timeslot length. * Class level: DetNet Flows may be grouped by similar service requirements, i.e., timeslot id(s), on the network entrance. Packets will be provided TQF service based on timeslot id(s), without checking flow characteristic. * Work-conserving/non-work-conserving configurable: The TQF scheduler configured with in-time scheduling mode is work- conserving (i.e., to send the packet as soon as possible before its outgoing timeslot), while the TQF scheduler configured with on-time scheduling mode is non work-conserving (i.e., to ensure that the packet can always be sent within its outgoing timeslot). * In-time/on-time configurable: The TQF scheduler configured with in-time scheduling mode is in-time to get bounded end-to-end latency, while the TQF scheduler configured with on-time scheduling mode is on-time to get bounded end-to-end delay jitter. * Time based: A DetNet flow is scheduled based on its expected outgoing timeslot(s). All DetNet flows are interleaved and arranged in different timeslots to obtain the maximum number of admission flows. In addition, the per hop latency dominant factor of TQF is the offset between incoming timeslot and outgoing timeslot that is asigned to the flow. 17. IANA Considerations TBD. Peng, et al. Expires 22 December 2024 [Page 37] Internet-Draft Timeslot Queueing and Forwarding June 2024 18. Security Considerations Security considerations for DetNet are described in detail in [RFC9055]. General security considerations for the DetNet architecture are described in [RFC8655]. Considerations specific to the DetNet data plane are summarized in [RFC8938]. Adequate admission control policies should be configured in the edge of the DetNet domain to control access to specific timeslot resources. Access to classification and mapping tables must be controlled to prevent misbehaviors, e.g, an unauthorized entity may modify the table to map traffic to an unallowed timeslot resource, and competes and interferes with normal traffic. 19. Acknowledgements TBD. 20. References 20.1. Normative References [I-D.chen-detnet-sr-based-bounded-latency] Chen, M., Geng, X., Li, Z., Joung, J., and J. Ryoo, "Segment Routing (SR) Based Bounded Latency", Work in Progress, Internet-Draft, draft-chen-detnet-sr-based- bounded-latency-03, 7 July 2023, . [I-D.eckert-detnet-tcqf] Eckert, T. T., Li, Y., Bryant, S., Malis, A. G., Ryoo, J., Liu, P., Li, G., Ren, S., and F. Yang, "Deterministic Networking (DetNet) Data Plane - Tagged Cyclic Queuing and Forwarding (TCQF) for bounded latency with low jitter in large scale DetNets", Work in Progress, Internet-Draft, draft-eckert-detnet-tcqf-05, 5 January 2024, . [I-D.ietf-detnet-dataplane-taxonomy] Joung, J., Geng, X., Peng, S., and T. T. Eckert, "Dataplane Enhancement Taxonomy", Work in Progress, Internet-Draft, draft-ietf-detnet-dataplane-taxonomy-00, 24 May 2024, . Peng, et al. Expires 22 December 2024 [Page 38] Internet-Draft Timeslot Queueing and Forwarding June 2024 [I-D.ietf-detnet-scaling-requirements] Liu, P., Li, Y., Eckert, T. T., Xiong, Q., Ryoo, J., zhushiyin, and X. Geng, "Requirements for Scaling Deterministic Networks", Work in Progress, Internet-Draft, draft-ietf-detnet-scaling-requirements-06, 22 May 2024, . [I-D.p-6man-deterministic-eh] Peng, S., "Deterministic Source Route Header", Work in Progress, Internet-Draft, draft-p-6man-deterministic-eh- 00, 20 June 2024, . [I-D.pb-6man-deterministic-crh] Peng, S. and R. Bonica, "Deterministic Routing Header", Work in Progress, Internet-Draft, draft-pb-6man- deterministic-crh-00, 29 February 2024, . [I-D.peng-detnet-policing-jitter-control] Peng, S., Liu, P., and K. Basu, "Policing Caused Jitter Control Mechanism", Work in Progress, Internet-Draft, draft-peng-detnet-policing-jitter-control-00, 18 January 2024, . [I-D.peng-lsr-deterministic-traffic-engineering] Peng, S., "IGP Extensions for Deterministic Traffic Engineering", Work in Progress, Internet-Draft, draft- peng-lsr-deterministic-traffic-engineering-01, 4 July 2023, . [I-D.xp-ippm-detnet-stamp] Min, X., Peng, S., and X. He, "STAMP Extensions for DetNet", Work in Progress, Internet-Draft, draft-xp-ippm- detnet-stamp-00, 19 June 2024, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . Peng, et al. Expires 22 December 2024 [Page 39] Internet-Draft Timeslot Queueing and Forwarding June 2024 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC8655] Finn, N., Thubert, P., Varga, B., and J. Farkas, "Deterministic Networking Architecture", RFC 8655, DOI 10.17487/RFC8655, October 2019, . [RFC8938] Varga, B., Ed., Farkas, J., Berger, L., Malis, A., and S. Bryant, "Deterministic Networking (DetNet) Data Plane Framework", RFC 8938, DOI 10.17487/RFC8938, November 2020, . [RFC9055] Grossman, E., Ed., Mizrahi, T., and A. Hacker, "Deterministic Networking (DetNet) Security Considerations", RFC 9055, DOI 10.17487/RFC9055, June 2021, . 20.2. Informative References [ATM-LATENCY] "Bounded Latency Scheduling Scheme for ATM Cells", 1999, . [CQF] "Cyclic queueing and Forwarding", 2017, . [ECQF] "Enhancements to Cyclic Queuing and Forwarding", 2023, . [IEEE-1588] "IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems", 2008, . [SP-LATENCY] "Guaranteed Latency with SP", 2020, . [syncE] "Timing and synchronization aspects in packet networks", 2013, . [TAS] "Time-Aware Shaper", 2015, . Peng, et al. Expires 22 December 2024 [Page 40] Internet-Draft Timeslot Queueing and Forwarding June 2024 Authors' Addresses Shaofu Peng ZTE China Email: peng.shaofu@zte.com.cn Peng Liu China Mobile China Email: liupengyjy@chinamobile.com Kashinath Basu Oxford Brookes University United Kingdom Email: kbasu@brookes.ac.uk Aihua Liu ZTE China Email: liu.aihua@zte.com.cn Dong Yang Beijing Jiaotong University China Email: dyang@bjtu.edu.cn Guoyu Peng Beijing University of Posts and Telecommunications China Email: guoyupeng@bupt.edu.cn Peng, et al. Expires 22 December 2024 [Page 41]