Network freak: MPLS Guide

Introduction

Multi-Protocol Label Switching is a highly scalable, protocol agnostic, data-carrying mechanism.
Classic way of routing packets around the network, requires routing table lookups on each hop, since every node in the network makes it's own decision regarding incoming packets.

In MPLS enabled network, packets are grouped into flows (FECs), which traverse the network in tunnels (LSPs) - routing lookups are made only on the edges of the network - "core" of the network, makes only <if label x, then interface x> decisions, which require a lot less resources and can be easily implemented in hardware.
Since existing packet's headers are irrelevant (decisions are based on attached labels) almost any kind of traffic can be tunneled.

Basic definitions:

Labels - applied to packets before IP header - allow for layer2-like switching. Labels are assigned in upstream direction starting from last router in a given LSP. Labels are of local significance only.

MPLS Label

MPLS Special Use Label :
Label valuse 0-15 are reserved for special uses :

                    0 - IPv4 Explicit Null
                    1 - Router Alert
                    2 - IPv6 Explicit Null
                    3 - Implicit Null
                    4-15 - Reserved for future use.

Label Switched Path (LSP) - unidirectional path carrying label-tagged packets. 253 hops at maximum (TTL can't be higher than 255). Set-up in downstream direction.

Label Switching Router (LSR) - router with interfaces configured for MPLS

Ingress - beginning of an LSP. Performs routing lookup and label push (adding labels).

Transit - every LSR between ingress and egress. Performs label swap (replacing labels).

Penultimate - second to last LSR in LSP. Performs PHP by default.

Egress - end of an LSP. Performs routing lookup. If PHP disabled, also label popping (removing labels).

Penultimate Hop Popping (PHP) - Label popping performed by penultimate instead of egress LSR. Used for scaling and efficiency reasons - performing popping and IP lookup at the same time might be taxing on hardware in case of many LSPs. Nowadays, used to achieve uniform MPLS network, edge-to-edge QoS and VPN applications. Egress router advertising label 3 (implicit-null) means "do PHP", label 0 (explicit-null) means "do label swap".

PHP Before Implicit Null - source alcatel-lucent.com

PHP Implicit Null - source : alcatel-lucent.com

PHP Explicit Null - source : alcatel-lucent.com

Traffic Engineering (TE) - ability to influence the process of setting up LSPs. Without TE, IGP has complete control on which path through the network LSP takes. With TE, administrator can specify constraints as to which links will suit the LSP best

Label Distribution

Types of label distribution:

Static - labels assigned by hand on each LSR

LDP - fully automatic label allocation without traffic engineering

RSVP - dynamic allocation protocol with traffic engineering

Typical applications consist of LDP configured on every router in the domain and used to forward bulk of the traffic, and RSVP used to set-up LSPs for traffic requiring special treatment.

LDP

Protocol made specifically for MPLS. No traffic engineering support - routing decisions are IGP based only, hence no routing capabilities - used for label assigning and distribution.
Configured interfaces are automatically advertised into LDP domain with corresponding labels and are spread throughout the domain on hop-by-hop basis. This way, every pair of routers becomes ingress and egress points for a full mesh of LSPs. Besides turning on the LDP on chosen interfaces, no additional configuration is required.
Session between neighboring routers is initiated with a Hello mechanism. LDP Hello messages are UDP packets sent out all interfaces to the well-known 224.0.0.2/32 address.

Most important fields of Hello message are:

LDP Hello Message

LDP ID - defines router's label space: source_RID:label_allocation_method. Allocation methods are:

0 - per router (Junos default)

1 - per interface

IPv4 Transport Address - IP address for establishing LDP session. By default - loopback. When empty, source address of the message is used.

A single session between neighboring routers is established for each advertised label space, regardless the number of interconnected interfaces.
Once the Hellos are exchanged, routers switch to TCP port 646 and exchange LDP Initialize messages - router with higher RID begins the exchange. Most important fields are hold timers and a method of advertising MPLS labels:

MPLS Label Distribution Options

Downstream Unsolicited - router may send label values upstream at any time after setting up the LDP session.

Downstream on Demand - router may send label values only after receiving a request from an upstream router (Junos default).

Once the session is established, information is exchanged with the use of:

LDP Address Message - contains IP addresses of all interfaces on which LDP configured

LDP Label Mapping Message - contains Forwarding Equivalence Class (prefix) and associated label value. By default, Junos routers advertise only its loopback addresses. Every router in upstream direction replaces the value with one locally allocated. This way, all routers in LDP domain know each other's loopbacks and have "next hop" label values associated with them.

LDP Label Withdraw Message - identical to Label Mapping Message but used for withdrawing FECs.

IGP takes care of loop prevention. That's why every received FEC has to be in the routing table for LDP based forwarding to work.

Tunneling

It's possible to tunnel LDP LSPs inside an RSVP LSP - useful solution when we need to join physically separate LDP domains or we just need benefits of RSVP traffic engineering capabilities without the hassles of setting up all LSPs by hand. Configuring ldp-tunneling command on RSVP LSP connected edge routers triggers them to send targeted hello messages (destined to a particular router, not to the 224.0.0.2/32 address) over that LSP and allows for remote LDP session creation.

RSVP-TE

RSVP was originally designed to allow end-users reservation of network resources. Since it's not something you want to do on the Internet (potential for abuse is significant) it hasn't become very popular. It suits the MPLS flows quite well though, so it was recycled and modified to suit MPLS applications - new version was called RSVP-TE (Resource Reservation Protocol - Traffic Engineering).
In contrast to LDP - RSVP isn't fully automatic - it creates only simplex LSPs between configured ingress-egress router pair. You need to set-up another LSP in the other direction for full-duplex connectivity. It allows however for significant (and dynamic) control over the routes MPLS packets will go - unlike LDP which completely relies on IGP.

LSPs are set up using:

Path and Resv Messages - source : cisco.com

Path message - sent downstream by the ingress router. Contains details of the LSP and sets soft state on each passed node for this particular LSP.

Resv message - sent upstream by the egress router (over the same hops as the Path message). More soft state info is added and when the Resv message reaches the ingress router - LSP is up.

LSP is active as long as Hello messages (sent every 9 seconds) are received. If not, LSP is teared down after 63 seconds. If Hello extensions are not supported, Path and Resv messages are used (sent every 30 seconds).

Other RSVP messages:

PathTear - sent downstream, used for removing Path message soft states - be it because LSP is no longer needed or an error occurred.

ResvTear - sent upstream, used for removing Resv message soft states - be it because LSP is no longer needed or an error occurred.

PathErr - sent upstream, used for signaling an error, no soft states removed.

ResvErr - sent downstream, used for signaling an error, no soft states removed.

ResvConf - sent downstream, a confirmation of receiving Resv message.

To mitigate already significant overhead in comparison to LDP, messages for different LSPs are bundled together through message aggregation (up to 30 messages).
Commonly used objects in RSVP messages:

Session - contains ingress and egress router's IP addresses, and a Tunnel ID which uniquely identifies given LSP. Identifies current session, present in every message.

Hop - IP address of a neighbor. Present in Path and Resv messages.

Time - soft state refresh timer. By default 30 seconds in Junos. Present in Path and Resv messages.

Error - reason why a PathErr or ResvErr message was generated.

Style - reservation model:

Fixed Filter - every LSP requires reservation of resources - i.e. even if 2 LSPs begin and end on the same ingress and egress routers, when they share a link, reservations are separate. Default in Junos.

Shared Explicit - when multiple LSPs of the same session (the same ingress and egress routers) cross a shared link, only the highest bandwidth requirement among them is reserved. Used mainly for setting up backup LSPs.

Flow - bandwidth requirements of a LSP

Tspec - same as flow - contains bandwidth requirements of a LSP.

Filter - contains LSP ID uniquely identifying given LSP.

Sender-template - same as filter - contains LSP ID.

Label - contains labels advertised in upstream direction to a neighboring router.

Label Request Object (LRO) - request for a label (by default, labels are not sent upstream if no request is received).

Explicit Route Object (ERO) - allows Path messages to transit network independently of IGP. Following routing constraints are available:

Loose hops - specifies routers that LSP has to cross. Besides that, route is up to IGP.

Strict hops - specifies routers that LSP has to cross one after another.

Record Route Object (RRO) - encoded in both Path and Resv messages. Lists all hops the message went through. Used for loop detection.

Session Attribute - contains priority, preemption, affinity and rerouting options.

Even though RSVP is specifically used for resource reservation purposes - it can't actually do any traffic policing. For that, you need to use firewall filters.

Route Calculation
To select the best route fitting user constraints, Constrained Shortest Path First (CSPF) algorithm is used. It's a modification of SPF algorithm (used in OSPF and ISIS) with possibility to calculate against user provided constraints.
Constraints for a given LSP are passed to CSPF, which then consults Traffic Engineering Database (TED) to remove all links and nodes that don't fit the user requirements. SPF is then run on what is left.
If named path with defined hops is not used, SPF is run just once to calculate the ERO. If an LSP needs to go through user defined LSRs, then SPF is run for each pair of routers (i.e. from ingress to first specified hop, from first specified hop to the second one, etc.). Results of multiple computations are then compiled into a single ERO.

Signaling Path using ERO - source alcatel-lucent.com

CSPF algorithm steps:

Remove all links that don't meet bandwidth requirements.

Remove all links which colors are not included

Remove all links which colors are excluded

Calculate shortest path choosing from links that are left. If required nodes were specified, perform this calculation for each pair of hops.

When multiple equal cost paths exist:

remove those where last-hop address is not equal to egress router address.
choose one based on load-balancing configuration (random, most-fill, least-fill).

Create an ERO that lists all physical interfaces of LSRs along the route

User configurable constraints are:

Administrative Groups (Colors) - attach certain locally-significant name (i.e. Gold, Silver, Platinum) to bit values that define some link groups.

Priority and Preemption - setup and hold priority. 0 is best, 7 is worst. If LSP setup priority is better than existing LSP hold priority and there's not enough bandwidth available, then the lower priority LSP is preempted. Default 7/0 - LSP can't be preempted and doesn't preempt other LSPs.

Traffic Protection

Another advantage of using RSVP is possiblity to setup multiple backup paths for a given LSP.

Primary path
Main and default path of an LSP. By default, preemption (reverting from backup to primary path) is turned on. Primary path is reestablished according to retry-timer property (30sec by default) as many times as retry-limit propery allows (indefinitely by default). When the primary path is up again, router waits two instances of the timer, to switch back the traffic to make sure it's stable. Specifying a primary path is not required - it's possible to use only secondary paths.

Secondary path
Backup, used when primary path is down. First one from the list is chosen.

Standby secondary path
Normal secondary paths are set-up after the primary goes down. Standby paths are set-up immediately after the primary path is up to allow for as little downtime as possible in case of primary path failure.

Fast reroute
To mitigate packet loss when changing paths, reroutes are set-up for link or LSR along the path. If errors happen, upstream LSR immediately reroutes the traffic and generates a PathErr message.

Two possible modes of fast reroute operation are:

Node Protection - each LSR along the path calculates a detour from itself to egress router. Administrative constraints are inherited by default - new path has to conform to original LSP requirements. Main idea is to circumvent next node in case of failure but obiously node protection protects also from links failures.

Link Protection - each LSR along the path calculates a detour from itself just to the next LSR along the path. No constraints are inherited. A little less flexible than node protection but allows for many-to-one repairs, since many LSPs can use the same detour.

Controlling LSPs

There are few options available to control the way LSPs are set-up and operated.

Adaptive Mode Make-before-break, failed path is teared down after the second path is up and running. Also, doesn't double-count bandwidth reservations of the secondary path.

Explicit Null Advertisements
When configured, egress router advertises to the penultimate router label 0 (explicit null) instead of default label 3 (implicit null). In such case, PHP doesn't occur, and the MPLS header reaches the egress router. Useful for QoS classification purposes and VPNs.

Controlling Time-to-Live
By default TTL value from IP header is copied to MPLS header and is being decreased with every MPLS hop and at egress router rewritten into the IP header. This behavior can by modified to hide number of hops the packet went through in MPLS network:

no-decrement-ttl - TTL value in IP header is ignored and 255 is used instead. Egress router doesn't update the value in IP header. Configured only on ingress router, propagated through Label Request Object - every router has to understand it for this option to work.

no-propagate-ttl - same as no-decrement but configured on each router statically, without the use of LRO.

Traffic Engineering

OSPF and ISIS both, use the same Sub-TLVs to support Traffic Engineering:

Administrative Group - link color/group encoded in 32bit vector. In OSPF called Resource Class/Color.

Maximum Link Bandwidth - actual bandwidth of the local interface. In OSPF called Maximum Bandwidth.

Maximum Reservable Bandwidth - bandwidth available for reservation, might be higher than actual interface bandwidth due to over-subscription .

Unreserved Bandwidth - amount of bandwidth available for reservations. Calculated for each of eight priorities.

Traffic Engineering Metric - metric of the interface to be used in CSFP calculation. Always present in OSPF (same value as in Router LSP), sometimes present in ISIS (if different than the metric used by ISIS routing process).

Since ISIS uses TLVs instead of LSAs to convey interface info between nodes, above Sub-TLVs are attached to Extended IS Reachability TLV. In case of OSPF, those Sub-TLVs are part of type 10 Opaque LSA. Both protocols create separate Traffic Engineering Databases for given areas/levels.
Today, there's basically no performance/usability difference between ISIS and OSPF when it comes to supporting MPLS networks. Back in the day, overload bit offered by ISIS was useful since router's CPUs were awful slow - today it's used mostly to prevent blackholing (dropping traffic, since BGP converges A LOT slower than IGPs) in case of rebooting IBGP nodes but the same thing can be achieved modifying OSPF metrics on transit links.

ISIS is a bit more efficient when it comes to huge flat topologies (as in - not divided into areas/levels) since it doesn't trigger SPF computation for configuration changes - but once again, with today's hardware, it makes hardly any difference anymore.

Usefull links

Categories

Blog Archive

MPLS Guide

Comments

Post a Comment

Popular Posts

Stats

Find It

Credits