Border Gateway Protocol (BGP) Monitoring with SolarWinds
BGP (Border Gateway Protocol) is one of the most famous protocols that we have in networks. This protocol allows us to propagate network prefixes through the internet, and it is commonly used in big corporations and ISPs. Furthermore, it is one of the biggest sources of problems when we talk about reachability issues. This is not a surprise, as BGP is considered the most complex routing protocol.
In this blog post we are not going to talk about BGP itself (you have some links here and here to get a better understanding of the protocol itself), but about the basics of monitoring BGP within your company network, and more specifically, using SolarWinds® Network Performance Monitor (NPM) for this task.
Based on my experience as SolarWinds engineer (currently), and network engineer (in the past), I would divide how to apply basic BGP monitoring into three sections:
- Peer Status
- Prefixes Received
- Routes
Peer Status
This is probably the most simple way to monitor BGP, and probably the most commonly used ways to do so. In order to exchange network prefixes with the rest of BGP peers, first of all, the router needs to establish a connection with them. If the status of the BGP neighbourship is not ‘established’, no exchange of routes will occur. There are two ways to get the status of the BGP peers: by polling some SNMP OIDs; or by receiving SNMP traps from the routers. Polling is the most important option, as this method actively asks for the status information, so we always know what the last polled state was. With SNMP Traps, we are waiting to be told of a change and on its own is not enough to know what the current state is, but is very good at telling us in real-time that neighbour changes have occurred.
SNMP Polling (Active Collection)
SolarWinds has the built-in capability to monitor the status of the BGP peers. Basically, SolarWinds gets the value of the OID 1.3.6.1.2.1.15.3.1.2, which shows the status of the BGP peers. To activate this feature, list resources on the router (Settings > Manage Nodes > Select device > List Resources) where you want to monitor BGP neighbours and tick the BGP neighbours option.
When this feature is enabled on a device, by default a view resource will appear on the node details page of the devices where this feature is enabled showing the status of the BGP peers, along with other information.
SolarWinds monitors neighbour status every 5 minutes by default, which, for most of the situations, is enough. However, in some other situations, this frequency will not give us all the information at the speed we need. Using SNMP traps allows us to get the information as soon as it happens, and also can give us some extra information.
How to enable BGP traps on Cisco
How to enable BGP traps on Juniper
There are two main SNMP traps that we want to receive when monitoring BGP:
- Backwards transition: this trap is issued when the BGP has a new status ‘lower’ than the last one. For example, if the peer goes from Established to Idle.
- Established State: this trap is issued when a BGP peer reaches established status
- State change: this trap will be issued every time there is a change in the peer state, either backwards or forward.
All these three SNMP traps are interesting, however, there can be the situation when some of them are sent for the same event. For example, if there is a backwards transition (established to idle) this will trigger the backwards transition trap (obviously), and the state change trap as well (there has been a change on the state of the neighborship). Or a similar situation with established state and state change.
For most of the devices compatible with BGP, there is only one specific OID that will be sent for Backwards transition, however, some vendors, such as Cisco or Juniper among others, have some other specific SNMP Trap that will be sent along with the default one. This happens because, even though there is a specific MIB branch under the standard branch (1.3.6.1.2.1) which contains event message support for the BGP protocol, which in theory will support all of the messages related to BGP events. However, there are some devices which have events not supported in this ‘shared’ MIB structure and therefore, have another BGP MIB branch under the private and vendor enterprises branch (1.3.6.1.4.1) allowing them to extend the event and polling structure beyond the shared standard branch.
SolarWinds monitors neighbour status every 5 minutes by default, which, for most of the situations, is enough. However, in some other situations, this frequency will not give us all the information at the speed we need. Using SNMP traps allows us to get the information as soon as it happens, and also can give us some extra information.
How to enable BGP traps on Cisco
How to enable BGP traps on Juniper
There are two main SNMP traps that we want to receive when monitoring BGP:
- Backwards transition: this trap is issued when the BGP has a new status ‘lower’ than the last one. For example, if the peer goes from Established to Idle.
- Established State: this trap is issued when a BGP peer reaches established status
- State change: this trap will be issued every time there is a change in the peer state, either backwards or forward.
All these three SNMP traps are interesting, however, there can be the situation when some of them are sent for the same event. For example, if there is a backwards transition (established to idle) this will trigger the backwards transition trap (obviously), and the state change trap as well (there has been a change on the state of the neighborship). Or a similar situation with established state and state change.
For most of the devices compatible with BGP, there is only one specific OID that will be sent for Backwards transition, however, some vendors, such as Cisco or Juniper among others, have some other specific SNMP Trap that will be sent along with the default one. This happens because, even though there is a specific MIB branch under the standard branch (1.3.6.1.2.1) which contains event message support for the BGP protocol, which in theory will support all of the messages related to BGP events. However, there are some devices which have events not supported in this ‘shared’ MIB structure and therefore, have another BGP MIB branch under the private and vendor enterprises branch (1.3.6.1.4.1) allowing them to extend the event and polling structure beyond the shared standard branch.
We have mentioned before that the SNMP Traps from the private branch normally extend the information available, compared to the traps from the standard branch. Let’s have a closer look. For example, these are the backwards transition traps that a Cisco device will send when these events occur.
Standard branch SNMP Trap:
[code]TRAP: CES-BGP-DEFAULTS-MIB:bgpTraps.0.2 :
Last Error: bgpPeerLastError.192.168.10.101 = BAA=,
Current Status: bgpPeerState.192.168.10.101 = idle(1),
Device Up Time: sysUpTime = 14 days 16 hours 6 minutes 34.39 seconds,
Device IP: experimental.1057.1.0 = 192.168.10.103,
Trap Origin: snmpTrapEnterprise = CES-BGP-DEFAULTS-MIB:bgpTraps[/code]
Cisco branch SNMP Trap:
[code]TRAP: CISCO-BGP4-MIB:cbgpBackwardTransition :
Last Error: bgpPeerLastError.192.168.10.101 = BAA=,
Current Status: bgpPeerState.192.168.10.101 = idle(1),
Last Status: cbgpPeerPrevState.192.168.10.101 = established(6),
Reason: cbgpPeerLastErrorTxt.192.168.10.101 = hold time expired,
Device Up Time: sysUpTime = 14 days 16 hours 6 minutes 34.39 seconds,
Device IP: experimental.1057.1.0 = 192.168.10.103,
Trap Origin: snmpTrapEnterprise = CISCO-BGP4-MIB:ciscoBgp4MIB[/code]
As you may have noticed, the Cisco branch trap gives you a little bit more information, in this case, previous status and last error.
It is important to review and confirm which branch your device generates SNMP Traps for (Standard or Private) and if both utilise the Private branch as this is likely to have more information within it than the Standard branch message. The following link provides information on creating alerts within SolarWinds Orion:
How to create an alert for Traps in SolarWinds
Prefixes Received
When peering with ISPs, one of the common issues that we might have stopped receiving prefixes from the ISP router. This can be a big problem because it might be unnoticed if we only monitor the status of the BGP neighbourship.
It is also a problem when the ISP router advertises too many prefixes, as our router might start to receive more routes than the router memory can take. If this same router is peering internally with other routers that also perform critical routing functions within the network, this overhead could lead to a bad outcome for network function.
The management branch of the BGP MIB file does not contain an OID that allows us to monitor this metric, therefore we have to rely on the private branch of each vendor. This means that some vendors may give us this information and some others may not, so a review is always necessary to determine if and what the OID will be.
On the table below you will find the main metrics that we recommend to monitor via SNMP active polling.
Metric | Description |
Accepted Prefixes | Allows us to know how many prefixes have been received from the BGP peer. If the number of prefixes is 0 for a long time (2 hours) this might indicate a problem with the peer. |
Prefix Threshold | During the configuration of BGP on the Cisco router, we have the option to define a threshold (in %). Once the threshold is reached, the router will send a trap advertising the amount of prefixes received from a peer has exceeded the threshold. We can monitor this value in SolarWinds in order to create our own automation processes. |
Maximum Prefixes Allowed | This gives us the total amount of prefixes allowed on this neighbour. One of the actions, when the limit is reached, is to bring down the BGP peer connection. |
Advertised Prefixes | This monitors the prefixes we are advertising. This is important to monitor in order to know whether we are advertising too many prefixes or not enough. |
Cisco is one of the vendors that will give us the most of the metrics we need. Depending on how BGP is configured in the router, the OIDs might differ. This depends on where you are using basic BGP or with address families.
If BGP is configured without address families (basic BGP), then the OIDS are the following:
Accepted Prefixes | 1.3.6.1.4.1.9.9.187.1.2.1.1.1 |
Maximum Prefixes Allowed | 1.3.6.1.4.1.9.9.187.1.2.1.1.3 |
Advertised Prefixes | 1.3.6.1.4.1.9.9.187.1.2.1.1.4 |
Otherwise, if BGP has been configured with address families, then the OIDs are the following:
Accepted Prefixes | 1.3.6.1.4.1.9.9.187.1.2.4.1.1 |
Maximum Prefixes Allowed | 1.3.6.1.4.1.9.9.187.1.2.4.1.3 |
Prefix Threshold | 1.3.6.1.4.1.9.9.187.1.2.4.1.4 |
Advertised Prefixes | 1.3.6.1.4.1.9.9.187.1.2.4.1.6 |
Example of basic configuration for Cisco devices:
[code]neighbor 192.168.10.101 maximum-prefix 500 80[/code]
- neighbor IP address is 192.168.10.101
- maximum number of prefixes allowed are 500
- when the number of prefixes received is over 80% of the maximum (500×80% = 400)
To demonstrate the differences between vendors and how available data can be different, here we are using Juniper routers and only have the options to monitor received and advertised prefixes.
Accepted Prefixes | 1.3.6.1.4.1.2636.5.1.1.2.6.2.1.7 |
Advertised Prefixes | 1.3.6.1.4.1.2636.5.1.1.2.6.2.1.10 |
These are the Universal Device Pollers (UnDPs) that can be imported into SolarWinds.
Routes
On this particular topic, there are two main areas that we should monitor: flapping routes, and AS path.
Flapping Routes
Monitoring flapping routes are not exclusive to BGP, we should monitor flapping routes for each single routing protocol such as OSPF or EIGRP as well. The good news here is that SolarWinds can monitor this out of the box, just make sure you are monitoring the routing table when you list resources on a router (see List Resources section above).
Custom Poller: Universal Device Poller - Juniper BGP
Custom Poller: Universal Device Poller - Cisco BGP (Basic)
Custom Poller: Universal Device Poller - Cisco BGP (AF)
The other important metric here is the AS path. In order to know the route that the packets are following to reach a particular subnet, BGP uses the property AS path, determining the Autonomous Systems that the packet will go through to reach the destination. It is important to monitor the existing AS paths in order to detect any type of DDoS attack, hijacking as these are methods used to exploit the BGP protocol or merely to know the route our traffic will take.
In Cisco, we can monitor the AS path using the following OID 1.3.6.1.4.1.9.9.187.1.1.1.1.8 and for Juniper, it is 1.3.6.1.4.1.2636.5.1.1.3.5.1.4
If you are testing this metric with Cisco, it is necessary to convert the default HEX format of the output into a format which is more human readable. This can be performed using the SQL query within the Orion widget; Custom Table.
Custom Script: SQL Query AS Path
NOTE: this SQL script has been only tested for 16-bit AS numbers, not for 32-bit AS numbers. It also only includes up to the third AS, however, it could be edited to work with 32-bit AS numbers and more AS in the path.
And that’s all I wanted to share with you guys and gals. I hope this has been informative for you, and don’t hesitate to contact me with any question or ideas that you may have regarding the use of SolarWinds.
Raul Gonzalez
Technical Manager
Raul Gonzalez is the Technical Manager at Prosperon Networks. As a Senior SolarWinds and NetBrain Engineer for over seven years, Raul has helped hundreds of customers meet their IT monitoring needs with SolarWinds and NetBrain Solutions.
Custom Poller: Universal Device Poller - Juniper BGP
Custom Poller: Universal Device Poller - Cisco BGP (Basic)
Custom Poller: Universal Device Poller - Cisco BGP (AF)
Custom Script: SQL Query AS Path
Related Insights From The Prosperon Blog
The Critical Role Of The Trusted Advisor In NetOps
Before there was “Network Operations” there were networks. Networks grew out of a need for connecting one box to another, sharing printers, and for more advanced users,...
Webinar On-Demand: Beyond Monitoring – Introducing SolarWinds Observability Platform
In this webinar, you will discover how SolarWinds® is evolving to deliver complete infrastructure visibility. This webinar examines how to extend visibility across your IT...
An Introduction To SolarWinds Orion’s Device Configuration Compliance Reporting
Needless to say, it is critical that the all network devices in your organisation are secure and available at all times. However, configuration changes and adding new...