Skip to content

Monitoring

Monitoring is what GridNMS does around the clock so you don’t have to. It checks that your devices are reachable, collects detailed metrics like traffic and CPU, and watches those metrics against thresholds you set — raising an event the moment something looks wrong. This page covers what gets monitored, how to read the data, and how to set up bandwidth thresholds.

The Metrics page with device charts The Metrics page lets you pick a device and metric, choose a time range, and read the charts.

GridNMS watches your devices in two layers:

  1. Reachability checks — a regular “are you alive?” test of each device. This is what drives the Up / Down / Unknown status you see everywhere. A device that stops responding is marked Down and raises an event.
  2. Metric collection — periodic gathering of detailed numbers from each device: interface traffic, CPU load, memory use, disk capacity, temperatures, and more, depending on what the device exposes.

For a deeper look at the mechanics, see How Monitoring Works.

You don’t configure each metric by hand. Instead, GridNMS uses monitoring packs — bundles of checks that are attached to a device class. When you set a device’s class (Router, Switch, Firewall, Access Point, and so on), it automatically inherits the right pack and starts collecting the metrics that make sense for that kind of device.

Because packs attach to classes and classes form a hierarchy, monitoring “just works” once a device has the correct class. A switch gets interface and port metrics; a server-style device gets CPU, memory, and disk; a wireless controller gets client and radio metrics — all without per-device setup.

Open Metrics to explore the numbers GridNMS has collected.

  1. Select a device — start typing its name or pick it from the list.
  2. Select a metric — for example, an interface’s inbound/outbound traffic, CPU utilization, or memory usage.
  3. Choose a time range — use the time-range picker to switch between the last hour, last day, last week, or a custom window.
  4. Read the chart — the line chart plots the metric over your chosen range.
  • Time runs left to right. The most recent data is on the right edge.
  • Hover over the line to see the exact value at a point in time.
  • Traffic is usually shown in bits per second (e.g. Mbps); utilization metrics (CPU, memory, disk) are shown as a percentage.
  • Gaps in a line mean GridNMS didn’t have data for that period — often because the device was unreachable or monitoring was paused.

Beyond just graphing traffic, GridNMS can alert you when an interface crosses a bandwidth level you care about. This is how you find saturated uplinks before users start complaining.

The thresholds configuration page Bandwidth thresholds raise an event automatically when an interface crosses your set rate — and clear it when traffic falls back.

You can set a threshold from a device’s Interfaces tab or from the thresholds configuration page:

  1. Pick the interface you want to watch.
  2. Choose the directioninbound, outbound, or both.
  3. Set the rate that should trigger the alert (for example, 80% of the link speed, or a fixed value like 800 Mbps).
  4. Choose the severity the resulting event should carry (see severity levels).
  5. Optionally enable notification so the alert reaches your chosen endpoints, not just the events feed.
  6. Save the rule.
  • When traffic rises above your threshold, GridNMS automatically raises an event at the severity you chose. It appears on the Events page and, if you enabled notification, is delivered to your endpoints.
  • When traffic falls back below the threshold, GridNMS automatically clears the event — you don’t have to close it by hand. This keeps your events feed honest: an open bandwidth event means the link is busy right now.
Interface Suggested threshold
Internet / WAN uplink 75–85% of the contracted speed, in the relevant direction.
Core / distribution uplinks 80% of link speed, both directions.
Access-layer ports Usually not worth a threshold individually — watch the aggregate uplink instead.

Everything monitoring collects can become an alert:

  • A failed reachability check → a device down event.
  • A crossed bandwidth threshold → an interface event (auto-cleared when it recovers).

All of these land on the Events & Alerts page and can be routed to people via Notifications.