Acknowledging nagios problems

Print Friendly, PDF & Email

If you are listed as the “service-manager” for a project (see https://cfgman.ilrt.bris.ac.uk/) then you will receive alerts when any associated services for that project that are monitored by Nagios. If a service starts to send you alerts please could you pick them up and not ignore them. If the alert is caused by a known problem that is in hand please go to the nagios web interface and “acknowledge” the service problem. Follow these steps:

  1. Go to https://ig88.ilrt.bris.ac.uk/nagios3/
  2. Click on unhandled next to Problems – Services.
  3. Click on the service name for the service which is a Critical state that you manage.
  4. Click on “Acknowledge this service problem” in the list of options on the right.
  5. Enter some text in the comment field and click Commit.

If you do not do this, it will be assumed that there is an unknown problem occurring on the system and either a sysadmin or another service manager will investigate it. If you are handling the problem you can save that person the time and effort required to investigate it.

If on the other hand it is a problem which needs attention and you are not able to resolve the problem yourself, then acknowledge the problem and submit a ticket request to ilrt-helpdesk@bris.ac.uk stating the nature of the problem and services it is affecting. PLEASE DO NOT JUST FORWARD THE NAGIOS EMAIL! Make sure you have attempted to determine the nature of the problem.

It is possible that we are already aware of the problem and the service is “flapping”. We should attempt to set a similar acknowledgements. Looking at the nagios unhandled page may give some indication that there is a bigger problem at hand. This does not mean you should ignore the states but please attempt to escalate the problem report. If it is out of hours please remember that there is limited support available and response times cannot be guaranteed.