Monitoring Agent — User Guide

Email Relay, Distribution Lists, Service Monitor, Portal Watchdog, Activity Logging & Health Checks
Overview

What is the Monitoring Agent?

The Monitoring Agent provides email notification capabilities across the entire SAP landscape. All Linux servers are configured to relay email through smtp2go, a cloud SMTP service, using the verified domain fivetran-internal-sales.com.

Key features:

  • Distribution lists — managed via the web UI, stored in /usr/sap/sap_skills/mailing_lists.json, loaded dynamically on page load
  • Per-server email — each server sends as <hostname>@fivetran-internal-sales.com
  • Portal email — the Skills Portal sends as sap-skills@fivetran-internal-sales.com
  • Test email — one-click test from the cockpit page
SMTP Relay

smtp2go Configuration

PropertyValue
Providersmtp2go
SMTP Servermail.smtp2go.com
Port2525
Auth Userantonio.carbone@fivetran.com
Auth PasswordStored in vault key smtp2go
Verified Domainfivetran-internal-sales.com
TLSEnabled (opportunistic)
Portal Fromsap-skills@fivetran-internal-sales.com
smtp2go dashboard: https://app.smtp2go.com — use to check delivery logs, verify senders, and manage account settings.
Server Configuration

Per-Server Email Setup

Each Linux server in the SAP landscape is configured to relay email via smtp2go. Each uses <hostname>@fivetran-internal-sales.com as its from address.

ServerFrom AddressMTAOS
sapidesecc8sapidesecc8@fivetran-internal-sales.comPostfix (lmdb)SUSE 15 SP5
sapidess4sapidess4@fivetran-internal-sales.comPostfix (lmdb)SUSE 15 SP3
ts-sap-hana-s4-ssh-tunnelsshtunnel@fivetran-internal-sales.comExim4Debian 12
saproutersaprouter@fivetran-internal-sales.comPostfix (lmdb)Rocky Linux 10
saphvrhubhvrhub@fivetran-internal-sales.comPostfix (hash)Rocky Linux 8

Configuration Files

MTAConfig FileCredentials File
Postfix (SUSE)/etc/postfix/main.cf/etc/postfix/sasl_passwd (lmdb map)
Postfix (Rocky)/etc/postfix/main.cf/etc/postfix/sasl_passwd (hash or lmdb map)
Exim4 (Debian)/etc/exim4/update-exim4.conf.conf/etc/exim4/passwd.client
Map types: SUSE and Rocky Linux 10 use lmdb maps. Rocky Linux 8 uses hash (Berkeley DB). After editing sasl_passwd, regenerate the map: postmap lmdb:/etc/postfix/sasl_passwd or postmap /etc/postfix/sasl_passwd.

Test Commands

Send a test email from any server:

# From sapidesecc8 (local)
echo "Test" | mailx -s "Test from sapidesecc8" -r "sapidesecc8@fivetran-internal-sales.com" recipient@email.com

# From sapidess4 (via SSH)
ssh root@sapidess4 'echo "Test" | mailx -s "Test from sapidess4" -r "sapidess4@fivetran-internal-sales.com" recipient@email.com'

# From sshtunnel (Exim4 uses different syntax)
ssh root@10.142.0.37 'echo "Test" | mail -s "Test from sshtunnel" -a "From: sshtunnel@fivetran-internal-sales.com" recipient@email.com'

# From saprouter
ssh root@10.128.0.111 'echo "Test" | mailx -s "Test from saprouter" -r "saprouter@fivetran-internal-sales.com" recipient@email.com'

# From hvrhub
ssh root@10.128.15.240 'echo "Test" | mailx -s "Test from hvrhub" -r "hvrhub@fivetran-internal-sales.com" recipient@email.com'

Adding SMTP to a New Server

To configure a new Linux server to relay email via smtp2go:

# For Postfix (RHEL/Rocky/SUSE):
postconf -e "relayhost = [mail.smtp2go.com]:2525"
postconf -e "smtp_sasl_auth_enable = yes"
postconf -e "smtp_sasl_password_maps = lmdb:/etc/postfix/sasl_passwd"  # or hash: on older systems
postconf -e "smtp_sasl_security_options = noanonymous"
postconf -e "smtp_tls_security_level = may"
echo "[mail.smtp2go.com]:2525 antonio.carbone@fivetran.com:PASSWORD" > /etc/postfix/sasl_passwd
chmod 600 /etc/postfix/sasl_passwd
postmap lmdb:/etc/postfix/sasl_passwd  # or: postmap /etc/postfix/sasl_passwd
systemctl restart postfix

# For Exim4 (Debian):
# Edit /etc/exim4/update-exim4.conf.conf:
#   dc_eximconfig_configtype='smarthost'
#   dc_smarthost='mail.smtp2go.com::2525'
# Add to /etc/exim4/passwd.client:
#   mail.smtp2go.com:antonio.carbone@fivetran.com:PASSWORD
update-exim4.conf
systemctl restart exim4
Password: Get the smtp2go password from the vault: key smtp2go. Do not hardcode it in documentation.
Distribution Lists

How Distribution Lists Work

Distribution lists are stored in a JSON file on the server: /usr/sap/sap_skills/mailing_lists.json. The web UI loads the lists dynamically on page load via the API. When a list is updated via the web UI:

  • The browser calls POST /sap_skills/api/update_mailing_list with the list name and email array
  • The server writes the updated list to /usr/sap/sap_skills/mailing_lists.json
  • Changes are reflected immediately in both the Monitoring Cockpit and the Server Documentation page (loaded dynamically on each page load)
List NameRecipientsUsage
SAPSpecialistsantonio.carbone@fivetran.com, richard.brouwer@fivetran.comAlerts, notifications, reports
APIs

Monitoring API Endpoints

EndpointMethodAuthDescription
/sap_skills/api/get_mailing_listPOSTNoneRead a distribution list by name. Body: {"list_name": "SAPSpecialists"}
/sap_skills/api/update_mailing_listPOSTNoneUpdate a distribution list. Body: {"list_name": "...", "emails": [...]}
/sap_skills/api/send_test_emailPOSTNoneSend test email to a list. Body: {"list_name": "...", "emails": [...]}

Example — read list via curl:

curl -sk -X POST -H "Content-Type: application/json" \
  -d '{"list_name":"SAPSpecialists"}' \
  https://sapidesecc8.fivetran-internal-sales.com/sap_skills/api/get_mailing_list

Example — send to list from Python:

import subprocess
# Read list from API or hardcode
recipients = ["antonio.carbone@fivetran.com", "richard.brouwer@fivetran.com"]
for r in recipients:
    subprocess.run(f'echo "Alert message" | mailx -s "SAP Alert" -r "sap-skills@fivetran-internal-sales.com" {r}', shell=True)
Troubleshooting

Common Issues

IssueFix
Email not deliveredCheck mail log: tail -20 /var/log/mail (SUSE) or journalctl -u postfix -n 20 (Rocky/Debian)
unsupported dictionary type: hashUse lmdb instead: postconf -e "smtp_sasl_password_maps = lmdb:/etc/postfix/sasl_passwd" then postmap lmdb:/etc/postfix/sasl_passwd
sender domain not verifiedOnly fivetran-internal-sales.com is verified. Use *@fivetran-internal-sales.com as from address.
TLS engine unavailableEnable tlsmgr in /etc/postfix/master.cf (uncomment the line) and restart postfix
SASL authentication failedVerify credentials in /etc/postfix/sasl_passwd match the vault. Regenerate map after editing.
Distribution list changes not savingCheck browser console for errors. Verify the API returns {"status": "ok"}. Check permissions on /usr/sap/sap_skills/mailing_lists.json.
Test email button does nothingList may be empty. Add at least one recipient first.

Diagnostic Commands

# Check postfix queue
postqueue -p

# Flush stuck mail
postqueue -f

# Check postfix config
postconf relayhost smtp_sasl_auth_enable smtp_sasl_password_maps smtp_tls_security_level

# Check Exim4 queue (sshtunnel)
exim4 -bp

# Check smtp2go delivery log
# Go to https://app.smtp2go.com > Activity > Email Activity
Portal Watchdog

What is the Portal Watchdog?

The Portal Watchdog runs on sapidess4 and monitors the web server on sapidesecc8. If the portal goes down unexpectedly (without a planned maintenance flag), the watchdog sends email alerts to the SAP Specialists distribution list.

Because it runs on a separate server from the portal, it can detect and report outages even when sapidesecc8 is completely unreachable.

Configuration

PropertyValue
Script/usr/local/bin/portal_watchdog.py on sapidess4
ScheduleEvery 5 minutes via cron (*/5 * * * *) on sapidess4
Check methodHTTPS to https://sapidesecc8.fivetran-internal-sales.com/sap_skills/ + SSH fallback
State file/var/run/portal_watchdog_state.json
Planned flag/var/run/portal_planned_restart (set by cockpit before restart)
Local log/var/log/portal_watchdog.log
Activity CSV/var/log/portal_watchdog_activity.csv
Parquet outputgs://sap_cds_dbt/portal_log/ (written directly from sapidess4)

How It Works

  • HTTPS check: Attempts to connect to the portal URL. If that fails, falls back to SSH to sapidesecc8 to check service status.
  • Planned flag check: Before sending any alert, checks for /var/run/portal_planned_restart. If the flag exists, the outage is treated as planned and no alert is sent.
  • State tracking: Compares current state against /var/run/portal_watchdog_state.json to detect transitions (up → down, down → up).
  • Alerting: Emails are only sent on state changes (portal going down or coming back up), not on every check cycle.
  • DL recipients: Recipients are hardcoded in the script as a fallback, because the watchdog cannot read from the portal API if the portal is down.

Independent Parquet Writes

sapidess4 has gsutil and PyArrow installed, allowing the watchdog to write Parquet event files directly to gs://sap_cds_dbt/portal_log/ even when sapidesecc8 is down. This ensures activity logging continuity during portal outages.

Manual Commands

# Run watchdog manually
ssh root@sapidess4 "/usr/local/bin/portal_watchdog.py"

# View state
ssh root@sapidess4 "cat /var/run/portal_watchdog_state.json"

# View recent log
ssh root@sapidess4 "tail -20 /var/log/portal_watchdog.log"

# Check cron is active
ssh root@sapidess4 "crontab -l | grep watchdog"

# Set planned maintenance flag (prevents alerts)
ssh root@sapidess4 "touch /var/run/portal_planned_restart"

# Remove planned flag after maintenance
ssh root@sapidess4 "rm -f /var/run/portal_planned_restart"
Activity Logging

Overview

Every monitoring event across the SAP landscape is logged to both a local CSV file and individual Parquet files in GCS. This provides a durable audit trail for all service state changes, health checks, restarts, and watchdog events.

Storage Locations

FormatLocationRetention
CSV/var/log/sap_portal_activity.csv4-month retention (monthly cron cleanup)
Parquetgs://sap_cds_dbt/portal_log/Individual files per event

Schema

ColumnDescription
dateEvent date (YYYY-MM-DD)
timeEvent time (HH:MM:SS, local)
timestamp_utcUTC timestamp (ISO 8601)
sourceOrigin of the event (e.g., health_check, watchdog, cockpit)
typeEvent type (e.g., state_change, startup, alert)
actionWhat happened (e.g., start, stop, restart, check)
serverTarget server hostname
serviceTarget service name
resultOutcome (e.g., success, failure, running, stopped)
detailAdditional context or error message

What Gets Logged

  • Server startups: Logged automatically when a web server restart is detected
  • Service state changes: Start, stop, restart of SAP, HANA, Oracle, and other monitored services
  • Health check alerts: State transitions detected by the health alert system
  • Watchdog events: Portal up/down transitions detected by the Portal Watchdog on sapidess4
  • Cockpit actions: Manual start/stop/restart operations from the Management Cockpits
Service Monitor

What is the Service Monitor?

The Service Monitor is a real-time dashboard that checks the health of all 6 servers and their critical services every 5 minutes. It runs as a cron job on sapidesecc8 and writes results to a JSON file that the web UI reads.

Access it at: SAP_Monitoring.html → Service Monitor

Servers & Services Monitored

ServerServices CheckedCheck Method
sapidess4 SAP Dispatcher, Gateway, ICM, IGS
HANA FIV (Nameserver, Indexserver, XSEngine, Compileserver)
HANA PIT (Nameserver, Indexserver)
SSH + sapcontrol -nr 03 (SAP)
SSH + sapcontrol -nr 00 (FIV)
SSH + sapcontrol -nr 96 (PIT)
sapidesecc8 SAP Dispatcher, Gateway, ICM
Oracle Database
Web Server (Portal)
Local sapcontrol -nr 00
ps -ef | grep ora_pmon
curl https://localhost/sap_skills/
saprouter Server reachability
SAPRouter systemd service
SSH to saprouter-internal
systemctl is-active saprouter
sap-sql-ides Server reachability
SAP Instance (SQ1)
SQL Server Database
Ping 10.128.0.51
SOAP to http://10.128.0.51:50013/ (sapcontrol)
WinRM via sq1_system_status API
saphvrhub Server reachability
HVR Hub Service
PostgreSQL 14
SSH to saphvrhub
systemctl is-active hvrhubserver
systemctl is-active postgresql-14
SSH Tunnel Server reachability SSH to 10.142.0.37

Status Indicators

ColorMeaning
GreenService is running / server is reachable
RedService is down / server is unreachable
GrayInitial state — check in progress

Each server card also shows a REACHABLE or UNREACHABLE badge at the top right.

Health Check Script

Script Details

PropertyValue
Script/usr/local/bin/sap_health_check.sh
Runs onsapidesecc8
ScheduleEvery 5 minutes via cron (*/5 * * * *)
JSON Output/usr/sap/sap_skills/health_status.json
History Log/var/log/sap_health_check.log
Log RotationAuto-truncated to last 2000 lines

JSON Output Format

{
  "timestamp": "2026-04-14T05:16:19Z",
  "servers": {
    "sapidess4": {
      "reachable": true,
      "sap_app_server": {"dispatcher":1,"gateway":1,"icm":1,"igs":1},
      "hana_fiv": {"nameserver":1,"indexserver":1,"xsengine":3,"compileserver":1},
      "hana_pit": {"nameserver":1,"indexserver":0}
    },
    "sapidesecc8": {
      "reachable": true,
      "sap_app_server": {"dispatcher":1,"gateway":1,"icm":1},
      "oracle_db": true,
      "web_server": true
    },
    "saprouter": {"reachable": true, "saprouter_service": true},
    "sap_sql_ides": {"reachable": true, "sap_instance": true, "sql_server": true},
    "saphvrhub": {"reachable": true, "hvr_hub_service": true, "postgresql": true},
    "ssh_tunnel": {"reachable": true}
  }
}

Values of 1 or true = running. Values of 0 or false = down.

API Endpoints

EndpointMethodDescription
/sap_skills/health_status.jsonGETCurrent health status JSON (static file, updated by cron)
/sap_skills/api/admin/health_logPOSTLast 50 lines of the health check history log

Health Alert System

After each health check, sap_health_alert.py runs automatically to detect and report state changes:

  • State comparison: Compares the current health status against the previous state to detect transitions (running → stopped, stopped → running).
  • Planned flag check: Before sending any alert, checks for planned maintenance flags. If a service or server is flagged as planned (via the cockpit or cron wrapper), no alert email is sent.
  • Alert emails: Emails are only sent for actual state changes (start, stop, restart), never for routine health checks where the state is unchanged.
  • Activity logging: Every state change is logged to CSV and Parquet (see Activity Logging).

Planned Cron Wrapper

The planned cron wrapper (/usr/local/bin/planned_cron_wrapper.py) sets planned maintenance flags before running scheduled tasks, so the health alert system does not send false alerts during expected downtime.

--also-flag option: Used when a scheduled action has indirect consequences on other services. For example, an Oracle offline_force backup also stops SAP services on the same server:

# Oracle offline_force backup: flags both Oracle AND SAP as planned
planned_cron_wrapper.py --also-flag sapidesecc8:sap sapidesecc8:oracle -- brbackup -t offline_force ...

This prevents false alerts for both the directly stopped Oracle database and the indirectly affected SAP application server.

Manual Commands

# Run health check manually
ssh root@sapidesecc8 "/usr/local/bin/sap_health_check.sh"

# View current status
ssh root@sapidesecc8 "cat /usr/sap/sap_skills/health_status.json"

# View health check history
ssh root@sapidesecc8 "tail -20 /var/log/sap_health_check.log"

# Check cron is active
ssh root@sapidesecc8 "crontab -l | grep health"