Monitoring Agent — User Guide

Overview

What is the Monitoring Agent?

The Monitoring Agent provides email notification capabilities across the entire SAP landscape. All Linux servers are configured to relay email through smtp2go, a cloud SMTP service, using the verified domain fivetran-internal-sales.com.

Key features:

Distribution lists — managed via the web UI, stored in /usr/sap/sap_skills/mailing_lists.json, loaded dynamically on page load
Per-server email — each server sends as <hostname>@fivetran-internal-sales.com
Portal email — the Skills Portal sends as sap-skills@fivetran-internal-sales.com
Test email — one-click test from the cockpit page

SMTP Relay

smtp2go Configuration

Property	Value
Provider	smtp2go
SMTP Server	`mail.smtp2go.com`
Port	`2525`
Auth User	`antonio.carbone@fivetran.com`
Auth Password	Stored in vault key `smtp2go`
Verified Domain	`fivetran-internal-sales.com`
TLS	Enabled (opportunistic)
Portal From	`sap-skills@fivetran-internal-sales.com`

smtp2go dashboard: https://app.smtp2go.com — use to check delivery logs, verify senders, and manage account settings.

Server Configuration

Per-Server Email Setup

Each Linux server in the SAP landscape is configured to relay email via smtp2go. Each uses <hostname>@fivetran-internal-sales.com as its from address.

Server	From Address	MTA	OS
`sapidesecc8`	`sapidesecc8@fivetran-internal-sales.com`	Postfix (lmdb)	SUSE 15 SP5
`sapidess4`	`sapidess4@fivetran-internal-sales.com`	Postfix (lmdb)	SUSE 15 SP3
`ts-sap-hana-s4-ssh-tunnel`	`sshtunnel@fivetran-internal-sales.com`	Exim4	Debian 12
`saprouter`	`saprouter@fivetran-internal-sales.com`	Postfix (lmdb)	Rocky Linux 10
`saphvrhub`	`hvrhub@fivetran-internal-sales.com`	Postfix (hash)	Rocky Linux 8

Configuration Files

MTA	Config File	Credentials File
Postfix (SUSE)	`/etc/postfix/main.cf`	`/etc/postfix/sasl_passwd` (lmdb map)
Postfix (Rocky)	`/etc/postfix/main.cf`	`/etc/postfix/sasl_passwd` (hash or lmdb map)
Exim4 (Debian)	`/etc/exim4/update-exim4.conf.conf`	`/etc/exim4/passwd.client`

Map types: SUSE and Rocky Linux 10 use lmdb maps. Rocky Linux 8 uses hash (Berkeley DB). After editing sasl_passwd, regenerate the map: postmap lmdb:/etc/postfix/sasl_passwd or postmap /etc/postfix/sasl_passwd.

Test Commands

Send a test email from any server:

# From sapidesecc8 (local)
echo "Test" | mailx -s "Test from sapidesecc8" -r "sapidesecc8@fivetran-internal-sales.com" recipient@email.com

# From sapidess4 (via SSH)
ssh root@sapidess4 'echo "Test" | mailx -s "Test from sapidess4" -r "sapidess4@fivetran-internal-sales.com" recipient@email.com'

# From sshtunnel (Exim4 uses different syntax)
ssh root@10.142.0.37 'echo "Test" | mail -s "Test from sshtunnel" -a "From: sshtunnel@fivetran-internal-sales.com" recipient@email.com'

# From saprouter
ssh root@10.128.0.111 'echo "Test" | mailx -s "Test from saprouter" -r "saprouter@fivetran-internal-sales.com" recipient@email.com'

# From hvrhub
ssh root@10.128.15.240 'echo "Test" | mailx -s "Test from hvrhub" -r "hvrhub@fivetran-internal-sales.com" recipient@email.com'

Adding SMTP to a New Server

To configure a new Linux server to relay email via smtp2go:

# For Postfix (RHEL/Rocky/SUSE):
postconf -e "relayhost = [mail.smtp2go.com]:2525"
postconf -e "smtp_sasl_auth_enable = yes"
postconf -e "smtp_sasl_password_maps = lmdb:/etc/postfix/sasl_passwd"  # or hash: on older systems
postconf -e "smtp_sasl_security_options = noanonymous"
postconf -e "smtp_tls_security_level = may"
echo "[mail.smtp2go.com]:2525 antonio.carbone@fivetran.com:PASSWORD" > /etc/postfix/sasl_passwd
chmod 600 /etc/postfix/sasl_passwd
postmap lmdb:/etc/postfix/sasl_passwd  # or: postmap /etc/postfix/sasl_passwd
systemctl restart postfix

# For Exim4 (Debian):
# Edit /etc/exim4/update-exim4.conf.conf:
#   dc_eximconfig_configtype='smarthost'
#   dc_smarthost='mail.smtp2go.com::2525'
# Add to /etc/exim4/passwd.client:
#   mail.smtp2go.com:antonio.carbone@fivetran.com:PASSWORD
update-exim4.conf
systemctl restart exim4

Password: Get the smtp2go password from the vault: key smtp2go. Do not hardcode it in documentation.

Distribution Lists

How Distribution Lists Work

Distribution lists are stored in a JSON file on the server: /usr/sap/sap_skills/mailing_lists.json. The web UI loads the lists dynamically on page load via the API. When a list is updated via the web UI:

The browser calls POST /sap_skills/api/update_mailing_list with the list name and email array
The server writes the updated list to /usr/sap/sap_skills/mailing_lists.json
Changes are reflected immediately in both the Monitoring Cockpit and the Server Documentation page (loaded dynamically on each page load)

List Name	Recipients	Usage
`SAPSpecialists`	`antonio.carbone@fivetran.com`, `richard.brouwer@fivetran.com`	Alerts, notifications, reports

APIs

Monitoring API Endpoints

Endpoint	Method	Auth	Description
`/sap_skills/api/get_mailing_list`	POST	None	Read a distribution list by name. Body: `{"list_name": "SAPSpecialists"}`
`/sap_skills/api/update_mailing_list`	POST	None	Update a distribution list. Body: `{"list_name": "...", "emails": [...]}`
`/sap_skills/api/send_test_email`	POST	None	Send test email to a list. Body: `{"list_name": "...", "emails": [...]}`

Example — read list via curl:

curl -sk -X POST -H "Content-Type: application/json" \
  -d '{"list_name":"SAPSpecialists"}' \
  https://sapidesecc8.fivetran-internal-sales.com/sap_skills/api/get_mailing_list

Example — send to list from Python:

import subprocess
# Read list from API or hardcode
recipients = ["antonio.carbone@fivetran.com", "richard.brouwer@fivetran.com"]
for r in recipients:
    subprocess.run(f'echo "Alert message" | mailx -s "SAP Alert" -r "sap-skills@fivetran-internal-sales.com" {r}', shell=True)

Troubleshooting

Common Issues

Issue	Fix
Email not delivered	Check mail log: `tail -20 /var/log/mail` (SUSE) or `journalctl -u postfix -n 20` (Rocky/Debian)
`unsupported dictionary type: hash`	Use `lmdb` instead: `postconf -e "smtp_sasl_password_maps = lmdb:/etc/postfix/sasl_passwd"` then `postmap lmdb:/etc/postfix/sasl_passwd`
`sender domain not verified`	Only `fivetran-internal-sales.com` is verified. Use `*@fivetran-internal-sales.com` as from address.
`TLS engine unavailable`	Enable `tlsmgr` in `/etc/postfix/master.cf` (uncomment the line) and restart postfix
`SASL authentication failed`	Verify credentials in `/etc/postfix/sasl_passwd` match the vault. Regenerate map after editing.
Distribution list changes not saving	Check browser console for errors. Verify the API returns `{"status": "ok"}`. Check permissions on `/usr/sap/sap_skills/mailing_lists.json`.
Test email button does nothing	List may be empty. Add at least one recipient first.

Diagnostic Commands

# Check postfix queue
postqueue -p

# Flush stuck mail
postqueue -f

# Check postfix config
postconf relayhost smtp_sasl_auth_enable smtp_sasl_password_maps smtp_tls_security_level

# Check Exim4 queue (sshtunnel)
exim4 -bp

# Check smtp2go delivery log
# Go to https://app.smtp2go.com > Activity > Email Activity

Portal Watchdog

What is the Portal Watchdog?

The Portal Watchdog runs on sapidess4 and monitors the web server on sapidesecc8. If the portal goes down unexpectedly (without a planned maintenance flag), the watchdog sends email alerts to the SAP Specialists distribution list.

Because it runs on a separate server from the portal, it can detect and report outages even when sapidesecc8 is completely unreachable.

Configuration

Property	Value
Script	`/usr/local/bin/portal_watchdog.py` on sapidess4
Schedule	Every 5 minutes via cron (`/5 * * *`) on sapidess4
Check method	HTTPS to `https://sapidesecc8.fivetran-internal-sales.com/sap_skills/` + SSH fallback
State file	`/var/run/portal_watchdog_state.json`
Planned flag	`/var/run/portal_planned_restart` (set by cockpit before restart)
Local log	`/var/log/portal_watchdog.log`
Activity CSV	`/var/log/portal_watchdog_activity.csv`
Parquet output	`gs://sap_cds_dbt/portal_log/` (written directly from sapidess4)

How It Works

HTTPS check: Attempts to connect to the portal URL. If that fails, falls back to SSH to sapidesecc8 to check service status.
Planned flag check: Before sending any alert, checks for /var/run/portal_planned_restart. If the flag exists, the outage is treated as planned and no alert is sent.
State tracking: Compares current state against /var/run/portal_watchdog_state.json to detect transitions (up → down, down → up).
Alerting: Emails are only sent on state changes (portal going down or coming back up), not on every check cycle.
DL recipients: Recipients are hardcoded in the script as a fallback, because the watchdog cannot read from the portal API if the portal is down.

Independent Parquet Writes

sapidess4 has gsutil and PyArrow installed, allowing the watchdog to write Parquet event files directly to gs://sap_cds_dbt/portal_log/ even when sapidesecc8 is down. This ensures activity logging continuity during portal outages.

Manual Commands

# Run watchdog manually
ssh root@sapidess4 "/usr/local/bin/portal_watchdog.py"

# View state
ssh root@sapidess4 "cat /var/run/portal_watchdog_state.json"

# View recent log
ssh root@sapidess4 "tail -20 /var/log/portal_watchdog.log"

# Check cron is active
ssh root@sapidess4 "crontab -l | grep watchdog"

# Set planned maintenance flag (prevents alerts)
ssh root@sapidess4 "touch /var/run/portal_planned_restart"

# Remove planned flag after maintenance
ssh root@sapidess4 "rm -f /var/run/portal_planned_restart"

Activity Logging

Overview

Every monitoring event across the SAP landscape is logged to both a local CSV file and individual Parquet files in GCS. This provides a durable audit trail for all service state changes, health checks, restarts, and watchdog events.

Storage Locations

Format	Location	Retention
CSV	`/var/log/sap_portal_activity.csv`	4-month retention (monthly cron cleanup)
Parquet	`gs://sap_cds_dbt/portal_log/`	Individual files per event

Schema

Column	Description
`date`	Event date (YYYY-MM-DD)
`time`	Event time (HH:MM:SS, local)
`timestamp_utc`	UTC timestamp (ISO 8601)
`source`	Origin of the event (e.g., health_check, watchdog, cockpit)
`type`	Event type (e.g., state_change, startup, alert)
`action`	What happened (e.g., start, stop, restart, check)
`server`	Target server hostname
`service`	Target service name
`result`	Outcome (e.g., success, failure, running, stopped)
`detail`	Additional context or error message

What Gets Logged

Server startups: Logged automatically when a web server restart is detected
Service state changes: Start, stop, restart of SAP, HANA, Oracle, and other monitored services
Health check alerts: State transitions detected by the health alert system
Watchdog events: Portal up/down transitions detected by the Portal Watchdog on sapidess4
Cockpit actions: Manual start/stop/restart operations from the Management Cockpits

Service Monitor

What is the Service Monitor?

The Service Monitor is a real-time dashboard that checks the health of all 6 servers and their critical services every 5 minutes. It runs as a cron job on sapidesecc8 and writes results to a JSON file that the web UI reads.

Access it at: SAP_Monitoring.html → Service Monitor

Servers & Services Monitored

Server	Services Checked	Check Method
sapidess4	SAP Dispatcher, Gateway, ICM, IGS HANA FIV (Nameserver, Indexserver, XSEngine, Compileserver) HANA PIT (Nameserver, Indexserver)	SSH + `sapcontrol -nr 03` (SAP) SSH + `sapcontrol -nr 00` (FIV) SSH + `sapcontrol -nr 96` (PIT)
sapidesecc8	SAP Dispatcher, Gateway, ICM Oracle Database Web Server (Portal)	Local `sapcontrol -nr 00` `ps -ef \| grep ora_pmon` `curl https://localhost/sap_skills/`
saprouter	Server reachability SAPRouter systemd service	SSH to `saprouter-internal` `systemctl is-active saprouter`
sap-sql-ides	Server reachability SAP Instance (SQ1) SQL Server Database	Ping `10.128.0.51` SOAP to `http://10.128.0.51:50013/` (sapcontrol) WinRM via `sq1_system_status` API
saphvrhub	Server reachability HVR Hub Service PostgreSQL 14	SSH to `saphvrhub` `systemctl is-active hvrhubserver` `systemctl is-active postgresql-14`
SSH Tunnel	Server reachability	SSH to `10.142.0.37`

Status Indicators

Color	Meaning
Green	Service is running / server is reachable
Red	Service is down / server is unreachable
Gray	Initial state — check in progress

Each server card also shows a REACHABLE or UNREACHABLE badge at the top right.

Health Check Script

Script Details

Property	Value
Script	`/usr/local/bin/sap_health_check.sh`
Runs on	`sapidesecc8`
Schedule	Every 5 minutes via cron (`/5 * * *`)
JSON Output	`/usr/sap/sap_skills/health_status.json`
History Log	`/var/log/sap_health_check.log`
Log Rotation	Auto-truncated to last 2000 lines

JSON Output Format

{
  "timestamp": "2026-04-14T05:16:19Z",
  "servers": {
    "sapidess4": {
      "reachable": true,
      "sap_app_server": {"dispatcher":1,"gateway":1,"icm":1,"igs":1},
      "hana_fiv": {"nameserver":1,"indexserver":1,"xsengine":3,"compileserver":1},
      "hana_pit": {"nameserver":1,"indexserver":0}
    },
    "sapidesecc8": {
      "reachable": true,
      "sap_app_server": {"dispatcher":1,"gateway":1,"icm":1},
      "oracle_db": true,
      "web_server": true
    },
    "saprouter": {"reachable": true, "saprouter_service": true},
    "sap_sql_ides": {"reachable": true, "sap_instance": true, "sql_server": true},
    "saphvrhub": {"reachable": true, "hvr_hub_service": true, "postgresql": true},
    "ssh_tunnel": {"reachable": true}
  }
}

Values of 1 or true = running. Values of 0 or false = down.

API Endpoints

Endpoint	Method	Description
`/sap_skills/health_status.json`	GET	Current health status JSON (static file, updated by cron)
`/sap_skills/api/admin/health_log`	POST	Last 50 lines of the health check history log

Health Alert System

After each health check, sap_health_alert.py runs automatically to detect and report state changes:

State comparison: Compares the current health status against the previous state to detect transitions (running → stopped, stopped → running).
Planned flag check: Before sending any alert, checks for planned maintenance flags. If a service or server is flagged as planned (via the cockpit or cron wrapper), no alert email is sent.
Alert emails: Emails are only sent for actual state changes (start, stop, restart), never for routine health checks where the state is unchanged.
Activity logging: Every state change is logged to CSV and Parquet (see Activity Logging).

Planned Cron Wrapper

The planned cron wrapper (/usr/local/bin/planned_cron_wrapper.py) sets planned maintenance flags before running scheduled tasks, so the health alert system does not send false alerts during expected downtime.

--also-flag option: Used when a scheduled action has indirect consequences on other services. For example, an Oracle offline_force backup also stops SAP services on the same server:

# Oracle offline_force backup: flags both Oracle AND SAP as planned
planned_cron_wrapper.py --also-flag sapidesecc8:sap sapidesecc8:oracle -- brbackup -t offline_force ...

This prevents false alerts for both the directly stopped Oracle database and the indirectly affected SAP application server.

Manual Commands

# Run health check manually
ssh root@sapidesecc8 "/usr/local/bin/sap_health_check.sh"

# View current status
ssh root@sapidesecc8 "cat /usr/sap/sap_skills/health_status.json"

# View health check history
ssh root@sapidesecc8 "tail -20 /var/log/sap_health_check.log"

# Check cron is active
ssh root@sapidesecc8 "crontab -l | grep health"