Skip to main content
Altcraft Docs LogoAltcraft Docs Logo
User guide iconUser guide
Developer guide iconDeveloper guide
Admin guide iconAdmin guide
English
  • Русский
  • English
Login
    Getting StartedAdministrator documentationFunctional characteristics
      Technology descriptionarrow
    • Architecture OverviewComponent Description
        Deployment schemesarrow
      • Basic schemeFail-safe schemeTypical Placement in Infrastructure
    System requirements
      Admin Panelarrow
      • Account areaarrow
        • Accountsarrow
        • Account UsersAccount Virtual SendersAccount Database Indexes
        TariffsExternal data configurationLDAPTasksSchedule JobsGlobal Stop ListsWebversion Store Policies
        Settingsarrow
      • Databases
          Accessarrow
        • AdminsAPI tokens
        Notifiers
          MTAarrow
        • Default rulesRetry rulesLock rulesBounce patternsStrategiesKeysISPsPools
      Nodes
        Sendersarrow
      • EmailSMSEvent generatorIntegration with Altcraft Cloud SMTPIntegration with SendsayENS: настройка сендера
        Reportsarrow
      • Audit JournalData Usage
        Toolsarrow
      • ARF decoderURL decoderSMID decoderLicense
      Platform installationarrow
    • Automatic installationManual installationRunning the platform in a Docker container
      Platform configurationarrow
    • Configuration fileDomain settingsLDAP access configurationSending Email via SMTP relayPixel and push domain configurationCluster and Replication SetupSystem notifications configurationProcesses UNIX sockets configurationHTTPS ConfigurationMigrating from MongoDB Community Edition to Percona Server for MongoDBAdding sender IP addressesDeduplication request settingsPostgreSQL database for account dataProxy server settingsKeycloak Integration with AltcraftGetting HTTP service statusesConfiguring MongoDB log rotation
        Configuration of system constants and directoriesarrow
      • Filtering bot actionsDirectory of gender markers
      Platform maintenancearrow
    • Personnel requirementsPlatform maintenance processesPlatform updatingBackup and recoveryTransferring the platform to a new serverCreating, deleting, and populating tables for statistics in ClickHouseUsing the aktool utilityUsers and directories engaged by the platformPlatform service monitoringProcess and mailing monitoring via Prometheus
      Custom channels guidearrow
    • OverviewCreating and configuring a channelEntity field configurationTemplate languageEntities
        Pipesarrow
      • Pipe: Basic ConceptsResultErrorsLogPackUnpackHTTP RequestStore SetStore GetSelectorSQLEventerScheduler
        Pipelinesarrow
      • Pipeline: Basic ConceptsMessageScheduleListenerModerateStop
      Extraarrow
    • System page customizationSend Message IDClickHouse History Migration GuideInstructions for migrating history to ClickHouseUtility for importing push subscriptions to Firebase projectUtility for importing push subscriptions to Firebase project
    Processing HTTP/HTTPS traffic
      Administrator APIarrow
      • Accounts admin apiarrow
        • Restricted accessarrow
        • Account Activation and DeactivationAccount Freeze and Unfreeze
        Get accounts listAdd a new accountDelete the account
        Account usersarrow
      • Update an Existing AccountAdd a new userDelete a userGet a list of usersSending a Welcome Email
        Nodesarrow
      • Synchronize node MTA configurationGet nodes listGet node MTA statusActivate node MTADeactivate node MTA
        Senders admin apiarrow
      • Create or update AKMTA senderGet AKMTA sender informationAssign account to senderGet senders listDelete senderRestore sender
          Sender queuearrow
        • Get sender queue informationHold sender queueRelease sender queueClear sender queue
        Virtual sendersarrow
      • Get virtual senders listGet virtual sender informationCreate virtual senderUpdate virtual senderClone virtual senderDelete virtual sender
    Documentation Archive
  • Platform maintenance
  • Process and mailing monitoring via Prometheus

Process and mailing monitoring via Prometheus

Process and mailing monitoring via Prometheus is used to control performance, processing stability, and message queue status.

Collecting metrics allows you to:

  • track mailing processing speed and individual lead processing stages;
  • identify delays in scenario execution and data processing;
  • monitor message publishing status in RabbitMQ;
  • detect queue growth, re-sends, and lost events;
  • analyze load on campaign, procworkflow, and proctrigger processes;
  • build dashboards and configure alerts in Prometheus and Grafana;
  • find bottlenecks during performance degradation or mailing processing errors.

Which processes support metrics​

Currently, metrics are supported by the campaign, procworkflow, and proctrigger processes. Mailing metrics can be delivered in two ways:

  • via Pushgateway when campaign runs separately;
  • via the pull model inside procworkflow and proctrigger, if campaigns are executed within these processes.
Monitoring typeProcessesCollection method
Pullprocworkflow, proctriggerPrometheus scrapes /metrics
PushcampaignMetrics are sent to Pushgateway

Configuring pull metrics​

The pull model is used to collect metrics from the procworkflow and proctrigger processes. In this mode, Prometheus periodically queries the HTTP endpoint /metrics exposed by the corresponding process.

Metrics from procworkflow and proctrigger also include mailing metrics if mailings are executed within these processes.

Pull metrics configuration example​

Example platform configuration for enabling pull metrics for both procworkflow and proctrigger simultaneously:

{
"PROMETHEUS_METRICS": {
"ENABLE": true,
"PROCESSES": [
"procworkflow",
"proctrigger"
]
},
"WF_METRIC_HOST": "0.0.0.0",
"WF_METRIC_PORT": 8911,
"PROC_TRIGGER_METRIC_HOST": "0.0.0.0",
"PROC_TRIGGER_METRIC_PORT": 8912
}

If the PROCESSES array is empty ([]), metrics are automatically enabled for all supported processes.

Add metric scrape jobs to the Prometheus configuration:

scrape_configs:
- job_name: 'procworkflow'
metrics_path: /metrics
static_configs:
- targets:
- '10.200.5.25:8911'

- job_name: 'proctrigger'
metrics_path: /metrics
static_configs:
- targets:
- '10.200.5.25:8912'

After starting the processes, verify that the metric services are listening on the specified ports:

netstat -tlpn | grep 8911
netstat -tlpn | grep 8912

Example output:

tcp6       0      0 :::8911                 :::*                    LISTEN
tcp6 0 0 :::8912 :::* LISTEN

Check the availability of the /metrics endpoint:

curl http://10.200.5.25:8911/metrics
curl http://10.200.5.25:8912/metrics

If the services are configured correctly, the endpoint will return a list of Prometheus metrics for the procworkflow and proctrigger processes.

Pull metric parameters​

ParameterDescription
PROMETHEUS_METRICS.ENABLEGlobal enable of Prometheus metrics
PROMETHEUS_METRICS.PROCESSESList of processes for which metric collection is enabled
WF_METRIC_HOSTAddress on which procworkflow publishes metrics
WF_METRIC_PORTPort of the procworkflow metrics service
PROC_TRIGGER_METRIC_HOSTAddress on which proctrigger publishes metrics
PROC_TRIGGER_METRIC_PORTPort of the proctrigger metrics service

When configuring pull metrics, consider the Prometheus location relative to the platform server.

ValueWhen to use
127.0.0.1Prometheus is installed on the same server as the platform
0.0.0.0Prometheus is installed on a separate server

The WF_METRIC_HOST and PROC_TRIGGER_METRIC_HOST parameters define the internal address on which the processes will accept requests to the /metrics endpoint.

The WF_METRIC_PORT and PROC_TRIGGER_METRIC_PORT parameters set the metric service ports. You can use ports in the range 1024 to 9999, excluding ports occupied by other services.

caution

If Prometheus is located on a separate server, specify 0.0.0.0 in the WF_METRIC_HOST and PROC_TRIGGER_METRIC_HOST parameters.

Do not use the same port for WF_METRIC_PORT and PROC_TRIGGER_METRIC_PORT. Each process must serve metrics on a separate port.

If the metrics service does not start after changing WF_METRIC_HOST, WF_METRIC_PORT, PROC_TRIGGER_METRIC_HOST, or PROC_TRIGGER_METRIC_PORT, verify that the specified port is free and available on the server.

Configuring push metrics for campaign​

The push model is used to send metrics from the campaign process to the Prometheus Pushgateway.

In this mode, the campaign process independently sends metrics to the Pushgateway, after which Prometheus scrapes them from the gateway server.

info

The push model is supported only for the campaign process. Metrics will not appear in Pushgateway until the mailing has been started at least once.

Before configuration, you must deploy and start the Prometheus Pushgateway.

The platform does not start Pushgateway automatically. In the ADDRESS parameter, you must specify the address of an already running Pushgateway.

Pushgateway launch example via systemd​

Example unit file:

[Unit]
Description=Prometheus Pushgateway
Wants=network-online.target
After=network-online.target

[Service]
User=pushgateway
Group=pushgateway
Type=simple
ExecStart=/usr/local/bin/pushgateway

[Install]
WantedBy=multi-user.target

After starting Pushgateway, configure metric delivery in the platform configuration:

{
"PROMETHEUS_METRICS_PUSH_GATEWAY": {
"ENABLE": true,
"ADDRESS": "10.200.5.20:9091",
"PERIOD_SEC": 5
}
}

Parameter description:

ParameterDescription
ENABLEEnables sending metrics to Pushgateway
ADDRESSAddress and port of the already running Pushgateway
PERIOD_SECMetric send interval in seconds

Add Pushgateway to the Prometheus configuration:

scrape_configs:
- job_name: 'pushgateway'
static_configs:
- targets:
- '10.200.5.20:9091'

Check Pushgateway metrics availability:

curl http://10.200.5.20:9091/metrics

campaign metrics start appearing in the Pushgateway after the mailing is launched.

Grouping metrics by campaign ID​

The CAMPAIGN_ID_PROMETHEUS_GROUPING_ENABLE parameter controls grouping of push metrics by mailing ID.

Example configuration:

{
"CAMPAIGN_ID_PROMETHEUS_GROUPING_ENABLE": true
}
ValueBehavior
trueMetrics are grouped by mailing ID
falseAll metrics are sent to a single group

By default, the parameter is enabled (true).

When grouping is enabled, separate metric groups are created in the Pushgateway for each mailing. This simplifies:

  • analyzing performance of individual mailings;
  • building Grafana dashboards;
  • configuring alerting rules;
  • finding issues in a specific mailing.

When grouping is disabled, metrics from all mailings are aggregated into a single Pushgateway group.

RabbitMQ publisher metrics​

RabbitMQ publisher business metrics are available for the procworkflow and proctrigger processes.

These metrics are used to monitor message publishing, delivery confirmation time, and the number of retry attempts.

Configuring histogram bucket values​

For the total_duration and confirm_duration metrics, you can configure custom histogram bucket values.

Example configuration:

{
"PROMETHEUS_METRICS_RMQ_PUBLISHER": {
"MSEC_BUCKETS": {
"total_duration": [10, 25, 50, 75.5],
"confirm_duration": [10, 25, 50, 75.5]
}
}
}

Available metrics​

MetricDescription
total_durationTotal message publishing duration
confirm_durationPublishing confirmation duration
retry_countsNumber of retry attempts
retried_countNumber of messages sent with at least one retry
lost_failed_events_countNumber of messages discarded after exceeding the retry limit

Metric interpretation​

When analyzing RabbitMQ publisher metrics, pay attention to the following changes:

MetricPossible cause
Increase in confirm_durationRabbitMQ slowdown or network issues
Increase in retry_countsUnstable message delivery
Increase in retried_countElevated number of publishing errors
Non-zero lost_failed_events_countEvent loss after exceeding the retry limit

Mailing metrics​

Mailing metrics are used to monitor lead processing performance, individual stage execution time, and error counts during mailing execution.

The tables below list the main metrics. Actual names in Prometheus may contain additional prefixes, suffixes, and labels depending on the platform configuration.

Lag metrics​

MetricDescription
cursor_lag_millisecondsMailing processing lag relative to the current queue state

An increase in cursor_lag_milliseconds may indicate insufficient resources, queue overload, or slowed lead processing.

General lead processing metrics​

MetricDescription
lead_prepare_millisecondsLead preparation time
lead_processing_millisecondsLead processing time
lead_wait_millisecondsTotal wait time
lead_total_millisecondsTotal lead processing time

Stage processing metrics​

MetricDescription
lead_suppress_lists_check_millisecondsSuppress list check
lead_policy_check_millisecondsPolicy check
lead_static_millisecondsStatic data processing
lead_form_millisecondsForm processing
lead_relation_millisecondsRelations processing
lead_query_millisecondsQuery execution
lead_loyalty_millisecondsLoyalty data processing
lead_loyalty_program_millisecondsLoyalty program processing
lead_site_millisecondsSite data processing
lead_json_millisecondsJSON processing
lead_render_millisecondsContent rendering
lead_links_millisecondsLink generation
lead_sends_millisecondsMessage sending

Stage processing metrics are used to find bottlenecks during mailing execution.

Stage wait metrics​

MetricDescription
lead_suppress_lists_check_wait_millisecondsWait time for suppress list check
lead_policy_check_wait_millisecondsWait time for policy check
lead_static_wait_millisecondsWait time for static data processing
lead_form_wait_millisecondsWait time for form processing
lead_relation_wait_millisecondsWait time for Relations processing
lead_query_wait_millisecondsWait time for query execution
lead_loyalty_wait_millisecondsWait time for loyalty data
lead_loyalty_program_wait_millisecondsWait time for loyalty programs
lead_site_wait_millisecondsWait time for site data
lead_json_wait_millisecondsWait time for JSON processing
lead_render_wait_millisecondsWait time for rendering
lead_links_wait_millisecondsWait time for link generation

An increase in wait metrics typically indicates insufficient resources, locks, or overloaded dependent services.

Stage error metrics​

MetricDescription
lead_suppress_list_check_failure_countSuppress list check errors
lead_policy_check_failure_countPolicy check errors
lead_static_failure_countStatic data processing errors
lead_form_failure_countForm processing errors
lead_relation_failure_countRelations processing errors
lead_query_failure_countQuery execution errors
lead_loyalty_failure_countLoyalty data processing errors
lead_loyalty_program_failure_countLoyalty program processing errors
lead_site_failure_countSite data processing errors
lead_json_failure_countJSON processing errors
lead_render_failure_countRendering errors
lead_links_failure_countLink generation errors
lead_sends_failure_countMessage sending errors

An increase in failure metrics indicates errors at specific mailing processing stages and can be used to configure alerting rules in Prometheus and Grafana.

Monitoring verification​

After configuration, verify that Prometheus receives metrics.

For pull metrics, run the following requests:

curl http://10.200.5.25:8911/metrics
curl http://10.200.5.25:8912/metrics

For push metrics, check the Pushgateway:

curl http://10.200.5.20:9091/metrics

Also check the target status in the Prometheus UI. Jobs procworkflow, proctrigger, and pushgateway should show status UP.

Common issues​

IssuePossible causeWhat to check
Prometheus does not receive pull metricsProcess listens only on the local interfaceCheck WF_METRIC_HOST and PROC_TRIGGER_METRIC_HOST values
/metrics endpoint is unavailableMetrics service did not startCheck the port with netstat -tlpn
Metrics service does not startPort is occupied by another processSpecify a free port in the 1024–9999 range
No campaign metrics in PushgatewayMailing has not been launched yetLaunch the mailing and re-check
All campaign metrics fall into a single groupMailing ID grouping is disabledCheck CAMPAIGN_ID_PROMETHEUS_GROUPING_ENABLE
Last updated on Jun 4, 2026
Previous
Platform service monitoring
Next
Custom channels guide
  • Which processes support metrics
  • Configuring pull metrics
    • Pull metrics configuration example
    • Pull metric parameters
  • Configuring push metrics for campaign
    • Pushgateway launch example via systemd
    • Grouping metrics by campaign ID
  • RabbitMQ publisher metrics
    • Configuring histogram bucket values
    • Available metrics
    • Metric interpretation
  • Mailing metrics
    • Lag metrics
    • General lead processing metrics
    • Stage processing metrics
    • Stage wait metrics
    • Stage error metrics
  • Monitoring verification
  • Common issues
© 2015 - 2026 Altcraft, LLC. All rights reserved.