Monitoring Substrate node (Polkadot, Kusama, parachains) — Validator guide
Edit 2023: this article becomes outdated and I am not maintaining this dashboard anymore. For proper monitoring, I strongly suggest to refer to the new monitoring stack article here with a brand new dashboard, much more relevant than this one.

The Polkadot ecosystem has raised an incredible attention in the past few months and this is just the beginning. Many individuals joined the adventure and set up their own validator node, which is extraordinary for decentralization. However, maintaining a validator node is a huge responsibility as you are basically securing millions of dollars on the blockchain. Health and security of your node have to be the top priority for you as a validator.
This guide provides helpful monitoring and alerting content for validators. The examples provided use the Plasm Network node, but most of the configuration is exactly the same for a Polkadot, Kusama, or any Substrate based node (also known as parachain).
However, monitoring is of course not all, security has to be considered very carefully. You should review our secure SSH guide for implementing basic security to your connection.
Here are the different steps we will walk through:
- General understanding
- Set SSH tunneling
- Installation
- Configuration
- Setting services
- Test Alert Manager
- Run Grafana dashboard
- Conclusion
You can also find extra sections at the end of this tutorial:
For more convenience, all parameters you should change with your own value or can change with time are identified in bold inside code blocks:
Example of a value you should change
This guide was created using Ubuntu 20.04 LTS with a Plasm node on the server side and Debian 10 Buster on the client side.
General understanding
Here is how our final configuration will look like at the end of this guide.

- Prometheus is the central module; it pulls metrics from different sources to provide them to the Grafana dashboard and Alert Manager.
- Grafana is the visual dashboard tool that we access from the outside (through SSH tunnel to keep the node secure).
- Alert Manager listens to Prometheus metrics and pushes an alert as soon as a threshold is crossed (CPU % usage for example).
- Your Substrate node (any Polkadot based blockchain) natively provides metrics for monitoring.
- Node exporter provides hardware metrics of the dashboard.
- Process exporter provides processes metrics for the dashboard (optional).
Since you are running a production node, it is very important not to expose open ports to the outside (moreover a http port). A secure way to avoid that is to set SSH tunneling, let’s start with that.
Set SSH tunneling
Grafana runs an HTTP server on your local node so basically, we shouldn’t access it directly from the outside.
SSH tunneling is considered to be a safe way to make traffic transit from your node to your local computer (or even phone). The principle is to make the SSH client listen to a specific port on your local machine, encrypt traffic through SSH protocol and forward it to the target port on your node.
Of course, you could also configure Grafana to run a HTTPS server but we do not want to expose another open port. Since our data will be encrypted with SSH, we do not need HTTPS.
Once we have finished installing Grafana on our node, we will access it through this address on our local machine: http://localhost:2022
If you are using Putty to connect, jump directly to this part.
Open SSH
When using OpenSSH on your client machine, the arguments look like this:
-L 2022:localhost:3000
-L
for a local port forwarding2022
is the local port we arbitrary chose (please use a different unused local port inside the range 1024–49151)3000
is Grafana’s port
Assuming you already followed and set up our article for securing SSH access, you will connect to the node with your private key, the full command will look like this:
ssh -i ~/.ssh/id_ed25519 <user@server ip> -p 2021 -L 2022:localhost:3000
id_ed25519
is our local private key2021
is the custom SSH port we configured to connect to our node
Great, now once we have finished installing Grafana on our node, we will access it through this address on our local machine:
http://localhost:2022
Automating OpenSSH connection
Remembering all parameters of the ssh command can really be a pain, especially if you have several nodes to maintain.
So, let’s just create a little a config file with our parameters:
touch ~/.ssh/config
nano ~/.ssh/config
In this file, we will add the parameters for our node, including port forwarding:
Host node
HostName 66.66.66.66
Port 2021
User bld
IdentityFile ~/.ssh/id_ed25519
LocalForward 2022 localhost:3000
ServerAliveInterval 120
HostName
is your node IP address.Port 2021
is the SSH port we use to connect to our node (we changed and closed the default port 21, remember?).- We also add a
keep alive
parameter every 2mn to keep the session active.
Putty
As Putty is a very popular client usable on many OS, here is where you can configure local port forwarding. You do not need this part if you use OpenSSH to connect.

Inside the SSH > Tunnel’s menu, just add the local port and destination then click Add.
2022
is the local port we arbitrary chose (please use a different unused local port inside the range 1024–49151)3000
is Grafana’s port
Don’t forget to save the session.
Installation
To save your precious time, we added &&
after each line of long blocks. Of course, you can remove those and copy/paste lines one by one if you like to suffer :-)
Let’s start with the prerequisites:
sudo apt update && sudo apt upgrade
sudo apt install -y adduser libfontconfig1
Download the latest releases. Please check Prometheus, Process exporter and Grafana download pages.
wget https://github.com/prometheus/prometheus/releases/download/v2.32.0/prometheus-2.32.0.linux-amd64.tar.gz &&wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz &&wget https://github.com/ncabatoff/process-exporter/releases/download/v0.7.10/process-exporter-0.7.10.linux-amd64.tar.gz &&wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz &&wget https://dl.grafana.com/oss/release/grafana_8.3.3_amd64.deb
Extract the downloaded files:
tar xvf prometheus-*.tar.gz &&
tar xvf node_exporter-*.tar.gz &&
tar xvf process-exporter-*.tar.gz &&
tar xvf alertmanager-*.tar.gz &&
sudo dpkg -i grafana*.deb
Copy the extracted files into /usr/local/bin
:
sudo cp ./prometheus-*.linux-amd64/prometheus /usr/local/bin/ &&sudo cp ./prometheus-*.linux-amd64/promtool /usr/local/bin/ &&sudo cp -r ./prometheus-*.linux-amd64/consoles /etc/prometheus &&sudo cp -r ./prometheus-*.linux-amd64/console_libraries /etc/prometheus &&sudo cp ./node_exporter-*.linux-amd64/node_exporter /usr/local/bin/ &&sudo cp ./process-exporter-*.linux-amd64/process-exporter /usr/local/bin/ &&sudo cp ./alertmanager-*.linux-amd64/alertmanager /usr/local/bin/ &&sudo cp ./alertmanager-*.linux-amd64/amtool /usr/local/bin/
Install the Alert manager plugin for Grafana:
sudo grafana-cli plugins install camptocamp-prometheus-alertmanager-datasource
Create dedicated users:
sudo useradd --no-create-home --shell /usr/sbin/nologin prometheus &&
sudo useradd --no-create-home --shell /usr/sbin/nologin node_exporter &&
sudo useradd --no-create-home --shell /usr/sbin/nologin process-exporter &&
sudo useradd --no-create-home --shell /usr/sbin/nologin alertmanager
Create directories for Prometheus, Process exporter and Alert manager:
sudo mkdir /var/lib/prometheus &&
sudo mkdir /etc/process-exporter &&
sudo mkdir /etc/alertmanager &&
sudo mkdir /var/lib/alertmanager
Change the ownership for all directories:
sudo chown prometheus:prometheus /etc/prometheus/ -R &&
sudo chown prometheus:prometheus /var/lib/prometheus/ -R &&
sudo chown prometheus:prometheus /usr/local/bin/prometheus &&
sudo chown prometheus:prometheus /usr/local/bin/promtool &&
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter &&
sudo chown process-exporter:process-exporter /etc/process-exporter -R &&
sudo chown process-exporter:process-exporter /usr/local/bin/process-exporter &&
sudo chown alertmanager:alertmanager /etc/alertmanager/ -R &&
sudo chown alertmanager:alertmanager /var/lib/alertmanager/ -R &&
sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager &&
sudo chown alertmanager:alertmanager /usr/local/bin/amtool
Finally, clean up the download directory:
rm -rf ./prometheus* &&
rm -rf ./node_exporter* &&
rm -rf ./process-exporter* &&
rm -rf ./alertmanager* &&
rm -rf ./grafana*
Exhausting right? We grouped all install once and for all, now let’s have some fun with configuration.
Configuration
Prometheus
Let’s edit the Prometheus config file and add all the modules in it:
sudo nano /etc/prometheus/prometheus.yml
Add the following code to the file and save:
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- 'rules.yml'
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scrape_configs:
- job_name: "prometheus"
scrape_interval: 5s
static_configs:
- targets: ["localhost:9090"]
- job_name: "substrate_node"
scrape_interval: 5s
static_configs:
- targets: ["localhost:9615"]
- job_name: "node_exporter"
scrape_interval: 5s
static_configs:
- targets: ["localhost:9100"]
- job_name: "process-exporter"
scrape_interval: 5s
static_configs:
- targets: ["localhost:9256"]
scrape_interval
defines how often Prometheus scrapes targets, whileevaluation_interval
controls how often the software will evaluate rules.rule_files
set the location of Alert manager rules we will add next.alerting
contains the alert manager target.scrape_configs
contain the services Prometheus will monitor.
You can notice the first scrap where Prometheus monitors itself.
Alert rules
Let’s create the rules.yml
file that will give the rules for Alert manager:
sudo touch /etc/prometheus/rules.yml
sudo nano /etc/prometheus/rules.yml
We are going to create 2 basic rules that will trigger an alert in case the instance is down or the CPU usage crosses 80%. Add the following lines and save the file:
groups:
- name: alert_rules
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Instance $labels.instance down"
description: "[{{ $labels.instance }}] of job [{{ $labels.job }}] has been down for more than 1 minute."
- alert: HostHighCpuLoad
expr: 100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
for: 0m
labels:
severity: warning
annotations:
summary: Host high CPU load (instance bLd Kusama)
description: "CPU load is > 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
The criteria for triggering an alert are set in the expr:
part. To create your own alerts, you’re going to have to learn and test the different variables provided to Prometheus by the services we are setting up. There is (almost) an infinite number of possibilities to personalize your alerts.
As this part can be time-consuming to learn and build, we have shared a list of alerts we like to use. Please feel free to reach us if you have an interesting one you would like to set.
You should also have a look at alerts provided by Parity.
Then, check the rules file:
promtool check rules /etc/prometheus/rules.yml
And finally, check the Prometheus config file:
promtool check config /etc/prometheus/prometheus.yml
Process exporter
Process exporter needs a little config file to be told which processes they should take into account:
sudo touch /etc/process-exporter/config.yml
sudo nano /etc/process-exporter/config.yml
Add the following code to the file and save:
process_names:
- name: "{{.Comm}}"
cmdline:
- '.+'
Gmail setup
We will use a Gmail address to send the alert emails. For that, we will need to generate an app password from our Gmail account.
Note: we recommend you here to use a dedicated email address for your alerts.
Google has this bad habit to change its interface pretty often so, instead of giving you the detailed steps here and make this guide outdated next month, we are cowardly sending you to the Gmail app password procedure page :-)

The result will look like that (sorry for the French screen, I was a little lazy to change my whole account language setting). Copy the password and save it for later.
Alert manager
The Alert manager config file is used to set the external service that will be called when an alert is triggered. Here, we are going to use the Gmail notification created previously.
Let’s create the file:
sudo touch /etc/alertmanager/alertmanager.yml
sudo nano /etc/alertmanager/alertmanager.yml
And add the Gmail configuration to it and save the file:
global:
resolve_timeout: 1m
route:
receiver: 'gmail-notifications'
receivers:
- name: 'gmail-notifications'
email_configs:
- to: 'mydedicatednodealertaddress@protonmail.com'
from: 'bldnodes@gmail.com'
smarthost: 'smtp.gmail.com:587'
auth_username: 'bldnodes@gmail.com'
auth_identity: 'bldnodes@gmail.com'
auth_password: 'yrymyemufalyjing'
send_resolved: true
Of course, you have to change the email addresses and the auth_password
with the one generated from Google previously (you didn’t seriously think we were going to let this password be active right? :-))
Here you can notice we use a different address between sender and receiver, this is actually a little useful trick that lets you install a dedicated application (Protonmail is very cool) to receive alerts even on the phone. This way, you will know in the minute if something goes wrong with your node without being disturbed by other emails!
Note: the email notification is just an example, you can push notifications to many different services! Have a search with DuckDuckGo (enough with big G here) you will love what you can find.
Setting services
Starting all programs manually is such a pain, moreover with all we have here. So we are going to take a few minutes to create the systemd
services.
Creating those services will allow a fully automated process that you will never have to do again in case your node reboots.
I know, it’s long to create them one by one but after all, you’re maintaining a node, you can’t be a lazy person :-)
Node
Let’s start simple, in case you didn’t set a service for your node, this is highly recommended.
Warning: you should not do that if your node is actively validation unless you know exactly what you do. This will cause a chain resync because we change the chain storage directory.
Note: once again, this guide was made with a Plasm node. If you use Polkadot, Kusama or any other Substrate node, you just have to adapt the bold values for this one.
Create a dedicated user for the node binary and copy it to /user/sbin:
sudo useradd --no-create-home --shell /usr/sbin/nologin plasm &&
sudo cp ./plasm /usr/local/bin &&
sudo chown plasm:plasm /usr/local/bin/plasm
Create a dedicated directory for the chain storage data:
sudo mkdir /var/lib/plasm &&
sudo chown plasm:plasm /var/lib/plasm
Create and open the node service file:
sudo touch /etc/systemd/system/plasm.service &&
sudo nano /etc/systemd/system/plasm.service
Add the lines matching your node configuration:
[Unit]
Description=Plasm Validator
[Service]
User=plasm
Group=plasmExecStart=/usr/local/bin/plasm \
--validator \
--rpc-cors all \
--name <Your Validator Name> \
--base-path /var/lib/plasmRestart=always
RestartSec=120
[Install]
WantedBy=multi-user.target
Reload daemon, start and check the service:
sudo systemctl daemon-reload
sudo systemctl start plasm.service
sudo systemctl status plasm.service
If everything is working fine, activate the service:
sudo systemctl enable plasm.service
In case of trouble, check the service log with:
journalctl -f -u plasm -n100
Note: if your chain was previously synced somewhere else, purge it:
/usr/local/bin/plasm purge-chain
For the next ones, we are going to do it all in a row.
Prometheus
Create and open the Prometheus service file:
sudo touch /etc/systemd/system/prometheus.service &&
sudo nano /etc/systemd/system/prometheus.service
Add the following lines:
[Unit]
Description=Prometheus Monitoring
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
ExecReload=/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
Node exporter
Create and open the Node exporter service file:
sudo touch /etc/systemd/system/node_exporter.service &&
sudo nano /etc/systemd/system/node_exporter.service
Add the following lines:
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
Process exporter
Create and open the Process exporter service file:
sudo touch /etc/systemd/system/process-exporter.service &&
sudo nano /etc/systemd/system/process-exporter.service
Add the following lines:
[Unit]
Description=Process Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=process-exporter
Group=process-exporter
Type=simple
ExecStart=/usr/local/bin/process-exporter \
--config.path /etc/process-exporter/config.yml
[Install]
WantedBy=multi-user.target
Alert manager
Create and open the Alert manager service file:
sudo touch /etc/systemd/system/alertmanager.service &&
sudo nano /etc/systemd/system/alertmanager.service
Add the following lines:
[Unit]
Description=AlertManager Server Service
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
--config.file /etc/alertmanager/alertmanager.yml \
--storage.path /var/lib/alertmanager \
--web.external-url=http://localhost:9093 \
--cluster.advertise-address='0.0.0.0:9093'
[Install]
WantedBy=multi-user.target
Wow, it’s been a bunch of configurations right? Good news: if you did everything right, we are ready to fire the engine and test it.
Grafana
The Grafana’s service is automatically created during extraction of the deb
package, you do not need to create it manually.
Launch and activate services
Launch a daemon reload to take the services into account in systemd
:
sudo systemctl daemon-reload
Start the services:
sudo systemctl start prometheus.service &&
sudo systemctl start node_exporter.service &&
sudo systemctl start process-exporter.service &&
sudo systemctl start alertmanager.service &&
sudo systemctl start grafana-server
And check that they are working fine, one by one:
systemctl status prometheus.service
systemctl status node_exporter.service
systemctl status process-exporter.service
systemctl status alertmanager.service
systemctl status grafana-server
A service working fine should look like this:

When everything is okay, activate the services!
sudo systemctl enable prometheus.service &&
sudo systemctl enable node_exporter.service &&
sudo systemctl enable process-exporter.service &&
sudo systemctl enable alertmanager.service &&
sudo systemctl enable grafana-server
Test Alert manager
Run this command to fire an alert:
curl -H "Content-Type: application/json" -d '[{"labels":{"alertname":"Test"}}]' localhost:9093/api/v1/alerts
Check your inbox, you have a surprise:

You will always receive a Firing alert first, then a Resolved notification to indicate the alert isn’t active anymore.
Run Grafana dashboard
Now is the time to get the most visual part: the monitoring dashboard.
From the browser on your local machine, connect to the custom port on localhost that we have set at the beginning of this guide:
http://localhost:2022

Enter the default user admin
and password admin
then change password.

Add datasources
Open the Settings menu:

Click on Data Sources:

Click on Add data source:

Select Prometheus:

Just fill the URL with http://localhost:9090 and click Save & Test.
Then add a new data source and search for Alert manager

Fill the URL with http://localhost:9093 and click Save & Test.

Now you have your 2 data sources set like that:

Import the dashboard
Open the New menu:

Click on Import:

Select our favorite dashboard 13840 (we created this one just for you:-)) and click Load:

Select the Prometheus and AlertManager sources and click Import.

In the dashboard selection, make sure you select:
- Chain Metrics:
polkadot
for a Polkadot/Kusama node,substrate
for any other parachain node - Chain Instance Host:
localhost:9615
to point the chain data scrapper - Chain Process Name: the name of your node binary
And here you go, everything is set!

Easy right? Just think about saving the dashboard once parameters are set and work.

Note: you can also consider the Parity’s dashboards for an advanced monitoring and analysis.
Conclusion
We learned in this tutorial how to set a full monitoring solution with the most useful modules: Prometheus, Node exporter, Process exporter, Alert manager and Grafana.
There are many great guides all over the web that will be much more detailed but we wanted to provide this one that would be an ‘all-in-one’ solution.
The dashboard we build is mostly a compilation from different existing ones that we customized, added features that we found interesting for Substrate node validators. If you like it, we would love you to post a review on Grafana’s website and more importantly, send us your feedback so that we can improve it with needs from the community.
- Twitter : @bLdNodes
- Mail : bldnodes@gmail.com
- Matrix : @bld759:matrix.org
Enjoy your node!
Advanced usage of Prometheus
Prometheus gives an incredible number of options, but you need to be familiar with queries to use it.
If you would like to test all tools provided by Prometheus, you will have to forward another local port to the server’s port 9090 (Prometheus service):
-L 2023:localhost:9090
Then access the Prometheus interface from your local port: http://localhost:2023

From here you can test queries, check status, alerts…
Useful tips
You should always start by checking the service(s):
systemctl status prometheus.service
A service working fine should look like this:

After a change in a service, you always have to launch a daemon reload:
sudo systemctl daemon-reload
You can get a longer history log of your node process by using
journalctl -f -u <node service name> -n100
<node service name>
:polkadot-validator
,plasm
… use the name of your node service file (without the .service)- -n100 : number of lines to display
Troubleshooting
The list below is under construction and will be updated with feedback from the community so please reach us to share, this will help others!
Port already in use
You could have started a program manually, having a conflict with your service. If you have this type of error, check the processes using the port (for example here, port 9090 for Prometheus):
sudo lsof -i:9090
If you see something here, you can kill it:
sudo lsof -ti:9090 | sudo xargs kill -9
Cannot listen to port (OpenSSH)
If you didn’t exit properly from a previous connection and start ssh connection again, you may encounter this message:
channel_setup_fwd_listener_tcpip: cannot listen to port: 2022
In this case, on the client-side, close ssh connection and kill the port forwarding process still running:
lsof -ti:2022 | xargs kill -9