A Factom server monitoring bot for Authority Node Operators
The Factom M3 Governance roll out began in May 2018 and as balance is found in the network and code stability in these early days it means that Authority Node Operators (ANOs) must be on call 24 hours a day, seven days a week – ready to respond at a moments notice.
If the network is down it’s detrimental to all participants in the ecosystem and is the responsibility of these operators to minimize server downtime to the best of their abilities – even if it means a 3am wake up call (squeal). As challenging as this sounds, it’s what the ANOs signed up for and they will do what it takes for the Factom protocol.
To help with the management of this, one ANO, The Factoid Authority, has been working on a solution to minimize downtime and maximize sleep and today officially launched the TFA-Bot, which is now published on git for community consumption.
What does TFA-Bot do?
TFA-Bot functions to continuously monitor factomd nodes and trigger an alert in case of any issues. The bot will pick up on unresponsiveness, node stalls, network stalls and changes in latency. When an alert is triggered, admins are notified through Discord and by phone.
Who can use it?
It can monitor any factomd node on the mainnet or testnet, so it will be particularly useful for Authority Node Operators who need to be on continuous call.
How does it work?
TFA-Bot runs inside a docker-container. Users configure a Google Sheet to their specifications, which includes listing their servers and admin information, such as Discord ID, phone numbers, time zones etc. Using the information in the spreadsheet, TFA-Bot applies logic to notify admins based on experience and time zone. Being configurable, means it has the flexibility to meet the needs of virtually any ANO model. It also can determine who is on-call at any given time, which is handy for ANOs distributed over several time zones.
Users can interact with TFA-Bot through Discord, aided by a simple list of commands. For example, typing “help” will bring up the full list of commands:
“nodes” will bring up the servers being monitored
“users” to see who’s on duty
“alarm” gives the associated alarm controls
“We’re really happy with how the TFA-Bot has come together, it’s been a useful tool for The Factoid Authority and now it’s open-sourced for the benefit of other Operators. This will be especially useful for the those ANOs that are distributed geographically, as it offers flexibility to configure time zones and rostering.” said Stuart Johnson, TFA
The M3 rollout in May and June has began what will eventually be 65 Authority Nodes for the Factom protocol and serves as true decentralization for Factom.
For further information on the TFA-Bot please contact The Factoid Authority by email.