I was working on figuring out what was going on with my instance and @firstname.lastname@example.org helped me figure it out.
One key thing as an admin is heading into your postgresql database and looking at the tasks. It seems like there's often good information in the tasks table as to what the system is thinking and why.
First, I logged into postgresql. I just started with:
Then I connected to my database using:
\c lotide [username] 127.0.0.1 [port]
The default port is 5432, for reasons known only to the illuminati, my database is running at 5433.
It will ask you for your password, then you'll be logged in.
you can see information about the tables in the database by typing
\d. You can find information about a specific table by typing
Initially I tried
select * from task; but it looked like it didn't work at first because the amount of characters in some of the columns are massive -- far larger than my screen so I was just scrollling and scrolling. When there is scrolling, postgresql is nice to you and gives you something you can scroll around with the arrow keys. In that case, the prompts changes to a : and you need to press q to quit that state and return to the normal sql prompt.
vpzom suggested I try
SELECT id, state, latest_error FROM task ORDER BY created_at DESC; which showed all the pending and completed requests.
Ultimately, I also tried
select id, state,kind,latest_error,attempts,max_attempts from task where state='pending'; which helped me understand not just that there were pending transactions, but in some cases why I had pending transactions. After I resolved the issue, I could see 3 spots where the transactions were still pending, but the error message indicated it was due to the remote node misbehaving so there wasn't anything I could do about that.
One final thing: when I was looking at error messages, I started in the system log using
journalctl -xe. the only error messages I saw there were ones thrown by systemd. In order to see the error messages thrown by lotide itself, I had to use
systemctl status lotide which seems to track the messages that would be displayed on the command line.
In my case, the backend didn't seem to be sending any messages or even trying. The attempts field said '0', whereas the completed transactions said at least 1. When I looked at systemctl I noted that especially after a reboot the log seemed to be trying to constantly restart lotide but it was complaining that it was already running or there was some other problem. This suggested to me that lotide had a process running in the background that hadn't correctly exited. I added a killall lotide command to the script I use to start the backend and immediately the tasks started running.
As an additional prophylactic measure because I'm not a full-time system administrator and therefore can't be around to tinker with my servers when I'm not around my computer, I added a line to my crontab to restart the lotide service every day to ensure it won't hang up. Might be overkill, but it appears that it's a simple solution that doesn't really have any downsides. The restart takes seconds and the site doesn't go down if federation restarts, so it's low risk for the potential reward of not having any problems while I'm away.