[napkin sketch]
-
Create 1-N test bot accounts that follow different subsets of each other. Have them post predictable content on a regular schedule.
-
Monitor the bot posting software. Have it report errors and latency, and store the ID of each post when it's generated.
-
Write a client that, authenticated as each bot, fetches various permutations of the bot's home timeline with/without min_id/max_id/limit set. Because each bot's posts are predictable you can predict the timeline contents for each query, using the status IDs stored earlier. Have the client report good/bad results and latency to the monitoring system.
-
Alert on errors, and if latency exceeds some threshold.
Monitoring logs for errors ultimately doesn't work, because you will only ever catch problems you already know to look for. Although it can be helpful / a stop gap.
The approach above tells you the bad behaviour your users are experiencing, irrespective of the cause of that bad behaviour.
Bonus: have bot accounts on multiple servers follow each other and do the same tests. Now you know if federation is working.
Bonus: call the bots mysterious names and make the content look weird (eg, numbers stations) and drive conspiracy theorists crazy, cf https://en.wikipedia.org/wiki/Webdriver_Torso
Bonus: make those api calls through your proxy layer and now you know if that's working.
Bonus: run the bots and verification software in different world locations, and now you have an idea of the latency seen by your users in different parts of the world.