We process a lot of data at bitHound. Currently we analyze thousands upon thousands of commits a day in our backend distributed processes.
As with any project the size of bitHound, things can and will go wrong. We started to experience exceptional cases where our analysis would break. It could be anything from hardware / network problems, invalid
package.json data or even our chaos monkey scripts killing processes and servers randomly to test our fault handling code.
We quickly wanted a way to triage and track these failed analysis runs so we started putting them into a Trello board.
This worked fine but like any data driven shop we wanted to automate it. With what started as a collection of a few shell scripts grew into a rather fun and neat use of Trello.
One shell script to rule them all
When a repository fails all of its initial retries for any reason, it will automatically add a card to a failed analysis Trello board in the Backlog list (with
trello.sh -p). The card has some initial metadata about where / when / what was attempted to allow for developer investigation if needed (
metadata.sh | trello.sh -m).
Example metadata to help with diagnosing errors
And now lets automate everything!
Any cards that are in the Backlog list are processed again 🎆automatically🎆 every hour (
trello.sh -l Backlog | queue.sh). If the repository is successfully processed the card will be archived with a comment (
trello.sh -a). It is not uncommon for a project to end up in the Backlog list and reprocessed successfully before a human even looks at it.
If the card is continuing to fail we can drag it into another list. These lists act as placeholders for us to work on and group known issues and fix the codebase. These lists are not automatically processed and allow us to work on a fix and deploy it to production before procssing them again. The lists also allow us to blacklist processing of the cards (with the
trello.sh -f command) so that other automated processes do not retry before we are ready.
Our use of Trello started small and has grown into a key tool for us to manage unexpected issues in our backend processing. The API is really easy to use and we were able to get a lot of power out of it with just some creative bash commands and cURL requests.