I hadn't done any lo-fi text analysis for ages, and since we've been using Slack at my job for a little while now, it was time to get munging!
My MVP was really simplistic, but actually surprisingly good. I didn't record the output of the implementation at the time, but in summary it used the Jaccard Index between channel memberships as a measure of similarity.
I then did a fairly standard thing of inverting this into a distance measure, and visualising this with a D3 Force Layout. So far, so good, without even touching any messages!
It focusses on doing a small number of jobs, but well. I've been playing about with its word2vec implementation in another project, but here I used TF-IDF + Latent Semantic Indexing to produce channel similarities:
The procedure is similar to my MVP, except I treat each channel as a document, consisting of a big bag of words from up to the last 1000 messages. TF-IDF is applied to boost up effect of unusual words and LSI extracts topics. I then compute all channel to channel similarities, convert to distances, and visualise as a force layout.
To run this, you literally only need Python and a Slack
API_TOKEN. I would love to see visualisations of other people's Slack channels. Please share them, or ping me on Twitter, and I can host them on Github!