Having had chance to think about, and articulate some ideas as to how to deal with my data set, I started dividing it up into blogs posts by year. I like using Pandas for Python, although it can be difficult to find help with it that is pitched at the right level. Anyway, I separated out all the year from 2004 to 2017 and saved them in individual .csv files.
Than I had a go at clustering posts from 2017. With ‘only’ 230 blog posts, this was relatively easy in terms of processing using the hardware available on my laptop. I stuck with 10 clusters as I’d used this arbitrary number when I…
Continue reading at: