I started working on a text analysis crate during grad school. I’ve gotten some surface-level stuff in there already, but I’d like to flesh it out and add some more true NLP functionality.
Does this detect stylistic similarity between texts? I was thinking about porting a similar project that was developed to determine probable authorship. It would be quite useful as an open-source bot detector.
I started working on a text analysis crate during grad school. I’ve gotten some surface-level stuff in there already, but I’d like to flesh it out and add some more true NLP functionality.
Link: https://github.com/michael-long88/rnltk
Does this detect stylistic similarity between texts? I was thinking about porting a similar project that was developed to determine probable authorship. It would be quite useful as an open-source bot detector.
Nothing quite that advanced. It’s mostly just stemming, basic tokenization, TF-IDF, and cosine similarity at this point.