Posted on by Drew in news.

We can determine that the presentation talks about cake but isn’t actually about cake.

One of the things we’ve been working on lately with Notist is ways to make presentations more discoverable. We’ve been blown away with the great and varied decks and videos you’ve been adding to your profiles, and want to make sure that visitors can find those when browsing the site.

One way to do that is through the use of categories. Categorisation is a useful way of grouping similar items and making them discoverable, but it can fall short in a few key places.

The first danger of categorisation is if you use too few categories. A small number of categories result in very broad and general subject areas that encompass so much as to be almost useless. However, if you have too many categories, it becomes laborious for users to find the right category to use, and it can feel like a chore. Save for a few twisted individuals, most of us don’t like to manually put things into categories. It’s dull and uninteresting work.

These issues all stem from asking the user to categorise their own work, so we decided we didn’t want to do that. Instead, we decided to try out a method of automatically categorising presentations.

Notist already extracts text content from any uploaded PDFs, and slides can be titled and annotated by the user once uploaded. We also have presentation titles and descriptions to work with, so there’s plenty of text-based information about each presentation that we can use as a basis for analysis.

The process analyses the text for themes, and results in a list of topics along with a score giving the probability of that topic being a good match for the content. This is important, as just because a presentation mentions a topic, it doesn’t necessarily mean that it’s about that topic.

For example, this presentation by Andrew Smith is about the concept of Portals in the React Javascript library, and is themed around the Portal video game. By extracting topics from the text, we can see that the presentation categorises for Cake, but only at about a 20% probability. Because the probability is so low, we can determine that the presentation talks about cake but isn’t about cake. The cake is a lie.

For our purposes, we’ve decided to take a probability of about 80% as our watermark for topics that accurately describe the content. Anything less that 80% is noted, but not actively used just now.

The result is our new topics page, which lists some of the more common topics, sorted into tiers of popularity. We’ve also added some to the Explore page to help visitors to browse around. Diving into a topic then showcases firstly video, and then presentations about that topic.

In time, we’ll be displaying topics on the presentation pages too, but not before we have a user interface to enable you to manually check and tweak the list for your own presentations.

We hope this is a small but positive step towards helping to surface some of the great presentations and videos that are on the side to bring your work to a wider audience. If there are other ways you’d like to be able to explore content, be sure to let us know!

