How to Classify Data Without Markup

Yaroslav Murzaev

Data Scientist

Oct 21

iFunny users upload about 1,000,000 pieces of content to the app every day, including not only memes but also racism, violence, pornography, and other inappropriate material.

Previously, we checked all this manually, but now we are developing automatic moderation based on convolutional neural networks. We have already trained the system to divide content into three classes: it recognizes what can be included in user feeds, what needs to be removed, and what is hidden from the shared feed. To make the algorithms more accurate, we decided to add a specification for removing content that did not have such markup before.

Aleksandr Dzhumurat

Lead Data Scientist

May 20

Recommendation systems will always stay relevant — users want to see personalized content, the best of the catalog (in the case of our iFunny app — trending memes and jokes). Our team is testing dozens of hypotheses on how a smart feed can improve user experience. This article will tell you how we implemented the second-ranking level of the model above the collaborative one: what difficulties we encountered, and how they affected the metrics.

Yaroslav Murzaev

Data Scientist

Apr 20

The articles that will come in handy

Approximately 100,000 units of varying content come through our iFunny app daily, and every single one of them needs to be checked. We have already dealt with forbidden imagery by creating a classifier that automatically bans it. Next up — old memes, reuploads, and straight-up doubles that users try to sneak past the moderation.

To get rid of those, we have introduced a duplicate detection system. It had already gone through several iterations, but at some point, we realized it was impossible to put version-to-version improvements in proper perspective. And so we ventured into the Net, searching for books and articles that would allow us to examine currently existing approaches to duplicate detection and — most importantly — to their quality assessment. You can see what we’ve found below.

Yaroslav Murzaev

Data Scientist

Jul 7

Overview of self-supervised methods.

While the demand for neural networks is growing, most state-of-the-art approaches to adapt them to business needs often lag, hindered by insufficient or absent markup. Supervised learning is hardly feasible in this situation, and standard unsupervised methods won’t work for most of your tasks. This is where self-supervised plans come to the rescue. Depending on the task, they require next to no markup or none at all.

Aleksandr Dzhumurat

Lead Data Scientist

Jul 27

We’ve been long working on improving the user experience in UGC products with machine learning. Here are our ten key lessons of implementing recommendation systems in business to build a really good product.

Yaroslav Murzaev

Data Scientist

Sep 8

More information about us, as well as photos and videos in our public pages:

We plan to continue our growth and development by entering new markets and finding new business niches. Take a look at open positions. Perhaps there is one that is right for you!

If you know a passionate Software Developer who's looking for job opportunities, e-mail us at job@fun.co. In case of successful recommendation you will get a $3000 reference fee.

Vacancies

Articles

How to Classify Data Without Markup

Putting a two-layered recommendation system into production. Bonus: we reveal the dataset!

Detecting image duplicates

The articles that will come in handy

Deep Learning with a Small Training Batch (or Lack Thereof)

Ten Mistakes to Avoid When Creating a Recommendation System

Finding a picture in an image without marking it up?