Click to learn more about author Paolo Tamagnini.
Welcome to the sixth episode of our Guided Labeling Blog Series. In the
last episode, we made an analogy with a number of “friends” labeling “movies”
with three different outcomes: “good movie” (?), “not seen movie”
( – ), “bad movie” (?). We have seen how we can train a
machine learning model, also predicting movies no friend has watched before and
adding to the model additional feature data about such movies.
The other episodes are here:
Let’s pick up where we left off.
You can blend friends’ movie opinions into a single model,
but how is this useful if you don’t have any labels to train a generic
supervised model? How can weak supervision become an alternative to active
learning in a generic classification task? How can this analogy with many “friends”
labeling “movies” work better than a single human expert like in active
Weak Supervision Instead of Active
The key feature that differentiates active learning from
weak supervision is the source of the labels we are using to train a generic
classification model from an unlabeled dataset.
Unique vs. Flexible
In active learning, the source of labels — referred to in
literature as the “oracle” — is usually quite unique, making it expensive and
hard to find. This can be an expensive experiment, but, more often than not, we
are talking about a subject matter expert (SME) that is a human with domain
expertise. In weak supervision, the weak source can be a human with less
expertise who makes mistakes but also something else like a heuristic, which
applies only to a subset of the dataset.
IF “movie budget category” is “low”
AND “actor popularity” is “none”:
MOVIE LABEL = “?”
MOVIE LABEL = “-”
Of course, this rule (or heuristic) is not accurate at all
and only applies to some movies, but this can be thought of as a weak source in
weak supervision and considered a labeling function. In most cases, you will
need an expensive human expert to build those heuristics, but this is still
less time consuming than manual labeling work. Once you have a set of
heuristics, you can apply them to millions of data points within a few seconds.
Solid vs. Weak
While in active learning, the label source theoretically
always provides a 100 percent accurate label, in weak supervision, we can have
weak sources that cannot label all samples and can be less accurate.
Single vs. Multiple
Active learning is usually described as a system counting on
a single and expensive source of labels. Weak supervision counts on many not so
Human-in-the-Loop vs. Prior Model Training
In active learning, the labels are provided as the model improves within the human-in-the-loop process. In comparison, in weak supervision, the noisy labels are provided from all weak sources before the model is trained.
From Movie Opinions to Any Classification
Our example about blending movie opinions from people was
helpful in explaining the weak supervision framework on an intuitive example.
However, for movie recommendation use cases, there are better algorithms than
weak supervision (e.g., collaborative filtering). Weak supervision is powerful
because it can be used anywhere where:
- There is a classification task to be solved
- You want to use supervised machine learning
- The dataset to train your model is unlabeled
- You can use weak label sources
Those requirements are quite flexible, making weak
supervision versatile for a number of use cases where active learning would
have been far more time-consuming in terms of manual labeling.
Your unlabeled dataset of documents, images, or customer
data can have weak label sources just like you had “opinions from friends”
on “movies.” These “friends” can be considered labeling
functions that can label only a subset of your rows (in the example,
that would be only those “movies” they have watched) with accuracy
better than random. The “opinions” we had (“?” or “?”) are the output labels of
the labeling functions.
We can then extend this solution to any machine learning classification problem with missing labels. Those output labels can be only two for binary classification, like in our example, or even more for the multi-class problem. If a labeling function is not able to label a sample, it can output a missing value (“–”).
While in active learning, the expensive expert was providing
labels row by row; in weak supervision, we can simply ask the expert to provide
a number of labeling functions. By labeling function, we mean any heuristic
that, in the expert opinion, can correctly label a subset of labels. The expert
should provide as many labeling functions as possible that cover as many rows
as possible with as high an accuracy as possible (see Figure 1 below).
Labeling functions are only one example of weak label sources, though. You can, for example, use predictions of an old model, which was only working for old data points in the training set. You can blend with a public dataset or with information crawled from the internet or ask cheaper non-experts to label your data and treat them as weak label sources. Any strategy that can label a subset of your rows with accuracy better than random labeling can be added to your weak supervision input. The theory behind the Label Model (Figure 1) algorithm requires all label sources to be independent. However, recent research shows that this requirement holds even with a wide variety of weak label sources.
When dealing with tons of data and no labels at all, weak
supervision’s flexibility in blending knowledge from different generic sources can
be a solution in training an accurate model without asking any expensive expert
to label thousands of samples.
In the next Guided
Labeling Blog Post episode, we will look at how to train a document
classifier in this way, using movie reviews: one more movie example via
interactive views! Stay tuned!
This is an on-going series on
guided labeling; see each episode at:
Credit: Source link