Let’s start by thinking about how we learn new skills. Suppose you’ve downloaded and installed a new piece of software, and you’re loading it up for the first time. When you enter the interface, everything is new. As you explore the menus, options, and functions, you start to realize just how complicated this program really is, and you feel lost and confused.
Thankfully, the home page guides you to a short tutorial. There, you learn the basic rules for controlling the software. You learn how to navigate through it, how to make simple changes, and other preliminaries. Half an hour later, you’re back at the home page, and the program looks a little different than it did before. Sure, there’s still plenty that you don’t know or understand, but you’re feeling confident that you can figure it out.
Armed with foundational knowledge, you start clicking around. Your skills gradually grow, building on top of each other. You keep practicing, you make some mistakes, and soon you’ve come to master it.
Building an Algorithm: The Basics of Self-Supervised Learning
The process we’ve described is a lot like what AI engineers are trying to mimic when we build a class of algorithms called self-supervised learning. Essentially, we want to teach a computer to teach itself. That means we need to provide a launchpad, which we call pretext tasks, and then we let it loose; from there, it uses a built-in mechanism to collect clean data, learn the underlying concepts, and begin generalizing. After that, it snowballs through the downstream tasks and progressively becomes better and better at its assigned job. In this way, it goes from learning to deal with the least noisy data until it eventually can handle the messiest data that we can throw at it.
In his keynote address at AAAI 2020, Turing award winner and leading researcher Yan LeCun described self-supervised learning as “the idea of learning to represent the world before learning a task.”
This points to the reason why so many researchers are now interested in self-supervised learning. Not only is it an intuitive model for learning, but it also demonstrates this same progression within the AI field itself. It’s how we’re building on what we’ve learned so far about AI. That’s because self-supervised learning tries to synthesize supervised and unsupervised learning to compensate for their respective shortcomings and to create a better, more robust solution.
Self-supervised learning seeks to combine the best elements from both worlds to create a better solution. It only requires unlabeled data, but it can do much more than unsupervised learning because we start it off with a tutorial and the ability to teach itself.
The Process: Self-Supervised Learning from Start to End
Now let’s break down the self-supervised learning process. It starts with two key components: a sizable amount of unstructured data and an algorithm.
We start by encoding two sets of rules onto the algorithm. The first is for training the algorithm during the pretext task, where it will gain a foundation for moving forward. These rules let the machine find “clean data” within the sea of unorganized data. Let’s say, for instance, that we want to do some natural language processing (NLP) and teach our computer how to read emails. We’ll tell it to find messages that are nicely formatted, follow certain patterns, and are generally easier to parse.
The machine then treats this data subset as labeled data, and it learns from it in the same way that a supervised learning algorithm would. We’re left with a kernel of intelligence that’s ready to move into uncharted territory.
Before we send our computer into the wilds, however, our second ruleset takes effect. This adds noise to the data. It mixes things up, changes the data, and adds an element of chaos. We do this to prevent overfitting, to stop the machine from “thinking too narrowly” or “memorizing” the data. Let’s take a moment to explore this concept.
Another great analogy is studying for a math test. If you just memorized the answers from the back of the book, you’d fail on test day because you only knew the answers to specific problems instead of how to solve these problems for yourself. If, during your studies, you changed numbers, rearranged the syntax, or introduced new elements, you’d be forced to work through the problems on your own. By adding noise, you’ll learn to generalize your problem-solving approach. That’s what it means to prevent overfitting.
So, now that we’ve trained our machine on the clean data and the noise-added data, we’re ready to attack the more difficult data. The machine has learned to generalize and has learned the underlying concepts, all without any human labeled data. The self-supervised model is ready! The machine then snowballs through the data, parsing emails without any supervision. This is what ultimately gives results.
What kind of results? Let’s go through a real example of how Text IQ uses self-supervised learning to create genuine insights.
Example: Adding a Layer of Intelligence to Email Data
We begin with one of Text IQ’s advanced self-learning algorithms and a hard drive containing 5,000,000 emails. We want to search through these correspondences for potentially privileged, sensitive, or private data, but since there’s no structure, it’s impossible to find the information that we’re looking for without going through it by hand.
It’s time to set our algorithm to work. We’ll encode rules for the computer to find clean email chains, and then we’ll set parameters for noise-inducing randomization. For the first ruleset, we can include features like including full names and addresses in the “To:” and “From:” lines, as well as consistent formatting for the “Date:” line, and other similar features.
The machine then searches through all the data and pulls out the few emails that fit all our criteria. This becomes the tutorial, the clean data set from which our computer can learn the ropes and build that essential ground level.
From there, we add noise by mixing up the order of different fields, removing names, adding random names, and other chaotic elements that make the formatting difficult to parse. Once it’s done training on the noisy data, it dives into the ocean of emails at its disposal. When it’s done, we’re left with a layer of intelligence that’s derived from the data’s underlying structure. This enables us to make connections between people, documents, and concepts, find sensitive data, and complete our mission.
Conclusion: The Next Step in Artificial Intelligence
Self-supervised learning is an exciting advancement that’s just getting started. By overcoming the downfalls of both supervised and unsupervised learning, we’re opening up opportunities for greater levels of intelligence than ever before.
The use cases speak for themselves. Besides the example that we went through above, we’re seeing great strides in using self-supervised learning for NLP, as in the case of ULMFiT, a natural language process training approach that’s dramatically altering the landscape in a field that we see in household devices like Apple’s Siri and Amazon’s Alexa.
Researchers are also using self-supervised learning to improve computer vision programs. For instance, we’re making progress on robotic surgical platforms and teaching self-driving cars how to detect and respond to rough terrain. These problems are hard to solve with supervised learning due to the difficulty of generating enough high-quality labeled data, while these applications go well beyond the scope of what’s possible with unsupervised learning. That’s what makes self-supervised learning such an ideal candidate.
Self-supervised learning is a paradigm shift in AI technology, though it actually began with a 1989 paper by Jürgen Schmidhuber titled Making the World Differentiable. There he “described how the algorithm can be augmented by dynamic curiosity and boredom. This can be done by introducing (delayed) reinforcement for controller actions that increase the model network’s knowledge about the world. This in turn requires the model network to model its own ignorance, thus showing a rudimentary form of introspective behavior.”
This takeaway gets at what’s so profound about self-supervised learning. Curiosity and boredom are two forces that drive our own learning because they point to something outside of us, something that we don’t yet have or know, yet something that we can work toward. Knowing our own ignorance compels us to learn more and to grow, but opening our eyes to that Socratic wisdom remains a challenging prerequisite.
Ultimately, that’s what self-supervised learning is trying to do. We want machines to hunger for knowledge so that they can learn on their own.