If looking for the latest news, go here…
Banking on data
University of Washington Computer Science Professor Pedro Domingos (Ph.D. 1997) is today at the forefront of machine learning developments—making computers more intelligent, with less human input—but back in 1994, he was an ICS graduate student under Professor Emeritus Dennis Kibler. While most computer science schools had yet to probe the depths of machine learning potential, several ICS faculty were hard at work developing machine learning methodologies. Domingos credits the school’s groundbreaking work as a catapult into his own research in the field.
His book, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World, investigates the five tribes of machine learning and the effort to unify them into a master algorithm, or a grand unified theory akin to the standard system in physics. Once we reach this point, Domingos posits, we’ll have access to exhaustive, integrated personal digital models that function as extensions of our brains, recommending not just movies or books, but jobs, homes and even romantic dates. We’re not far from achieving this, Domingos says, but we’ll have to address crucial privacy challenges and political concerns along the way.
In a recent ACM webinar, Domingos detailed the current five approaches of machine learning and the breakneck pace companies are moving to unify them. He gives us further insights in the interview below.
You write about a personal model assembled out of all your data in your book, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World. Can you elaborate on this?
Today what we have is a bunch of recommender systems that recommend things for you based on little pieces of data that you leave in different places. Netflix recommends movies based on the reviews you left for other movies you’ve seen. Amazon recommends products, Facebook selects updates, Twitter selects tweets and Google selects search results. None of this is really what you’d like to have, though. What would be ideal is a recommender system that recommends everything based on all of the data you generate. Then, it knows you very well as opposed to just knowing one small side of you, allowing it to make much better recommendations. It can recommend not just books or movies, but jobs, dates, houses and travel destinations—everything that you do, in every stage of your life.
In order for that to happen, there’s a number of things that would have to occur first. All of that data needs to come together in one place, which is not the current case, because different companies have different parts of your data. We need machine learning algorithms that are actually able to synthesize a picture of you that is accurate from all of that data, and we’re not quite there yet.
In an essay you wrote for The Wall Street Journal you claim that we might reach this point in a matter of years, not decades.
Yes. Companies like Apple, Google, Microsoft, Facebook and Amazon are all trying to do this. They are in a race to be the first ones to do this. They believe that whoever reaches that point first will control the next era of information technology. They’re probably right: Once there is a model of you with a triage of everything that you could possibly do, everything is going to revolve around it.
The companies who make smartphones have an advantage, because that’s where most of your data goes through today. Google, actually, is in the pole position, because the biggest percentage of smartphone users today are Android users. Microsoft has Cortana, but they have a smaller user base. Facebook and Amazon have different approaches to this. Amazon has Echo, which sits in your home and listens to your voice. They’re all trying to do this, both on the side of assembling the data, and creating the learning algorithms. It’s hard to predict exactly how far this is going to go and exactly where there will be stumbling blocks, but surely, in the next few years, everyone is going to grow accustomed to the idea that they have a virtual assistant-like model that helps them with everything they do. Even if the model is not very good, it will be more than good enough to be useful. It will be better than our current individualized apps and recommender systems.
How do we account for privacy concerns with such personal digital models?
Privacy concerns are a huge part of this. Here’s the thing: On the one hand, I do want to have the 360-degree model that knows me as well as my best friend and helps me connect with the world—an extension of my brain, if you will. On the other hand, I don’t want it to be owned by a company that has other goals in mind, like selling me things or showing me apps. I think one of the things needed for this to really happen is that your data should all be in one place, and there should be a model of you from that data, but the data and the model have to be under your control. Otherwise, you should be very wary. If somebody knows too much about you, they will have too much power over you, and you don’t want that to happen.
The way I would frame the privacy issue is that the companies who are in some ways in the best position to harness this technology are, in terms of their business models, not ideal. We need a different kind of company that makes its money from something like a subscription fee, and its commitment to you is that your model is always going to be what’s in your best interest. Otherwise, you shouldn’t trust it.
Essentially, we need “data banks.”
Yes, exactly. All of these companies want to be your data bank, but they also want to lock you in. That’s the other big danger. People already think it’s a pain to switch from one smartphone to the other. It’s already difficult to move things from one cloud to another. But imagine when there is this personal model that knows your life better than you do: Giving up on that model to change to another would be like losing your memory. People are going to be severely locked in. Unfortunately, I think this is the direction things are headed in, unless we take control of our data and models, and part of my goal is to alert people to that.
You recently hosted an ACM webinar on “The Five Tribes of Machine Learning.” Tell us more about that.
The webinar was about the five tribes of machine learning and what you can take from each—a one hour introduction to machine learning centered on the five tribes, similar to what I do in my book. In the last 10 minutes, I talked about these personal digital models and other things that better machine learning will make possible, like home robots and potentially even curing cancer.
What are the five tribes of machine learning?
We need to first understand the difference between learning algorithms and regular algorithms. Everything that computers do, most people don’t realize they do it by means of algorithms. An algorithm is somebody programming the computer line by line, step by step, to do what it’s supposed to do. This was the first stage of the information age, where we had to explain in detail the instructions for computers to do something. Machine learning algorithms are radically different. In a machine learning algorithm, you don’t have to program the computer—it learns by itself what to do. It learns what to do by looking at the input and what the desired output is. For example, let’s say the computer is learning to diagnose breast cancer from X-rays, which is actually a real application. The input is an X-ray, and the output is the confirmation of and location of a tumor. We don’t actually know how to program computers to do this, but they can learn to do this from data. Of course, the more data you have, the more you can learn.
Another amazing thing is that with regular algorithms, you need to create a different algorithm for everything you want the computer to do. If I want the computer to play chess, I have to explain to it how to play chess. If I want it to complete a medical diagnosis, I have to give it the rules by which to do the diagnosis. In machine learning, the same algorithm that knows nothing about chess or breast cancer can learn to do those things by going over the appropriate data. That’s because machine learning algorithms are master algorithms, because they make other algorithms. The question, of course, is how general can that algorithm be? Can we make one learning algorithm that could learn absolutely anything?
What happens is that there are five different paradigms in machine learning, each of which has its own master algorithm, or its own approach to learning anything from data. One approach is to reverse engineer the brain, and that’s what’s behind deep learning and neural networks and so on. The idea is that your brain is the greatest learning algorithm there is, so we can try to reverse engineer it on the computer.
Another approach is to simulate evolution on the computer. Evolution created not just the brain, but all life on earth, so perhaps that is the greatest learning algorithm on earth, so we think of implementing that on a computer.
Another approach is based on reasoning at a high level like scientists do, formulating a hypothesis, testing and refining them. Another approach is based on Bayesian statistics and using probability and Bayes’ theorem to decide the probability of a hypothesis being true. The final one is reading by analogy, something very intuitive that people do all the time: I’m faced with a new situation and I try to find similarities with other situations that occurred before and take solutions from one to the other.
What can we learn from each of these tribes?
Each of these tribes are solving one important problem in machine learning. The symbolists—the tribe that employs a scientist’s reasoning—they do something very important. They learn pieces of knowledge, for example, in the form of rules that can be composed in arbitrary ways to create many different chains of reasoning that are very different from the source they originally learned from. This type of compositionality is very important for things like understanding language, which is very compositional, reasoning, and so on.
The connectionists—those who do neural networks and brain-inspired learning—know how to solve the creative assignment problem, which is having a system with lots of different parameters and deciding which is responsible for errors and where to fix things. When you have a system with a lot of moving parts and it’s not obvious how to do that, the connectionists have an algorithm called backpropagation that’s solves that problem.
The evolutionaries know how to learn structure. The connectionists think of the brain, with the structure already defined, and all that happens is the strength of the synapses change as you learn and remember things. The evolutionaries know how to learn that structure from scratch. They can actually build things like radios, programs and robots starting from piles of parts.
The Bayesians know how to deal with uncertainty. All knowledge that is learned from data is uncertain. I’m never sure that something is true. I generalize it, but the generalization might be wrong. The Bayesians use probability to do that, and they have probabilistic inference machinery that allows them to correctly update the probability of each hypothesis being true.
Finally, the analogizers, they know how to do something that nobody else does, which is to learn across very different domains and from very little data. These days there’s a lot of emphasis on “big data,” but take, for example, a new job. You have no data, or you have one or two instances. We humans, however, are very good at taking everything from our previous experience and applying it in the new domain. A famous example of this is Niels Bohr’s first quantum theory of the atom, which was based on an analogy between the atom and the solar system.
Each one of these tribes has something unique that they know how to do. That’s what we can take from them.
How can we unify these into one master algorithm? Why is it important to do so?
Each of these tribes tends to focus on its problem, and they think that they have the master algorithm, but the thing is, because each of these problems is real, you don’t have the master algorithm to solve all five of them. What we really need is unification of these five algorithms into what would truly be the master algorithm. In some sense, what we’re looking for is a grand unified theory of machine learning like the standard model is the great unified theory of physics.
How far are we from developing the master algorithm?
In some ways, we’re not that far because we’ve been doing this step by step. We’ve been able to unify symbolism and Bayesian thought. We’ve been able to unify symbolism and analogy-based learning, too. For example, there’s learning algorithms that use evolutionary learning for structure and use connectionist learning for the parameters and whatnot. So, we’ve actually made very good progress. I think the day is not far when we will actually have a full unification of these five things.
The big question, however, is: Will we be done then? There are people who think we’ll be done, but my suspicion is that in addition to these five major ideas, there are others that have yet to be discovered. Someone is going to have to discover them. In some ways, I think it will be easier for someone who is not a professional machine learning scientist to do this. We tend to think along the tracks of the tribes of which we’re a member. One of my goals in writing my book is to get other people thinking about machine learning and maybe one of them will have the idea that we’re missing.
Tell us about your time with ICS. What did you study under Professor Emeritus Dennis Kibler?
I joined UC Irvine in 1992 and graduated in 1997 with a Ph.D. in machine learning. The reason I came to UC Irvine for my Ph.D. was because, at the time, they actually had one of the few large machine learning groups anywhere. Most departments didn’t have a machine learning group back then, and if they did, they’d have maybe one faculty member pursuing machine learning; UC Irvine had a whole bunch of them. I had decided definitively that machine learning was my field. I applied to UC Irvine and got in, and Dennis was actually teaching the introductory machine learning course for graduate students. My thesis grew out of the project that I did for that class.
How did ICS impact your current work?
It’s where I learned how to do machine learning. I’m still using today the methodology and the ideas that I learned then. Machine learning has changed and grown immensely in the last 20 years, but a lot of the fundamental things that I learned at ICS still apply, both in terms of the learning algorithms and these five paradigms, and in terms of the experimental methodology. UC Irvine is one the places where these methodologies originated, and I’m still using them and teaching them to my students.
What drew you to machine learning?
I originally got interested in artificial intelligence when I was an undergraduate student. That was kind of accidental: I just saw this book called Artificial Intelligence and I was very intrigued as to what that might be. But then when I read the book back in the ’80s, it had one chapter about machine learning. I immediately thought two things: One, machine learning really is the crucial thing. If you can’t learn, you’ll never have anything that’s very intelligent. If you can learn, then the rest will follow. Just look at people. If you had a robot that you spent years programming and the robot doesn’t learn, and it’s as good as a person on that day, a month later a person will be much more intelligent. If we can do it, it will have all sorts of amazing consequences and applications. Second, I saw that the state of the art at the time was very primitive, which meant that I could contribute. It’s better to go into a field that’s immature than into a mature field like physics, biology or mathematics, where it’s much harder to make a contribution. With this combination, machine learning becomes a very seductive field.
I was convinced at the time that machine learning was going to take over the world one day, and I told people that. The scary thing is that, actually, it is taking over the world.
Where did you go post-ICS?
I originally am from Portugal, and when I attended UC Irvine I had a Fulbright Scholarship, which requires you to return to your country for two years. So I did and I was a professor in Lisbon. Then I got this job at the University of Washington, and that’s where I’ve been since.
What advice would you offer graduating students?
Here’s something that applies to both undergraduates and graduate students: There is enormous impact to be had in computer science today. Don’t just think along the existing tracks. There are a lot of things that are easy to do, and there’s no harm in doing them, but the biggest opportunities today are to apply computer science to unsolved problems. Think, for example, of Uber—it’s a huge company now and all it did was apply information technology to the problem of finding cabs. That doesn’t sound like a big deal, but it is. Think of Zillow, which is revolutionizing the real estate market because somebody thought of applying computer science to it. I could go on with this list, but the point is, what you should be doing as a graduate is not just taking a job at an established company—although that’s fine too—but thinking about open opportunities, and I think there are many.