Teaching machine learning within different fields

Everyone is talking about machine learning (ML) these days. They usually call it “machine learning and artificial intelligence” and I keep wondering what exactly they mean by each term.

It seems the term “artificial intelligence” has shaken off its negative connotations from back when it meant top-down systems (as opposed to the superior bottom-up “computational intelligence” that most of today’s so-called AI actually uses) and has come to mean what cybernetics once was: robotics, machine learning, embedded systems, decision-making, visualization, control, etc., all in one.

Now that ML is important to so many industries,application areas, and fields, it is taught in many types of academic departments. We approach machine learning differently in ECE, in CS, in business schools, in mechanical engineering, and in math and statistics programs. The granularity of focus varies, with math and CS taking the most detailed view, followed by EC and ME departments, followed by the highest-level applied version in business schools, and with Statistics covering both ends.

In management, students need to be able to understand the potential of machine learning and be able to use it toward management or business goals, but do not have to know how it works under the hood, how to implement it themselves, or how to prove the theorems behind it.

In computer science, students need to know the performance measures (and results) of different ways to implement end-to-end machine learning, and they need to be able to do so on their own with a thorough understanding of the technical infrastructure. (If what I have observed is generalizable, they also tend to be more interested in virtual and augmented reality, artificial life, and other visualization and user-experience aspects of AI.)

In math, students and graduates really need to understand what’s under the hood. They need to be able to prove the theorems and develop new ones. It is the theorems that lead to powerful new techniques.

In computer engineering, students also need to know how it all works under the hood, and have some experience implementing some of it, but don’t have to be able to develop the most efficient implementations unless they are targeting embedded systems. In either case, though, it is important to understand the concepts, the limitations, and the pros and cons as well as to be able to carry out applications. Engineers have to understand why there is a such a thing as PAC, what the curse of dimensionality is and what it implies for how one does and does not approach a problem, what the NFL is and how that should condition one’s responses to claims of a single greatest algorithm, and what the history and background of this family of techniques are really like. These things matter because engineers should not expect to be plugging-and-playing cookie-cutter algorithms from ready-made libraries. That’s being an operator of an app, not being an engineer. The engineer should be able to see the trade-offs, plan for them, and take them into account when designing the optimal approach to solving each problem. That requires understanding parameters and structures, and again the history.

Today, the field of ‘Neural Networks’ is popular and powerful. That was not always the case. It has been the case two other times in the past. Each time, perhaps like an overextended empire, the edifice of artificial neurons came down (though only to come up stronger some years later).

When I entered the field, with an almost religious belief in neural networks, they were quite uncool. The wisdom among graduate students seemed to be that neural nets were outdated, that we had SVMs now, and with the latter machine learning was solved forever. (This reminds me of the famous patent-office declaration in the late 1800s that everything that could be invented had been invented.) Fortunately, I have always benefited from doing whatever was unpopular, so I stuck to my neural nets, fuzzy systems, evolutionary algorithms, and an obsession with Bayes’ rule while others whizzed by on their SVM dissertations. (SVMs are still awesome, but the thing that has set the world on fire is neural nets again.)

One of the other debates raging, at least in my academic environment at the time, was about “ways of knowing.” I have since come to think that science is not a way of knowing. It never was, though societies thought so at first (and many still think so). Science is a way of incrementally increasing confidence in the face of uncertainty.

I bring this up because machine learning, likewise, never promised to have the right answer every time. Machine learning is all about uncertainty; it thrives on uncertainty. It’s built on the promise of PAC learning; i.e., it promises to be only slightly wrong and to be so only most of the time. The hype today is making ML seem like some magical panacea to all business, scientific, medical, and social problems. For better or worse, it’s only another technological breakthrough in our centuries-long adventure of making our lives safer and easier. (I’m not saying we haven’t done plenty of wrongs in that process—we have—but no one who owns a pair of glasses, a laptop, a ball-point pen, a digital piano, a smart phone, or a home-security system should be able to fail to see the good that technology has done for humankind.)

I left the place of the field of Statistics in machine learning until the end. They are the true owners of machine learning. We engineering, business, and CS people are leasing property on their philosophical (not real) estate.

 

Science-doing

There are (at least) two types of scientists: scientist-scientists and science-doers.

Both groups do essential, difficult, demanding, and crucial work that everyone, including the scientist-scientists, needs. The latter group (like the former) includes people who work in research hospitals, water-quality labs, soil-quality labs, linear accelerators, R-&-D labs of all kinds, and thousands of other places. They carry out the daily work of science with precision, care, and a lot of hard work. Yet, at the same time, in the process of doing the doing of science, they typically do not get the luxury of stepping back, moving away from the details, starting over, and discovering the less mechanical, less operational connections among the physical sciences, the social sciences, the humanities, technology, business, mathematics, and statistics… especially the humanities and statistics.

I am not a good scientist, and that has given me the opportunity to step back, start over, do some things right this time, and more importantly, through a series of delightful coincidences, learn more about the meaning of science than about the day-to-day doing of it.[1] This began to happen during my Ph.D., but only some of the components of this experience were due to my Ph.D. studies. The others just happened to be there for me to stumble upon.

The sources of these discoveries took the form of two electrical-engineering professors, three philosophy professors, one music professor, one computer-science professor, some linguistics graduate students, and numerous philosophy, math, pre-med, and other undergrads. All of these people exposed me to ideas, ways of thinking, ways of questioning, and ways of teaching that were new to me.

As a result of their collective influence, my studies, and all my academic jobs from that period, I have come to think of science not merely as the wearing of lab coats and carrying out of mathematically, mechanically, or otherwise challenging complex tasks. I have come to think of science as the following of, for lack of a better expression, the scientific method, although by that I do not necessarily mean the grade-school inductive method with its half-dozen simple steps. I mean all the factors one has to take into account in order to investigate anything rigorously. These include double-blinding (whether clinical or otherwise, to deal with confounding variables, experimenter effects, and other biases), setting up idiot checks in experimental protocols, varying one unknown at a time (or varying all unknowns with a factorial design), not assuming unjustified convenient probability distributions, using the right statistics and statistical tests for the problem and data types at hand, correctly interpreting results, tests, and statistics, not chasing significance, setting up power targets or determining sample sizes in advance, using randomization and blocking in setting up an experiment or the appropriate level of random or stratified sampling in collecting data [See Box, Hunter, and Hunter’s Statistics for Experimenters for easy-to-understand examples.], and the principles of accuracy, objectivity, skepticism, open-mindedness, and critical thinking. The latter set of principles are given on p. 17 and p. 20 of Essentials of Psychology [third edition, Robert A. Baron and Michael J. Kalsher, Needham, MA: Allyn & Bacon, 2002].

These two books, along with Hastie, Tibshirani, and Friedman’s The Elements of Statistical Learning and a few other sources that are heavily cited papers on the misuses of Statistics have formed the basis of my view of science. This is why I think science-doing is not necessarily the same thing as being a scientist. In a section called ‘On being a scientist’ in a chapter titled ‘Methodology Wars’, the neuroscientist Fost explains how it’s possible, although not necessarily common, to be on “scientific autopilot” (p. 209) because of the way undergraduate education focuses on science facts and methods[2] over scientific thinking and the way graduate training and faculty life emphasize administration, supervision, managerial oversight, grant-writing, and so on (pp. 208–9). All this leaves a brief graduate or a post-doc period in most careers for deep thinking and direct hands-on design of experiments before the mechanical execution and the overwhelming burdens of administration kick in. I am not writing this to criticize those who do what they have to do to further scientific inquiry but to celebrate those who, in the midst of that, find the mental space to continue to be critical skeptical questioners of methods, research questions, hypothesis, and experimental designs. (And there are many of those. It is just not as automatic as the public seems to think it is, i.e., by getting a degree and putting on a white coat.)

 

Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, George E. P. Box, William G. Hunter, and J. Stuart Hunter, New York , NY: John Wiley & Sons, Inc., 1978 (0-471-09315-7)

Essentials of Psychology, third edition, Robert A. Baron and Michael J. Kalsher, Needham, MA: Allyn & Bacon, A Pearson Education Company, 2002

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, second edition, Trevor Hastie, Robert Tibshirani, and Jerome Friedman, New York, NY: Springer-Verlag, 2009 (978-0-387-84858-7 and 978-0-387-84857-0)

If Not God, Then What?: Neuroscience, Aesthetics, and the Origins of the Transcendent, Joshua Fost, Clearhead Studios, Inc., 2007 (978-0-6151-6106-8)

[1] Granted, a better path would be the more typical one of working as a science-doer scientist for thirty years, accumulating a visceral set of insights, and moving into the fancier stuff due to an accumulation of experience and wisdom. However, as an educator, I did not have another thirty years to spend working on getting a gut feeling for why it is not such a good idea to (always) rely on a gut feeling. I paid a price, too. I realize I often fail to follow the unwritten rules of social and technical success in research when working on my own research, and I spend more time than I perhaps should on understanding what others have done. Still, I am also glad that I found so much meaning so early on.

[2] In one of my previous academic positions, I was on a very active subcommittee that designed critical-thinking assessments for science, math, and engineering classes with faculty from chemistry, biology, math, and engineering backgrounds. We talked often about the difference between teaching scientific facts and teaching scientific thinking. Among other things, we ended up having the university remove a medical-terminology class from the list of courses that counted as satisfying a science requirement in general studies.