Teaching machine learning within different fields

Everyone is talking about machine learning (ML) these days. They usually call it “machine learning and artificial intelligence” and I keep wondering what exactly they mean by each term.

It seems the term “artificial intelligence” has shaken off its negative connotations from back when it meant top-down systems (as opposed to the superior bottom-up “computational intelligence” that most of today’s so-called AI actually uses) and has come to mean what cybernetics once was: robotics, machine learning, embedded systems, decision-making, visualization, control, etc., all in one.

Now that ML is important to so many industries,application areas, and fields, it is taught in many types of academic departments. We approach machine learning differently in ECE, in CS, in business schools, in mechanical engineering, and in math and statistics programs. The granularity of focus varies, with math and CS taking the most detailed view, followed by EC and ME departments, followed by the highest-level applied version in business schools, and with Statistics covering both ends.

In management, students need to be able to understand the potential of machine learning and be able to use it toward management or business goals, but do not have to know how it works under the hood, how to implement it themselves, or how to prove the theorems behind it.

In computer science, students need to know the performance measures (and results) of different ways to implement end-to-end machine learning, and they need to be able to do so on their own with a thorough understanding of the technical infrastructure. (If what I have observed is generalizable, they also tend to be more interested in virtual and augmented reality, artificial life, and other visualization and user-experience aspects of AI.)

In math, students and graduates really need to understand what’s under the hood. They need to be able to prove the theorems and develop new ones. It is the theorems that lead to powerful new techniques.

In computer engineering, students also need to know how it all works under the hood, and have some experience implementing some of it, but don’t have to be able to develop the most efficient implementations unless they are targeting embedded systems. In either case, though, it is important to understand the concepts, the limitations, and the pros and cons as well as to be able to carry out applications. Engineers have to understand why there is a such a thing as PAC, what the curse of dimensionality is and what it implies for how one does and does not approach a problem, what the NFL is and how that should condition one’s responses to claims of a single greatest algorithm, and what the history and background of this family of techniques are really like. These things matter because engineers should not expect to be plugging-and-playing cookie-cutter algorithms from ready-made libraries. That’s being an operator of an app, not being an engineer. The engineer should be able to see the trade-offs, plan for them, and take them into account when designing the optimal approach to solving each problem. That requires understanding parameters and structures, and again the history.

Today, the field of ‘Neural Networks’ is popular and powerful. That was not always the case. It has been the case two other times in the past. Each time, perhaps like an overextended empire, the edifice of artificial neurons came down (though only to come up stronger some years later).

When I entered the field, with an almost religious belief in neural networks, they were quite uncool. The wisdom among graduate students seemed to be that neural nets were outdated, that we had SVMs now, and with the latter machine learning was solved forever. (This reminds me of the famous patent-office declaration in the late 1800s that everything that could be invented had been invented.) Fortunately, I have always benefited from doing whatever was unpopular, so I stuck to my neural nets, fuzzy systems, evolutionary algorithms, and an obsession with Bayes’ rule while others whizzed by on their SVM dissertations. (SVMs are still awesome, but the thing that has set the world on fire is neural nets again.)

One of the other debates raging, at least in my academic environment at the time, was about “ways of knowing.” I have since come to think that science is not a way of knowing. It never was, though societies thought so at first (and many still think so). Science is a way of incrementally increasing confidence in the face of uncertainty.

I bring this up because machine learning, likewise, never promised to have the right answer every time. Machine learning is all about uncertainty; it thrives on uncertainty. It’s built on the promise of PAC learning; i.e., it promises to be only slightly wrong and to be so only most of the time. The hype today is making ML seem like some magical panacea to all business, scientific, medical, and social problems. For better or worse, it’s only another technological breakthrough in our centuries-long adventure of making our lives safer and easier. (I’m not saying we haven’t done plenty of wrongs in that process—we have—but no one who owns a pair of glasses, a laptop, a ball-point pen, a digital piano, a smart phone, or a home-security system should be able to fail to see the good that technology has done for humankind.)

I left the place of the field of Statistics in machine learning until the end. They are the true owners of machine learning. We engineering, business, and CS people are leasing property on their philosophical (not real) estate.

 

Advertisements
Herbie Hancock's Chameleon's BPM graph from the Android app 'liveBPM' (v. 1.2.0) by Daniel Bach

Listening to music seems easy.

Listening to music seems easy; it even appears like a passive task.

Listening, however, is not the same as hearing. In listening, i.e., attending, we add cognition to perception. The cognition of musical structures, cultural meanings, conventions, and even of the most fundamental elements themselves such as pitch or rhythm turns out to be a complex cognitive task. We know this is so because getting our cutting-edge technology to understand music with all its subtleties and its cultural contexts has proven, so far, to be impossible.

Within small fractions of a second, humans can reach conclusions about musical audio that are beyond the abilities of the most advanced algorithms.

For example, a trained or experienced musician (or even non-musician listener) can differentiate computer-generated and human-performed instruments in almost any musical input, even in the presence of dozens of other instruments sounding simultaneously.

In a rather different case, humans can maintain time-organizational internal representations of music while the tempo of a recording or performance continuously changes. A classic example is the jazz standard Chameleon by Herbie Hancock off the album ‘HEADHUNTERS’. The recording never retains any one tempo, following an up-and-down contour and mostly getting faster. Because tempo recognition is a prerequisite to other music-perception tasks like meter induction and onset detection, this type of behavior presents a significant challenge to signal-processing and machine-learning algorithms but generally poses no difficulty to human perception.

Another example is the recognition of vastly different cover versions of songs: A person familiar with a song can recognize within a few notes a cover version of that song done in another genre, at a different tempo, by another singer, and with different instrumentation.

Each of these is a task that is well beyond machine-learning techniques that are exhibiting remarkable successes with visual recognition where the main challenge, invariance, is less of an obstacle than the abstractness of music and its seemingly arbitrary meanings and structures.

Consider the following aspects of music cognition.

  • inferring a key (or a change of key) from very few notes
  • identifying a latent underlying pulse when it is completely obscured by syncopation [Tal et al., Missing Pulse]
  • effortlessly tracking key changes, tempo changes, and meter changes
  • instantly separating and identifying instruments even in performances with many-voice polyphony (as in Dixieland Jazz, Big-Band Jazz, Baroque and Classical European court music, Progressive Rock, folkloric Rumba, and Hindustani and Carnatic classical music)

These and many other forms of highly polyphonic, polyrhythmic, or cross-rhythmic music continue to present challenges to automated algorithms. Successful examples of automated tempo or meter induction, onset detection, source separation, key detection, and the like all work under the requirement of tight limitations on the types of inputs. Even for a single such task such as source separation, a universally applicable algorithm does not seem to exist. (There is some commercial software that appear to do these tasks universally, but because proprietary programs do not provide sufficiently detailed outputs, whether they really can perform all these function or whether they perform one function in enough detail to suffice for studio uses is uncertain. One such suite can identify and separate every individual note from any recording, but does not perform source separation into streams-per-instrument and presents its output in a form not conducive to analysis in rhythmic, harmonic, melodic, or formal terms, and not in a form analogous to human cognitive processing of music.)

Not only does universal music analysis remain an unsolved problem, but also most of the world’s technological effort goes toward European folk music, European classical music, and (international) popular music. The goal of my research and my lab (Lab BBBB: Beats, Beats, Bayes, and the Brain) is to develop systems for culturally sensitive and culturally informed music analysis, music coaching, automated accompaniment, music recommendation, and algorithmic composition, and to do so for popular music styles from the Global South that are not in the industry’s radar.

Since the human nervous system is able to complete musical-analysis tasks under almost any set of circumstances, in multiple cultural and cross-cultural settings, with varying levels of noise and interference, the human brain is still superior to the highest-level technology we have developed. Hence, Lab BBBB takes inspiration and direct insight from human neural processing of audio and music to solve culturally specific cognitive problems in music analysis, and to use this context to further our understanding of neuroscience and machine learning.

The long-term goal of our research effort is a feedback cycle:

  1. Neuroscience (in simulation and with human subjects at our collaborators’ sites) informs both music information retrieval and research into neural-network structures (machine learning). We are initially doing this by investigating the role of rhythm priming in Parkinson’s (rhythm–motor interaction) and in grammar-learning performance (rhythm–language interaction) in the basal ganglia. We hope to then replicate in simulation the effects that have been observed with people, verify our models, and use our modeling experience on other tasks that have not yet been demonstrated in human cases or that are too invasive or otherwise unacceptable.
  2. Work on machine learning informs neuroscience by narrowing down the range of investigation.
  3. Deep learning is also used to analyze musical audio using structures closer to those in the human brain than the filter-bank and matrix-decomposition methods typically used to analyze music.
  4. Music analysis informs cognitive neuroscience, we conjecture, as have been done in certain cases in the literature with nonlinear dynamics.
  5. Phenomena like entrainment and neural resonance in neurodynamics further inform the development of neural-network structures and data-subspace methods.
  6. These developments in machine learning move music information retrieval closer to human-like performance for culturally informed music analysis, music coaching, automated accompaniment, music recommendation, and algorithmic composition for multicultural intelligent music systems.

 

The subjunctive is scientific thinking built into the language.

The subjunctive draws a distinction between fact and possibility, between truths and wishes. The expression “if he were” (not “if he was”) is subjunctive; it intentionally sounds wrong (unless you’re used to it) to indicate that we’re talking about something hypothetical as opposed to something actual.
This is scientific thinking built into the language (coming from its romance-language roots).

This is beautiful. Let’s hold onto it.

You are not disinterested.

Everyone: Stop saying ‘disinterested’. You apparently don’t know what it means. It doesn’t mean ‘uninterested’.

In fact, it means you’re truly interested. ‘Disinterested’ is when you care so deeply as to want to treat the situation objectively. It is a scientific term describing the effort to rid a study of the effects of subconscious biases.

Also, please don’t say ‘substantive’ when all you mean is ‘substantial’. They’re not the same thing. Thanks. (‘Substantial’ is a good word. You’re making it feel abandoned. )

Microsoft: Fix your use of the word ‘both’.
When comparing only two files, Windows says something like “Would you like to compare both files?” As opposed to what, just compare one, all by itself? (like the sound of one hand clapping?)
The word ‘both’ is used when the default is not that of two things. It emphasizes the two-ness to show that the twoness is special, unusual. But when the default is two, you say “the two” (as in “Would you like to compare the two files?”), not ‘both’, and DEFINITELY NOT ‘the both’. (It was cute when that one famous said it once. It’s not cute anymore. Stop saying it.)
Back to ‘both’: A comparison has to involve two things, so ‘both’ (the special-case version of the word ‘two’) only makes sense if the two things are being compared to a third.
English is full of cool, meaningful nuances. I hope we stop getting rid of them.

Seriously, everyone: English is wonderful. Why are you destroying it?

 

PS: same with “on the one hand”… We used to say “on one hand” (which makes sense… either one, any one, not a definite hand with a definite article)

Science-doing

There are (at least) two types of scientists: scientist-scientists and science-doers.

Both groups do essential, difficult, demanding, and crucial work that everyone, including the scientist-scientists, needs. The latter group (like the former) includes people who work in research hospitals, water-quality labs, soil-quality labs, linear accelerators, R-&-D labs of all kinds, and thousands of other places. They carry out the daily work of science with precision, care, and a lot of hard work. Yet, at the same time, in the process of doing the doing of science, they typically do not get the luxury of stepping back, moving away from the details, starting over, and discovering the less mechanical, less operational connections among the physical sciences, the social sciences, the humanities, technology, business, mathematics, and statistics… especially the humanities and statistics.

I am not a good scientist, and that has given me the opportunity to step back, start over, do some things right this time, and more importantly, through a series of delightful coincidences, learn more about the meaning of science than about the day-to-day doing of it.[1] This began to happen during my Ph.D., but only some of the components of this experience were due to my Ph.D. studies. The others just happened to be there for me to stumble upon.

The sources of these discoveries took the form of two electrical-engineering professors, three philosophy professors, one music professor, one computer-science professor, some linguistics graduate students, and numerous philosophy, math, pre-med, and other undergrads. All of these people exposed me to ideas, ways of thinking, ways of questioning, and ways of teaching that were new to me.

As a result of their collective influence, my studies, and all my academic jobs from that period, I have come to think of science not merely as the wearing of lab coats and carrying out of mathematically, mechanically, or otherwise challenging complex tasks. I have come to think of science as the following of, for lack of a better expression, the scientific method, although by that I do not necessarily mean the grade-school inductive method with its half-dozen simple steps. I mean all the factors one has to take into account in order to investigate anything rigorously. These include double-blinding (whether clinical or otherwise, to deal with confounding variables, experimenter effects, and other biases), setting up idiot checks in experimental protocols, varying one unknown at a time (or varying all unknowns with a factorial design), not assuming unjustified convenient probability distributions, using the right statistics and statistical tests for the problem and data types at hand, correctly interpreting results, tests, and statistics, not chasing significance, setting up power targets or determining sample sizes in advance, using randomization and blocking in setting up an experiment or the appropriate level of random or stratified sampling in collecting data [See Box, Hunter, and Hunter’s Statistics for Experimenters for easy-to-understand examples.], and the principles of accuracy, objectivity, skepticism, open-mindedness, and critical thinking. The latter set of principles are given on p. 17 and p. 20 of Essentials of Psychology [third edition, Robert A. Baron and Michael J. Kalsher, Needham, MA: Allyn & Bacon, 2002].

These two books, along with Hastie, Tibshirani, and Friedman’s The Elements of Statistical Learning and a few other sources that are heavily cited papers on the misuses of Statistics have formed the basis of my view of science. This is why I think science-doing is not necessarily the same thing as being a scientist. In a section called ‘On being a scientist’ in a chapter titled ‘Methodology Wars’, the neuroscientist Fost explains how it’s possible, although not necessarily common, to be on “scientific autopilot” (p. 209) because of the way undergraduate education focuses on science facts and methods[2] over scientific thinking and the way graduate training and faculty life emphasize administration, supervision, managerial oversight, grant-writing, and so on (pp. 208–9). All this leaves a brief graduate or a post-doc period in most careers for deep thinking and direct hands-on design of experiments before the mechanical execution and the overwhelming burdens of administration kick in. I am not writing this to criticize those who do what they have to do to further scientific inquiry but to celebrate those who, in the midst of that, find the mental space to continue to be critical skeptical questioners of methods, research questions, hypothesis, and experimental designs. (And there are many of those. It is just not as automatic as the public seems to think it is, i.e., by getting a degree and putting on a white coat.)

 

Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, George E. P. Box, William G. Hunter, and J. Stuart Hunter, New York , NY: John Wiley & Sons, Inc., 1978 (0-471-09315-7)

Essentials of Psychology, third edition, Robert A. Baron and Michael J. Kalsher, Needham, MA: Allyn & Bacon, A Pearson Education Company, 2002

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, second edition, Trevor Hastie, Robert Tibshirani, and Jerome Friedman, New York, NY: Springer-Verlag, 2009 (978-0-387-84858-7 and 978-0-387-84857-0)

If Not God, Then What?: Neuroscience, Aesthetics, and the Origins of the Transcendent, Joshua Fost, Clearhead Studios, Inc., 2007 (978-0-6151-6106-8)

[1] Granted, a better path would be the more typical one of working as a science-doer scientist for thirty years, accumulating a visceral set of insights, and moving into the fancier stuff due to an accumulation of experience and wisdom. However, as an educator, I did not have another thirty years to spend working on getting a gut feeling for why it is not such a good idea to (always) rely on a gut feeling. I paid a price, too. I realize I often fail to follow the unwritten rules of social and technical success in research when working on my own research, and I spend more time than I perhaps should on understanding what others have done. Still, I am also glad that I found so much meaning so early on.

[2] In one of my previous academic positions, I was on a very active subcommittee that designed critical-thinking assessments for science, math, and engineering classes with faculty from chemistry, biology, math, and engineering backgrounds. We talked often about the difference between teaching scientific facts and teaching scientific thinking. Among other things, we ended up having the university remove a medical-terminology class from the list of courses that counted as satisfying a science requirement in general studies.

This is a sci-fi story.

The icers were out in force that night. Joey really didn’t like running into them. It wouldn’t be dangerous if he wouldn’t wear his lukers leather jacket when he went out alone, but unless he was a full-time luker—and flaunting it—he felt like a traitor. He felt like he wasn’t “real” or wasn’t noteworthy enough to be out on the streets. He also didn’t want to run into any girls while not visually declaring allegiance to his chosen subculture, so he wore the identifiers of his subculture even though it meant he’d likely get beaten bloody by icers or somebody else.

The icers were truly extreme, Joey thought. They claimed they would rather die than drink any but the most extreme ice-cold tap water. Lukers, like Joey, weren’t so picky, and indeed preferred avoiding brain freeze.

Water was no longer the simple commodity previous generations took for granted. It wasn’t exactly unobtainable, but most people had to save to get their monthly allotment, which was a very small amount, or perform community service for extra water which would then be delivered automatically to their approved smarthomes. Now that games, movies, music, fashion, food, cars, drones, jetpacks, body alterations, and everything else a youth could want was readily available through picofabrication, automation, and biotech, it was ironically the most basic substance of life, water, that became scarce—because desalination remained expensive—and thus became the marker of one’s identity as a young person: their subcultural in-group.

Aside from the icers and lukers, there were half a dozen other fine-grained varieties of tap-water subcultures (like the boilers and the JCs—”just cold”).

Mostly high-school kids, and mostly idle due to the abundance of free scavenged (recycled) energy for their picoautomation and their neural implants, these youths roamed the streets in their picofabricated faux-leather jackets emblazoned with their subcultural affiliation, and picked fights with members of other groups.

After the first few years of water shortage, this expression of identity through the one scarce resource that was critical for survival began to expand. Through a naturally stochastic clustering process, some hairstyles and preferences for clothes or shoes became associated with particular groups.

It just happened the way it did. There is no reason icers should prefer fur-lined boots and Christmas sweaters. If anything, one would expect the opposite. Yet, they wear them even in the summer… in the 130° globally warmed summers of Cascadia. That’s how you know you’ve got a genuine subculture: The clothing has got to be uncomfortable; it’s gotta require sacrifice.

The lukers likewise somehow ended up all having to wear havaianas, 20th-century motorcycle helmets over long green hair, tank tops (what the British call “vests”), bandannas tied right at the elbow, one on each arm, and pajama pants with teddy bears sewn unto them. (None of them knew that this last little detail originated with a bassist in a combo of ancient “rock” music from back when music was made by people playing instruments rather than autonomous conscious AI units that wrote every kind of music straight into digital encoding.) The more teddy bears one’s pants had on it, the greater would be their status as a luker.

Joey had found the time to get his automation to sew 37 onto his favorite pajama pants and another 24 on a different pair. The fact that he consequently couldn’t run was a big part of why the icers picked on him so much. They, on the other hand, spent most of their time getting their picobots to learn to assemble themselves into fists and feet for delivering punches and kicks from a distance.

So, Joey called up a game in his neural implant as he and his 37 teddy bears set out onto the streets of Seaportouver, bracing themselves—not so much the teddy bears but Joey’s bio-body and all his affiliated picobots and neurally linked semi-autonomous genetic floaters—against the onslaught of icer attacks and against old people who look disdainfully at his awkward teddy-bear-encumbered gait and transmit unsolicited neuro-advice that clogs up his game for an entire interminable microsecond, in search of a thimblefull of lukewarm water.

 

This mini sci-fi story is an attempt to draw a parallel between how ridiculous and unlikely such tap-water-based subcultures of street-fighting youth might seem to us, and how the music-based subcultures of my youth in the ’80s must seem to today’s youth.

Music, after all, is like water now: You turn the tap, and it pours out—out of YouTube, Spotify, Pandora, Slacker, or SoundCloud, and in a sense, also out of GarageBand, FruityLoops, Acid, and myriad other tools for generating music from loops. A few dozen people in a few offices in LA may make those loops—they’re like the people working the dams and the people who run the municipal water bureau or whatever. They supply the water that we take for granted, and it just flows out of the tap, not requiring any thought or effort on our part about how it got there or how much of it there might be. Music today works the same way. You exchange memory cards or streaming playlists; you download free software that allows you to drag and drop loops and which makes sure they are in the same key and tempo. It’s about as complicated as making lemonade. Why would such a thing have any relation to one’s identity and individuality?

In contrast, when I was young, I had to save money for a year and still beg my parents for a long-playing record. I could also occasionally buy some cheap tapes or record songs off the radio (almost always with the beginning cut off and with a DJ talking over the end) onto cheap low-fi cassettes that had more hiss than hi-hat. My first compact disc, a birthday present from a wealthy relative, was like an alien artifact. It still looks a bit magical to me… so small and shiny. Today, I hear they’re referred to as “coasters” because… why bother putting music on a recording medium when it’s free and ubiquitous? 

Subculture-as-identity-marker has disappeared except among the old. (How old is Iggy today, or the guys from The Clash?) Young people today dress in combinations of the “uniforms” of ‘50s, ‘60s, ‘70s, ‘80s, and ‘90s subcultures without having any interest in the sociopolitics or music of those subcultures. The last three times I talked to a―seemingly―fellow goth or punk rocker, they reacted with mild repulsion at the suggestion that they might listen to such music.

Expressing allegiance to a musical subculture must seem as silly to today’s youth (say, through age 30 or so) as expressing allegiance to a temperature of water would seem to anyone.

 

Zeno’s thermometers?

A friend just told me about the xkcd idea for the “Felcius scale” which is the arithmetic mean of the Fahrenheit and centigrade (Celsius) scales. Naturally, my first thought was that this was a funny but pointless idea since it discarded the advantages of the centigrade scale, which was renamed ‘Celsius’, but I’m using the old name to emphasize the 0-to-100 advantage. (A better discussion is found here: http://www.explainxkcd.com/wiki/index.php/1923:_Felsius)

My friend, however, suggested that it was a step in the right direction.

If so, it isn’t enough. If this idea were to take hold, we would need another such step in the right direction, perhaps to be called the “Eelsius” which would take us another 50% of the way to Celsius, and eventually another halfway jump to “Delsius”, and (aside from running out of characters between ‘f’ and ‘c’), a nice little Zeno’s paradox of temperature-scale systems that asymptotically approach the logical centigrade scale.

Overfitting, Confirmation Bias, Strong AI, and Teaching

I was asked recently by a student about how machine learning could happen. I started out by talking about human learning: how we don’t consider mere parroting of received information to be same as learning, but that we can make the leap from some examples we have seen to a new situation or problem that we haven’t seen before. Granted there need to be some similarities (shared structure or domain of discourse—we don’t become experts on European Union economics as a result only of learning to distinguish different types of wine), but what makes learning meaningful and fun for us is the ability to make a leap, to solve a previously inaccessible problem or deduce (really it’s ‘induce’) a new categorization.

In response, the student asked how machines could do that. I replied that not only do we give them many examples to learn from, but we also give them algorithms (ways to deal with examples) that are inspired by how natural systems work: inspired by ants or honeybees, genetics, the immune system, evolution, languages, social networks and ideas (memes), and even just the mammalian brain. (One difference is that, so far, we are not trying to make general-purpose consciousness in machines; we are only trying to get them to solve well-defined problems very well, and increasingly these days, not-so-well-defined problems also).

So, then the student asked how machines could make the leap just like we can. This led me to bring up overfitting and how to avoid it. I explained that if a machine learns the examples it is given all too well, it will not be able to see the forest for the trees—it will be overly rigid, and will want to make all novel experiences fit the examples in its training. For new examples that do not fit, it will reject them (if we build that ability into it), or it will make justifiable wrong choices. It will ‘overfit’, in the language of machine learning.

Then it occurred to me that humans do this, too. We’ve all probably heard the argument that stereotypes are there for a reason. In my opinion, they are there because of the power of confirmation bias (not to mention, sometimes selection bias as well—consider the humorous example of the psychiatrist who believes everyone is psychotic).

Just as a machine-learning algorithm that has been presented with a set of data will learn the idiosyncrasies of that data set if not kept from overfitting by early-stopping, prestructuring, or some other measure, people also overfit to their early-life experiences. However, we have one other pitfall compared to machines: We continue to experience new situations which we filter through confirmation bias to make ourselves think that we have verification of the validity of our misinformed or under-informed early notions. Confirmation bias conserves good feelings about oneself. Machines so far do not have this weakness, so they are only limited by what data we give them; they cannot filter out inconvenient data the way we do.

Another aspect of this conversation turned out to be pertinent to what I do every day. Not learning the example set so well is advantageous not only for machines but for people as well, specifically for people who teach.

I have been teaching at the college level since January 1994, and continuously since probably 2004, and full-time since 2010, three or four quarters per year, anywhere from two to five courses per quarter. I listed all this because I need to point out, for the sake of my next argument, that I seem to be a good teacher. (I got tenured at a teaching institution that has no research requirement but very high teaching standards.) So, let’s assume that I can teach well.

I was, for the most part, not a good student. Even today, I’m not the fastest at catching on, whether it’s a joke, an insult, or a mathematical derivation. (I’m nowhere near the slowest, but I’m definitely not among the geniuses.) I think this is a big part of why I’m a good teacher: I know what it’s like not to get it, and I know what I have had to do to get it. Hence, I know how to present anything to those who don’t get it, because, chances are, I didn’t get it right away either.

But there is more to this than speed. I generate analogies like crazy, both for myself and for teaching. Unlike people who can operate solely at the abstract level, I make connections to other domains—that’s how I learn; I don’t overfit my training set. I can take it in a new direction more easily, perhaps, than many super-fast thinkers. They’re right there, at a 100% match to the training set. I wobble around the training set, and maybe even map it to n+1 dimensions when it was given in only n.

Overfitting is not only harmful to machines. In people, it causes undeserved confidence in prejudices and stereotypes, and makes us less able to relate to others or think outside the box.

One last thought engendered by my earlier conservation with this student: The majority of machine-learning applications, at least until about 2010 or maybe 2015, were for well-defined, narrow problems. What happens when machines that are capable of generalizing well from examples in one domain, and in another, and in another, achieve meta-generalization from entire domains to new ones we have not presented them with? Will they attain strong AI as a consequence of this development (after some time)? If so, will they, because they’ve never experienced the evolutionary struggle for survival, never develop the violent streak that is the bane of humankind? Or will they come to despise us puny humans?

 

Four simple tricks to solve many of your grammar questions without having to search online

REMOVE A WORD: “Me and Mike like ice cream.” becomes “Me like ice cream.” Apparently, that’s not it, so the original sentence should have been “I and Mike (both) like ice cream.”

ANSWER A QUESTION: “Who should I ask?” The answer could be: “You should ask hiM.” Therefore, the first sentence should have been “Whom should I ask?”—the ‘m’s match.

ASK A QUESTION: “industrial music group”: What kind of group? The industrial kind (as well as music kind)

as opposed to

“industrial-music group”: What kind of group? The industrial-music kind

CHANGE THE ORDER: The expression “music industrial group” fails in a different way (and also means something very different) than the expression “red big house” would fail in comparison to “big red house” (so, a hyphen was needed).

“Big red house” is both correct and proper, and a hyphen would be wrong between ‘big’ and ‘red’. The two modifiers ‘big’ and ‘red’ are independent of each other; they act separately. The house could also have been small and red, or big and green.

The other two modifiers, ‘industrial’ and ‘music’ (the latter a noun that tells what type of group) are not independent when what we mean is Einstuerzende Neubauten or Cabaret Voltaire. The opposite is true when we are talking about Roland, Yamaha, Korg, and Nord, for example.

“Verbing weirds language.”

“Verbing Weirds Language” (not so fast)

I have heard this term quite a lot lately, and it certainly gets its point across. However, as a speaker of languages from more than one family (language family, that is), I find that it’s shortsighted.

In languages that use helping verbs to make verbs out of nouns and adjectives (Japanese: suru; Turkish: etmek), there is no such problem, and, it seems, very little accompanying debate of descriptivism versus prescriptivism. For a multiculturally informed version of this aphorism, I suggest saying something along the lines of “Verbing weirds English.” (or “Verbing weirds Indo-European languages.”) because we Turks have been doing it without any weirding for quite some time, not to mention the Japanese and others—recall the infamous ‘bushusuru’.

Furthermore, verbing is not necessary. Take the new “fail” as a noun. The word ‘fail’ is a verb. The noun form is ‘failure’. I am familiar with the contention between descriptivism and prescriptivism. Descriptivists usually argue that language evolves. “It always has, so let it continue to do so.” However, we do not support this line of reasoning in other matters. Human beings necessarily form judgments and opinions based on confirmation bias, regression to the mean, and various well-documented heuristics and biases. Even those who are aware of these mistakes continue to make them much of the time. This, then, is how people behave. According to the descriptivist approach, we should let biases rule, and let informed thinking go by the wayside: The property of being something that occurs very often as part of human behavior does not bestow a sacred status on verbing or biases. Language evolves, as it perhaps ought to, but mindful people can strive for a direction of linguistic evolution that does not reduce clarity, increase redundancy, and encourage laziness of mind. I am all for positive evolution in the English language: changes that will make it more consistent, more rational, easier to understand, less redundant, and more elegant. Many of the changes brought about by the Internet and smart phones do not have this effect. To illustrate my point, I will include many examples of how brilliantly awesome English actually is below, but first, a bit more about verbing (and nouning).

I was reading the membership conditions for a retail chain, and realized that perhaps nouning[i] bothers me more than verbing. But why should either one? English is a language that has many words that serve as a verb and a noun with no change in spelling or pronunciation (‘park’, for example). Yet, as a native speaker of a non-Indo-European language, which has its moments of logical consistency, and which uses helping verbs so that neither verbing nor nouning ever need to happen, and further, as a native-level English speaker of 33 years[ii], I value the superior logical consistency of English, and don’t like seeing it eroded. (Note: Of course I realize that English pronunciation and spelling are not logical or consistent; there is a wonderful demonstration here: https://www.youtube.com/watch?v=2_dc65V7DV8), but trust me, and read on: English grammar, syntax, and punctuation are superbly, surprisingly, wonderfully logical. I will start small.

It may not be worth it, because I decided to give everyone full credit. (This means it ain’t worth it.)

It may not be worth it because I decided to give everyone full credit. (This means it may be worth it for some other reason.)

From a student paper: “Humans are over populating the world.” (This seems to indicate that humans have stopped reproducing, that they are no longer interested in populating the world. What the student really meant was “humans are overpopulating the world.”

Next, I would like to discuss hyphenation. Aniruddh Patel, in one of his talks, describes the hierarchical nature of language as follows: “If you know English, and I say the following sentence, ‘The girl who kissed the boy opened the door.’, … there is a sequence of words in that sentence: ‘the boy opened the door’ … But, if you speak English and understand English, you know it’s not the boy that opened the door; it’s the girl that opened the door. In other words, you don’t just interpret language in a left-to-right fashion. …” He goes on to explain that the phrases are hierarchically related such that ‘girl’ is linked to ‘opened’, not ‘boy’ which is right next ‘opened’.

Hyphens help us with another hierarchical aspect of language. An arithmetic analogy will help demonstrate this. 4 + 2 × 3 = 10 because precedence tells us to multiple 2 and 3 first. In arithmetic, we can use parentheses to change the hierarchy and override precedence: (4 + 2) × 3 = 18. This is exactly what hyphens do: They group words into concepts. Notice that the hyphen, typographically speaking, is rather short. It’s shorter than any of the letters in a monospaced font. That’s because, unlike dashes, hyphens serve to combine, not separate. Dashes, which are either ‘n’ long or ‘m’ long, serve to push words apart. The en dash, for example, is used for ranges, like 9–5 and Seattle–Atlanta. But let’s get back to hyphens.

Hyphens join two words into one concept, as in two-car garage, one-man band, and land-grant university. ‘Two’ and ‘car’ started life as separate concepts. In ‘two-car garage’, they are combined into a single concept, just as (4 + 2) got combined into a single number, 6. Again, just as 6 acted as one entire and single number on 3 in that multiplication above, ‘two-car’ acts as a single concept of garage size, when neither ‘two’ nor ‘car’ ordinarily signify size or width.

A friend once asked (on facebook): [. . .] purple people eaters: Are they purple people who eat people or people who eat purple people? While this may have been posted in jest, the logic applies to serious cases where the intended meaning matters. I responded: “Purple-people eaters eat people who are purple, while purple people eaters are purple in color themselves. In English, the modifiers gang up on the noun at the end unless you hyphenate.” I followed this up with two examples. “A community college association is a group of college-related people from the community whereas a community-college association is a group of colleges. In other languages, nouns have cases, so you don’t have these problems.” (Just as verbing doesn’t weird all languages, this type of problem also completely fails to occur in Turkish because nouns in noun strings get modified with suffixes that place them in their proper cases, doing the job of hyphens in English. The difficulty for native speakers of English is that they grow up speaking English, wherein it’s difficult to hear the sound of the hyphen. In Turkish, there is no mistaking the suffix; you hear it from the time you’re a little baby.

My second, and perhaps better example was the following. “The phrase ‘lake of fire Christians’ implies a lake consisting of ‘fire Christians’, whatever that would mean. What is typically meant is ‘lake-of-fire Christians’. English is ‘endian’ unless you override that with hyphens. In spoken language, we use inflection to make these things clear (which, again, is why many native speakers have a harder time than some ESL-speakers).

The subject line of an e-mail I received said, “Security for the Cloud Lunch in Portland”

This seems to indicate that someone is setting up security for a lunch event in Portland. What they meant, of course, was “Security-for-the-Cloud Lunch in Portland.” This type of mistake can very easily be avoided by rejecting the temptation to produce noun strings, and using the three little powerful words that make English work: ‘of’, ‘for’, and ‘from’. Calling it “Lunch Meeting about the Security of the Cloud, to take place in Portland” would remove all ambiguity.

Likewise, on the back of a 45-RPM record I recently bought[iii], the recording location is identified as “the crazy cat lady house” (another noun chain). It is clear, of course, that what is intended is “the crazy-cat-lady house” where “crazy cat lady” is a compound modifier, a unified concept and single descriptor for the house. Without proper hyphenation, the meaning is open to interpretations such as “the crazy-cat lady-house” (with the last hyphen not strictly necessary, but placed because this blog post is written, not spoken).

My first car was used, so I was a new car owner. I recently bought my third car, and it was brand new, so I am now a new-car owner (but not a new car owner).

The Coursera privacy policy states, “If you participate in an online course, we may collect from you certain student-generated content, such as assignments you submit to instructors, peer-graded assignments and peer grading student feedback.”

Note that the first (correct) expression “peer-graded assignments” and the later (incorrect) expression “peer grading student work” ought to be based on the same reasoning, so how could they end up different? It seems, based not only on this example, that people use other mental processes, not reasoning, for determining whether to hyphenate or not. These processes could be memorization, template-matching, or aesthetics.

There is support for this possibility. American English, as opposed to British English, is template-based in its treatment of punctuation with respect to quotation marks: They always have to be inside, unless they’re large characters like ‘?’. This is a purely aesthetic choice, and is not logical. The IT industry has, in recent years, protested this and switched to logical/British punctuation. I saw this reflected in Microsoft Word grammar-correction recommendations as early as 2012. (Way to go, Microsoft!)

“An Early Bird Sound Collage” is the title of a work by an experimental-music band. Do they mean it’s an early bird-sound collage, or an early-bird sound collage? (They are experimental, so the intended meaning could easily have been either.)

“Introducing the Möbius-Twisted Turk’s Head Knot” is a paper title from the Bridges 2015 conference. They got the first one right. Now, is it a twisted Turk and his head knot, or is it the Turk’s-head knot that’s twisted?

Compare the following. “Small plane crash” where we don’t know how big a plane it was, but the crash was a minor concern, and “small-plane crash” where we know that the plane was small, but the crash could still have been quite a big deal. In most cases, it is better not to be stingy with words; something like “a big crash involving a small plane” would be much clearer.

My next example is from course packs. In one case, one might be able to get a refund: “All packets are not refundable.” (unclear) vs. “All packets are non-refundable.” (quite clear!)

Here are some more examples from academia. There is a big difference between the “higher-ed budget” and a “higher ed budget” (we all want the latter). An “online learning report” is a report that gets posted online, while an “online-learning report” does not have to be posted, but it is about online learning. How about “main session outline” versus “main-session outline”?

What is the opposite of the right-hand rule? And what is the opposite of the right hand rule?[iv] The opposite of the former would be the left-hand rule, whereas the opposite of the latter would be the wrong hand rule.

Compare the expressions “proof of concept viruses for Linux” with “proof-of-concept viruses for Linux.” Again, from the tech fields: “no load gain” could be the opposite of “no-load gain”!

Even if it’s a stretch, one of the following could be about athletic performance whereas the other is clearly about academic performance: “college grade-point average” and “college-grade point average”

Here’s one from the field I teach in: Which technical term does not limit the model size: “small-signal model” and “small signal model”?

“Portland’s first clean air cab” was meant to indicate a regular car that that doesn’t pollute, but is written like a flying car that is not dirty.

I saw this on the web as well: THE HAITIAN TERRACING FOR HOPE PROJECT. One wonders if Hope Project will get some Haitian terracing, or if there is a Haitian project called ‘Terracing for Hope’. Since hyphenation cannot be imposed on proper nouns (such as the official name of a project), using one of the magic little words or changing word order could have helped this case: “A Project in Haiti: Terracing for Hope” or “The Haitian Project of Terracing for Hope” or “Terracing for Hope, a Haitian Project,” etc.

And how about all this free stuff we’re being sold all the time? This was seen on a billboard: “NEW TRANS-FAT FREE” with the word “free” on a separate line. It appears to imply that the new product has trans-fat, and is free. Likewise with all these products that sport the expression “gluten free”: apparently, there is gluten in it, but we’re not paying for the gluten.

And then, there is verb hyphenation, which confuses many people I know even more. Verbs are not hyphenated when used as verbs. However, when non-verbs are used with verbs as a compound verb, they do get hyphenated (and this is common sense). For example, “how to fly-fish” is very different from “how to fly fish.” In the latter, one tries to make fish fly. Likewise, “moonbathing” is about a person enjoying moonlight, whereas an expression like “moon bathing” suggests it’s the moon doing the bathing (assuming the rest of a proper sentence surrounds that expression). Similarly, “battle ready” (the battle is ready?) is very different from “battle-ready” (a compound modifier that shows that some person, equipment, or army is ready for battle).

Even after verbing turns a new word like ‘blog’ into a verb as well as a noun, combining it with the noun/adjective ‘video’ requires hyphenation: “learning to video blog” makes no sense, while “learning to video-blog” does.

Alright, perhaps we do not need to be so vigilant about compound modifiers all the time. Here is one I have seen where even I have to admit that context and common sense are quite sufficient to know what is meant even in the absence of a compulsively placed hyphen. It is “sexual abuse hysteria.” Perhaps, no hyphen is needed when an adjective becomes an adverb. I think this one is clear without the need for a hyphen.

Before leaving hyphenation, I must address adverbs. Adverbs are not hyphenated (although many well-meaning and thoughtful individuals do hyphenate them.)

You only need hyphenation when the target of a modifier is ambiguous, which is why adverbs are not entered into hyphenated compounds. Recall that in one of the examples above, ‘purple’ had the option of referring to the people being eaten or to the creature doing the eating, so we had to specify which by knowing when to and when not to use a hyphen. In the case of adverbs, as in “culturally sensitive employer,” for instance, there is no question about which word ‘culturally’ is attached to; there is no such thing as a “culturally employee”; so there is no ambiguity, and no need to waste time with hyphens.

Commas are another matter disproportionately consequential in comparison to the size of the punctuation mark involved. The following examples come from a variety of sources, but mostly from a delightfully brilliant book, to which I was introduced[v] during University Studies teacher training at Portland State University: Maxwell Nurnberg’s Questions You Always Wanted to Ask about English .

  1. Which statement clearly shows that not all bacteria are sphere-shaped?
  2. a) Christian A. T. Billroth called bacteria which had the shape of tiny spheres ‘cocci’. (In this case, it is implied that only some bacteria are spherical.)
  3. b) Christian A. T. Billroth called bacteria, which had the shape of tiny spheres, ‘cocci’. (In this case, it is implied that all bacteria are spherical.)
  1. Which sentence shows extraordinary powers of persuasion?
  2. I left him convinced he was a fool. (He is convinced, not I.)
  3. I left him, convinced he was a fool. (I am convinced.)
  1. Which is the dedication of a self-confessed polygamist?
  2. I dedicate this book to my wife, Edith, for telling me what to leave out. (In this case, he has one wife, whose name is Edith.)
  3. I dedicate this book to my wife Edith for telling me what to leave out. (In this case, we are led to believe he has at least one wife other than Edith. If it’s not clear why this is the case, see # 5 or # 6 below. The comma starts an explanation of who is being referred to.)

 

  1. In which sentence are you sure that “somatic” and “bodily” mean the same?
  2. Radioactive materials that cause somatic, or bodily, damage are to be limited

                     in their use. (In this case, ‘bodily’ is offered as a more familiar synonym for ‘somatic’.)

  1. Radioactive materials that cause somatic or bodily damage are to be limited

     in their use. (In this case, the implication is that ‘somatic’ and ‘bodily’ are mutually exclusive, hence mean different things.)

Nurnberg’s examples were my introduction to the power of the comma. They went beyond the boilerplate rules I had been taught, like “Never place a comma before ‘because’!” and “Always put a comma before ‘too’!”

Soon, I was noticing commas where they should not have been, and a lack of commas where they were badly needed.

I read the following at http://www.riskshield.com.au/Glossary.aspx [2]: “Technique, procedure and rule used by risk manager to identify asses and examine the risks.” The missing comma could really have helped with the change in meaning caused by the missing ‘s’ in ‘assess’. On the other hand, perhaps the risk manager’s job really is to identify asses. If so, this is one particularly frank document.

In the book MIDI Systems and Control [3], I came across the following, “RS422 … is a standard for balanced communications over long lines devised by the EIA (Electronics Industries Association)” (p. 23). Without a comma right after ‘long lines’, the sentence is open to the interpretation that it was long lines that were devised by the EIA, as opposed to RS422.

Here is some correct comma use (as one would expect from Stanford University). In The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Hastie, Tibshirani, and Friedman, there is a sentence “. . . in a study to try to predict whether the email was junk email, or ‘spam’” (p. 2). The comma is used, appropriately, in its explanation-signifying role. Later on, however, the authors say “In the handwritten digit example the output is one of 10 different digit classes . . .” (p. 9). This clearly needs a hyphen connecting ‘handwritten’ and ‘digit’. Currently (without the hyphen), it is the example that is handwritten, not the digits. This difference could be meaningful if one were referring to a solutions manual, for example, where examples are often handwritten.

Here’s an example I must have gotten from someone else or a book: “King Charles walked and talked; half an hour after, his head was cut off.” Let’s try it without the comma and the semicolon: “King Charles walked and talked half an hour after his head was cut off.” This is reminiscent of the English-class exercise that was making the rounds on facebook at one point: “A woman without her man is nothing.” which can be punctuated either as “A woman: without her, man is nothing.” or as “A woman, without her man, is nothing.” [4]

Grammarly also posted this headline about Peter Ustinov’s travels: “Highlights of his global tour include encounters with Nelson Mandela, an 800-year-old demigod and a dildo collector.” The absence of the Oxford comma turns what was meant as a list of three entities into one entity (Mandela) and a description of him (an 800-year-old demigod who collects dildos). As I mention elsewhere, the structure of other languages (such as Turkish) may be such that the Oxford comma is useless, but it clearly makes a difference to the meaning of a sentence in English.

Here’s an instance from an e-mail I once wrote. I had asked my friend, “Did I go too far into unnecessary details by way of explaining what I’m doing and why?” This question addresses my explanation of what I was doing and why I was doing it. It is different from “Did I go too far into unnecessary details by way of explaining what I’m doing, and why?” which asks why my explanation is considered to be excessive.

Nurnberg does a much better job of revealing the power and importance of the comma in his book than my examples here do. I think everyone who writes in English should own a copy and read it.

I also want to address redundancy a little. Someone once said this to me during a conversation: “. . . considering they’re both not at the same time. . .” This would make sense if the two events spoken of were not coincident with a third event, but, in this case, there was no third event. All that was meant was “considering they’re not at the same time . . .”

I also frequently hear “continue on …” and “return back …” (This is frustrating.) To continue is to go on. Therefore, to continue on becomes ‘to go on on’. (We’re all familiar with “ATM machine” and “PIN number”…)

Redundancy is necessary when life-threatening situations are handled by electronic or electro-mechanical systems. Let’s leave the redundancy to those cases, and stop wasting our breath with it.

And then there is the centipede sentence: “I’d take MLK Boulevard would be my suggestion.”

My boss at my old job, fifteen years ago, was amazing at these. I wonder if he composed any regular sentences; they all seemed to be along the lines of “The reason is is that there is an address conflict was what happened.” (Otherwise, he was clearly a genius. I don’t know why he talked like that.)

I have now firmly established myself, in this post, as the worst possible killjoy nerd geek compulsive so-called “grammar N**i” ever, so let me close on a positive note.

I wrote this ridiculously long post because I love the English language, and I want people, especially its native speakers, to treat it well. I also love Turkish, Portuguese, German, and Japanese, and if you have ever read engineering material written in English by Japanese engineers, you know that the structure of non-Indo-European languages must be very different. The agglutinative use of cases in Turkish makes most of the discussion of this post unnecessary, but there is one thing Spanish, Portuguese, English, etc. have that I’m quite envious of: THE SUBJUNCTIIIIIVE (Cue dark, scary music.)

The subjunctive, which is still going strong in Spanish, but has only a few surviving occasions of use in English, draws a distinction between fact and possibility, or between wishes and truths: “if he were” makes it sound wrong to indicate something hypothetical as being actual. (Note how it does not go “if he was” but switches to the awkward subjunctive, the non-reality case.) This is scientific thinking built into the language. The speaker is required to differentiate between factual cases and wishing or wondering. (If only we could get the health-care industry to differentiate between factual evidence- and mechanism-based care and wishful-thinking-based care.)

In conclusion: Languages are awesome. Let’s not stand by and watch them get eroded into redundancy and lack of clarity by mental (and technologically aided) sloth. If languages change, fine; let them change well, preserving the characteristics that allow humans to communicate with precision and subtlety. Good communication saves lives.

And what is the deal with “the both” in 2016? The expression ‘both of’ is intended for a different meaning than the expression ‘the two’. There was no reason to make a hybrid that goes ‘the both’. Please, everyone, stop saying this.

“The two of them went away, but we stayed.” (There were more than two people involved.)

“We both went away.” (There were only two people involved.)

English has set up this great way to incorporate set theory into the language. Why are we messing it up? Consider the following:

“Are you ready to compare both files?” (I would be, if I wanted to compare two files each with a third. However, if the comparison were simply between two files, it’s “Are you ready to compare the two files?”)

PS: I need to get this one off my chest, too: “Computation methods” would be methods of computation, whereas “computational methods” would be methods that make use of computation. And don’t get me started on ‘methodologies’. How many people actually study methods? (I’m glad to have witnessed this being brought up at a discussion during the AAWM 2016 conference.)

There.

[1] Nurnberg, M., Questions You Always Wanted to Ask about English (but were afraid to raise your hand), New York: Washington Square Press, 1972.

[2] http://www.riskshield.com.au/Glossary.aspx (not there anymore)

[3] Rumsey, F., MIDI Systems and Control (Second Edition), Oxford, UK: Focal Press, 1994.

[4] facebook.com/Grammarly

[i] “12-month spend of $500” it said. At least it’s hyphenated correctly, but what was wrong with the noun ‘spending’ that we need a new noun to replace it?

[ii] People who know me can attest to this.

[iii] From the band NASALROD

[iv] Again, altering the word order could clarify such cases: After all, we never say “thumb rule” instead of “rule of thumb”; we always take the time to say “rule of thumb”!

[v] Like most of the important things in life