Look, it’s really simple…

― Dude, did you see all this stuff people are doing with CobraSpit 17.6?!!

― Yeah, I’ve been wanting to do that for a while, but I had some trouble installing it. I went to the CobraSpit download site and it said:

If you’re using [some OS], follow these instructions

If you’re using [another OS], follow these instructions.

If you’re using Windows, please switch to Linux. We can’t be bothered.

― So I went on a bunch of forums, you know, pancakexchange and stuff, and finally found someone who said: “If you have to use Windows, install the Windows versions of the following tools native to Linux. [list of tools]”

― So, I installed everything, but SmokeRrring 4.2 conflicted with Squeek and I found out that’s because it can only work with Squeek 3 which is completely different from Squeek 2. However Rootabaga only works with Squeek 2 unless you install PPa (pron. Papua), which I did. Now all I had to do was to uninstall Squeek 2, but then:

“If your computer ever had a Squeek 2 installation or has even been in the same ZIP code as another machine with Squeek 2, you should set fire to your machine and switch to Linux. If you insist on using Windows, you may be able to get around the Squeek-2-vs.-3 problem by using Screech 5. Screech 5 is for ubuntu only, but you can install CrumblyCake 3 to make it work in Windows.”

(An hour later) “CrumblyCake 3 only works with Windows 7. If you have Windows 10, install WhippedCream 8.1 and Whisk 9.2 running inside Battenberg 4 in your CatUnicornLizard-8R or oRCAsCORPIONhAWKiBEXtIGER environment.”

So you look these up and (after 25 minutes of reading puns on “ate” and “eight”) find out that CatUnicornLizard-8R [CUL8R] has been bought by a corporation and incorporated into their “solution” which costs $12,000/year (and more if you want support).

orCAscorPIONhaWKibEXtiGER is still open-source, but to run on Windows, it needs TurnTable 17.3 running on top of Onions 4. The website for Onions 4 comes up “404” and the forums suggest another dozen or so layers of stuff you can install in its place.

Still helpful and sympathetic, your friend says:

― Alternatively, you can start a Neanderthal 3 process within an Urrk channel running on top of your Dayvid stack in Aghast, and if you can talk to that with a Gennifur script—Hey! How about building a UNIX box for that from scratch? … And why were we doing this anyway?

― We were gonna use CobraSpit 17.6 to do some cool stuff really quickly.

― Oh, nobody uses that anymore. You should try PixieRust 8.0. It’s like Banana 3 but better.

― How does that work?

― Well, you need to install a SeaShell environment first, but that only works if you have Coral Wreath 9, so start by creating a Babylon 5.0 sandbox inside a BurritoCircus virtualizer running on Celery. You need Dwindle 3 for that though, which is really just an Ohmlette 5.9 instance in a fryingPeterPan shell, so it’s no big deal if you’re running ubuntu.

― Ah… I was on Windows, remember?

― Well, then why don’t you just install Screech 5?!

― Uhmm, that’s what we were trying to do in the first place.



[Look, it’s really simple… or, “Can we just go back to FORTRAN on VAX/VMS?]

Teaching machine learning within different fields

Everyone is talking about machine learning (ML) these days. They usually call it “machine learning and artificial intelligence” and I keep wondering what exactly they mean by each term.

It seems the term “artificial intelligence” has shaken off its negative connotations from back when it meant top-down systems (as opposed to the superior bottom-up “computational intelligence” that most of today’s so-called AI actually uses) and has come to mean what cybernetics once was: robotics, machine learning, embedded systems, decision-making, visualization, control, etc., all in one.

Now that ML is important to so many industries,application areas, and fields, it is taught in many types of academic departments. We approach machine learning differently in ECE, in CS, in business schools, in mechanical engineering, and in math and statistics programs. The granularity of focus varies, with math and CS taking the most detailed view, followed by EC and ME departments, followed by the highest-level applied version in business schools, and with Statistics covering both ends.

In management, students need to be able to understand the potential of machine learning and be able to use it toward management or business goals, but do not have to know how it works under the hood, how to implement it themselves, or how to prove the theorems behind it.

In computer science, students need to know the performance measures (and results) of different ways to implement end-to-end machine learning, and they need to be able to do so on their own with a thorough understanding of the technical infrastructure. (If what I have observed is generalizable, they also tend to be more interested in virtual and augmented reality, artificial life, and other visualization and user-experience aspects of AI.)

In math, students and graduates really need to understand what’s under the hood. They need to be able to prove the theorems and develop new ones. It is the theorems that lead to powerful new techniques.

In computer engineering, students also need to know how it all works under the hood, and have some experience implementing some of it, but don’t have to be able to develop the most efficient implementations unless they are targeting embedded systems. In either case, though, it is important to understand the concepts, the limitations, and the pros and cons as well as to be able to carry out applications. Engineers have to understand why there is a such a thing as PAC, what the curse of dimensionality is and what it implies for how one does and does not approach a problem, what the NFL is and how that should condition one’s responses to claims of a single greatest algorithm, and what the history and background of this family of techniques are really like. These things matter because engineers should not expect to be plugging-and-playing cookie-cutter algorithms from ready-made libraries. That’s being an operator of an app, not being an engineer. The engineer should be able to see the trade-offs, plan for them, and take them into account when designing the optimal approach to solving each problem. That requires understanding parameters and structures, and again the history.

Today, the field of ‘Neural Networks’ is popular and powerful. That was not always the case. It has been the case two other times in the past. Each time, perhaps like an overextended empire, the edifice of artificial neurons came down (though only to come up stronger some years later).

When I entered the field, with an almost religious belief in neural networks, they were quite uncool. The wisdom among graduate students seemed to be that neural nets were outdated, that we had SVMs now, and with the latter machine learning was solved forever. (This reminds me of the famous patent-office declaration in the late 1800s that everything that could be invented had been invented.) Fortunately, I have always benefited from doing whatever was unpopular, so I stuck to my neural nets, fuzzy systems, evolutionary algorithms, and an obsession with Bayes’ rule while others whizzed by on their SVM dissertations. (SVMs are still awesome, but the thing that has set the world on fire is neural nets again.)

One of the other debates raging, at least in my academic environment at the time, was about “ways of knowing.” I have since come to think that science is not a way of knowing. It never was, though societies thought so at first (and many still think so). Science is a way of incrementally increasing confidence in the face of uncertainty.

I bring this up because machine learning, likewise, never promised to have the right answer every time. Machine learning is all about uncertainty; it thrives on uncertainty. It’s built on the promise of PAC learning; i.e., it promises to be only slightly wrong and to be so only most of the time. The hype today is making ML seem like some magical panacea to all business, scientific, medical, and social problems. For better or worse, it’s only another technological breakthrough in our centuries-long adventure of making our lives safer and easier. (I’m not saying we haven’t done plenty of wrongs in that process—we have—but no one who owns a pair of glasses, a laptop, a ball-point pen, a digital piano, a smart phone, or a home-security system should be able to fail to see the good that technology has done for humankind.)

I left the place of the field of Statistics in machine learning until the end. They are the true owners of machine learning. We engineering, business, and CS people are leasing property on their philosophical (not real) estate.


Herbie Hancock's Chameleon's BPM graph from the Android app 'liveBPM' (v. 1.2.0) by Daniel Bach

Listening to music seems easy.

Listening to music seems easy; it even appears like a passive task.

Listening, however, is not the same as hearing. In listening, i.e., attending, we add cognition to perception. The cognition of musical structures, cultural meanings, conventions, and even of the most fundamental elements themselves such as pitch or rhythm turns out to be a complex cognitive task. We know this is so because getting our cutting-edge technology to understand music with all its subtleties and its cultural contexts has proven, so far, to be impossible.

Within small fractions of a second, humans can reach conclusions about musical audio that are beyond the abilities of the most advanced algorithms.

For example, a trained or experienced musician (or even non-musician listener) can differentiate computer-generated and human-performed instruments in almost any musical input, even in the presence of dozens of other instruments sounding simultaneously.

In a rather different case, humans can maintain time-organizational internal representations of music while the tempo of a recording or performance continuously changes. A classic example is the jazz standard Chameleon by Herbie Hancock off the album ‘HEADHUNTERS’. The recording never retains any one tempo, following an up-and-down contour and mostly getting faster. Because tempo recognition is a prerequisite to other music-perception tasks like meter induction and onset detection, this type of behavior presents a significant challenge to signal-processing and machine-learning algorithms but generally poses no difficulty to human perception.

Another example is the recognition of vastly different cover versions of songs: A person familiar with a song can recognize within a few notes a cover version of that song done in another genre, at a different tempo, by another singer, and with different instrumentation.

Each of these is a task that is well beyond machine-learning techniques that are exhibiting remarkable successes with visual recognition where the main challenge, invariance, is less of an obstacle than the abstractness of music and its seemingly arbitrary meanings and structures.

Consider the following aspects of music cognition.

  • inferring a key (or a change of key) from very few notes
  • identifying a latent underlying pulse when it is completely obscured by syncopation [Tal et al., Missing Pulse]
  • effortlessly tracking key changes, tempo changes, and meter changes
  • instantly separating and identifying instruments even in performances with many-voice polyphony (as in Dixieland Jazz, Big-Band Jazz, Baroque and Classical European court music, Progressive Rock, folkloric Rumba, and Hindustani and Carnatic classical music)

These and many other forms of highly polyphonic, polyrhythmic, or cross-rhythmic music continue to present challenges to automated algorithms. Successful examples of automated tempo or meter induction, onset detection, source separation, key detection, and the like all work under the requirement of tight limitations on the types of inputs. Even for a single such task such as source separation, a universally applicable algorithm does not seem to exist. (There is some commercial software that appear to do these tasks universally, but because proprietary programs do not provide sufficiently detailed outputs, whether they really can perform all these function or whether they perform one function in enough detail to suffice for studio uses is uncertain. One such suite can identify and separate every individual note from any recording, but does not perform source separation into streams-per-instrument and presents its output in a form not conducive to analysis in rhythmic, harmonic, melodic, or formal terms, and not in a form analogous to human cognitive processing of music.)

Not only does universal music analysis remain an unsolved problem, but also most of the world’s technological effort goes toward European folk music, European classical music, and (international) popular music. The goal of my research and my lab (Lab BBBB: Beats, Beats, Bayes, and the Brain) is to develop systems for culturally sensitive and culturally informed music analysis, music coaching, automated accompaniment, music recommendation, and algorithmic composition, and to do so for popular music styles from the Global South that are not in the industry’s radar.

Since the human nervous system is able to complete musical-analysis tasks under almost any set of circumstances, in multiple cultural and cross-cultural settings, with varying levels of noise and interference, the human brain is still superior to the highest-level technology we have developed. Hence, Lab BBBB takes inspiration and direct insight from human neural processing of audio and music to solve culturally specific cognitive problems in music analysis, and to use this context to further our understanding of neuroscience and machine learning.

The long-term goal of our research effort is a feedback cycle:

  1. Neuroscience (in simulation and with human subjects at our collaborators’ sites) informs both music information retrieval and research into neural-network structures (machine learning). We are initially doing this by investigating the role of rhythm priming in Parkinson’s (rhythm–motor interaction) and in grammar-learning performance (rhythm–language interaction) in the basal ganglia. We hope to then replicate in simulation the effects that have been observed with people, verify our models, and use our modeling experience on other tasks that have not yet been demonstrated in human cases or that are too invasive or otherwise unacceptable.
  2. Work on machine learning informs neuroscience by narrowing down the range of investigation.
  3. Deep learning is also used to analyze musical audio using structures closer to those in the human brain than the filter-bank and matrix-decomposition methods typically used to analyze music.
  4. Music analysis informs cognitive neuroscience, we conjecture, as have been done in certain cases in the literature with nonlinear dynamics.
  5. Phenomena like entrainment and neural resonance in neurodynamics further inform the development of neural-network structures and data-subspace methods.
  6. These developments in machine learning move music information retrieval closer to human-like performance for culturally informed music analysis, music coaching, automated accompaniment, music recommendation, and algorithmic composition for multicultural intelligent music systems.


The subjunctive is scientific thinking built into the language.

The subjunctive draws a distinction between fact and possibility, between truths and wishes. The expression “if he were” (not “if he was”) is subjunctive; it intentionally sounds wrong (unless you’re used to it) to indicate that we’re talking about something hypothetical as opposed to something actual.
This is scientific thinking built into the language (coming from its romance-language roots).

This is beautiful. Let’s hold onto it.

You are not disinterested.

Everyone: Stop saying ‘disinterested’. You apparently don’t know what it means. It doesn’t mean ‘uninterested’.

In fact, it means you’re truly interested. ‘Disinterested’ is when you care so deeply as to want to treat the situation objectively. It is a scientific term describing the effort to rid a study of the effects of subconscious biases.

Also, please don’t say ‘substantive’ when all you mean is ‘substantial’. They’re not the same thing. Thanks. (‘Substantial’ is a good word. You’re making it feel abandoned. )

Microsoft: Fix your use of the word ‘both’.
When comparing only two files, Windows says something like “Would you like to compare both files?” As opposed to what, just compare one, all by itself? (like the sound of one hand clapping?)
The word ‘both’ is used when the default is not that of two things. It emphasizes the two-ness to show that the twoness is special, unusual. But when the default is two, you say “the two” (as in “Would you like to compare the two files?”), not ‘both’, and DEFINITELY NOT ‘the both’. (It was cute when that one famous said it once. It’s not cute anymore. Stop saying it.)
Back to ‘both’: A comparison has to involve two things, so ‘both’ (the special-case version of the word ‘two’) only makes sense if the two things are being compared to a third.
English is full of cool, meaningful nuances. I hope we stop getting rid of them.

Seriously, everyone: English is wonderful. Why are you destroying it?


PS: same with “on the one hand”… We used to say “on one hand” (which makes sense… either one, any one, not a definite hand with a definite article)


There are (at least) two types of scientists: scientist-scientists and science-doers.

Both groups do essential, difficult, demanding, and crucial work that everyone, including the scientist-scientists, needs. The latter group (like the former) includes people who work in research hospitals, water-quality labs, soil-quality labs, linear accelerators, R-&-D labs of all kinds, and thousands of other places. They carry out the daily work of science with precision, care, and a lot of hard work. Yet, at the same time, in the process of doing the doing of science, they typically do not get the luxury of stepping back, moving away from the details, starting over, and discovering the less mechanical, less operational connections among the physical sciences, the social sciences, the humanities, technology, business, mathematics, and statistics… especially the humanities and statistics.

I am not a good scientist, and that has given me the opportunity to step back, start over, do some things right this time, and more importantly, through a series of delightful coincidences, learn more about the meaning of science than about the day-to-day doing of it.[1] This began to happen during my Ph.D., but only some of the components of this experience were due to my Ph.D. studies. The others just happened to be there for me to stumble upon.

The sources of these discoveries took the form of two electrical-engineering professors, three philosophy professors, one music professor, one computer-science professor, some linguistics graduate students, and numerous philosophy, math, pre-med, and other undergrads. All of these people exposed me to ideas, ways of thinking, ways of questioning, and ways of teaching that were new to me.

As a result of their collective influence, my studies, and all my academic jobs from that period, I have come to think of science not merely as the wearing of lab coats and carrying out of mathematically, mechanically, or otherwise challenging complex tasks. I have come to think of science as the following of, for lack of a better expression, the scientific method, although by that I do not necessarily mean the grade-school inductive method with its half-dozen simple steps. I mean all the factors one has to take into account in order to investigate anything rigorously. These include double-blinding (whether clinical or otherwise, to deal with confounding variables, experimenter effects, and other biases), setting up idiot checks in experimental protocols, varying one unknown at a time (or varying all unknowns with a factorial design), not assuming unjustified convenient probability distributions, using the right statistics and statistical tests for the problem and data types at hand, correctly interpreting results, tests, and statistics, not chasing significance, setting up power targets or determining sample sizes in advance, using randomization and blocking in setting up an experiment or the appropriate level of random or stratified sampling in collecting data [See Box, Hunter, and Hunter’s Statistics for Experimenters for easy-to-understand examples.], and the principles of accuracy, objectivity, skepticism, open-mindedness, and critical thinking. The latter set of principles are given on p. 17 and p. 20 of Essentials of Psychology [third edition, Robert A. Baron and Michael J. Kalsher, Needham, MA: Allyn & Bacon, 2002].

These two books, along with Hastie, Tibshirani, and Friedman’s The Elements of Statistical Learning and a few other sources that are heavily cited papers on the misuses of Statistics have formed the basis of my view of science. This is why I think science-doing is not necessarily the same thing as being a scientist. In a section called ‘On being a scientist’ in a chapter titled ‘Methodology Wars’, the neuroscientist Fost explains how it’s possible, although not necessarily common, to be on “scientific autopilot” (p. 209) because of the way undergraduate education focuses on science facts and methods[2] over scientific thinking and the way graduate training and faculty life emphasize administration, supervision, managerial oversight, grant-writing, and so on (pp. 208–9). All this leaves a brief graduate or a post-doc period in most careers for deep thinking and direct hands-on design of experiments before the mechanical execution and the overwhelming burdens of administration kick in. I am not writing this to criticize those who do what they have to do to further scientific inquiry but to celebrate those who, in the midst of that, find the mental space to continue to be critical skeptical questioners of methods, research questions, hypothesis, and experimental designs. (And there are many of those. It is just not as automatic as the public seems to think it is, i.e., by getting a degree and putting on a white coat.)


Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, George E. P. Box, William G. Hunter, and J. Stuart Hunter, New York , NY: John Wiley & Sons, Inc., 1978 (0-471-09315-7)

Essentials of Psychology, third edition, Robert A. Baron and Michael J. Kalsher, Needham, MA: Allyn & Bacon, A Pearson Education Company, 2002

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, second edition, Trevor Hastie, Robert Tibshirani, and Jerome Friedman, New York, NY: Springer-Verlag, 2009 (978-0-387-84858-7 and 978-0-387-84857-0)

If Not God, Then What?: Neuroscience, Aesthetics, and the Origins of the Transcendent, Joshua Fost, Clearhead Studios, Inc., 2007 (978-0-6151-6106-8)

[1] Granted, a better path would be the more typical one of working as a science-doer scientist for thirty years, accumulating a visceral set of insights, and moving into the fancier stuff due to an accumulation of experience and wisdom. However, as an educator, I did not have another thirty years to spend working on getting a gut feeling for why it is not such a good idea to (always) rely on a gut feeling. I paid a price, too. I realize I often fail to follow the unwritten rules of social and technical success in research when working on my own research, and I spend more time than I perhaps should on understanding what others have done. Still, I am also glad that I found so much meaning so early on.

[2] In one of my previous academic positions, I was on a very active subcommittee that designed critical-thinking assessments for science, math, and engineering classes with faculty from chemistry, biology, math, and engineering backgrounds. We talked often about the difference between teaching scientific facts and teaching scientific thinking. Among other things, we ended up having the university remove a medical-terminology class from the list of courses that counted as satisfying a science requirement in general studies.

This is a sci-fi story.

The icers were out in force that night. Joey really didn’t like running into them. It wouldn’t be dangerous if he wouldn’t wear his lukers leather jacket when he went out alone, but unless he was a full-time luker—and flaunting it—he felt like a traitor. He felt like he wasn’t “real” or wasn’t noteworthy enough to be out on the streets. He also didn’t want to run into any girls while not visually declaring allegiance to his chosen subculture, so he wore the identifiers of his subculture even though it meant he’d likely get beaten bloody by icers or somebody else.

The icers were truly extreme, Joey thought. They claimed they would rather die than drink any but the most extreme ice-cold tap water. Lukers, like Joey, weren’t so picky, and indeed preferred avoiding brain freeze.

Water was no longer the simple commodity previous generations took for granted. It wasn’t exactly unobtainable, but most people had to save to get their monthly allotment, which was a very small amount, or perform community service for extra water which would then be delivered automatically to their approved smarthomes. Now that games, movies, music, fashion, food, cars, drones, jetpacks, body alterations, and everything else a youth could want was readily available through picofabrication, automation, and biotech, it was ironically the most basic substance of life, water, that became scarce—because desalination remained expensive—and thus became the marker of one’s identity as a young person: their subcultural in-group.

Aside from the icers and lukers, there were half a dozen other fine-grained varieties of tap-water subcultures (like the boilers and the JCs—”just cold”).

Mostly high-school kids, and mostly idle due to the abundance of free scavenged (recycled) energy for their picoautomation and their neural implants, these youths roamed the streets in their picofabricated faux-leather jackets emblazoned with their subcultural affiliation, and picked fights with members of other groups.

After the first few years of water shortage, this expression of identity through the one scarce resource that was critical for survival began to expand. Through a naturally stochastic clustering process, some hairstyles and preferences for clothes or shoes became associated with particular groups.

It just happened the way it did. There is no reason icers should prefer fur-lined boots and Christmas sweaters. If anything, one would expect the opposite. Yet, they wear them even in the summer… in the 130° globally warmed summers of Cascadia. That’s how you know you’ve got a genuine subculture: The clothing has got to be uncomfortable; it’s gotta require sacrifice.

The lukers likewise somehow ended up all having to wear havaianas, 20th-century motorcycle helmets over long green hair, tank tops (what the British call “vests”), bandannas tied right at the elbow, one on each arm, and pajama pants with teddy bears sewn unto them. (None of them knew that this last little detail originated with a bassist in a combo of ancient “rock” music from back when music was made by people playing instruments rather than autonomous conscious AI units that wrote every kind of music straight into digital encoding.) The more teddy bears one’s pants had on it, the greater would be their status as a luker.

Joey had found the time to get his automation to sew 37 onto his favorite pajama pants and another 24 on a different pair. The fact that he consequently couldn’t run was a big part of why the icers picked on him so much. They, on the other hand, spent most of their time getting their picobots to learn to assemble themselves into fists and feet for delivering punches and kicks from a distance.

So, Joey called up a game in his neural implant as he and his 37 teddy bears set out onto the streets of Seaportouver, bracing themselves—not so much the teddy bears but Joey’s bio-body and all his affiliated picobots and neurally linked semi-autonomous genetic floaters—against the onslaught of icer attacks and against old people who look disdainfully at his awkward teddy-bear-encumbered gait and transmit unsolicited neuro-advice that clogs up his game for an entire interminable microsecond, in search of a thimblefull of lukewarm water.


This mini sci-fi story is an attempt to draw a parallel between how ridiculous and unlikely such tap-water-based subcultures of street-fighting youth might seem to us, and how the music-based subcultures of my youth in the ’80s must seem to today’s youth.

Music, after all, is like water now: You turn the tap, and it pours out—out of YouTube, Spotify, Pandora, Slacker, or SoundCloud, and in a sense, also out of GarageBand, FruityLoops, Acid, and myriad other tools for generating music from loops. A few dozen people in a few offices in LA may make those loops—they’re like the people working the dams and the people who run the municipal water bureau or whatever. They supply the water that we take for granted, and it just flows out of the tap, not requiring any thought or effort on our part about how it got there or how much of it there might be. Music today works the same way. You exchange memory cards or streaming playlists; you download free software that allows you to drag and drop loops and which makes sure they are in the same key and tempo. It’s about as complicated as making lemonade. Why would such a thing have any relation to one’s identity and individuality?

In contrast, when I was young, I had to save money for a year and still beg my parents for a long-playing record. I could also occasionally buy some cheap tapes or record songs off the radio (almost always with the beginning cut off and with a DJ talking over the end) onto cheap low-fi cassettes that had more hiss than hi-hat. My first compact disc, a birthday present from a wealthy relative, was like an alien artifact. It still looks a bit magical to me… so small and shiny. Today, I hear they’re referred to as “coasters” because… why bother putting music on a recording medium when it’s free and ubiquitous? 

Subculture-as-identity-marker has disappeared except among the old. (How old is Iggy today, or the guys from The Clash?) Young people today dress in combinations of the “uniforms” of ‘50s, ‘60s, ‘70s, ‘80s, and ‘90s subcultures without having any interest in the sociopolitics or music of those subcultures. The last three times I talked to a―seemingly―fellow goth or punk rocker, they reacted with mild repulsion at the suggestion that they might listen to such music.

Expressing allegiance to a musical subculture must seem as silly to today’s youth (say, through age 30 or so) as expressing allegiance to a temperature of water would seem to anyone.


Zeno’s thermometers?

A friend just told me about the xkcd idea for the “Felcius scale” which is the arithmetic mean of the Fahrenheit and centigrade (Celsius) scales. Naturally, my first thought was that this was a funny but pointless idea since it discarded the advantages of the centigrade scale, which was renamed ‘Celsius’, but I’m using the old name to emphasize the 0-to-100 advantage. (A better discussion is found here: http://www.explainxkcd.com/wiki/index.php/1923:_Felsius)

My friend, however, suggested that it was a step in the right direction.

If so, it isn’t enough. If this idea were to take hold, we would need another such step in the right direction, perhaps to be called the “Eelsius” which would take us another 50% of the way to Celsius, and eventually another halfway jump to “Delsius”, and (aside from running out of characters between ‘f’ and ‘c’), a nice little Zeno’s paradox of temperature-scale systems that asymptotically approach the logical centigrade scale.

Overfitting, Confirmation Bias, Strong AI, and Teaching

I was asked recently by a student about how machine learning could happen. I started out by talking about human learning: how we don’t consider mere parroting of received information to be same as learning, but that we can make the leap from some examples we have seen to a new situation or problem that we haven’t seen before. Granted there need to be some similarities (shared structure or domain of discourse—we don’t become experts on European Union economics as a result only of learning to distinguish different types of wine), but what makes learning meaningful and fun for us is the ability to make a leap, to solve a previously inaccessible problem or deduce (really it’s ‘induce’) a new categorization.

In response, the student asked how machines could do that. I replied that not only do we give them many examples to learn from, but we also give them algorithms (ways to deal with examples) that are inspired by how natural systems work: inspired by ants or honeybees, genetics, the immune system, evolution, languages, social networks and ideas (memes), and even just the mammalian brain. (One difference is that, so far, we are not trying to make general-purpose consciousness in machines; we are only trying to get them to solve well-defined problems very well, and increasingly these days, not-so-well-defined problems also).

So, then the student asked how machines could make the leap just like we can. This led me to bring up overfitting and how to avoid it. I explained that if a machine learns the examples it is given all too well, it will not be able to see the forest for the trees—it will be overly rigid, and will want to make all novel experiences fit the examples in its training. For new examples that do not fit, it will reject them (if we build that ability into it), or it will make justifiable wrong choices. It will ‘overfit’, in the language of machine learning.

Then it occurred to me that humans do this, too. We’ve all probably heard the argument that stereotypes are there for a reason. In my opinion, they are there because of the power of confirmation bias (not to mention, sometimes selection bias as well—consider the humorous example of the psychiatrist who believes everyone is psychotic).

Just as a machine-learning algorithm that has been presented with a set of data will learn the idiosyncrasies of that data set if not kept from overfitting by early-stopping, prestructuring, or some other measure, people also overfit to their early-life experiences. However, we have one other pitfall compared to machines: We continue to experience new situations which we filter through confirmation bias to make ourselves think that we have verification of the validity of our misinformed or under-informed early notions. Confirmation bias conserves good feelings about oneself. Machines so far do not have this weakness, so they are only limited by what data we give them; they cannot filter out inconvenient data the way we do.

Another aspect of this conversation turned out to be pertinent to what I do every day. Not learning the example set so well is advantageous not only for machines but for people as well, specifically for people who teach.

I have been teaching at the college level since January 1994, and continuously since probably 2004, and full-time since 2010, three or four quarters per year, anywhere from two to five courses per quarter. I listed all this because I need to point out, for the sake of my next argument, that I seem to be a good teacher. (I got tenured at a teaching institution that has no research requirement but very high teaching standards.) So, let’s assume that I can teach well.

I was, for the most part, not a good student. Even today, I’m not the fastest at catching on, whether it’s a joke, an insult, or a mathematical derivation. (I’m nowhere near the slowest, but I’m definitely not among the geniuses.) I think this is a big part of why I’m a good teacher: I know what it’s like not to get it, and I know what I have had to do to get it. Hence, I know how to present anything to those who don’t get it, because, chances are, I didn’t get it right away either.

But there is more to this than speed. I generate analogies like crazy, both for myself and for teaching. Unlike people who can operate solely at the abstract level, I make connections to other domains—that’s how I learn; I don’t overfit my training set. I can take it in a new direction more easily, perhaps, than many super-fast thinkers. They’re right there, at a 100% match to the training set. I wobble around the training set, and maybe even map it to n+1 dimensions when it was given in only n.

Overfitting is not only harmful to machines. In people, it causes undeserved confidence in prejudices and stereotypes, and makes us less able to relate to others or think outside the box.

One last thought engendered by my earlier conservation with this student: The majority of machine-learning applications, at least until about 2010 or maybe 2015, were for well-defined, narrow problems. What happens when machines that are capable of generalizing well from examples in one domain, and in another, and in another, achieve meta-generalization from entire domains to new ones we have not presented them with? Will they attain strong AI as a consequence of this development (after some time)? If so, will they, because they’ve never experienced the evolutionary struggle for survival, never develop the violent streak that is the bane of humankind? Or will they come to despise us puny humans?


Four simple tricks to solve many of your grammar questions without having to search online

REMOVE A WORD: “Me and Mike like ice cream.” becomes “Me like ice cream.” Apparently, that’s not it, so the original sentence should have been “I and Mike (both) like ice cream.”

ANSWER A QUESTION: “Who should I ask?” The answer could be: “You should ask hiM.” Therefore, the first sentence should have been “Whom should I ask?”—the ‘m’s match.

ASK A QUESTION: “industrial music group”: What kind of group? The industrial kind (as well as music kind)

as opposed to

“industrial-music group”: What kind of group? The industrial-music kind

CHANGE THE ORDER: The expression “music industrial group” fails in a different way (and also means something very different) than the expression “red big house” would fail in comparison to “big red house” (so, a hyphen was needed).

“Big red house” is both correct and proper, and a hyphen would be wrong between ‘big’ and ‘red’. The two modifiers ‘big’ and ‘red’ are independent of each other; they act separately. The house could also have been small and red, or big and green.

The other two modifiers, ‘industrial’ and ‘music’ (the latter a noun that tells what type of group) are not independent when what we mean is Einstuerzende Neubauten or Cabaret Voltaire. The opposite is true when we are talking about Roland, Yamaha, Korg, and Nord, for example.