maxpagels · August 26, 2018 13:47
diff --git a/ml-jargon.csv b/ml-jargon.csv
	Term	Explanation
	model	one or more functions trying to explain some system/environment works. Most models are terrible approximations of the real world, but some are less terrible than others. The whole idea of machine learning is to find the least terrible model for a particular problem. Yes, we're pessimists.
	regression	predicting a number; e.g. predicting someone's income based on their education, country and so on.
	logistic regression	classification; predicting if something belongs to a particular class ('is this photo a photo of a cat or a dog?'). Logistic regression is one of many learning algorithms for classification.
	feature engineering	massaging data so that it yields the most predictive power; generating variables from data that fit our understanding of the context we're trying to model but are not found as-is in the raw data we are using.
	hypothesis space	a set of possible functions. It's machine learning's job to learn which of these possible functions best approximates the relationship between input and output.
	hyperparameters	a set of configuration values. Actually, we just wanted a cooler name for configuration. All models have some types of configuration variables, be that the depth of a tree-based classification model or the amount of clusters we want to find in clustering methods. We spend a lot of time fiddling around with hyperparameters, because they have a huge impact on training time and model accuracy.
	ensemble	a collection of models working together. More often than not, a single model will contain some faults that can be assisted by feeding its results to another one. This is more common in machine learning competitions than real life.
	boosting	the same as an ensemble, but with a single type of model fitted many times; say, by fitting many classification trees recursively, thus improving the accuracy of the ones before it.
	matrix	data consisting of nothing but numbers. Few models can work with non-numeric data - albeit even those will transform non-numeric data to numeric under the hood - so we mainly operate using matrices.
	confusion matrix	a summary of classification results. Classification can go wrong in many ways. You can predict someone is female when they are male, and vice-versa. You can also predict these things correctly. Confusion matrices show us how wrong we are in each different case.
	continuous variable	it's a number. The height of a person is a continuous variable. The opposite of a continuous variable is a discrete variable (like, say, the output of a binary classifier, 0/1).
	imputation	replacing missing values. Learning algorithms don't like data with missing values. To address this issue, we replace missing values with something fitting for the context. For height, it could be median height by gender.
	target/response/label	the thing we're trying to predict. E.g. if we want to predict customer churn, then churn/non-churn are our targets.
	training	trying to make a machine learn something; taking a set of data and letting the computer find if there is any relation between the dataset's features and the given target variable. Almost all machine learning entails some form of training.
	cross validation	assessing how well a model generalises to data it's never seen before. If we train a model for predicting a person's income, we can numerically validate how well it generalises by comparing it to labelled data it's not seen during training.
	supervised learning	machine learning on fully labelled data. If we have data with the 'answers' for the thing we're trying to predict/classify, it's easier to numerically validate how much we are messing up and how we should change our parameters to mess up less. That's why supervised learning is popular. The downside is that fully labelled data is hard to get and/or laboursome to make.
	unsupervised learning	machine learning on data that isn't labelled. Targets are what we train our models on, i.e. the things we want to predict. There are, however, lots of problems where where we don't have, or indeed want, a fixed set of correct answers. We may, for example, want to group data (e.g. text) into different clusters (e.g. topics) without fixing the clusters beforehand.
	non-parametric algorithm	a learning algorithm where we don't place restrictions on the number of parameters/weights learned functions can have . A non-parametric model may, in fact, have thousands or millions of parameters.
	parametric algorithm	a learning algorithm where we do place restrictions on the number of parameters/weights learned functions can have. Why these aren't called 'fixed-parameter algorithms' or something similar is beyond us.
	vectorisation	matrix/vector calculations. Training a machine learning model is a process of iteration. For/while loops are obvious choices for control flow in code, but for some calculations, it turns out you can achieve the same end result by using by grouping values into matrices /vectors and doing operations (addition, multiplication, and so on) on them. It's usually much faster, which is the main reason we like vectorising stuff. If also makes our code incomprehensible jibberish, but we don't really care.
	credit assignment problem	figuring out what should get the credit for some action. In reinforcement learning, where we try to learn the best action to take at any given moment, we may only get feedback on how well we did much later on. A classic example is chess, where we only get to know how we did (win/loss) when the game is over. Assigning credit to individual moves is difficult, which is why we gave it a name.
	convolution	Summing up element-wise dot products between matrices. To be exact, this isn't even a mathematical convolution, but cross-correlation. We just skip some classical convolution operations because we don't need them, making mathematicians angry in the process.
	meta-learning	learning how to learn. Finding the best configuration for a dataset/learning algorithm combination is a time-consuming process of trial and error. Since learning algorithms, erm, learn, why not have them learn how to learn optimally? Saves us the trouble of doing it ourselves.
	deep learning	neural networks with more than one hidden layer. More layers can capture more complex relationships between input and output. But deep learning sounds more sci-fi than 'more than one hidden layer', so we went with that instead.
	Bayes error	the irreducible error of a model. There's usually some element of randomness in data generated by a given process, which means any model we train on data generated by that process will inevitably be wrong some of time.
	innate prior	enforcing common sense in learning algorithms. Typically done by designing the learning algorithm directly so that not doing the common sense thing is impossible.