Ilya Kamen ikamensh

Since we can build a NN using just matrix multiplication (for the case of no activation function), the outputs are just linear combinations of the inputs. Adding something like ReLU adds a non-linearity, but can never increase the power of an argument.

For example, below is the output of a network that was trained to predict f(x) = x^2, trained on the interval [-1, 1]. Test data lies in the interval [-2, 2], and obviously network perfectly predicts the data it has seen, without making useful generalisation.

This is the same for many architectures, including CNNs, RNNs, and others. Is that a significant limitation in practice? what kind of dataset would demonstrate this limitation (not a toy one like I used below)

	import numpy as np
	from matplotlib import pyplot as plt


	def noisy_mapping( mapping ):
	def _(x):
	y = mapping(x)
	y += np.random.normal(0, 2, size=y.size)
	return y
	return _

	from keras.models import Sequential
	from keras.layers import Dense, Dropout, LeakyReLU
	from keras.optimizers import adam

	ad = adam()

	size_factor=128

	predictor=Sequential()