yosemitebandit/3_regularization.ipynb

Last active June 8, 2017 01:49

Star (3) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/yosemitebandit/8aec5677e69017bed04c.js"></script>
Save yosemitebandit/8aec5677e69017bed04c to your computer and use it in GitHub Desktop.

Download ZIP

udacity neural network course -- assignment 3.4, 3-layer NN with regularization and dropout

Raw

3_regularization.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

aakashef commented Nov 18, 2016

The problem asks for restricting the data which is achieved using offset = batch_size * np.random.choice(np.arange(5))
Number of steps does not needs to be reduced to show the overfitting.

gronat commented Nov 18, 2016 •

edited

Loading

I am curious, why did you initialize from truncated normal with stddev = sqrt(2 / <input_size>)? Why not just truncated, say, stddev = 0.1 for all layers?

zhuanquan commented Feb 20, 2017

help! when i run the following code, my loss function diverges, please can someone explain why?

batch_size = 128

#regularisation parameter
beta = 0.001

#2 hidden layers, neural network
hidden_nodes1 = 1024
hidden_nodes2 = 512

keep_prob = 0.5 #probability of drop out
initial_learning_rate = 0.5

graph = tf.Graph()
with graph.as_default():

Input data. For the training data, we use a placeholder that will be fed

at run time with a training minibatch.

tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)

Variables.

hidden_weights1 = tf.Variable(
tf.truncated_normal([image_size * image_size, hidden_nodes1]))
hidden_biases1 = tf.Variable(tf.zeros([hidden_nodes1]))
hidden_layer1 = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights1)
+ hidden_biases1)
hidden_layer_drop1 = tf.nn.dropout(hidden_layer1, keep_prob) #Dropout added

hidden_weights2 = tf.Variable(
tf.truncated_normal([hidden_nodes1, hidden_nodes2]))
hidden_biases2 = tf.Variable(tf.zeros([hidden_nodes2]))
hidden_layer2 = tf.nn.relu(tf.matmul(hidden_layer_drop1, hidden_weights2)
+ hidden_biases2)
hidden_layer_drop2 = tf.nn.dropout(hidden_layer2, keep_prob) #Dropout added

weights = tf.Variable(tf.truncated_normal([hidden_nodes2, num_labels]))
biases = tf.Variable(tf.zeros([num_labels]))

Training computation.

logits = tf.matmul(hidden_layer_drop2, weights) + biases
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels,
logits=logits))
loss = loss + beta * tf.nn.l2_loss(weights)

Optimizer. Learning rate decreases with number of cycles

global_step = tf.Variable(0) # count the number of steps taken.
learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step,
100000, 0.95, staircase=True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,
global_step=global_step)

Predictions for the training, validation, and test data.

train_prediction = tf.nn.softmax(logits)

valid_relu1 = tf.nn.relu(tf.matmul(tf_valid_dataset, hidden_weights1) + hidden_biases1)
valid_relu2 = tf.nn.relu(tf.matmul(valid_relu1, hidden_weights2) + hidden_biases2)
valid_prediction = tf.nn.softmax(tf.matmul(valid_relu2, weights) + biases)

test_relu1 = tf.nn.relu(tf.matmul(tf_test_dataset, hidden_weights1) + hidden_biases1)
test_relu2 = tf.nn.relu(tf.matmul(test_relu1, hidden_weights2) + hidden_biases2)
test_prediction = tf.nn.softmax(tf.matmul(test_relu2, weights) + biases)

cipher982 commented Feb 22, 2017

@zhuanquan I would assume somewhere where the losses are being computed incorrectly. If I drop your initial learning rate by an order of magnitude or more it begins to minimize. But it will not work for me either when I start at 0.5

sahibzada-irfanullah commented Apr 29, 2017

@zhuanquan I would suggest to initialize your weights' variables with standard deviation between 0.1 and 0.2 i.e, weights = tf.Variable([size], stddev=stdvalue)

ashleylid commented May 23, 2017

Why do you use np.random.choice(np.arange(5)) instead of just np.random.choice(5)? Just looking at the docs: https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html and wondering what I am missing. Are they the same or slightly different? Or is this way easier for understanding?