zhuanquan commented Feb 20, 2017

help! when i run the following code, my loss function diverges, please can someone explain why?

batch_size = 128

#regularisation parameter
beta = 0.001

#2 hidden layers, neural network
hidden_nodes1 = 1024
hidden_nodes2 = 512

keep_prob = 0.5 #probability of drop out
initial_learning_rate = 0.5

graph = tf.Graph()
with graph.as_default():

Input data. For the training data, we use a placeholder that will be fed

at run time with a training minibatch.

tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)

Variables.

hidden_weights1 = tf.Variable(
tf.truncated_normal([image_size * image_size, hidden_nodes1]))
hidden_biases1 = tf.Variable(tf.zeros([hidden_nodes1]))
hidden_layer1 = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights1)
+ hidden_biases1)
hidden_layer_drop1 = tf.nn.dropout(hidden_layer1, keep_prob) #Dropout added

hidden_weights2 = tf.Variable(
tf.truncated_normal([hidden_nodes1, hidden_nodes2]))
hidden_biases2 = tf.Variable(tf.zeros([hidden_nodes2]))
hidden_layer2 = tf.nn.relu(tf.matmul(hidden_layer_drop1, hidden_weights2)
+ hidden_biases2)
hidden_layer_drop2 = tf.nn.dropout(hidden_layer2, keep_prob) #Dropout added

weights = tf.Variable(tf.truncated_normal([hidden_nodes2, num_labels]))
biases = tf.Variable(tf.zeros([num_labels]))

Training computation.

logits = tf.matmul(hidden_layer_drop2, weights) + biases
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels,
logits=logits))
loss = loss + beta * tf.nn.l2_loss(weights)

Optimizer. Learning rate decreases with number of cycles

global_step = tf.Variable(0) # count the number of steps taken.
learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step,
100000, 0.95, staircase=True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,
global_step=global_step)

Predictions for the training, validation, and test data.

train_prediction = tf.nn.softmax(logits)

valid_relu1 = tf.nn.relu(tf.matmul(tf_valid_dataset, hidden_weights1) + hidden_biases1)
valid_relu2 = tf.nn.relu(tf.matmul(valid_relu1, hidden_weights2) + hidden_biases2)
valid_prediction = tf.nn.softmax(tf.matmul(valid_relu2, weights) + biases)

test_relu1 = tf.nn.relu(tf.matmul(tf_test_dataset, hidden_weights1) + hidden_biases1)
test_relu2 = tf.nn.relu(tf.matmul(test_relu1, hidden_weights2) + hidden_biases2)
test_prediction = tf.nn.softmax(tf.matmul(test_relu2, weights) + biases)

cipher982 commented Feb 22, 2017

@zhuanquan I would assume somewhere where the losses are being computed incorrectly. If I drop your initial learning rate by an order of magnitude or more it begins to minimize. But it will not work for me either when I start at 0.5

sahibzada-irfanullah commented Apr 29, 2017

@zhuanquan I would suggest to initialize your weights' variables with standard deviation between 0.1 and 0.2 i.e, weights = tf.Variable([size], stddev=stdvalue)

ashleylid commented May 23, 2017

Why do you use np.random.choice(np.arange(5)) instead of just np.random.choice(5)? Just looking at the docs: https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html and wondering what I am missing. Are they the same or slightly different? Or is this way easier for understanding?

yosemitebandit/3_regularization.ipynb

zhuanquan commented Feb 20, 2017

Uh oh!

cipher982 commented Feb 22, 2017

Uh oh!

sahibzada-irfanullah commented Apr 29, 2017

Uh oh!

ashleylid commented May 23, 2017

Uh oh!