Last active
June 8, 2017 01:49
-
-
Save yosemitebandit/8aec5677e69017bed04c to your computer and use it in GitHub Desktop.
udacity neural network course -- assignment 3.4, 3-layer NN with regularization and dropout
@zhuanquan I would assume somewhere where the losses are being computed incorrectly. If I drop your initial learning rate by an order of magnitude or more it begins to minimize. But it will not work for me either when I start at 0.5
@zhuanquan I would suggest to initialize your weights' variables with standard deviation between 0.1 and 0.2 i.e, weights = tf.Variable([size], stddev=stdvalue)
Why do you use np.random.choice(np.arange(5))
instead of just np.random.choice(5)
? Just looking at the docs: https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html and wondering what I am missing. Are they the same or slightly different? Or is this way easier for understanding?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
help! when i run the following code, my loss function diverges, please can someone explain why?
batch_size = 128
#regularisation parameter
beta = 0.001
#2 hidden layers, neural network
hidden_nodes1 = 1024
hidden_nodes2 = 512
keep_prob = 0.5 #probability of drop out
initial_learning_rate = 0.5
graph = tf.Graph()
with graph.as_default():
Input data. For the training data, we use a placeholder that will be fed
at run time with a training minibatch.
tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
Variables.
hidden_weights1 = tf.Variable(
tf.truncated_normal([image_size * image_size, hidden_nodes1]))
hidden_biases1 = tf.Variable(tf.zeros([hidden_nodes1]))
hidden_layer1 = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights1)
+ hidden_biases1)
hidden_layer_drop1 = tf.nn.dropout(hidden_layer1, keep_prob) #Dropout added
hidden_weights2 = tf.Variable(
tf.truncated_normal([hidden_nodes1, hidden_nodes2]))
hidden_biases2 = tf.Variable(tf.zeros([hidden_nodes2]))
hidden_layer2 = tf.nn.relu(tf.matmul(hidden_layer_drop1, hidden_weights2)
+ hidden_biases2)
hidden_layer_drop2 = tf.nn.dropout(hidden_layer2, keep_prob) #Dropout added
weights = tf.Variable(tf.truncated_normal([hidden_nodes2, num_labels]))
biases = tf.Variable(tf.zeros([num_labels]))
Training computation.
logits = tf.matmul(hidden_layer_drop2, weights) + biases
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels,
logits=logits))
loss = loss + beta * tf.nn.l2_loss(weights)
Optimizer. Learning rate decreases with number of cycles
global_step = tf.Variable(0) # count the number of steps taken.
learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step,
100000, 0.95, staircase=True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,
global_step=global_step)
Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(logits)
valid_relu1 = tf.nn.relu(tf.matmul(tf_valid_dataset, hidden_weights1) + hidden_biases1)
valid_relu2 = tf.nn.relu(tf.matmul(valid_relu1, hidden_weights2) + hidden_biases2)
valid_prediction = tf.nn.softmax(tf.matmul(valid_relu2, weights) + biases)
test_relu1 = tf.nn.relu(tf.matmul(tf_test_dataset, hidden_weights1) + hidden_biases1)
test_relu2 = tf.nn.relu(tf.matmul(test_relu1, hidden_weights2) + hidden_biases2)
test_prediction = tf.nn.softmax(tf.matmul(test_relu2, weights) + biases)