The conference was a great experience but there didn't seem to be any groundbreaking work, mostly new tricks or combination of methods. Probabilistic modelling using neural networks and GAN's seem very popular and applying neural networks to new datasets/areas is still enough to get a NIPS poster. Some of the orals were good but personally I think most of them were only poster level while many posters were oral talk level though people more intelligent than me made that choice which probably means the selection process is quite random.
Differentiable Neural Computers (Memory Networks) will probably play a big role in the more complex tasks e.g. dialog systems and reasoning where the model needs to keep an internal representation of an entity and it's properties (also see the EntityNet from LeCun). As Graves explained it is conceptually a nice idea to separate the computation and the memory and it gives the model the capability to learn from examples. Besides, Graves seems to be betting on this and he's a clever fellow.
Nando de Freitas and some others were obsessed with meta learning and though I'm skeptical about the naming the concept is sound. It seems to cover multiple methods but boils down to sharing information between tasks/networks (e.g. using the same network to solve multiple tasks like playing Atari games quickens the learning) and to use networks to train other networks (e.g. Learning to Learn).
The deep kernel learning/deep GP concept is nice and provides a way for using GPs in large scale or semi-parametric methods. And the infinite one-layer network is a GP(ref?).
From the Machine Learning in Health workshop it is clear that the potential is massive but so far the applied methods has almost only been linear/logistic regression. One reason seems to be attributed to skepticism from doctors that requires interpretable models and engineers that don't follow through. There are many interesting problems to work on so I'm not surprised that the combination of stoic doctors and bored engineers fails. The fierce regulation doesn't help either, but to get around it we need better safeguards on the data, maybe something like the Trust (something) that Neil Lawrence proposed or national (EU?) level openPDS-like systems. Neil had many good points in his talk but I especially liked the notion that we need to develop systems for the whole world, not just ourselves and it is actually in the developing countries that personalised medicine and machine learning systems can be most easily implemented since they don't yet have our rigid healthcare system.
The invited talks by Zoubin Ghahramani, Jürgen Schmidhuber, Paul Werbos, and Ryan Adams made it once again clear that the deep learning community all but ignores the early contributions in the 90's. Especially the Bayesian deep learning subfield is currently being re-invented.
Max Welling pointed out that we need more benchmarks to compare methods. This is partly what have driven the development in image recognition/modelling. UCI datasets are not good enough. In addition I still think more real-world issues/data should be addressed.
- Bilaji: estimating uncertainty with an ensemble of density networks. The output is then a mixture of Gaussians.
- Matthew Hoffman: using the Jacobian to analyse the VAE inference network (still not sure how)
- Matthew Hoffman: optimizing the inference network multiple (e.g. 100) times for each time the recognition network is updated. Especially in sparse data problems
- Fiterau: modify the LSTM cell to use metadata/covariates/context in the gates
- Osborne: differentiate between risk and uncertainty in neural networks. Using dropout it is possible to get an uncertainty (Gal, 2016), but only in areas with enough samples to estimate the variance. If only one sample is available, the variance will be low and the risk is potentially high.
- Merity: quasi-recurrent neural networks. Use convolutions that are forward/temporally pooled to create a recurrent chain. Very efficient and better than RNN in some circumstances
- Daniel Neil: Phased LSTM - add a gate that is governed by time of sample and the network will learn underlying frequencies and only update when a new sample is available when multiple modalities are incoherently sampled. Similar ideas in HMRNN.
- Adversarial Message Passing: the likelihood can be factorized into segments that only depend on the markov blanket. Optimizing the parameters can then be done iteratively and only requires samples from the parents and children of the node. The variables can thus be modelled using any distribution or parameterization hereof
- Mike (something): Hawkes Process Memory unit in RNN's to incorporate time of event and learn a time scale of different process in the data
- Nando de Freitas: learning to learn. Learn everything
- Some poster in the Adverserial workshop - use GAN's for transfer/transductive learning with semi-supervised data or in a new domain
- Unk: HyperNetworks(?) - use a small network to generate parameters for a large network. So far this is done deterministically but why not stochastic?
- Ba: Fast Weights - using fast weights in a RNN as a fast associative memory model (summary). The normal RNN weights are slow and in between updating these is a sequence of fast weight updates. Not totally sure how this works yet.
- Kalchbrenner: ByteNet - use dilated convolutions instead of RNN to model sequences. Same idea as in WaveNet
- Krueger: ZoneOut - regularizing RNN by stochastically using the identity function for transition between states
- Salimans: Weight Normalization - Very nice approach to speed up convergence. Easier to implement than batchnorm and works for RNN.
- Tamar: Value Iteration Networks - a differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network.
- Bachem: Fast and Provably Good Seedings for k-Means
- Courbariaux: Binarized Neural Networks - Neural networks with binary weights and activations at run-time. Interesting and potential for embedded DL.
- Battaglia: Interaction Networks for Learning about Objects, Relations and Physics
http://bayesiandeeplearning.org/
http://www.nipsml4hc.ws/
http://papers.ai/collections/nips.2016
Brad Neuberg
Andrew Beam
Neil Lawrence
ML Reddit
Variational Inference Tutorial
Tutorial: Introduction to Generative Adversarial Networks
Machine Learning & Likelihood Free Inference in Particle Physics
Towards a Biologically Plausible Model of Deep Learning
To collaborate on a gist:
git clone [email protected]:9b44566a2cecf9cb1d4dba5c3f9368e4.git
)e.g. if your friend is named Rasmus:
git remote add-url rasmus https://gist.github.com/rasmus/...
git fetch rasmus/master
git merge rasmus/master
git push origin/master