This is an implementation of Geoffrey Hinton's Neural Networks Programming Assigment 2 on Coursera in Python with GPU support by Theano. The assignment was about training a feed-forward neural networks, in order to predict the next word from 3 previous words. Training took ~10s/epoch with default parameters on a GTX 660 GPU. On the other hand, training took ~100s/epoch on a Core 2 Duo E7300 @2.67GHz CPU with the original Octave/MATLAB implementation, which is 10-fold slower.
Files in this repo:
data.mat
: data file that includes training, validation, and testing set for the modelwordModel.ipynb
: IPython Notebook that records the main ingredients, notes, and thought when I built the implementation,wordModel.py
: the implementation
Usage
- Pre-requisites: Python 2.7+, Theano, CUDA (optional, for training by GPU)
- Run
$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python wordModel.py
in the terminal- change
device=gpu
intodevice=cpu
if you want to train by CPU
- change
Issues
test_model
andvalidate_model
computecost
(average cross-entropy) on batches of test set and validation set, not the whole sets respectively. Difference should not be significant though.- Original momentum expression did not yield expeceted cross-entropy for some reason. I had to use an alternative from Stanford Deep Learning Tuttorial (see Momentum section).
References:
- deeplearning.net tutorials and this for building Python code.
- Stanford Deep Learning Tuttorial