softmax



Softmax function, also called normalized exponential function, as shown in the following:

{\displaystyle P(y=j\mid \mathbf {x} )={\frac {e^{\mathbf {x} ^{\mathsf {T}}\mathbf {w} _{j}}}{\sum _{k=1}^{K}e^{\mathbf {x} ^{\mathsf {T}}\mathbf {w} _{k}}}}}
if the elements in the weighting vector  w are all equal to one!

>>> import math
>>> z = [1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0]
>>> z_exp = [math.exp(i) for i in z]
>>> print([round(i, 2) for i in z_exp])
[2.72, 7.39, 20.09, 54.6, 2.72, 7.39, 20.09]
>>> sum_z_exp = sum(z_exp)
>>> print(round(sum_z_exp, 2))
114.98
>>> softmax = [round(i / sum_z_exp, 3) for i in z_exp]
>>> print(softmax)
[0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]


Dropout rate:

From:  https://www.reddit.com/r/MachineLearning/comments/3oztvk/why_50_when_using_dropout/
For example, in the Otto competition, one of my better-performing networks had three dropout layers with values 0.15, 0.7, and 0.4 respectively. If I were to try to generalize, I'd say that it's all about balancing an increase in the number of parameters of your network without overfitting. So if e.g. you start with a reasonable architecture and amount of dropout, and you want to try increasing the number of neurons in some layer, you will likely want to increase the dropout proportionately, so that the same number of neurons are not-dropped. If p=0.5 was optimal for 100 neurons, a good first guess for 200 neurons would be to try p=0.75. Also, I get the feeling that dropout near the top of the network can be more damaging than dropout near the bottom of the network, because it prevents all downstream layers from accessing that information. So the final error will be more sensitive to getting the tuning of the early dropout correct than to getting the tuning of the late dropout correct. As to why 0.5 is generally used, I think its because tuning things like the dropout parameter is really something to be done when all of the big choices about architecture have been settled. It might turn out that 0.14 is better, but it takes a lot of work to discover that, and if you change the filter size of a conv-net or change the overall number of layers or do things like that, you're going to have to re-optimize that value. So I see 0.5 as a sort of placeholder value until you're at the stage where you're chasing down percentage points or fractions of a percentage point.

Pooling:

What the pooling was supposed to do is to introduce positional, orientational, proportional invariances. What we needed was not invariance but equivariance. Invariance makes a CNN tolerant to small changes in the viewpoint. Equivariancemakes a CNN understand the rotation or proportion change and adapt itself accordingly What we needed was not invariance but equivariance. Invariance makes a CNN tolerant to small changes in the viewpoint. Equivariancemakes a CNN understand the rotation or proportion change and adapt itself accordingly 

Visualize NN :
A picture is worth a thousand words! I think it is good to visualize what you build, luckily there is already a tool there.

from keras.utils.vis_utils import plot_model
model = Sequential()
model.add(Dense(2, input_dim=1, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)

留言

這個網誌中的熱門文章

AndrewNg's CNN notes(practical issues)

New findings in Numpy