Dog classifier and VGG
Very keen to run a RCNN, hopefully it can help to solve my dog classifier problem in that case, only thirty + accuracy was acheived.
The data set is from Standford Univeristy. There are more that one hundreds breeds and each one have about 100 ~ 200 pictures.
A work on this topic : https://github.com/lemuelbarango/dog-breed-classifier,it claims the accuracy reach to 76% for validataion and to 88% for training.
This work leads me to learn "vgg model"(VGG very deep 16 of Karen Simonyan and Andrew Zisserman) and vgg paper(published at ICLR 2015,titled: VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION)
A good start to understand VGG from here: http://www.robots.ox.ac.uk/~vgg/practicals/cnn/index.html
Still far away from RCNN which may be very helpful
https://zhuanlan.zhihu.com/p/28585873
fine-grained image classification FGIC
Standford Dog, calTech-UCSD Birds, Oxford Flowers, FGVC-aircrafts
SIFT(Scale-invariant feature transform): SIFT[1] can robustly identify objects even among clutter and under partial occlusion, because the SIFT feature descriptor is invariant to uniform scaling, orientation, illumination changes, and partially invariant to affine distortion.SIFT keypoints of objects are first extracted from a set of reference images[2] and stored in a database. An object is recognized in a new image by individually comparing each feature from the new image to this database and finding candidate matching features based on Euclidean distance of their feature vectors. From the full set of matches, subsets of keypoints that agree on the object and its location, scale, and orientation in the new image are identified to filter out good matches. The determination of consistent clusters is performed rapidly by using an efficient hash table implementation of the generalized Hough transform. Each cluster of 3 or more features that agree on an object and its pose is then subject to further detailed model verification and subsequently outliers are discarded. Finally the probability that a particular set of features indicates the presence of an object is computed, given the accuracy of fit and number of probable false matches. Object matches that pass all these tests can be identified as correct with high confidence. (from wiki)
RCNN:
https://zhuanlan.zhihu.com/p/28585873
Selective Pooling Vector for Fine-grained Recognition
LeNet architecture:
(CONV#-RELU-POOL)xN + (FC#)xM +FC-120
GoogLeNet architecture: too complicate
learning rate in Stochastic Gradient Descent, SGD, for neural nets using back-propagation.
Top 1 accuracy and Top 5 accuracy:
Top 1 accuracy means the accuracy for the right answer gets the highest value. For top 5 accuracy, the right answer just in the first top 5 is counted as a good one.
The data set is from Standford Univeristy. There are more that one hundreds breeds and each one have about 100 ~ 200 pictures.
A work on this topic : https://github.com/lemuelbarango/dog-breed-classifier,it claims the accuracy reach to 76% for validataion and to 88% for training.
This work leads me to learn "vgg model"(VGG very deep 16 of Karen Simonyan and Andrew Zisserman) and vgg paper(published at ICLR 2015,titled: VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION)
A good start to understand VGG from here: http://www.robots.ox.ac.uk/~vgg/practicals/cnn/index.html
Still far away from RCNN which may be very helpful
https://zhuanlan.zhihu.com/p/28585873
fine-grained image classification FGIC
Standford Dog, calTech-UCSD Birds, Oxford Flowers, FGVC-aircrafts
SIFT(Scale-invariant feature transform): SIFT[1] can robustly identify objects even among clutter and under partial occlusion, because the SIFT feature descriptor is invariant to uniform scaling, orientation, illumination changes, and partially invariant to affine distortion.SIFT keypoints of objects are first extracted from a set of reference images[2] and stored in a database. An object is recognized in a new image by individually comparing each feature from the new image to this database and finding candidate matching features based on Euclidean distance of their feature vectors. From the full set of matches, subsets of keypoints that agree on the object and its location, scale, and orientation in the new image are identified to filter out good matches. The determination of consistent clusters is performed rapidly by using an efficient hash table implementation of the generalized Hough transform. Each cluster of 3 or more features that agree on an object and its pose is then subject to further detailed model verification and subsequently outliers are discarded. Finally the probability that a particular set of features indicates the presence of an object is computed, given the accuracy of fit and number of probable false matches. Object matches that pass all these tests can be identified as correct with high confidence. (from wiki)
RCNN:
https://zhuanlan.zhihu.com/p/28585873
Selective Pooling Vector for Fine-grained Recognition
LeNet architecture:
(CONV#-RELU-POOL)xN + (FC#)xM +FC-120
GoogLeNet architecture: too complicate
learning rate in Stochastic Gradient Descent, SGD, for neural nets using back-propagation.
Top 1 accuracy and Top 5 accuracy:
Top 1 accuracy means the accuracy for the right answer gets the highest value. For top 5 accuracy, the right answer just in the first top 5 is counted as a good one.
留言
張貼留言