@Article{dlib09, author = {Davis E. King}, title = {Dlib-ml: A Machine Learning Toolkit}, journal = {Journal of Machine Learning Research}, year = {2009}, volume = {10}, pages = {1755-1758}, }
cd examples mkdir build cd build del /F /S /Q * cmake .. cmake --build . --config ReleaseThat should compile the dlib examples in visual studio. The output executables will appear in the Release folder. The del /F /S /Q * command is to make sure you clear out any extraneous files you might have placed in the build folder and is not necessary if build begins empty.
matrix<double> mat; mat.set_size(4,5); matrix<double,0,1> column_vect; column_vect.set_size(6); matrix<double,0,1> column_vect2(6); // give size to constructor matrix<double,1> row_vect; row_vect.set_size(5);
cd examples mkdir build cd build cmake .. cmake-gui .Which looks like this:
std::ifstream fin("myfile", std::ios::binary);or
std::ofstream fout("myfile", std::ios::binary);If you don't give std::ios::binary then the iostream will mess with the binary data and cause serialization to not work right.
Picking the right kernel all comes down to understanding your data, and obviously this is highly dependent on your problem.
One thing that's sometimes useful is to plot each feature against the target value. You can get an idea of what your overall feature space looks like and maybe tell if a linear kernel is the right solution. But this still hides important information from you. For example, imagine you have two diagonal lines which are very close together and are both the same length. Suppose one line is of the +1 class and the other is the -1 class. Each feature (the x or y coordinate values) by itself tells you almost nothing about which class a point belongs to but together they tell you everything you need to know.
On the other hand, if you know something about the data you are working with then you can also try and generate your own features. So for example, if your data is a bunch of images and you know that one of your classes contains a lot of lines then you can make a feature that attempts to measure the number of lines in an image using a hough transform or sobel edge filter or whatever. Generally, try and think up features which should be highly correlated with your target value. A good way to do this is to try and actually hand code N solutions to the problem using whatever you know about your data or domain. If you do a good job then you will have N really great features and a linear or rbf kernel will probably do very well when using them.
Or you can just try a whole bunch of kernels, kernel parameters, and training algorithm options while using cross validation. I.e. when in doubt, use brute force :) There is an example of that kind of thing in the model selection example program.
So you need to pick the gamma value so that it is scaled reasonably to your data. A good rule of thumb (i.e. not the optimal gamma, just a heuristic guess) is the following:
const double gamma = 1.0/compute_mean_squared_distance(randomly_subsample(samples, 2000));
For example, you could reduce the amount of data by saying this:
// reduce to only 1000 samples cross_validate_trainer_threaded(trainer, randomly_subsample(samples, 1000), randomly_subsample(labels, 1000), 4, // num folds 4); // num threads
You should try kernel ridge regression instead since it also doesn't take any parameters but is always very fast.