OTL, "Online Transfer Learning", aims to attack an online learning task on a target domain by transferring knowledge from some source domain. We do not assume data in the target domain follows the same distribution as that in the source domain, and the motivation of our work is to enhance a supervised online learning task on a target domain by exploiting the existing knowledge that had been learnt from training data in source domains.
In the source code package,
there are two folders: Classification and Concept Drift .
| Classification | ||||||||||||||||||||||||||||||||||||||||
In the Classification folder, there are two sub-folders Homogeneous and Heterogeneous . Homogeneous In the Homogenous folder, there are · all the 5 binary classification algorithms including: PA1_K_M (i.e., PA-I), PAIO_K_M , HomOTLf_K_M (i.e., HomOTL (fixed)), HomOTL1_K_M , HomOTL2_K_M ; · the "avePA1_K_M", which will output the average of all the online classifiers produced by PA1_K_M ; · the main procedure Experiment_OTL_K_M , which is used to compare all the online algorithms; · the procedure EOC , which is used to evaluate the effect of parameter C ; · the procedure EObeta , which is used to evaluate the effect of parameter beta on the HomOTL2. To compare the HomOTL algorithms with other baselines, you need to run Experiment_OTL_K_M . For example, if you would like to compare these online algorithms performances on the books_dvd dataset (please refer to the instruction to books_dvd ), you should type Experiment_OTL_K_M ( books_dvd ) in the command window of MATLAB then press the Enter key. Finally, you will get a figure of online mistake rates, a figure of online SV size, a figure of online time consumption, and a table for the final mistake rates, SV size, and time consumption for all the algorithms. You can use EOC( books_dvd ) to evaluate the effect of parameter C on the dataset books_dvd , and use EObeta to evaluate the effect of parameter beta for HomOTL2 on all the datasets. Heterogeneous
In the Heterogeneous folder, there are
·
all the 5 binary classification algorithms
including: PA1_K_M (i.e., PA-I), PAIO_K_M , HetOTL0_K_M , Ensemble_K_M , and HetOTL_K_M ;
· the "avePA1_K_M", which will output the average of all the online classifiers produced by PA1_K_M ;
·
the procedure Experiment_OTL_K_M ,
which is used to compare all the online algorithms;
·
the procedure EOC ,
which is used to evaluate the effect of parameter C .
To compare the HetOTL
algorithms with other baselines, you need to run Experiment_OTL_K_M .
For example, if you would like to compare these online algorithms
performances on the books_dvd dataset, you should
type Experiment ('books_dvd') in the command
window of MATLAB then press the Enter key. Finally, you would get a figure
of online mistake rates, a figure of online SV size, a figure of online time
consumption, and a table for the final mistake rates, SV size, and time
consumption for all the algorithms. You can use EOC( books_dvd ) to evaluate the effect of parameter C on
the dataset books_dvd .
|
||||||||||||||||||||||||||||||||||||||||
| Concept Drifting | ||||||||||||||||||||||||||||||||||||||||
In the Concept Drift folder, there are · all the 5 online algorithms including: PE_K_M (i.e., Perceptron), PA1_K_M (i.e., PA-I), ShiftPE_K_M (i.e., Shifting Perceptron), ModiPE_K_M (i.e., Modified Perceptron), CDOLfix_K_M (i.e., CDOL(fixed)), and CDOL_K_M ; · the main procedure Experiment_OTL_K_M , which is used to compare all the online algorithms; · the procedure EOC , which is used to evaluate the effect of parameter C . To compare the CDOL algorithms with other baselines, you need to run Experiment_OTL_K_M . For example, if you would like to compare these online algorithms performances on the emaildata dataset (please refer to the instruction to emaildata ), you should type Experiment_OTL_K_M ( emaildata ) in the command window of MATLAB then press the Enter key. Finally, you would get a figure of online mistake rates, a figure of online SV size, a figure of online time consumption, and a table for the final mistake rates, SV size, and time consumption for all the algorithms. Moreover, you can evaluate the effect of parameter C on the dataset emaildata by using EOC( emaildata )
|
||||||||||||||||||||||||||||||||||||||||
| Dataset Descriptions | ||||||||||||||||||||||||||||||||||||||||
In the zip file,
there are two folders: Data for Classification , and Data for Concept
Drift .
Data for Classification
In this folder,
there are 6 binary-class datasets used for testing the HomOTL algorithms, HetOTL algorithms and other baselines. These datasets are: books_dvd , dvd_books , ele_kit (i.e.,
electronics-kitchen), kit_ele (i.e., kitchen_electronics),
landmine1 , and landmine2 .
Take books_dvd for example, it is
in mat format and consists of three matrices: data , ID_old , and ID_new . The
dimension of data is 4000-by-473857, which means the number of training example is 4000 and the dimension
of each example is 473857 consisting of one label and 473856 instance features.
Take the first row for example; the first number is the label 1, while the
rest 473857 number is the
instance vector.
The structure of
data for german
The dimension of ID_old is 1-by-2000, which means there are a permutations of 1,
2& 2000.
The dimension of ID_new is 20-by-2000. Every row of ID_new is a permutation of 2000, 2001, & , 4000.
When you would like
to use these binary datasets, please put them into the folder: 1. Classification\1. Homogeneous\data or 1. Classification\1.
Heterogeneous\data .
Data for Concept
Drift
In this folder, there are 6 binary-class datasets used for testing the binary CDOL algorithm and other binary
baselines. These datasets are: emaildata , mitface , newsgroup4 , usenet1 , usenet2 and usps .
Take
emaildata for example, the structure of emaildata is similar with the binary classification
datasets. The only difference is that ID_ALL is designed to only permutate
the indices of instances in one period.
The structure of ID_ALL for emaildata
|