GSoc2021-DBpedia-NeuralExtraction logo GSoc2021-DBpedia-NeuralExtraction

The target of this week is to 1) apply bootstrapping on a simple classifier, i.e. Logistic Regression; 2) evaluate the effectiveness of bootstrapping in the learning steps.

Bootstrapping - version 1

Data processing procedures

At first, it is necessary to acknowledge the data processing procedures of bootstrapping. In our experiment, for the same dataset, we apply bootstrapping with a classifier for 20 rounds. For each round, the dataset splits into 10 consecutive folds (with shuffling), where two folds is used as test set and the eight remaining folds forms the train set. Thus each round triggers 10 repetitions with different splitting ways.

Learning rate

As we discussed, for each time we got a trained classifier in one round, we expect to include the predicating positive examples in next round. To do so, first we need to set a learning rate, also a threshold, to examine which examples are positive (exceeding learning rate) and negative (lower than learning rate). In our experiment, we set the learning rate as 0.9 and 0.8 as comparison.

Logistic Regression Performance

The increasing positive examples of LR classifier Fig 1

Here in Figure 1, the X-axis presents the different round of bootstrapping and the Y-axis indicates the increased (delta) number of positive examples after this round.

We see that the learning rate - 0.8 provides the overwhelming positive examples in the first 10 rounds, then it stays at 0; the learning rate - 0.9 generates a smaller number of positive examples and it ends at 0 after the 4th round.

The figure of bootstraping performance Fig2

Here in Figure 2, we applied 4 metrics for each iteration, where each iteration in the X-axis is also namely as each round in this blog. In the Y-axis, we provide the averaged value from 10 repetitions. It is noteworthy that we provide the same metric in two conditions: with learning rate - 0.9 and with learning rate - 0.8.

We observe that the learning rate - 0.8 achieves higher precision (red line) and higher recall (purple line) than that of learning rate - 0.9. Thus we could conclude that lower learning rate of bootstrapping has better performance.

Briefly, the simple bootstrapping methods worked well as expected with learning rate - 0.8.