GSoc2021-DBpedia-NeuralExtraction logo GSoc2021-DBpedia-NeuralExtraction

Levi (1979) has two causality relations, which differ in directionality: CAUSE1(X,Y) means “Y causes X”, e.g., “tear/X gas/Y”. CAUSE2(X,Y) means “Y is caused by X”, e.g., “drug/X deaths/Y”.

In order to extract cause-effect pairs from a individual sentence, we define that the pairs Cause-Effect(X, Y) is true for a sentence S if and only if the situation described in S entails that X is the cause of Y, or that X causes/makes/produces/emits/… Y, according to SemEval-2010 Task 8 Dataset.

For this blog, we will try to discover the cause-effect pairs evaluated by human beings, extract them from the external datasets, e.g. SemEval-2010 Task 8 Dataset, and store them for further usage of causality relation extraction.

External Resources

Until now, we would like to focus on two external resources, one serves for seed cause-effect pairs extraction, another serves for bootstrapping for the purpose of data augmentation.

For seed cause-effect pairs

The requirement of seed cause-effect pairs is that the reliability of pairs should be really high. Any individual and non-practical evaluation is error-prone in extracting seed pairs. For this reason, we choose a published dataset in the competition platform – SemEval-2010 Task 8 Dataset, we say SemEval Dataset in the following text for short.

The SemEval Dataset includes thousands of sentences, each is tagged with 9 relations (including cause-effect relation) and Other (if none of the above relations is found). Interestingly, there are 1,003 pairs in training set and 328 pairs in test set tagged with cause-effect relation. We would like to extract those pairs as seed cause-effect pairs.

Sentence examples

Positive examples

“A person infected with a particular flu virus strain develops an antibody against that virus.” Cause-Effect(e2, e1)

Near-miss negative examples

“Playing a music instrument opens up a lot of possibilities to enrich your life." Entity-Origin(e1,e2) is not our target causality relations

Pattern matching

pattern_causal = 'Cause-Effect\((e.),(e.)\)'
pattern_other_relations = '^\S*-\S*\((e.),(e.)\)$'