We invite you to participate in our ongoing challenge on the detection of clickbait posts in social media. Clickbait refers to social media posts that are, at the expense of being informative and objective, designed to entice its readers into clicking an accompanying link. More on clickbait.
The task of the challenge is to develop a classifier that rates how click baiting a social media post is. For each social media post, the content of the post itself as well as the main content of the linked target web page are provided as JSON-Objects in our datasets.
{
"id": "608999590243741697",
"postTimestamp": "Thu Jun 11 14:09:51 +0000 2015",
"postText": ["Some people are such food snobs"],
"postMedia": ["608999590243741697.png"],
instances.jsonl
"targetTitle": "Some people are such food snobs",
"targetDescription": "You'll never guess one...",
"targetKeywords": "food, foodfront, food waste...",
"targetParagraphs": [
"What a drag it is, eating kale that isn't ...",
"A new study, published this Wednesday by ...",
...],
"targetCaptions": ["(Flikr/USDA)"]
}
instances.jsonl (cont'd)
Classifiers have to output a clickbait score in the range [0,1], where a value of 1.0 denotes that a post is heavily click baiting.
{"id": "608999590243741697", "clickbaitScore": 1.0}
results.jsonl
Performance is measured against a crowd-sourced test set. The posts in the training and test sets have been judged on a 4-point scale [0, 0.3, 0.66, 1] by at least five annotators.
{"id": "608999590243741697",
"truthJudgments": [0.33, 1.0, 1.0, 0.66, 0.33],
"truthMean" : 0.6666667,
"truthMedian": 0.6666667,
"truthMode" : 1.0,
"truthClass" : "clickbait"}
truth.jsonl
As primary evaluation metric, Mean Squared Error (MSE) with respect to the mean judgments of the annotators is used. For informational purposes, we compute further evaluation metrics such as the Median Absolute Error (MedAE), the F1-Score (F1) with respect to the truth class, as well as the runtime of the classification software. For your convenience, you can download the official python evaluation program.
MSE | MedAE | ACC | F1 | Runtime | Team |
---|---|---|---|---|---|
0.024 | 0.174 | 0.91 | 0.760 | 17:11:16 | Team 1 |
0.052 | 0.201 | 0.88 | 0.533 | 02:47:50 | Team 2 |
We provide three datasets for the competition. Each dataset is provided as a zip archive with the naming pattern clickbait17-<dataset>-<version>.zip
. It contains the following resources (the unlabeled dataset lacks the truth file):
instances.jsonl
: A line delimited JSON file (JSON Lines). Each line is a JSON-Object containing the information we extracted for a specific post and its target article. Have a look at the dataset schema file for an overview of the available fields.truth.jsonl
: A line delimited JSON file. Each line is a JSON-Object containing the crowdsourced clickbait judgements of a specific post. Have a look at the dataset schema file for an overview of the available fields.media/
: A folder that contains all the images referenced in the instances.jsonl
file.
Dataset | #posts | #clickbait | #no-clickbait | Download Link | Release Date |
---|---|---|---|---|---|
Training | 2495 | 762 | 1697 | clickbait16-train-170331.zip | March 31, 2017 |
Unlabeled | 80012 | ? | ? | clickbait17-unlabeled-170429.zip | April 30, 2017 |
Training / Validation | 19829 | 9656 | 10173 | clickbait17-train-170616.zip | June 16, 2017 |
Training / Validation | 19538 | 4761 | 14777 | clickbait17-train-170630.zip | June 30, 2017 |
We use the Evaluation as a Service platform TIRA to evaluate the performance of your classifier. TIRA requires that you deploy your classifier as a program that can be executed with two arguments for input and output directories via a command line call. E.g., the syntax could be:
> myClassifier -i path/to/input/directory -o path/to/output/directoryexample command line call for tira.io
At runtime, the input directory contains the unzipped dataset (i.e. instances.jsonl
and media/
folder) your classifier has to process. The predictions of your classifier should be written into a file called results.jsonl
into the given output directory. The results.jsonl
file should contain a valid JSON-Object in each line that contains the id and the predicted clickbaitScore for a post (cf. the dataset schema file).
{"id": "608999590243741697", "clickbaitScore": 1.0}
{"id": "609408598704128000", "clickbaitScore": 0.25}
...
results.jsonl
We will ask you to deploy your classifier onto a virtual machine that will be made accessible to you after registration. You can choose freely among the available programming languages and among the operating systems Microsoft Windows and Ubuntu. You will be able to reach the virtual machine via ssh and via remote desktop. More information about how to access the virtual machines can be found in the user guide below:
Once deployed on your virtual machine, we ask you to access TIRA at www.tira.io, where you can self-evaluate your software on the test data.
Note: By submitting your software you retain full copyrights. You agree to grant us usage rights only for the purpose of the Clickbait Challenge. We agree not to share your software with a third party or use it for other purposes than the Clickbait Challenge.
The first workshop on clickbait detection took place on November 27, 2017 at Bauhaus-Universität Weimar, Germany.
09:00 - 09:30 | Welcome Reception |
09:30 - 10:30 | Clickbait-Challenge 2017: Overview Martin Potthast, Tim Gollub |
10:30 - 11:00 | A Neural Clickbait Detection Engine Yash Kumar Lal |
11:00 - 11:30 | Clickbait Identification using Neural Networks Philippe Thomas |
11:30 - 12:00 | The Emperor Clickbait Detector Erdan Genc |
12:00 - 14:00 | Lunch Break |
14:00 - 14:30 | Detecting Clickbait in Online Social Media: You Won’t Believe How We Did It Aviad Elyashar |
14:30 - 15:00 | Heuristic Feature Selection for Clickbait Scoring Matti Wiegmann |
15:00 - 16:00 | Discussion and Outlook |
The following list presents the current performances achieved by the participants. As primary evaluation measure, Mean Squared Error (MSE) with respect to the mean judgments of the annotators is used. For further metrics, see the full result table on tira.io. If provided, paper and code of the submissions are linked in each row.
MSE | F1 | Prec | Rec | Acc | Runtime | Team | Paper/Code |
---|---|---|---|---|---|---|---|
0.032 | 0.670 | 0.732 | 0.619 | 0.855 | 00:01:10 | albacore | paper code |
0.033 | 0.683 | 0.719 | 0.650 | 0.856 | 00:03:27 | zingel | paper code |
0.034 | 0.679 | 0.717 | 0.645 | 0.855 | 00:07:20 | anchovy | code |
0.036 | 0.641 | 0.714 | 0.581 | 0.845 | 00:04:03 | emperor | code |
0.036 | 0.036 | 0.728 | 0.568 | 0.847 | 00:08:05 | carpetshark | paper code |
0.039 | 0.656 | 0.659 | 0.654 | 0.837 | 00:35:24 | arowana | |
0.041 | 0.631 | 0.642 | 0.621 | 0.827 | 00:54:28 | pineapplefish | paper |
0.043 | 0.565 | 0.699 | 0.474 | 0.826 | 00:04:31 | whitebait | paper code |
0.044 | 0.552 | 0.758 | 0.434 | 0.832 | 00:37:34 | clickbait17-baseline | |
0.045 | 0.604 | 0.711 | 0.524 | 0.836 | 01:04:42 | pike | paper |
0.046 | 0.654 | 0.654 | 0.653 | 0.835 | 06:14:10 | tuna | paper |
0.079 | 0.650 | 0.530 | 0.841 | 0.785 | 00:04:55 | torpedo | paper code |
0.099 | 0.023 | 0.779 | 0.012 | 0.764 | 00:26:38 | houndshark | |
0.118 | 0.467 | 0.380 | 0.605 | 0.671 | 00:05:00 | dory | |
0.174 | 0.261 | 0.167 | 0.593 | 0.209 | 114:04:50 | salmon | paper |
0.252 | 0.434 | 0.287 | 0.893 | 0.446 | 19:05:31 | snapper | paper code |
In case of questions, don't hesitate to contact us via clickbait@webis.de.