1/1
9 files

Twemoji Dataset

dataset
posted on 28.02.2018 by S.H. Cappallo
Collection of 13M tweets divided into training, validation, and test sets for the purposes of predicting emoji based on text and/or images.

The data provides the tweet status ID and the emoji annotations associated with it. In the case of image-containing subsets, the image URL is also listed.

The Full, unbalanced dataset consists of a random test and validation sets of 1M tweets, with the remainder in the training set.

The Balanced testset is a subset of the test set chosen to improve emoji class balance.

The Image subsets are image-containing tweets.

Finally, emoji_map_1791.csv provides information regarding the emoji labels and potential metadata.

History

Retention period

01/01/2028

Licence

Exports

Logo branding

Licence

Exports