datasetposted on 28.02.2018 by S.H. Cappallo
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Collection of 13M tweets divided into training, validation, and test sets for the purposes of predicting emoji based on text and/or images.
The data provides the tweet status ID and the emoji annotations associated with it. In the case of image-containing subsets, the image URL is also listed.
The Full, unbalanced dataset consists of a random test and validation sets of 1M tweets, with the remainder in the training set.
The Balanced testset is a subset of the test set chosen to improve emoji class balance.
The Image subsets are image-containing tweets.
Finally, emoji_map_1791.csv provides information regarding the emoji labels and potential metadata.