COLLECTION

Language Technology

Acronym: Language_Technology

Description

This collection includes multilingual linguistic data that can be used for Computational Linguistics R&D tasks such as Text Mining, Machine Translation (MT), Information Extraction (IE), Document Classification, Cross-Lingual Information Access and Retrieval (CLIR), and more. Further relevant keywords applying to this collection include Named Entity Recognition and Classification (NERC), Quotation Recognition, clustering, categorisation, terminology extraction, sentiment analysis, disambiguation, Multi-Document Summarisation, Statistical Machine Translation (SMT), Neural Machine Translation (NMT). While the related data collection Europe Media Monitor (EMM) contains data mainly derived from on-line traditional media or social media monitoring, the Language Technology collection is based on other text types and has been created through other means.

Contact

Email
Bertrand.DE-LONGUEVILLE (at) ec.europa.eu

Datasets (3)

DATASET | Last updated:
Temporal Lexica

A set of multilingual lexica containing component parts of Temporal Expressions, i.e. phrases referring to Temporal Entities. They can be integrated in a Temporal Expression parsin...

DATASET | Last updated:
Sentiment analysis for Georgian

This dataset is the first ever publicly available annotated dataset for sentiment classification and semantic polarity dictionary for Georgian. We consider both three- (positive, n...

Additional information

Published by
European Commission, Joint Research Centre
Created date
2019-06-24
Modified date
2022-04-28