JRC Data Catalogue
COLLECTIONApproved

Language Technology

This collection includes multilingual linguistic data that can be used for Computational Linguistics R&D tasks such as Text Mining, Machine Translation (MT), Information Extraction (IE), Document Classification, Cross-Lingual Information Access and Retrieval (CLIR), and more. Further relevant keywords applying to this collection include Named Entity Recognition and Classification (NERC), Quotation Recognition, clustering, categorisation, terminology extraction, sentiment analysis, disambiguation, Multi-Document Summarisation, Statistical Machine Translation (SMT), Neural Machine Translation (NMT). While the related data collection Europe Media Monitor (EMM) contains data mainly derived from on-line traditional media or social media monitoring, the Language Technology collection is based on other text types and has been created through other means.

Datasets (0)

Loading

Additional information

Published by
European Commission, Joint Research Centre
Contact email
Bertrand.DE-LONGUEVILLE (at) ec.europa.eu
Created date
24 Jun 2019 11:32
Modified date
13 Nov 2023 14:50
Rate this page
Please vote (optional).
An unhandled error has occurred. Reload 🗙