JRC Data Catalogue
COLLECTIONApproved

Semantic Text Analyser BERT-like language model for formal language understanding

SeTABERTa is a new multilingual langue model pertained from scratch using various Open Access text repositories: EU legislation, research articles, EU public documents and US patents. 2/3 of training data is English. The other part of data covers EU24 languages. The model was trained on JRC Big Data Platform. The model can be fine-tuned for other tasks.

Datasets (0)

Loading

Additional information

Published by
European Commission, Joint Research Centre
Contact email
vidas.daudaravicius (at) ec.europa.eu
Created date
30 Jan 2024 15:55
Modified date
28 May 2024 14:23
Rate this page
Please vote (optional).
An unhandled error has occurred. Reload 🗙