DATASET

Semantic Text Analyser BERT-like language model for formal language understanding

Collection: SeTABERTa : Semantic Text Analyser BERT-like language model for formal language understanding 

Description

SeTABERTa is a new multilingual langue model pertained from scratch using various Open Access text repositories: EU legislation, research articles, EU public documents and US patents. 2/3 of training data is English. The other part of data covers EU24 languages. The model was trained on JRC Big Data Platform. The model can be fine-tuned for other tasks. The model is available on HuggingFace at https://huggingface.co/vidaud/SeTABERTa-mlm-v1 and can be loaded with FuggingFace transformers library.

Contact

Email
vidas.daudaravicius (at) ec.europa.eu

Contributors

How to cite

European Commission, Joint Research Centre (JRC) (2024): Semantic Text Analyser BERT-like language model for formal language understanding. European Commission, Joint Research Centre (JRC) [Dataset] PID: http://data.europa.eu/89h/addd10f9-8325-4e49-8588-6cb681c162a5

Keywords

Language model

Data access

Semantic Text Analyser BERT-like language model for formal language understanding
URL 

Additional information

Published by
European Commission, Joint Research Centre
Created date
2024-02-01
Modified date
2024-02-08
Issued date
2024-02-01
Data theme(s)
Science and technology
Update frequency
unknown
Identifier
http://data.europa.eu/89h/addd10f9-8325-4e49-8588-6cb681c162a5
Popularity