DATASET

Wastewater in-silico NGS dataset

Collection: ETOHA-OPEN : ETOHA-OPEN - Emerging Health Threats and the One Health Approach 

Description

This dataset comprises 45 paired-end wastewater next-generation sequencing (NGS) samples generated in-silico using the SWAMPY tool (https://doi.org/10.1093/bioinformatics/btae532). In particular, we produced 500000 reads per sample (--n_reads 500000) to simulate MySeq runs (--seqSys MSv3) with the ARTIC amplicon panel v.4 (--primer_set a4). Each sample includes three different SARS-CoV-2 lineages (JN.1-Omicron: EPI_ISL_18850060, AY.99.2-Delta: OM898676, BA.2-Omicron: OV754178) in varying proportions. Overall, there are five different datasets. Within each dataset (consisting of nine samples), the frequency of JN.1 is kept fixed while the relative abundance of the other two lineages varies inversely. The fixed frequencies of JN.1 are 0%, 5%, 10%, 30%, and 50%.

- When JN.1 is at 50%, AY.2 ranges from 5% to 45% while BA.2 decreases from 45% to 5%.

- When JN.1 is at 30%, AY.2 ranges from 7% to 63% while BA.2 decreases from 63% to 7%.

- When JN.1 is at 10%, AY.2 ranges from 9% to 81% while BA.2 decreases from 81% to 9%.

- When JN.1 is at 5%, AY.2 ranges from 10% to 90% while BA.2 decreases from 85% to 5%.

- When JN.1 is at 0%, AY.2 ranges from 10% to 90% while BA.2 decreases from 90% to 10%.

Filenames are encoded with the following information: Presence/Absence of JN.1; Frequency of JN.1; AY.99.2 ID; AY.99.2 Frequency; BA.2 ID; BA.2 Frequency; Instrument Model; Nreads (e.g., Positive.JN1.5.OM898676.80.OV754178.15.MSv3.WW.500000reads_R1.fastq.gz)

Contact

Email
gabriele.leoni (at) ec.europa.eu

Contributors

  • Gabriele Leoni
  • Mauro Petrillo
  • Victoria RUIZ SERRA

How to cite

Leoni, Gabriele; Petrillo, Mauro; RUIZ SERRA, Victoria (2024): Wastewater in-silico NGS dataset. European Commission, Joint Research Centre (JRC) [Dataset] doi: 10.2905/72cbb0ab-1807-4d68-9bd5-882312715419 PID: http://data.europa.eu/89h/72cbb0ab-1807-4d68-9bd5-882312715419

Keywords

bioinformatics in silico NGS SARS-CoV-2 wastewater

Data access

in-silico wastewater NGS dataset
Download 
in-silico wastewater dataset metadata
Download 
  • This file reports sample information such as frequencies of lineages, SWAMPY error model and reads counts

Additional information

Published by
European Commission, Joint Research Centre
Created date
2024-11-11
Modified date
2024-11-19
Issued date
2024-11-11
Data theme(s)
Environment, Health, Science and technology
Update frequency
irregular
Identifier
http://data.europa.eu/89h/72cbb0ab-1807-4d68-9bd5-882312715419
Popularity