A parallel corpus of news articles in the Balkan languages, originally extracted from http://www.setimes.com. The corpus is PUBLIC DOMAIN, but if you use it in your work, please cite: Francis M. Tyers and Murat Alperen (2010), "South-East European Times: A parallel corpus of the Balkan languages".
9 languages, 36 bitextsBottom-left triangle: download files
| Upper-right triangle: sample files
|
| bg | el | en | hr | mk | ro | sq | sr | tr | ||
|---|---|---|---|---|---|---|---|---|---|---|
| bg | view | view | view | view | view | view | view | view | bg | |
| el | ces | view | view | view | view | view | view | view | el | |
| en | ces | ces | view | view | view | view | view | view | en | |
| hr | ces | ces | ces | view | view | view | view | view | hr | |
| mk | ces | ces | ces | ces | view | view | view | view | mk | |
| ro | ces | ces | ces | ces | ces | view | view | view | ro | |
| sq | ces | ces | ces | ces | ces | ces | view | view | sq | |
| sr | ces | ces | ces | ces | ces | ces | ces | view | sr | |
| tr | ces | ces | ces | ces | ces | ces | ces | ces | tr | |
| bg | el | en | hr | mk | ro | sq | sr | tr |
| language | files | tokens | sentences | bg | el | en | hr | mk | ro | sq | sr | tr |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| bg | 8 | 37.4M | 1.3M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | |
| el | 8 | 40.2M | 1.3M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | |
| en | 8 | 36.1M | 1.3M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | |
| hr | 8 | 36.8M | 1.3M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | |
| mk | 8 | 37.0M | 1.3M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | |
| ro | 8 | 41.4M | 1.4M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | |
| sq | 8 | 41.5M | 1.4M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | |
| sr | 8 | 37.5M | 1.3M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | |
| tr | 8 | 33.9M | 1.3M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M | 0.2M |