Skip to content

V21.03

Compare
Choose a tag to compare
@grw20blt grw20blt released this 29 Nov 10:46
4065c68

Dyma gorpws o frawddegau o destun Cymraeg wedi'u trwyddedu o dan drwydded CC0. Ar hyn o bryd, mae'r corpws yn cynnwys bron i 20,000 o frawddegau dros 180,000 o docynnau, a'r bwriad yw parhau i'w gynyddu wrth i ni gael gafael ar destunau o dan y drwydded briodol. Bwriad y corpws hwn y galluogi hyfforddi modelau iaith Cymraeg ar gyfer sawl diben gwahanol.

This is a corpus of Welsh texts licensed under the CC0 licence. The corpus currently contains nearly 20,000 sentences and over 180,000 tokens, and our aim is to continue to increase it's size as and when we're able to secure texts under the appropriate license. This corpus is intended to enable the training of language models for a variety of different purposes.