Move to main content
Search

Lexical corpora

The Institute has comprehensive collections of lexical corpora. They contain data on Finnish dialects, Old Literary Finnish, Karelian and Finnish slang.

The largest lexical collections are the corpora of the Word Archive of Finnish Dialects, collected since the beginning of the 20th century. The main corpus of Finnish dialects contains more than 8 million data items about approximately 400,000 words. The material covers all Finnish dialects spoken in current Finnish territory as well as on the Karelian Isthmus and in Ingria. It also includes the Finnish dialects spoken in West Bothnia (Sweden) and Finnmark (Norway), and the extinct dialect of the immigrants from Savo, spoken in Värmland (western central Sweden).

The Dictionary of Old Literary Finnish is compiled based on a corpus of approx. 500,000 entry slips. The data is collected from prints and coherent manuscripts in Finnish ranging from the 1540's until around 1810.

The Dictionary of Contemporary Finnish contains over 100 000 entries of modern standard Finnish. It provides information on the meaning, usage, inflection and spelling of the words. 

The word corpus of Karelian consists of more than 550,000 dialect word entry slips. The oldest entries are from the late 19th century and the newest ones from the 1970s.


Solmu-sanalippu. Suomen murteiden sana-arkisto.
An entry slip. Photo: The archives of the Institute.