Online Latin Corpora for Linguistic Research - Texts

Home
Main Navigation
Content
Search
Help

Font Standard Bold
Login

Research unit

COST

Project number

C12.0053

Project title

Online Latin Corpora for Linguistic Research

Texts for this project

	German	French	Italian	English
Key words	-	-	-
Research programs	-	-	-
Short description	-	-	-
Partners and International Organizations	-	-	-
Abstract	-	-	-
References in databases	-	-	-

Inserted texts

Category	Text
Key words (English)	Latin; Middle Ages; Lexicography; Text corpora; Linguistics; Open Source
Research programs (English)	COST-Action IS1005 - Medieval Europe - Medieval Cultures and Technological Resources
Short description (English)	This project means to set up several corpora of Latin texts belonging to well defined segments of the Latin language on our already existing experimental server (http://mlat.uzh.ch/) for linguistic research. For this work a part time programmer (who already collaborated with us in the past) and a Hilfswissenschaftler will be needed. We apply for the funding of these two part time collaborators. All our text corpora will be freely accessible online and therefore be of further use to other researchers, all words in our texts will be automatically lemmatised and linked to the site of Medieval Latin dictionaries that is currently being developed by our COST partners (Mittellateinisches Wörterbuch in Munich, the Polish and Czech national dictionaries and the Novum Glossarium located in France). We hope that an open large scale Latin text collection in collaboration with other projects will grow out of this collaboration.
Partners and International Organizations (English)	BE; BG; CZ; DK; FI; FR; DE; EL; HU; IE; IL; IT; MT; NL; NO; PL; PT; RO; ES; SE; CH; UK
Abstract (English)	In its third year, the project (http://www.mlat.uzh.ch) has quickly reached its final goals. Several bugs have been corrected and the performance improved. The toponomastic lexicon by Graesse was added among the dictionary tools. The dictionary tool can now also be consulted directly as a search engine in the chrome / chromium browser, the search URL to be used is: http://mlat.uzh.ch/MLS/info_frame.php?w=%s. Other minor new features include: direct links to any text in the collection (which enable other sites to link to a specific text of ours), and the different rendering in colour of quotations in the texts (provided they are coded in the xml file). New texts from new collaborators could be adapted to our standards and incorporated into the database (cf. list below “cooperation partners”), especially the important collection (now corpus 13) of Latin Grammatical texts from the Corpus Grammaticorum Latinorum by Prof. Alessandro Garcea (Université Paris Sorbonne), and another large corpus of Latin texts (corpus 16), from the Digital library of late-antique Latin texts (digiliblt), Università di Vercelli: each contains some 100 texts, most of them in the most recent and best edition available (but for legal reasons without apparatuses). Besides the applicant has OCRed some texts on traveling in the middle ages and united them in a corpus “Itinera” (cps 12), and some more Greek texts from Perseus Library have been included (esp. Plato); Corpora 13 and 14 will soon get more theological and poetical texts added, respectively, from several sources. Currently we have 134 million words from 6210 texts on display. General résumé of the entire project. We have been able to achieve most of our initial goals, indeed the project has grown much larger than originally intended and is now the largest open Latin full-text meta-collection in existence. Our logs currently show between 200 and 300 individual users daily (excluding engines and robots) and we are more and more often contacted by researchers from around the world who intend to use our data for their research. The project has remained faithful to its open source approach: it still uses only open software and is freely accessible for everyone. The proposed linkage to the planned new dictionary platform has not been realised as this platform has not (yet?) advanced beyond the planning stage. Instead, several large public domain dictionaries were directly incorporated into the interface. As stated above, they can now also be queried together as a browser search engine. The input does not have to be the lemma-form but may be any form known to our wordlists. The software Philologic, which would have provided word statistics, could not be installed as there is no version running on contemporary server operating systems. We hope to be able to incorporate some of its uses in a future stage. We have applied for further funding within the European Horizon 2020 project MAMEUR, led by University of Siena. If granted, we will be able to implement more features such as: further linkage to other projects, like links for all texts and authors to their respective pages on mirabile (http://www.mirabileweb.it/ricerca_semplice.aspx) and to their GND and VIAF numbers (international library catalogues identifying texts and authors uniquely); visualisation of images accompanying texts; more options for truncated (reverse) searches; grammatical searches (e.g. lucror + DAT within 5 words distance); and quality control: the quality of the editions used (critical, non-critical, mere transcription), the quality of its OCR, and the richness of the tagging should be more clearly communicated to the user. If the large scale project MAMEUR is not accepted, we will try to find other funding to continue the development. We are in contact with Lukas Rosenthaler from the Digital Humanities Lab (University of Basel) with the aim to preserve the resource for the long-term future.
References in databases (English)	Swiss Database: COST-DB of the State Secretariat for Education and Research Hallwylstrasse 4 CH-3003 Berne, Switzerland Tel. +41 31 322 74 82 Swiss Project-Number: C12.0053