Hermann-Gundert Portal – a treasure trove for South Indian languages


The printed  and handwritten source materials from the collection of Hermann Gundert (and his Basel Mission colleagues) available in the Tuebingen University Library,  has been scanned, catalogued, partially transcribed  and made accessible with the support of the German Research Foundation (DFG). The digital collection can be accessed through the Hermann Gundert Portal which is available here http://gundert-portal.de. The portal is officially opened to public through a small function happened in Tuebingen University Library on 20th November 2018.

Hermann Gundert Portal
Hermann Gundert Portal

Gundert Legacy project

During 2012-2013 a request was made to Tuebingen University authorities to digitize the Malayalam documents in Gundert collection. The university took it forward and a project is planned to scan not only the Gundert collection but also his colleagues collections which are in other south Indian languages. On 2013 September 12 a function was organized in Kochi, Kerala and the Gundert Legacy project was announced.  The project was funded by German Research Foundation (The Deutsche Forschungsgemeinschaft).  One of the highlight of the project is the decision to convert the printed documents in Malayalam to Malayalam Unicode. About 24,000 pages from the collection is converted to Malayalam Unicode.

Gundert Portal

The Gundert Portal is designed to display the results of the Gundert Legacy Project. The portal is inaugurated at a small function happened in Tuebingen University Library on 20th November 2018. The portal can be accesses here: http://gundert-portal.de.

In the Gundert portal Tuebingen University has included the 19th century works in South Indian languages that are in the holdings of the University Library, even if they were not strictly speaking from the Gundert legacy. English and German material written by Gundert and his closest colleagues was also added.

A flyer created by Tuebingen University about the Gundert Portal is available here.

Languages covered

While a major portion of the collection is source materials is in Malayalam, the next major collection is about Kannada. Also there are source materials in Tamil, Telugu, Tulu, Sanskrit, English and German. The English and German materials are related to South India/Basel Mission.

Types of documents

The source materials include printed (both letter press and lithography) and handwritten (paper documents and palm leaves) documents.

Subjects covered

The subjects covered in these documents include Christian and Hindu literature,  mythological topics, dictionary, grammar, fiction, and non-fiction books.

Number of documents

As noted above, Malayalam language has a major share in this collection. The next important one is Kannada. Following are the details of the facts and figures of the documents:

  • 293 titles in Malayalam script which includes 73 paper manuscripts and 20 palm-leaf manuscripts
  • 187 titles in Kannaḍa script which includes 37 paper manuscripts and 2 palm-leaf manuscripts
  • 4 titles in Tulu language (in Kannada script)
  • 25 titles in Tamil script which includes 2 paper manuscripts
  • 4 titles in Telugu script
  • 340 titles in German or English, including 190 articles from periodicals

The number of pages in all these documents amounts to 1,37,148 pages

Accessing the portal

The portal is user friendly. But default user interface language is German. So those who are not familiar with German, need to change the language. For this, click on the text “SPRACHE” on the top right and select English from the list as shown in the below image.

 The collection is arranged based on Language, type of document, subject of the document and so on. Since browsing the collection is easy I am not not going into the details of it.

Under Home tab you can see the following pages:

These pages will provide a general overview about the management of this project.

Unicode Conversion

A decision was made by Tuebingen University at the initial phase of the project to convert printed Malayalam documents to Unicode. I got the opportunity to coordinate the Unicode conversion.  Since I have plans to document this in detail very soon  I am not going into the details of the project management of Unicode conversion.

You can view the Unicode version of Malayalam books from the portal.  For this select a Malayalam printed book from the Gundert portal. The books opens in Digital library. For example, the book  Prakrtisastram (പ്രകൃതിശാസ്ത്രം) by  L.J. Frohnmeyer. The Unicode text will appear on the Transcript tab as shown in the following image. The Unicode conversion is done by matching the source text as it is. I will write in detail about these things later.



Over the past 5 years almost 250 people across the globe participated in this project. It is nice to see the result of this international collaboration through Gundert Portal. Hope the researchers, Open Knowledge community (for example, Wikimedians), Language computing experts will make good use of this excellent collection from Tuebingen University.

My thanks to Tuebingen University officials for creating such an excellent treasure trove for South Indian languages and people.