The Language Technology Group

News and Events

The REMU project - Reliable Multilingual Digital Communication: Methods and Applications.

Third GF Summer School 2013 - Frontiers of Multilingual Technology.

Background

The Language Technology Group at the Department of Computer Science and Engineering was founded in 2001. It built on the Department's earlier competence areas:

With this background, the group was in the beginning very much profiled towards precision-oriented tasks, rather than to wide coverage

In more recent years, the efforts have been extended to the creation of tools and resources usable in all kinds of language technology tasks:

The main characteristics identifying our group are

Currently (February 2008), the group has 8 members with a PhD and 4 PhD students. Some of the senior members have their principal affiliations at the Departments of Linguistics and Swedish Language.

People

Alumni

Multilingual grammars

Grammatical Framework (GF), grammaticalframework.org, is a multilingual grammar formalism based on the idea of a shared abstract syntax and mappings between the abstract syntax and concrete languages. GF has hundreds of users all over the world.

The GF Resource Grammar Library, grammaticalframework.org/lib/doc/synopsis.html, implements the morphology (inflection) and basic syntax (phrase structure) of 16 languages: Bulgarian, Catalan, Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Polish, Romanian, Russian, Spanish, Swedish, and Urdu. These resources are, freely available as open-source software. More languages are under construction, in both in-house and external projects.

Embedded grammars are parsing and generation programs compiled from GF grammars and usable as parts of programs written in other languages: Haskell, Java, and JavaScript.

Multilingual grammar applications

The Numeral Translator, http://www.cse.chalmers.se/alumni/bringert/gf/translate/, is a demo of embedded grammars in Java. It translates number words between 88 languages.

The Letter Editor http://www.cse.chalmers.se/alumni/markus/gramlets/letter-applet.html is another demo of embedded grammars in Java. It allows the user to write a letter in languages she doesn't know while viewing it in a language she knows.

The Pizza ordering system, http://www.cse.chalmers.se/alumni/bringert/xv/pizza/pizza-movie-large.html, is a demo of integrated speech language model, JavaScript, and VoiceXML generation from GF grammars. In a browser supporting the used WC3 standards (e.g. Opera on Windows), the user can construct an order by using spoken language.

Morphology and lexicon

SALDO, a large-scale freely available morphological lexicon of Swedish, with semantic association.

Lexicon Extraction from Raw Text Data, www.cse.chalmers.se/alumni/markus/extract, a tool for collecting a morphological lexicon by the use of inflection paradigms.

Functional Morphology, www.cse.chalmers.se/alumni/markus/FM, a Haskell library for developing inflection engines and morphological lexica.

Unsupervised Learning of Morphology, www.cs.chalmers.se/~harald2/lic.pdf, a technique usable for languages with scarce resources.

Language identification

A Fine-Grained Model for Language Identification, www.cs.chalmers.se/~harald2/id_inews07.pdf, a technique usable for short passages and language switching.

Compiler technology

The BNF Converter (BNFC) is a high-level multi-backend compiler tool, inspired by GF. It has thousands of users and is included in Linux distributions such as Debian and Ubuntu.