About

The CESAR program is being developed as part of the ACAD research pilot of CLARIAH.

ACAD stands for Automatic Coherence Analysis of Dutch and it serves a number of goals:

  • develop a web application that allows computationally naive linguists to define complex syntactic searches
  • use the application for automatic analysis of causal coherence in discourse
  • add parsed corpora of underrepresented categories (whatsapp and high-quality newspapers)

The first goal concentrates on a web application, which is Cesar: a Corpus Editor for Syntactically Annotated Resources. The Cesar application allows for defining, hosting and browsing syntactically annotated text corpora (see the Browse menu item), and it allows for editing and executing searches through these corpora, as well as viewing the results.

The second goal of ACAD is to allow for automatic anaylsis of causal coherence in discourse. This goal is accomplished by developing a number of searches for the different causal connectives under review (e.g. omdat, want). These searches are made available to different groups of CESAR users, such as researchers and students. Users can copy a search to their own account, adapt it and use it on a corpus to perform the automatic analysis.

ACAD provides the first part of a coherence analysis. It not only finds all occurrences of the causal marker that is being investigated, but it supplements each of those finds with a user-programmable number of output features. The table of the found markers and their output features can be exported from Cesar as a CSV (or Excel) file, and it are these data that can then be analysed further by statistical tools.

Request assistance » Step-by-step (pdf) Manual (pdf) Short overview

Overview of the ACAD components

The CESAR application builds on previous programs:

  • Cesax: coreference editor for syntactically annotated XML corpora. A Windows application (url)
  • CorpusStudio: a Windows application to search corpora on one's computer (url)
  • CorpusStudioWeb: a web application developed through the sponsorship of CLARIN-NL that allows specifying complex queries in the Xquery language

The source code of Cesar is available at github.

References

Komen, Erwin R. 2013. “Corpus Databases with Feature Pre-Calculation.” The Twelfth Workshop on Treebanks and Linguistic Theories (TLT12). Bulgary. pdf.
Komen, Erwin R. 2017. “Beyond Counting Syntactic Hits.” In CLARIN in the Low Countries, edited by J. Odijk and Hessen, A. van. Netherlands: Ubiquity press. pdf.
Komen, Erwin R., and Jet Hoek. 2018. Automatic Coherence Analysis for the Computationally Challenged. http://clin28.cls.ru.nl.