From 2704b29bc010c97eae88910d94e6018021542df1 Mon Sep 17 00:00:00 2001 From: Jouni Tuominen Date: Thu, 28 Sep 2023 15:09:21 +0300 Subject: [PATCH] FI: add documentation --- Samples/ParlaMint-FI/README.md | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/Samples/ParlaMint-FI/README.md b/Samples/ParlaMint-FI/README.md index b31affdcc..54459a2e2 100644 --- a/Samples/ParlaMint-FI/README.md +++ b/Samples/ParlaMint-FI/README.md @@ -6,12 +6,30 @@ ### Characteristics of the national parliament +The Parliament of Finland is the unicameral and supreme legislature of Finland. The Parliament consists of 200 members, 199 of whom are elected every four years from 13 multi-member districts electing 7 to 36 members using the proportional D'Hondt method. In addition, there is one member from Åland. Most MPs work in parliamentary groups which correspond with the political parties. + +The ParlaMint-FI corpus contains the minutes of the Finnish Parliament's plenary sessions from parliamentary session 2015 to parliamentary session 2021 (28.4.2015-28.1.2022). + ### Data source and acquisition +The minutes of the Finnish Parliament's plenary sessions from parliamentary session 2015 onwards are freely available on the Open Data service of the Parliament of Finland (https://avoindata.eduskunta.fi) via an API in XML format (wrapped in JSON). The minutes were fetched from the API using a Python script. + ### Data encoding process +The original XML data was transformed into TEI-XML using a series of Python and shell scripts (https://github.com/SemanticComputing/semparl-data-transformation). + ### Corpus-specific metadata +There is no metadata available going beyond what’s common for all corpora. + ### Structure -### Linguistic annotation \ No newline at end of file +There are no additional TEI elements beyond what’s described in the ParlaMint schema. + +### Linguistic annotation + +The linguistic annotation was generated using a Python script utilizing a previously generated linguistically annotated version of the minutes of the Finnish Parliament's plenary sessions in RDF format (which was produced in the Finnish Semantic Parliament project (https://seco.cs.aalto.fi/projects/semparl/en/)). + +There is an issue in the linguistically annotated data regarding speeches that contain transcriber comments and/or interruptions. Transcriber comments and interruptions, and also the parts of the speeches that are after a transcriber comment or interruption aren't included in the linguistically annotated version. + +There is no specific linguistic annotation going beyond what’s common for all corpora. \ No newline at end of file