The Annotated Corpus

A wide diachronic corpus of political speeches encompassing the whole history of the Italian Republic, annotated for the relevant categories of implicits.

The project’s most demanding part will be the collection and annotation of a corpus that will include around 1.500 of the most influential political speeches distributed over the whole history of republican Italy since 1946, totalizing around 3 million words. It will provide the largest resource devoted to Italian political discourse and one of the largest corpora of transcribed spoken Italian. All texts will be tagged for implicit communication and made available through a dedicated web resource. The corpus will be balanced for parameters such as date, speaking politician, political party, diaphasic situation etc. It will be carefully annotated for the relevant linguistic persuasion strategies, such as presuppositions, implicatures, vagueness and metaphors, topicalizations etc. This will include quantitative assessment of the amount of implicits in each speech, made along the lines of the following Table:

Availability of the biggest existing corpus of Italian political communication

The existence of this balanced, transcribed and aligned corpus, thoroughly tagged not only for morphosyntactic categories but also for the pragmatically relevant categories of implicit persuasive strategies, together with its full searchability, will provide all those (scholars or other professionals) interested in the matter with an important and so far lacking resource. Here follows an example of provisional text annotation:

Data collection 

The list of the speakers will count around 150 political personalities and include all Presidents of the Republic, of the Senate and of the Chamber of Deputies, leaders of major parties, mayors of main cities, Presidents of main regions. 

The corpus will be organized into sub-corpora balanced according to two parameters: communication context and speaker.

The parameter of COMMUNICATION CONTEXT will specify the following features:

  • TIME PERIOD: 1946-1968, 1969-1992, 1993 to present. The largest subcorpus will include the speeches produced in the Second Republic (since 1993 to present), due to their greater interest to civil society as far as unveiling tendentious implicitness is concerned. The speeches produced in the First Republic (1946-1968 and 1969-1992) will constitute smaller sub-corpora. 
  • CHANNEL (broadcasting, face-to-face, new media). Beside conventional face-to-face and broadcasted speeches, a smaller section of the corpus will provide samples of political discourse spread via the new media, namely Facebook and Twitter, which have been acknowledged to share lexical and structural features of spoken language. 
  • PUBLIC (supporters, general public). 
  • STRUCTURE (assembly, meeting, interview, declaration). 
  • SOCIAL CONTEXT (institutional context, election campaign, other). 
  • TOPIC (the largest and potentially open feature). 

2. The parameter of SPEAKER will specify the speaker’s relevant sociolinguistic characteristics: SEX, AGE, GEOGRAPHICAL ORIGIN, POLITICAL PARTY, OFFICE AT THE MOMENT OF THE SPEECH. 

Sources for the corpus will be audio-video resources (available online or from institutional archives).