Part II: Who wrote the last coalition agreement in Germany?
After discovering the limitations of pretrained topic classification, I pivot to using actual Manifesto Project data to trace party contributions through embeddings and cosine similarity.
Please see Part I of this blog entry for an introduction to the topic.
The Original Plan (And Why It Failed)
In Part I, I used sentence embeddings and cosine similarity to identify which party programs were most similar to each sentence in the coalition agreement. The next logical step seemed straightforward: classify each sentence by policy topic using a pretrained transformer model trained on Manifesto Project data. This would allow me to see whether parties contributed sentences according to their core issue areas.
The theory was sound. If the Greens dominated sentences about environmental protection, or the SPD controlled welfare policy language, this would validate both the attribution method and reveal each party’s negotiating influence.
So I loaded the manifestoberta-xlm-roberta-56policy-topics-sentence-2024-1-1 model and classified every sentence in the coalition agreement and party programs.
The results were… not great.
Click here to see classification code
import osimport torchfrom transformers import AutoModelForSequenceClassification, AutoTokenizerimport pandas as pdfrom tqdm import tqdmoutput_path ="sentences_with_topics.csv"if os.path.exists(output_path): df = pd.read_csv(output_path)else:# Load model & tokenizer model = AutoModelForSequenceClassification.from_pretrained("manifesto-project/manifestoberta-xlm-roberta-56policy-topics-sentence-2024-1-1") tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large") device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu") model.to(device)# Read sentences df = r.sentence_sampledef get_topic(sentence): inputs = tokenizer(str(sentence), return_tensors="pt", max_length=200, padding="max_length", truncation=True ) inputs = {k: v.to(device) for k, v in inputs.items()}with torch.no_grad(): logits = model(**inputs).logits predicted_class = model.config.id2label[logits.argmax().item()]return predicted_class tqdm.pandas() df['topic'] = df['text'].progress_apply(get_topic) df.to_csv(output_path, index=False)print(f"Saved results to {output_path}")
PRESSE- UND MEINUNGSFREIHEIT SCHÜTZEN Presse- und Meinungsfreiheit sind Grundpfeiler unserer liberalen und offenen demokratischen Gesellschaft.
FDP
107 - Internationalism: Positive
Zeitarbeit sichert Teilhabe für die Beschäftigten und Flexibilität für die Unternehmen.
GRUENE
304 - Political Corruption
Wir wollen insbesondere die demokratische Kontrolle 81 Ka pi te l 2 bei der regulatorischen Kooperation verbessern.
GRUENE
110 - European Community/Union: Negative
Sommercamps und Nachhilfe in den Kernfächern alleine werden nicht ausreichen, um die Folgen der Krise zu bewältigen.
KOALITION
606 - Civic Mindedness: Positive
Geprüft wird die Errichtung einer Stiftung oder Gesellschaft, die den Rückbau der Kohleverstromung und die Renaturierung organisiert.
KOALITION
104 - Military: Positive
Das Brennstoffemissionshandelsgesetz (BEHG), einschließlich der erfassten Brennstoffemissionen in der Industrie (industrielle Prozesswärme), wollen wir auf seine Kompatibilität mit einem möglichen ETS 2 überprüfen und gegebenenfalls so anpassen, dass ein möglichst reibungsloser Übergang gewährleistet ist.
SPD
606 - Civic Mindedness: Positive
Wir werden eine nationale Leitstelle Mobilität einrichten, die die Erarbeitung regionaler Mobilitätspläne unterstützt und eine frühzeitige Beteiligung vor Ort sicherstellt.
SPD
110 - European Community/Union: Negative
Wir stellen uns konsequent gegen Diskriminierung und Gewalt.
These samples were chosen at random
# Prepare examples dataset (assuming you have columns sentence_coal and sentence)examples<-read_csv("sentences_with_topics.csv")%>%mutate(party =if_else(str_detect(party, "koalition"), "KOALITION", toupper(party)))%>%group_by(party)%>%slice_sample(n =2)%>%ungroup()%>%select(party, topic, text)# Ensure these columns exist in your data# Create gt table in your style:examples%>%gt()|>cols_label( party =md("**Party**"), text =md("*Text*"), topic =md("**Topic**"))|>tab_header( title =md("**Classification Results**"), subtitle ="from coalition and party programs")|>tab_footnote(md("*These samples were chosen at random*"))
Validate, validate, validate
Does it hold up? The classification results demonstrate several clear misclassifications that highlight limitations of the pretrained topic model in this context. Sentences related to Fuel Emissions (“Brennstoffemissionshandelsgesetz”) were classified as “104 - Military: Positive”, sentences against discrimination and violence as “110 - European Community/Union: Negative”. Freedom of speech was tagged as “701 - Labour Groups: Positive”.
As Grimmer and Stewart (2013) famously emphasize, when working with automatically classified texts you must “validate, validate, validate.” The ManifestoBERTa sentence model authors report a top-1 accuracy of 57% and an F1 macro score of 0.45; while seemingly reasonable, these aggregate metrics may mask idiosyncratic errors and biases when applied to new texts or domains (Burst 2024). They are meaningless without the comparison to the human gold standard. How well would a human fare at classifying this? Hence, thorough validation, including comparison to manual coding, qualitative inspection, and robustness checks, is essential to ensure trustworthy inferences. Without such validation, reliance solely on published performance measures risks drawing misleading conclusions from noisy or systematically biased automatic classifications (Grimmer and Stewart 2013).
By doing a simple sanity check on a few sentences, I can already determine that this model makes too many errors to be used for my task as is.
Why did this happen? Pretrained models work brilliantly when applied to data similar to their training corpus. But the manifestoberta model was trained on a specific structure: coded sentences from official party manifestos in the Manifesto Project database. Our documents—extracted from PDFs, split by tidytext, with all their formatting quirks—were different enough that the model couldn’t reliably generalize. With a little more time, it might make sense to take a pre-trained transformer that is specialized for the german language and fine-tune it only on the seven main topic domains and create separate classifiers within these.
The Pivot: Using Actual Manifesto Project Data
Rather than fighting with the pretrained classifier, since a few years have passed, by now the Manifesto Project provides the actual coded sentences from these same party programs, already classified by trained human coders (Lehmann et al. (2025)).
This approach has the obvious advantage that it is very reliable: Human-coded classifications from the Manifesto Project are the gold standard for this type of analysis. The trade-off is that I need to re-do the embedding and similarity matching using the Manifesto Project’s sentence-level data rather than my PDF extractions. But this potentially improves the analysis as the Manifesto data is cleaner and more consistently structured.
Loading the Manifesto Project Data
Code for loading the data
library(manifestoR)library(manifestoEnhanceR)if(!file.exists("manifesto_data.fst")){# Load German documents from 2021key<-Sys.getenv("manifestoAPIkey")ger_corpus<-mp_corpus(countryname=="Germany"&edate>as.Date("2020-01-01"), apikey =key)ger_docs<-as_tibble(ger_corpus)# Get metadatacmp<-mp_maindataset(south_america =FALSE)meta<-cmp%>%filter(countryname=="Germany"&edate>as.Date("2020-01-01"))%>%transmute( manifesto_id =paste0(party, "_", date),party,partyname,partyabbrev)%>%unique()ger_docs<-left_join(ger_docs, meta)# Extract sentence-level data with topic codesger_enhanced<-ger_docs%>%mutate(data =map(data, enhance_manifesto_df))%>%unnest(data)fst::write_fst(ger_enhanced,"manifesto_data.fst")}else{ger_enhanced<-fst::read_fst("manifesto_data.fst")}
The analysis here is made possible by using two wonderful R packages that allow querying the Manifesto Project database directly via its API: manifestoR and manifestoEnhanceR. Now, each sentence has a cmp_code indicating its policy domain according to the Manifesto coding scheme.
This setup is incredibly handy. Having seamless, programmatic access to a rich collection of party manifestos and their detailed coding without manual downloads or preprocessing saves countless hours. It is quite cool to have so many party manifestos at our fingertips with such ease!
Thanks to the Manifesto Project team for maintaining this comprehensive database and offering open API access, and to Hauke Licht for creating and sharing the manifestoEnhanceR package that enhances data usability with sentence-level metadata.
Re-embedding with Manifesto Data
Now I repeat the embedding process from Part I, but using the Manifesto Project sentences instead of my PDF extractions. The details are discussed in Part I, so I won´t go into it again.
Embedding manifesto sentences
from sentence_transformers import SentenceTransformer, utilimport pandas as pdimport torchfrom tqdm import tqdmoutput_path ="manifesto_similarity_results.csv"if os.path.exists(output_path): data = pd.read_csv(output_path)else:# load local GPU on mac M3 device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")# Load model model = SentenceTransformer('sentence-transformers/paraphrase-multilingual-mpnet-base-v2').to(device)# Encode coalition sentences coalition_text = r.sentences[r.sentences['party'] =='koalition'].reset_index(drop=True).text.values embedding_coal = model.encode( coalition_text, convert_to_tensor=True, show_progress_bar=True, device = device )# Prepare output DataFrame data = pd.DataFrame()# Define parties to process parties = ['SPD', '90/Greens', 'FDP'] target_date =202109for party in tqdm(parties, desc="Processing Parties"):# Get party sentences and reset index party_sentences = r.ger_enhanced[ (r.ger_enhanced['partyabbrev'] == party) & (r.ger_enhanced['date'] == target_date) ].reset_index(drop=True) party_text = party_sentences['text'].values party_codes = party_sentences['cmp_code'].values# Encode party sentences embedding_party = model.encode( party_text, convert_to_tensor=True, show_progress_bar=True, device = device )# Compute cosine similarity matrix (party sentences × coalition sentences) cosine_scores = util.pytorch_cos_sim(embedding_party, embedding_coal).cpu().numpy()# Prepare DataFrame for all pairwise combinations including cmp_code rows, cols = cosine_scores.shape results = pd.DataFrame({'party': party,'party_sentence': party_text.repeat(cols),'topic_code': party_codes.repeat(cols), # repeat to match sentences'coalition_sentence': list(coalition_text) * rows,'similarity_score': cosine_scores.flatten() })# Filter to keep only the best party sentence match per coalition sentence top_matches = ( results.sort_values('similarity_score', ascending=False) .groupby(['party', 'coalition_sentence']) .head(1) ) data = pd.concat([data, top_matches], ignore_index=True)# Save combined results with topic codes data.to_csv("manifesto_similarity_results.csv", index=False)
Mapping Topics to Party Strengths
Now we can ask the interesting questions: Do sentences attributed to each party fall into their expected policy domains?
The Manifesto coding scheme groups topics into seven main domains:
External Relations (Foreign policy, military, international cooperation)
Welfare & Quality of Life (Social justice, education, environment)
Fabric of Society (National identity, multiculturalism, tradition)
Social Groups (Labor, agriculture, demographic groups)
We’d expect:
Greens: Strong in Domain 5 (especially environmental protection)
SPD: Strong in Domains 5 and 7 (welfare expansion, labor groups)
FDP: Strong in Domain 4 (free market economy, deregulation)
However, as discussed in Part I, the automated attribution method that I chose, is not perfect. In cases where similarity scores are very high, it mostly gets it right, everywhere else, its a bit murky. To properly validate it, I would have to code a manual sample, which is a bit too labor intensive, which is why I am proceeding with a conservative approach. I will only evaluate sentences that have a similarity score of .9 or higher. This will exclude about 90% of the data. Hence statements can be made with reference only to the subsample of sentences that are almost copied entirely from party programs.
Topic Distribution across parties
In the 2021 German federal election, the Social Democratic Party (SPD) received 25.7% of the vote, making them the largest party for the first time since 2002. Alliance 90/The Greens secured 14.7%, achieving their best historical result at the time. The Free Democratic Party (FDP) obtained 11.4% of the vote, making small gains compared to previous elections. These vote shares reflect the relative strength of each party going into coalition talks after the election. However, this does not seem to be reflected in the topic distribution.
We can see that overall, the topics Welfare and Quality of Life and External Relations seem to dominate the direct contributions from party programs. In all topics, The Greens and the SPD seem to dominate, even in the topic of Economy, with the FDP only contributing a small share.
Figure 1: Topic Domain Distribution in Coalition Agreement
library(ggplot2)library(dplyr)library(patchwork)library(readr)library(paletteer)# Load similarity resultsresults<-read_csv("manifesto_similarity_results.csv")# Select the best match (highest similarity) for each coalition sentencebest_matches<-results%>%group_by(coalition_sentence)%>%slice_max(similarity_score, n =1)%>%filter(similarity_score>=.9)|>ungroup()# Map topic codes to broader domainsget_domain<-function(code){domain_num<-substr(as.character(code), 1, 1)case_when(domain_num=="1"~"1 External Relations",domain_num=="2"~"2 Freedom & Democracy",domain_num=="3"~"3 Political System",domain_num=="4"~"4 Economy",domain_num=="5"~"5 Welfare & Quality of Life",domain_num=="6"~"6 Fabric of Society",domain_num=="7"~"7 Social Groups",TRUE~"Missing")}best_matches<-best_matches%>%mutate(domain =get_domain(topic_code))# Calculate proportions by domain and party grouped within each domaindomain_party_props<-best_matches%>%count(domain, party)%>%ungroup()%>%mutate(total =sum(n))%>%group_by(party, domain)%>%mutate(prop =n/total)%>%ungroup()domain_party_props_in_topic<-best_matches%>%count(domain, party)%>%group_by(domain)%>%mutate(total =sum(n))%>%group_by(party, domain)%>%mutate(prop =n/total)%>%ungroup()# Use Frida Kahlo palette colors from paletteer for the partiesfrida_colors<-c("FDP"=paletteer::paletteer_d("lisa::FridaKahlo")[4],"90/Greens"=paletteer::paletteer_d("lisa::FridaKahlo")[2],"SPD"=paletteer::paletteer_d("lisa::FridaKahlo")[5])# Common theme to remove duplicate titles and axis y text from second plotcommon_theme<-theme_minimal(base_size =14)+theme( legend.position ="bottom", axis.title.x =element_text(size =12), axis.title.y =element_blank(), axis.text.y =element_text(size =10))# First plot: proportions across whole agreementp1<-ggplot(domain_party_props, aes(x =prop, y =domain, fill =party))+geom_col(position ="stack", width =0.75)+scale_fill_manual(values =frida_colors)+common_theme+labs( title ="Topic Domain Distribution in Coalition Agreement", subtitle ="by party according to best match between program and agreement", x ="Proportion across agreement", fill ="Party")# Second plot: proportions within each topic domainp2<-ggplot(domain_party_props_in_topic, aes(x =prop, y =domain, fill =party))+geom_col(position ="stack", width =0.75)+scale_fill_manual(values =frida_colors)+common_theme+theme(axis.text.y =element_blank())+labs( subtitle ="by party proportion within topic domain", x ="Proportion within topic domain")# Combine plots with unified legend and shared x-axis labelcombined_plot<-(p1+p2)+plot_layout(guides ="collect")&theme(legend.position ="bottom")# Add shared x-axis label using patchwork plot_annotationcombined_plot<-combined_plot+plot_annotation( caption ="Proportions show party influence based on best sentence matches")&theme(plot.caption =element_text(hjust =0.5, size =12, face ="italic"))combined_plot
Sub-topic distribution - a more detailed look
Looking at the more detailed topics beyond the main policy domains, the initial hypothesis can largely be confirmed, though the finding is somewhat tentative due to limited data. The Greens show strong positive contributions related to environmentalism and sustainability, as expected. However, the SPD and FDP also have sentences in these areas, albeit to a lesser degree. True to their “Free Democrats” name, the FDP’s highest contributing subtopic is indeed “freedom,” and administrative efficiency, a core campaign issue for the FDP, also appears reflected in the coalition agreement through their party sentences.
Overall, this suggests that coalition agreements are at least partially shaped by the parties’ dominant core issues. Interestingly, vote share does not seem to directly correlate with the volume of contributions, as the Greens have punched well above their weight in terms of textual influence. That said, these results rest on a limited dataset and should be interpreted with caution.
Figure 2: Topic Domain Distribution in Coalition Agreement
# Identify top 3 to 5 topics per partytop_topics<-topic_match%>%group_by(party)%>%slice_max(order_by =n, n =5, with_ties =T)%>%ungroup()|>filter(!is.na(label))# Restrict original data to only those top topicsfiltered_data<-topic_match%>%semi_join(top_topics, by =c("party", "code"))# Optional: reorder topics by count within each partyfiltered_data<-filtered_data%>%group_by(party)%>%ungroup()# Plotggplot(filtered_data, aes(x =n, y =reorder_within(str_wrap(label, 18),n, party), fill =party))+geom_col(show.legend =FALSE)+facet_wrap(~party, ncol =3, scales ="free_y")+scale_fill_manual(values =frida_colors)+scale_y_reordered()+labs( x ="Count of matched sentences", y ="Topic", title ="Top Topics by Party in Coalition Agreement", subtitle ="Five most frequent topics per party", caption ="with ties")+theme_minimal()+theme( axis.text.y =element_text(size =9), strip.background =element_blank(), strip.text =element_text(face ="bold"))
Lessons learnt
Domain shift and small textual markers matter more than you think: The gap between training data (clean, structured Manifesto database entries) and application data (PDF extractions with formatting artifacts) was enough to break the classifier.
Conservative thresholds are your friend: By requiring 0.9+ similarity, I sacrifice 90% of the data but gain confidence in the remaining 10%. In exploratory research, this trade-off is often worthwhile.
Future Directions
1. Validating Attribution with LLMs
The next step is to validate whether the high-similarity matches actually represent genuine text reuse vs. coincidental similarity. One could use local LLMs via rollama to:
Present sentence pairs to Llama 3.1 or Mistral
Ask: “Does sentence B appear to be derived from sentence A?”
Compare LLM judgments against similarity scores
Identify false positives where similarity is high but influence is questionable
2. Fine-Tuning a Domain-Specific Classifier
Rather than using the pretrained manifestoberta model, I could:
Start with a German language model (e.g., german-gpt2 or gbert-large)
Fine-tune on the seven main Manifesto domains rather than 56 granular topics
Use the Manifesto Project’s coded sentences as training data
Create an ensemble of domain-specific classifiers
This would likely improve accuracy while remaining computationally feasible.
3. Sentence-Level Negotiation Analysis
With validated attributions, I could investigate:
Bargaining patterns: Which topics show more “mixed” sentences (moderate similarity to multiple parties)?
Compromise detection: Can we identify sentences that blend language from multiple programs?
Sequential influence: Do sentences from one party tend to cluster together in the agreement?
Grimmer, Justin, and Brandon M Stewart. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.”Political Analysis 21 (3): 267–97. https://doi.org/10.1093/pan/mps028.
Lehmann, Pola, Simon Franzmann, Denise Al-Gaddooa, Tobias Burst, Christoph Ivanusch, Sven Regel, Felicia Riethmüller, Andrea Volkens, Bernhard Weßels, and Lisa Zehnter. 2025. “The Manifesto Data Collection. Manifesto Project (MRG/CMP/MARPOR). Version 2025a.” Dataset. https://doi.org/10.25522/manifesto.mpds.2025a.
Citation
BibTeX citation:
@online{bochtler2025,
author = {Bochtler, Paul},
title = {Part {II:} {Who} Wrote the Last Coalition Agreement in
{Germany?}},
date = {2025-09-28},
url = {https://www.paulbochtler.de/blog/2025/01/},
langid = {en}
}