Part II: Who wrote the last coalition agreement in Germany?

After discovering the limitations of pretrained topic classification, I pivot to using actual Manifesto Project data to trace party contributions through embeddings and cosine similarity.
r
python
sentence-transformer
nlp
Author
Published

Sunday, September 28, 2025

Please see Part I of this blog entry for an introduction to the topic.

The Original Plan (And Why It Failed)

In Part I, I used sentence embeddings and cosine similarity to identify which party programs were most similar to each sentence in the coalition agreement. The next logical step seemed straightforward: classify each sentence by policy topic using a pretrained transformer model trained on Manifesto Project data. This would allow me to see whether parties contributed sentences according to their core issue areas.

The theory was sound. If the Greens dominated sentences about environmental protection, or the SPD controlled welfare policy language, this would validate both the attribution method and reveal each party’s negotiating influence.

So I loaded the manifestoberta-xlm-roberta-56policy-topics-sentence-2024-1-1 model and classified every sentence in the coalition agreement and party programs.

The results were… not great.

Click here to see classification code
import os
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import pandas as pd
from tqdm import tqdm

output_path = "sentences_with_topics.csv"

if os.path.exists(output_path):
    df = pd.read_csv(output_path)
else:
    # Load model & tokenizer
    model = AutoModelForSequenceClassification.from_pretrained("manifesto-project/manifestoberta-xlm-roberta-56policy-topics-sentence-2024-1-1")
    tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large")
    device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
    model.to(device)

    # Read sentences
    df = r.sentence_sample
    def get_topic(sentence):
        inputs = tokenizer(
            str(sentence),
            return_tensors="pt",
            max_length=200,
            padding="max_length",
            truncation=True
        )
        inputs = {k: v.to(device) for k, v in inputs.items()}
        with torch.no_grad():
            logits = model(**inputs).logits
        predicted_class = model.config.id2label[logits.argmax().item()]
        return predicted_class

    tqdm.pandas()
    df['topic'] = df['text'].progress_apply(get_topic)

    df.to_csv(output_path, index=False)
    print(f"Saved results to {output_path}")
Classification Results
from coalition and party programs
Party Topic Text
FDP 701 - Labour Groups: Positive PRESSE- UND MEINUNGSFREIHEIT SCHÜTZEN Presse- und Meinungsfreiheit sind Grundpfeiler unserer liberalen und offenen demokratischen Gesellschaft.
FDP 107 - Internationalism: Positive Zeitarbeit sichert Teilhabe für die Beschäftigten und Flexibilität für die Unternehmen.
GRUENE 304 - Political Corruption Wir wollen insbesondere die demokratische Kontrolle 81 Ka pi te l 2 bei der regulatorischen Kooperation verbessern.
GRUENE 110 - European Community/Union: Negative Sommercamps und Nachhilfe in den Kernfächern alleine werden nicht ausreichen, um die Folgen der Krise zu bewältigen.
KOALITION 606 - Civic Mindedness: Positive Geprüft wird die Errichtung einer Stiftung oder Gesellschaft, die den Rückbau der Kohleverstromung und die Renaturierung organisiert.
KOALITION 104 - Military: Positive Das Brennstoffemissionshandelsgesetz (BEHG), einschließlich der erfassten Brennstoffemissionen in der Industrie (industrielle Prozesswärme), wollen wir auf seine Kompatibilität mit einem möglichen ETS 2 überprüfen und gegebenenfalls so anpassen, dass ein möglichst reibungsloser Übergang gewährleistet ist.
SPD 606 - Civic Mindedness: Positive Wir werden eine nationale Leitstelle Mobilität einrichten, die die Erarbeitung regionaler Mobilitätspläne unterstützt und eine frühzeitige Beteiligung vor Ort sicherstellt.
SPD 110 - European Community/Union: Negative Wir stellen uns konsequent gegen Diskriminierung und Gewalt.
These samples were chosen at random
# Prepare examples dataset (assuming you have columns sentence_coal and sentence)
examples <- read_csv("sentences_with_topics.csv") %>%
  mutate(party = if_else(str_detect(party, "koalition"), "KOALITION", toupper(party))) %>%
  group_by(party) %>%
  slice_sample(n = 2) %>%
  ungroup() %>%
  select(party, topic, text)  # Ensure these columns exist in your data

# Create gt table in your style:
examples %>%
  gt() |>
  cols_label(
    party = md("**Party**"),
    text = md("*Text*"),
    topic = md("**Topic**")
  ) |>
  tab_header(
    title = md("**Classification Results**"),
    subtitle = "from coalition and party programs"
  ) |>
  tab_footnote(md("*These samples were chosen at random*")) 

Validate, validate, validate

Does it hold up? The classification results demonstrate several clear misclassifications that highlight limitations of the pretrained topic model in this context. Sentences related to Fuel Emissions (“Brennstoffemissionshandelsgesetz”) were classified as “104 - Military: Positive”, sentences against discrimination and violence as “110 - European Community/Union: Negative”. Freedom of speech was tagged as “701 - Labour Groups: Positive”.

As Grimmer and Stewart (2013) famously emphasize, when working with automatically classified texts you must “validate, validate, validate.” The ManifestoBERTa sentence model authors report a top-1 accuracy of 57% and an F1 macro score of 0.45; while seemingly reasonable, these aggregate metrics may mask idiosyncratic errors and biases when applied to new texts or domains (Burst 2024). They are meaningless without the comparison to the human gold standard. How well would a human fare at classifying this? Hence, thorough validation, including comparison to manual coding, qualitative inspection, and robustness checks, is essential to ensure trustworthy inferences. Without such validation, reliance solely on published performance measures risks drawing misleading conclusions from noisy or systematically biased automatic classifications (Grimmer and Stewart 2013).

By doing a simple sanity check on a few sentences, I can already determine that this model makes too many errors to be used for my task as is.

Why did this happen? Pretrained models work brilliantly when applied to data similar to their training corpus. But the manifestoberta model was trained on a specific structure: coded sentences from official party manifestos in the Manifesto Project database. Our documents—extracted from PDFs, split by tidytext, with all their formatting quirks—were different enough that the model couldn’t reliably generalize. With a little more time, it might make sense to take a pre-trained transformer that is specialized for the german language and fine-tune it only on the seven main topic domains and create separate classifiers within these.

The Pivot: Using Actual Manifesto Project Data

Rather than fighting with the pretrained classifier, since a few years have passed, by now the Manifesto Project provides the actual coded sentences from these same party programs, already classified by trained human coders (Lehmann et al. (2025)).

This approach has the obvious advantage that it is very reliable: Human-coded classifications from the Manifesto Project are the gold standard for this type of analysis. The trade-off is that I need to re-do the embedding and similarity matching using the Manifesto Project’s sentence-level data rather than my PDF extractions. But this potentially improves the analysis as the Manifesto data is cleaner and more consistently structured.

Loading the Manifesto Project Data

Code for loading the data
library(manifestoR)
library(manifestoEnhanceR)

if (!file.exists("manifesto_data.fst")) {
  # Load German documents from 2021
  key <- Sys.getenv("manifestoAPIkey")
  ger_corpus <- mp_corpus(
    countryname == "Germany" & edate > as.Date("2020-01-01"),
    apikey = key
  )

  ger_docs <- as_tibble(ger_corpus)

  # Get metadata
  cmp <- mp_maindataset(south_america = FALSE)
  meta <- cmp %>%
    filter(countryname == "Germany" & edate > as.Date("2020-01-01")) %>%
    transmute(
      manifesto_id = paste0(party, "_", date),
      party,
      partyname,
      partyabbrev
    ) %>%
    unique()

  ger_docs <- left_join(ger_docs, meta)

  # Extract sentence-level data with topic codes
  ger_enhanced <- ger_docs %>%
    mutate(data = map(data, enhance_manifesto_df)) %>%
    unnest(data)

  fst::write_fst(ger_enhanced,"manifesto_data.fst")
} else {
  ger_enhanced <- fst::read_fst("manifesto_data.fst")
}

The analysis here is made possible by using two wonderful R packages that allow querying the Manifesto Project database directly via its API: manifestoR and manifestoEnhanceR. Now, each sentence has a cmp_code indicating its policy domain according to the Manifesto coding scheme.

This setup is incredibly handy. Having seamless, programmatic access to a rich collection of party manifestos and their detailed coding without manual downloads or preprocessing saves countless hours. It is quite cool to have so many party manifestos at our fingertips with such ease!

Thanks to the Manifesto Project team for maintaining this comprehensive database and offering open API access, and to Hauke Licht for creating and sharing the manifestoEnhanceR package that enhances data usability with sentence-level metadata.

Re-embedding with Manifesto Data

Now I repeat the embedding process from Part I, but using the Manifesto Project sentences instead of my PDF extractions. The details are discussed in Part I, so I won´t go into it again.

Embedding manifesto sentences
from sentence_transformers import SentenceTransformer, util
import pandas as pd
import torch
from tqdm import tqdm

output_path = "manifesto_similarity_results.csv"

if os.path.exists(output_path):
  data = pd.read_csv(output_path)
else:
  # load local GPU on mac M3
  device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")

  # Load model
  model = SentenceTransformer('sentence-transformers/paraphrase-multilingual-mpnet-base-v2').to(device)

  # Encode coalition sentences
  coalition_text = r.sentences[r.sentences['party'] == 'koalition'].reset_index(drop=True).text.values
  embedding_coal = model.encode(
      coalition_text,
      convert_to_tensor=True,
      show_progress_bar=True,
      device = device
  )

  # Prepare output DataFrame
  data = pd.DataFrame()

  # Define parties to process
  parties = ['SPD', '90/Greens', 'FDP']
  target_date = 202109

  for party in tqdm(parties, desc="Processing Parties"):
      # Get party sentences and reset index
      party_sentences = r.ger_enhanced[
              (r.ger_enhanced['partyabbrev'] == party) & 
              (r.ger_enhanced['date'] == target_date)
          ].reset_index(drop=True)
      
      party_text = party_sentences['text'].values
      party_codes = party_sentences['cmp_code'].values
      # Encode party sentences
      embedding_party = model.encode(
          party_text,
          convert_to_tensor=True,
          show_progress_bar=True,
          device = device
      )
      
      # Compute cosine similarity matrix (party sentences × coalition sentences)
      cosine_scores = util.pytorch_cos_sim(embedding_party, embedding_coal).cpu().numpy()
      
      # Prepare DataFrame for all pairwise combinations including cmp_code
      rows, cols = cosine_scores.shape
      results = pd.DataFrame({
          'party': party,
          'party_sentence': party_text.repeat(cols),
          'topic_code': party_codes.repeat(cols),   # repeat to match sentences
          'coalition_sentence': list(coalition_text) * rows,
          'similarity_score': cosine_scores.flatten()
      })
      
      # Filter to keep only the best party sentence match per coalition sentence
      top_matches = (
          results.sort_values('similarity_score', ascending=False)
                .groupby(['party', 'coalition_sentence'])
                .head(1)
      )
      
      data = pd.concat([data, top_matches], ignore_index=True)

  # Save combined results with topic codes
  data.to_csv("manifesto_similarity_results.csv", index=False)

Mapping Topics to Party Strengths

Now we can ask the interesting questions: Do sentences attributed to each party fall into their expected policy domains?

The Manifesto coding scheme groups topics into seven main domains:

  1. External Relations (Foreign policy, military, international cooperation)
  2. Freedom & Democracy (Civil rights, democracy, constitutionalism)
  3. Political System (Government efficiency, decentralization, political authority)
  4. Economy (Market regulation, planning, protectionism, corporatism)
  5. Welfare & Quality of Life (Social justice, education, environment)
  6. Fabric of Society (National identity, multiculturalism, tradition)
  7. Social Groups (Labor, agriculture, demographic groups)

We’d expect:

  • Greens: Strong in Domain 5 (especially environmental protection)
  • SPD: Strong in Domains 5 and 7 (welfare expansion, labor groups)
  • FDP: Strong in Domain 4 (free market economy, deregulation)

However, as discussed in Part I, the automated attribution method that I chose, is not perfect. In cases where similarity scores are very high, it mostly gets it right, everywhere else, its a bit murky. To properly validate it, I would have to code a manual sample, which is a bit too labor intensive, which is why I am proceeding with a conservative approach. I will only evaluate sentences that have a similarity score of .9 or higher. This will exclude about 90% of the data. Hence statements can be made with reference only to the subsample of sentences that are almost copied entirely from party programs.

Topic Distribution across parties

In the 2021 German federal election, the Social Democratic Party (SPD) received 25.7% of the vote, making them the largest party for the first time since 2002. Alliance 90/The Greens secured 14.7%, achieving their best historical result at the time. The Free Democratic Party (FDP) obtained 11.4% of the vote, making small gains compared to previous elections. These vote shares reflect the relative strength of each party going into coalition talks after the election. However, this does not seem to be reflected in the topic distribution.

We can see that overall, the topics Welfare and Quality of Life and External Relations seem to dominate the direct contributions from party programs. In all topics, The Greens and the SPD seem to dominate, even in the topic of Economy, with the FDP only contributing a small share.

Figure 1: Topic Domain Distribution in Coalition Agreement
library(ggplot2)
library(dplyr)
library(patchwork)
library(readr)
library(paletteer)

# Load similarity results
results <- read_csv("manifesto_similarity_results.csv")

# Select the best match (highest similarity) for each coalition sentence
best_matches <- results %>%
  group_by(coalition_sentence) %>%
  slice_max(similarity_score, n = 1) %>%
  filter(similarity_score>=.9) |> 
  ungroup()

# Map topic codes to broader domains
get_domain <- function(code) {
  domain_num <- substr(as.character(code), 1, 1)
  case_when(
    domain_num == "1" ~ "1 External Relations",
    domain_num == "2" ~ "2 Freedom & Democracy",
    domain_num == "3" ~ "3 Political System",
    domain_num == "4" ~ "4 Economy",
    domain_num == "5" ~ "5 Welfare & Quality of Life",
    domain_num == "6" ~ "6 Fabric of Society",
    domain_num == "7" ~ "7 Social Groups",
    TRUE ~ "Missing"
  )
}

best_matches <- best_matches %>%
  mutate(domain = get_domain(topic_code))

# Calculate proportions by domain and party grouped within each domain
domain_party_props <- best_matches %>%
  count(domain, party) %>%
  ungroup() %>%
  mutate(total = sum(n)) %>%
  group_by(party, domain) %>%
  mutate(prop = n / total) %>%
  ungroup()

domain_party_props_in_topic <- best_matches %>%
  count(domain, party) %>%
  group_by(domain) %>%
  mutate(total = sum(n)) %>%
  group_by(party, domain) %>%
  mutate(prop = n / total) %>%
  ungroup()

# Use Frida Kahlo palette colors from paletteer for the parties
frida_colors <- c(
  "FDP" = paletteer::paletteer_d("lisa::FridaKahlo")[4],
  "90/Greens" = paletteer::paletteer_d("lisa::FridaKahlo")[2],
  "SPD" = paletteer::paletteer_d("lisa::FridaKahlo")[5]
)

# Common theme to remove duplicate titles and axis y text from second plot
common_theme <- theme_minimal(base_size = 14) +
  theme(
    legend.position = "bottom",
    axis.title.x = element_text(size = 12),
    axis.title.y = element_blank(),
    axis.text.y = element_text(size = 10)
  )

# First plot: proportions across whole agreement
p1 <- ggplot(domain_party_props, aes(x = prop, y = domain, fill = party)) +
  geom_col(position = "stack", width = 0.75) +
  scale_fill_manual(values = frida_colors) +
  common_theme +
  labs(
    title = "Topic Domain Distribution in Coalition Agreement",
    subtitle = "by party according to best match between program and agreement",
    x = "Proportion across agreement",
    fill = "Party"
  )

# Second plot: proportions within each topic domain
p2 <- ggplot(domain_party_props_in_topic, aes(x = prop, y = domain, fill = party)) +
  geom_col(position = "stack", width = 0.75) +
  scale_fill_manual(values = frida_colors) +
  common_theme +
  theme(axis.text.y = element_blank()) +
  labs(
    subtitle = "by party proportion within topic domain",
    x = "Proportion within topic domain"
  )

# Combine plots with unified legend and shared x-axis label
combined_plot <- (p1 + p2) + 
  plot_layout(guides = "collect") & 
  theme(legend.position = "bottom")

# Add shared x-axis label using patchwork plot_annotation
combined_plot <- combined_plot + 
  plot_annotation(
    caption = "Proportions show party influence based on best sentence matches"
  ) & 
  theme(plot.caption = element_text(hjust = 0.5, size = 12, face = "italic"))

combined_plot

Sub-topic distribution - a more detailed look

Looking at the more detailed topics beyond the main policy domains, the initial hypothesis can largely be confirmed, though the finding is somewhat tentative due to limited data. The Greens show strong positive contributions related to environmentalism and sustainability, as expected. However, the SPD and FDP also have sentences in these areas, albeit to a lesser degree. True to their “Free Democrats” name, the FDP’s highest contributing subtopic is indeed “freedom,” and administrative efficiency, a core campaign issue for the FDP, also appears reflected in the coalition agreement through their party sentences.

Overall, this suggests that coalition agreements are at least partially shaped by the parties’ dominant core issues. Interestingly, vote share does not seem to directly correlate with the volume of contributions, as the Greens have punched well above their weight in terms of textual influence. That said, these results rest on a limited dataset and should be interpreted with caution.

Merging Manifesto Code Labels
topic_match <- best_matches |> 
  count(party, topic_code) |> 
  rename(code = topic_code)

codes <- read_csv("https://manifesto-project.wzb.eu/down/data/2020a/codebooks/codebook_categories_MPDS2020a.csv") |> 
  select(code, label)

topic_match <- topic_match |> 
  left_join(codes)
Figure 2: Topic Domain Distribution in Coalition Agreement
# Identify top 3 to 5 topics per party
top_topics <- topic_match %>%
  group_by(party) %>%
  slice_max(order_by = n, n = 5, with_ties = T) %>%
  ungroup() |> 
  filter(!is.na(label))

# Restrict original data to only those top topics
filtered_data <- topic_match %>%
  semi_join(top_topics, by = c("party", "code"))

# Optional: reorder topics by count within each party
filtered_data <- filtered_data %>%
  group_by(party) %>%
  ungroup()

# Plot
ggplot(filtered_data, aes(x = n, y = reorder_within(str_wrap(label, 18),n, party), fill = party)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~party, ncol = 3, scales = "free_y") +
  scale_fill_manual(values = frida_colors) +
  scale_y_reordered() +
  labs(
    x = "Count of matched sentences",
    y = "Topic",
    title = "Top Topics by Party in Coalition Agreement",
    subtitle = "Five most frequent topics per party",
    caption = "with ties"
  ) +
  theme_minimal() +
  theme(
    axis.text.y = element_text(size = 9),
    strip.background = element_blank(),
    strip.text = element_text(face = "bold")
  )

Lessons learnt

  1. Domain shift and small textual markers matter more than you think: The gap between training data (clean, structured Manifesto database entries) and application data (PDF extractions with formatting artifacts) was enough to break the classifier.

  2. Conservative thresholds are your friend: By requiring 0.9+ similarity, I sacrifice 90% of the data but gain confidence in the remaining 10%. In exploratory research, this trade-off is often worthwhile.

Future Directions

1. Validating Attribution with LLMs

The next step is to validate whether the high-similarity matches actually represent genuine text reuse vs. coincidental similarity. One could use local LLMs via rollama to:

  • Present sentence pairs to Llama 3.1 or Mistral
  • Ask: “Does sentence B appear to be derived from sentence A?”
  • Compare LLM judgments against similarity scores
  • Identify false positives where similarity is high but influence is questionable

2. Fine-Tuning a Domain-Specific Classifier

Rather than using the pretrained manifestoberta model, I could:

  • Start with a German language model (e.g., german-gpt2 or gbert-large)
  • Fine-tune on the seven main Manifesto domains rather than 56 granular topics
  • Use the Manifesto Project’s coded sentences as training data
  • Create an ensemble of domain-specific classifiers

This would likely improve accuracy while remaining computationally feasible.

3. Sentence-Level Negotiation Analysis

With validated attributions, I could investigate:

  • Bargaining patterns: Which topics show more “mixed” sentences (moderate similarity to multiple parties)?
  • Compromise detection: Can we identify sentences that blend language from multiple programs?
  • Sequential influence: Do sentences from one party tend to cluster together in the agreement?

References

Burst, Pola AND Franzmann, Tobias AND Lehmann. 2024. “Manifestoberta. Version 56topics.sentence.2024.1.1.” Berlin / Göttingen: Wissenschaftszentrum Berlin für Sozialforschung / Göttinger Institut für Demokratieforschung. https://doi.org/10.25522/manifesto.manifestoberta.56topics.sentence.2024.1.1.
Grimmer, Justin, and Brandon M Stewart. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3): 267–97. https://doi.org/10.1093/pan/mps028.
Lehmann, Pola, Simon Franzmann, Denise Al-Gaddooa, Tobias Burst, Christoph Ivanusch, Sven Regel, Felicia Riethmüller, Andrea Volkens, Bernhard Weßels, and Lisa Zehnter. 2025. “The Manifesto Data Collection. Manifesto Project (MRG/CMP/MARPOR). Version 2025a.” Dataset. https://doi.org/10.25522/manifesto.mpds.2025a.

Citation

BibTeX citation:
@online{bochtler2025,
  author = {Bochtler, Paul},
  title = {Part {II:} {Who} Wrote the Last Coalition Agreement in
    {Germany?}},
  date = {2025-09-28},
  url = {https://www.paulbochtler.de/blog/2025/01/},
  langid = {en}
}
For attribution, please cite this work as:
Bochtler, Paul. 2025. “Part II: Who Wrote the Last Coalition Agreement in Germany?” September 28, 2025. https://www.paulbochtler.de/blog/2025/01/.