Election Prediction - Jamhooriat (Case Study)

The Opportunity

Politics is, without doubt, the most talked about topic in Pakistan and Worldwide. Elections are the heart of politics and the most anticipated event for anyone interested. Whether it’s General or Bye elections, the curiosity for who will win remains!

The idea behind this module is to accurately predict constituency-wise winners for elections. The top three candidates, most probable to win are returned with their win scores. Hence the model not only predicts the suspected positions of candidates but also lets us gauge the competition between rival contestants.

What we did

Collect and preprocess the past four general election results, i.e. 2002, 2008, 2013, and 2018.
Collect province-wise public polls and survey forms from the last two years, i.e. 2021 and 2022, to capture how public opinion varies with provinces.
Scrap PTI, PML-N and PPPP, the country’s top three political parties, related tweets from Twitter to perform a cognitive analysis of the mass’s behaviour towards these parties.
Calculate a winning score for all candidates contesting on a particular constituency seat, and return the top three contestants, most probable to win.

The Results

After training and optimizing, our election forecasting model predicts the winning candidate in each constituency with the following accuracies:

The actual Winner is predicted correctly 74.5% of the time.
The actual Winner lies in the top two predicted winners 77% of the time.
The actual Winner lies in the top three predicted winners 82.7% of the time.

74.5 %

Actual Winner Predicted Correctly

77 %

Actual Winner lies in the top two predicted winners

82.7 %

Actual Winner Lies in the top 3 predicted winners

How We did It

The general pipeline of the model starts by fetching and preprocessing Twitter data for the country’s top three political parties: PTI, PPPP, and PML-N. Sentiment analysis is performed on them, using NLTK’s sentiment analyzer VADER, and finally, a probability is derived from them using a few mathematical expressions.

Next, a district and year-wise win probability, and a combined party win probability is computed from the last general elections of the year 2002,2008,2013, and 2018.

Lastly, province-wise party popularity probability is retrieved from public survey polls. The model is then optimized to calculate the final party and candidate score by rigging results of constituencies where the competition is clearly one-sided and applying Bayesian Optimization over it, to train the parameters of our model.