Cohort Whatsapp Group analysis with Python, Plotly and Matplotlib.

José Christian Topete
MCD-UNISON
Published in
6 min readDec 1, 2020

--

Based on “Whatsapp Group Chat Analysis using Python and Plotly” by Saiteja Kura

The following exercise follows the behaviours of a Whatsapp Group created during the first semester by students belonging to the first cohort of the Data Science graduate degree at Sonora University.

But first… Libraries! 📚

As the project grew, libraries were added and removed, then added again… then removed again…

These are the ones that made it…

Note: Whatsapp Export DateTime RegEx 📅

  • iPhone exports WhatsApp chat in a different format than commonly found online so a different RegEx pattern is applied.

Chatting it up! 💬

Up next, the chat file is handled via filepath, opened, buffered and parsed into corresponding columns for date, time, author and message in a Pandas Data Frame.

Regular expresions aren’t a laughing matter

To be able to handle common laugh expresion in spanish, regular expresions were applied.

Cover your privates! 🙊

The following script was executed to mask the names of the message senders and the respective mentions in chat.

Data for Data Science conversations.

It’s a fresh group and we only talk data science… capicci? 🤌🏻

A whopping 777 messages have been shared and read by the group. Slow but steady. To obtain this figure it was as simple as storing the shape of the dataframe in a single dimension.

Image and sticker messages gain prevalence due to meme culture and it is important to handle them with priority when analizing social networking data. We assign a couple new variables for these two criteria and sum their respective column for further handling.

For emojis, we apply the previous splitCount function… Remember this one?

Metrics 📐

From 777 messages involving 183 emojis and 87 hilarious stickers, it’s time to see who is causing all this ruckus.

Pyplot is simple and handy for bar charts. We Pass the value counts from the Author for the plotting dimension.

And voila!

Conversation in detail 🗣

So now we see that Person 4, 5 and 6 (what are the odds…) are the ones filling phones with notifications. Let’s dive deeper!

To discover details for each author, right off the bat, we will make a rundown of every author in the Data Frame. A dataframe then will be populated by iterating and filtering bases on the author list.

Information like total messages, average words per message, stickers, unique emojis and links sent are useful to understad the dynamic of the group.

Emoji as a form of comunication? 👀

39 unique emojis have been used in the group to express their emotions. Let’s dive even deeper!

Joy to the world through emojis! 🌎

We obtain a total list of emojis by passing a list comprehension where we filter and obtain each Emoji in the main database.

The emoji_dict variable counts and adds each emoji into a dictionary it’s frequency value by the embedded value in the dictionary.

Then we sort in reverse by the value of the Column and store it into a Pandas Dataframe for a simple but efective display.

Most emojis used in the group express laughter and joy. What a cheery group of chaps and chapettes!

Here we make use of the Plotly Express library for an interactive and easy to understand visual representation of our emoji frequencies.

Round and round with the radar graph.

For the following graph we obtain a visual representation of the frequency of messages by day of the week.

We pass a new Data Frame where we store a new column with day of the week information. For this we employ the .dt method on out main Pandas DF. Then we apply out day of the week into the “day_of_date” column.

For a simpler quantification we also pass a column on our new dataframe with value of 1 for each row selected and sum it all by day.

We then pass our newest Data Frame through pyplot express for an easy and seamless plot of an easy to understand graph.

Fridays seem to be interestingly active.

WordCloud roundup

To finish off we utilize the Word Cloud library. A predefined rundown of stopwords as of now exists in the library. This rundown comprises of English Words. Since our local language is Spanish, I refreshed the stopwords list by adding additional words.

But first… we find an issue…

It appears that one of the most recent updates for WhatsApp on iOS sets image, sticker, gif and link export in a different way. For that we pass a few lines explicitly excluding terms that might muddle our Word Cloud.

We transform our Message column into a single string, define a few common stopwords used in the Spanish language and set our plotting configurations and… presto!

It definitely seems like a busy and dedicated group. References to classes and gratitude are quite prevalent, as well as scheduling, class platform names and other related keywords are present.

Conclusion

Dipping toes into simple data analysis with such powerful tools is fun and exciting. The versatility of Python and the outstanding modularity with Plotly and Matplotlib do provide extensive and granular representation of information that can make even the most menial piece of information interesting.

As for the group… What a bunch of good fellows!

--

--

José Christian Topete
MCD-UNISON

🇲🇽 Market research professional and Data Science Graduate student.