Blog

Exploring Google Gemini from a Data Analyst’s perspective

Aron Saläng Data Consultant, Solita

Published 28 Oct 2024

Reading time 4 min

I’ve recently been exploring how you can use GenAI models to do data analysis. In June I tried using ChatGPT to solve some data analysis tasks that we use for training at Solita. You can read that post here but let me save you a click: I was impressed by the model’s capabilities. Since we have a tech-agnostic approach at Solita, my ambition is to try other models as well. For this post, I will look at Gemini from Google.

Did Google Gemini’s data analysis chops impress me the same way that ChatGPT’s did? Spoiler alert: it did not. I’ll get into it more below but first some general information about Gemini. It’s Google’s GenAI chat service that can be accessed at gemini.google.com. You need a Google account and just as it is for ChatGPT, basic functionality is free. If you want to be able to upload files, you will need a Gemini Advanced subscription priced at $20/month. Unlike ChatGPT they do however have a 30-day free trial.

The basic layout of the interface is quite familiar by now. You have a chatbox and an option to upload a file. The file types supported for data are csv, xlsx, xls, tsv, gsheet, gdocs.

My data file is an Excel file that contains sales data for a fictitious retail company. After upload and a simple instruction Gemini returns a summary with some stats about the contents of the file.

I would have preferred to get the metadata about the file but a quick prompt quelled my thirst for that.

Okay then, let’s get going. First I’ll just state my intentions.

We get a histogram of the quantity data points. Maybe not the first thing I would have wanted to know from the dataset but at least there’s a graph.

Next up, some more priming of how I would like the answers displayed.

Some contradictory information there but let’s start the querying and see what we’ll get.

Question 1

The first question is about costs, a measure that isn’t present in the dataset but can be inferred by subtracting Profit from Sales Gross.

We get an answer right away but Gemini seems to have forgotten my instruction about getting answers visualised. I’ll remind it.

Still no graph but there’s Python code that I can run. Looks like that can give me something visual. There’s a play button to run it.

Oops, that gives me an error. That’s not great. Well, at least I got an answer in the previous response and that was that Central region had the highest costs in 2016. That’s actually incorrect. Let’s probe some more about the formula that Gemini used for cost.

Okay, that’s not in any way right. The first question is a fail.

Question 2

This answer is correct. Maybe Gemini wasn’t warmed up properly for the first question. There’s no visualisation this time either so I’ll give a friendly reminder about that. Again.

Getting Python code again that results in yet another error message.

Question 3

The answer is wrong, it should be Saphhira Shifley with a discount amount of 2 636. The names in the returned list look suspiciously alphabetical and the provided amounts are all incorrect for those respective customers.

Question 4

Once more we get code that won’t run so we don’t know the answer. Let’s stick with a format that works, plaintext.

Incorrect again. The answer should be Spain with 90,08 hours. The answer for the UK is also wrong, it should be 98,48 h.

Question 5

This is getting tiresome.

Now the calculations for this answer in the returned list are actually correct. But unfortunately, Gemini fails to realise that having a percent difference vs a target that is negative is actually good. You are performing better than the target. Thus, the correct answer should be Same Day with -87%.

Question 6

That’s interesting because I know that there are 475 such orders. Fail.

Question 7

The presented reasoning to look at distinct combinations of Order ID and Customer Name is correct but the given answer of 10 000 is not. That’s how many rows there are in the dataset. The right answer is 5 121 orders.

That was my last analysis question and all in all, I’m not impressed by Google Gemini as a data analyst (at this time). Visualisations aren’t working properly, explicit instructions are forgotten, it gives incorrect answers and presents overt logical fallacies. Out of seven questions, it only got one right. As for visualising the answers, the only viz I managed to get out of Gemini was the initial, somewhat irrelevant, histogram of quantity that I got at the beginning of the session.

With my test of ChatGPT from June fresh in mind, this result was somewhat of a surprise. If the singularity is coming anytime soon, I have my doubts that it will spring out of Mountain View.

Author