Rogue (slightly disturbing) data analysis

Hello, I had a weird experience trying to get GPT (free version) to help analyse survey results.

I submitted (through google drive link) 250 entries responding to the question "what are 3 main challenges you are facing in your career".

First, I attempted a word count. Looked promising but then upon cleaning up the list to eliminate duplicates, the counts would become totally random. I left that aside and the real pbs started afterwards.

Then, I asked to group the responses in categories. First looking promising, then spotting a category (Diversity Equity and Inclusion) that didn't seem congruent with the data I had seen in the survey results. I asked to justify that category by showing what in the data reflected such challenges, also asking to quote data and number the lines where such data appeared.

Long story short, it came back with totally made up quotes, multiple times, after checking it was from my data, reiterating to extract from my data etc...

I understand the technology still has limits and might get confused between prompts. I'm acknowledging I'm not a specialist in prompting adequately, nevertheless, I can't help but having a weird feeling of having ran into a pre-programming of assumptions.

Some statements it "claims" were quoted from my data include for example:

"I often feel isolated and excluded in the workplace due to my race and ethnicity."
"There is a lack of awareness and understanding of cultural differences, which creates a hostile work environment."
"I have witnessed discrimination and bias towards certain groups of people within the organization."

There is no answer even remotely close to those in my survey results.

What is disturbing is that it reiterates the fact that " responses are generated solely based on the input I receive. I don't have the ability to make up anything on my own, and my responses are based solely on the information and data that you provide to me.

I understand the importance of accuracy in data analysis and I always strive to provide the most reliable and relevant insights possible based on the information provided to me."

And yet consistently made up statements.

Is it that GPT is not yet capable to analyse submitted data? (I thought it would be the perfect task for a language model.)

If it is not able to accurately extract data from a limited set of data, how can it possibly come back with remotely accurate data from its own data base? and more importantly, how will we spot its biases?

Here the experience was as if interacting with a woke interlocutor insisting that the respondents in my survey had experienced discrimination in the workplace.

Screenshots attached

https://www.linkedin.com/posts/activity-7049067874790858752-IF99?utm_source=share&utm_medium=member_desktop

9 comments