In Defense of LLMs in Data Science: What ChatGPT Can and Can’t Do for Your Data Science Career | by Murtaza Ali | Apr, 2024

OpinionChatGPT can take your data science game to the next level — if you know how to use it.An image of a data scientist using ChatGPT, generated by ChatGPT.When ChatGPT first came out in November 2022, the LLM (Large Language Model) craze was immense. Straight out of Tony Stark’s lab, we finally had an artificial intelligence that communicated like a human. Even for the tech-initiated, its capabilities were shocking at first, almost frightening. Granted, LLMs had been around for some time by then, but GPT-3 took things to a new level.But then, the issues started to show themselves. ChatGPT hallucinates, said machine learning researchers — it would often make things up and cite “sources” that did not exist. ChatGPT is a disaster for academic integrity, cautioned ethicists — students could cheat in easier ways than ever. And, arguably most importantly, ChatGPT is not ethically sound, warned AI ethics researchers — much of its training data was full of bias, and this reflects in its responses.This leads to a dilemma. ChatGPT is powerful, yes — it certainly can do things. But at the same time, it is far from perfect. So should we use it? And if so, how?I acknowledge the arguments against ChatGPT above. In fact, in many cases, you’ll find me actively making them. My own lab at the University of Washington is ripe with research concerning the ethics of LLMs.That said, I maintain it would be foolish to ignore them altogether. Technology is advancing, and we must advance with it. We can only combat the issues with LLMs by actively using them in effective ways to learn what must be changed, not by ignoring them altogether.Every field has its own unique drawbacks and benefits in this new technological age. In this article, I’ll discuss three ways in which you, the aspiring data scientist, can harness the power of ChatGPT. We’ll talk about what you can do, and, perhaps more importantly, what you can’t.I want to consider this dilemma from two different perspectives. First, I’ll give a technical example, and then I’ll provide a broader, subtler perspective.