I: INTRODUCTION & PROJECT OVERVIEW
Adam Steidley
Welcome to The Check with Joseki Tech. I'm Adam Steidley, the President of Joseki Tech. This week, we're joined by Dan, our Vice President of Engineering, and Campbell, our intern this summer.
Dan Bush
Yeah, I'm Dan. I'm the Vice President of Joseki Technologies, and I'm here to learn what Campbell did on her interesting internship. She kept me up to speed for the first couple of weeks she was in and then disappeared into a hole. So, I'm very interested to see what she did for her internship.
Campbell Drahus
Hi, I'm Campbell. I am a current student at Rensselaer Polytechnic Institute, where I'm studying mathematics with a focus on operations research. I had the honor of being an intern at Joseki Tech this summer, and I'm here to speak about it.
Adam Steidley
Thanks, Campbell. We thought it would be good for you to work on the internship with us. One of the things we did was we placed you with one of our major customers because while we've certainly got the skills to be working on these things and helping to mentor you and give you some exposure to this stuff, we don't have our own data set. So, for all the kind of fun that we're going to do, we need a data set. We need a customer who's going to do some real business applications. So, the final project you worked on was the big culmination of the summer. Can you talk about the goal of that project?
Campbell Drahus
Of course, yeah. Our client's sales team approached us regarding the issue of their sales reps spending a little too much time prepping for calls, you know, gathering information about the customer's previous history with the company and whatnot. Also, some of the newer account managers with this company had issues maintaining and continuing a conversation. They didn't have the language to have it flow naturally. So, as a solution, I created an AI-powered calling script generator that took in a specific customer's data and their purchasing histories and would produce a calling script based on that. These customers were also classified into different categories based on their purchasing history, such as declining or growing, to help further tailor the script to their purchasing patterns.
And so we had a that be classified through a process, RFM: Recency Frequency Monetary, which is essentially just a way to classify a customer based on the recency of their purchases. How frequently are they purchasing with us, and how much money are they spending? Overall, I got it to create multiple versions of scripts that would change the language used and alter what is being spoken about. And then, it all relates to producing a sale and getting the sales rep to have the language to come in and speak to a customer in an amazing and hopefully sales-inspiring way.
Adam Steidley
That's great. How long did this project last from start to finish?
Campbell Drahus
It lasted about a month and a half. We started in mid-July, and it came right up to the end of my internship in August.
Adam Steidley
That's cool. That sounds like a short span to get a result. Has it gone into the market yet? Are they using it in production now, or is that still rolling out?
Campbell Drahus
Not quite yet. We hope it will be available in the representatives' dashboard to prep for their calls. Each customer should have a box to pull up and see the script to make a call more efficient and quick.
Adam Steidley
Excellent. It'd be super interesting if we could get a follow-up with the customer on how much lift we're seeing.
Campbell Drahus
Yeah, definitely.
Adam Steidley
Now, Dan asks an insightful question.
II: AI IMPLEMENTATION & TESTING
Dan Bush
How does it all work? Where's the magic?
Campbell Drahus
The magic is in artificial intelligence. I got to utilize OpenAI's API option within a Python script, and I could access the GPT 4.0 model to generate the script because, within some testing, we determined that GPT 4.0 was the best model for text generation.
Adam Steidley
What test did you do to land on 4.0?
Campbell Drahus
I was basically running the script a bunch of times, and I was personally going in and seeing, you know, what kind of language it was using and how well it was producing what I thought would be a good, you know, version of a script. Through that, I tested multiple different models to optimize which one would be the best for the specific project, and we landed on GPT 4.0 as well, as it's an easy model to access and use.
Dan Bush
That's an interesting progression because you started life in LLM and Snowflake. What transpired to get you from Snowflake to, I assume, ChatGPT 3.5 for maybe a couple of heartbeats and then 4.0? Can you talk me through that?
Campbell Drahus
You nailed the progression. Yes, so when I started with a llama model in Snowflake, I was going through it. I was sending through prompts and whatnot. And I was getting it to produce things, produce responses and whatnot. And every time it came back, it was always something almost up to par, but it was never quite using the right language or organizing it properly. And maybe that was a prompt error on my part. But after a while, we determined that the model wasn't working for us. So, we jumped to GPT 4.0 because my mentor was super familiar with that. And so, I continued with that route and discovered that it was much more efficient and better for the project I was working on.
Dan Bush
The same thing happens in our internal AI projects. We started with ChatGPT 3.5, and then it went to Turbo. Now, 4.0 seems the way to go. They made it cheaper and faster. So, progression is just there.
Adam Steidley
And were the counts that you're using in this project small enough that cost wasn't a consideration? How many records did you wind up processing?
Campbell Drahus
I ended up processing -- this was for my testing purposes -- I was doing one at a time where I would send in a rewards number, and it would produce whatever script for the customer in that the processing time was initially not so good. It took roughly a minute to produce a specific script, so a minute per script. So I had to go back in, and I did some finicking and fiddling with the token usage. So that's one way you can limit AI and how it will produce it: by limiting the tokens it's allowed to use. Through multiple rounds of testing, I found the threshold between how low I can limit the tokens and how that would compromise how the script would come out.
Through multiple rounds of testing, I found a good limit where I was happy with the response time, which went down to about 30 seconds per script. And also, on the flip side, maintain the integrity of how the script was.
Adam Steidley
And how many scripts are you producing in a nightly batch? Is it hundreds or thousands?
Campbell Drahus
It would be hundreds. On a weekly or daily basis, x amount of scripts would be created for the sales representatives to meet their quota for calling whoever they need to call because they needed to call the customers at least once every 60 days. We hoped to find a time when the customer would need to be called to meet the quota for being called.
Adam Steidley
So your counts aren't crazy high because there's, let's say, if there are 100 sales reps, they can only make maybe 30 calls a day. Sounds like a lot. So you don't have a huge count you need to do daily.
Because I know with the work that we've done, a lot of times we want to go back to three, five if we need to do thousands and thousands in a run, but we don't want to wait overnight for it, then it's going to be a question of what's the fastest. Then, we also start to run into questions about dollars and cents.
Campbell Drahus
Yes.
Adam Steidley
Right. We want to go to the cheaper model to save money.
Campbell Drahus
Yes, and we also considered that in the project I was doing. Ultimately, we determined that the quality of the script's creation took precedence over the generation's time. I still tried to optimize it as best as possible because that's always important. But the goal wasn't exactly to be efficient, like fast. But, yes.
Dan Bush
You don't necessarily have to be fast because your business process is batch-oriented.
Campbell Drahus
Yes, correct.
III: TECHNICAL DETAILS & TIPS FOR IMPROVEMENT
Dan Bush
But it is interesting: you can make the orders of magnitude faster. The prompt engineering, like trick number one, tells or constrains how many tokens it will generate. It's not so much about what it's going in; it's what it has to produce. We saw very similar things, where it took a minute to easily generate a prompt or result. And then, just by limiting the output down, it came under 30 seconds.
Adam Steidley
And, Dan, didn't we get better results limiting the tokens with hallucinations? Do you get more hallucinations with more tokens?
What are AI hallucinations?
AI hallucination is a phenomenon wherein a large language model (LLM)—often a generative AI chatbot or computer vision tool—perceives patterns or objects that are nonexistent or imperceptible to human observers, creating outputs that are nonsensical or altogether inaccurate.
Definition by IBM
Dan Bush
Yeah, we gave it less noise. And there's also the temperature you can play with, which will help with the hallucinations.
Adam Steidley
And what does the temperature do exactly?
Dan Bush
Temperature is how likely it is to adhere to the data point you gave it.
Adam Steidley
So, Campbell, how does adjusting the temperature affect the model results?
Campbell Drahus
So, the temperature is related to AI's vector embeddings to produce language. It's all based on math. Every word, every letter is, you know, matched, and it's given a score on how likely it is to be the next letter or the next word. So, temperature adjusts the threshold of where it will rope in other ways, such as scores and whatnot. So it would alter and make it hallucinate more, be more creative, you know, rather than.
Dan Bush
You just explained that better than the OpenAI documentation.
Adam Steidley
Okay. A higher temperature will give us more randomness because it'll do things that are less likely to be the next word. So it might just say zebra suddenly because it's a crazy high temperature where that wouldn't otherwise make sense. Okay, that's cool. So, okay, so you got the, you did some playing with the onesie twosies to get a model you liked and get some basic prompts you like. Now, just from the mechanics of this. So you're sitting on top of Snowflake. You're writing things in Python. What are the differences? For example, where does your script run in the infrastructure?
Campbell Drahus
So, as an option for just demoing it to other people, I had a quick Streamlit app written so that, you know, you could quickly see what it's producing. So I could pull up a tab on my screen and have it show the script.
Dan Bush
So it's essentially Python calling out to an, you know, using a Python library to call OpenAI.
Campbell Drahus
Yes. So I could pull that up on a tab and then show someone, or you could throw it into a server and have it. Someone quickly type in a number, and they can access the script.
Adam Steidley
Okay, so you did the testing, got the script ready to run locally, and got some happy results. And from when we had talked about this a bit before, you're pulling all the data you need about the customer directly from Snowflake, and then you're pushing your responses back to Snowflake, right?
Campbell Drahus
Yes, correct. So that they can then be accessed in the customer's dashboard when the sales rep goes to see what customer they're calling.
Dan Bush
So, in AI speak, that looks like a bit of RAG and one-shot completion.
Adam Steidley
So when you say RAG and RFM, what does some of that mean for those of us who aren't writing this code every day, like you and Dan?
Campbell Drahus
Yes, RAG: Retrieval Augmented Generation.
What is retrieval-augmented generation?Â
RAG is an AI framework for retrieving facts from an external knowledge base to ground large language models (LLMs) on the most accurate, up-to-date information and to give users insight into LLMs' generative process.
Definition by IBM
Dan Bush
Let's ensure retrieval augmented generation because RAG is the terminology for plugging that data. And you went and looked that up, retrieved it, and then augmented it into the prompt.
And then it's a one-shot because there was no conversation, and you didn't give it any sample, say, based on these three examples, generate and then ask the question. Right. So it was just go?
Campbell Drahus
Yes, it was go. As Dan explained, I gave the model the customer's information, which was pulled through an SQL query into my script and then produced based on that customer's information.
Adam Steidley
The idea behind this is that you pull the customer info. So, like we said, you're pulling RFM stuff. Did you pull specific products or categories of products they were buying?
Campbell Drahus
Yes. So part of the data I was pulling was obviously what products they were buying, what categories those products were bought in, because part of the sales goals, as the, you know, the people I was connecting with in the sales team, they like to expand where the customers are buying from, so they don't care as much, you know if this customer is only buying in this specific category, they're okay with that because they know that they're going to go back to that category, but their goal is to branch, have them branch out, have them sell more, sell them different kinds of things. So I could provide them, you know, the top and bottom departments they were buying from and how much money they were spending in each department. That would give them insight into expanding the customers' purchasing.
Adam Steidley
And was there anything, as you were playing with the prompts, that you found very interesting or surprising, and where adding this to the prompt gave you a much better result?
Campbell Drahus
Yes. So, Dan briefly mentioned that before telling the LLM AI to take its time and think about what it will respond to. It helps with how the responses are produced. I don't know how that works, but it's just one method I use to increase the accuracy of what the AI is giving me. And I also, I did a lot of iterations of prompts. That was one of the biggest, you know, time-consuming things of this project. I had to review my prompt and get it to do something else to see if that would work. How can I change my language so it's dead on every time? And that was a big.
That was a big challenge, but also very rewarding when I got the right words so that it would always produce what I wanted it to. And, yeah, that's just one of the many things I noticed.
Dan Bush
Another well-understood approach is to tell it to "take its time." If you're giving it instructions or procedures, ensure it follows them and doesn't skip any. Yes, it's very interesting how I have to tell the computer to do that, and it gives me better results when I do it. Computer, take your time.
Campbell Drahus
And also, saying "don't hallucinate" oddly helps.
Dan Bush
I have not tried that.
Campbell Drahus
Yeah: "Base everything you do in fact and on the information I give you." It helps.
Dan Bush
I have taken the output and said, "Here, can you improve this"?
IV: FUTURE IMPROVEMENTS & REFLECTIONS
Adam Steidley
So, if you had another four to six weeks on this project, what were some of the next things you'd like to play with and try to do with it?
Campbell Drahus
We are talking about changing the script based on how well we know the customer because the data was all saved based on how long we have communicated with this customer. How loyal are they to us? That was another version of the script they would like to see in the future. How can I have an introductory conversation with a customer versus a customer I've known for a long time or long-term? And that was something I would be interested in doing.
Dan Bush
So, if memory serves, you started with a generic script, then abandoned that approach and went to a more dynamic script. Can you talk to that briefly about what happened and why you changed direction?
Campbell Drahus
Initially, I was given a template as a basis to see how a sales rep would make a very generic, average call. Initially, I started with the prompt, which involved some template for the script. And as I was iterating through it and, you know, trying to expand the different versions of the scripts and whatnot based on how the customer is purchasing, I noticed that the template was holding me back in the regard that it would stick too closely to it where it wasn't giving me the right language or the right ideas anymore if it was a certain kind of customer. So, I moved away from that and allowed the LLM to produce whatever it wanted based on the information I gave.
Produce whatever information or whatever script it thought was necessary. I did give it the context of, hey, you're creating a calling script for a customer that a sales rep would, you know, potentially use. It took that context and background and then, on top of the information I gave it, produced something more tailored to the specific situation we're working in. Cool.
Adam Steidley
So, Dan, we've been playing a little bit with training the models. Is there a decent opportunity to take some of these results and use a feedback loop of what the customer buys after one of these calls to tune the good and bad scripts?
Dan Bush
The training and tuning we're still wrapping our heads around. Tuning is good when you want to change the sentiment or language slightly. You want to make it adhere to a format. But if you want to learn more about the subject matter, that must be baked into the model itself. So you're no longer talking about ChatGPT 4.0 but about making your LLM build that information into it. You don't add information to tuning.
Adam Steidley
I guess when I'm talking tuning, an exercise we could go through is looking at the scripts to try to tune it, to use more the jargon that the salespeople use to talk about the different categories and classes in their speak instead of a more generic speech. So you need that in your model, so it sounds a bit sales.
Dan Bush
Reps. If you like tuning or want to pick certain words, you love the word bespoke, right? And you like to see that come up all the time, then that's a good usage of fine-tuning because you can take those outputs, you can change them up, you can send it back in the model, you'll produce a private model for you and switch over to it, and you'll see your generation is now using bespoke more.
V: CLOSING & NEXT STEPS
Adam Steidley
So, Campbell, what does the year look like for you going into your next classes? What will you take from this experience into your coursework?
Campbell Drahus
I enjoyed playing around with AI and having the chance to apply it to a real-world situation. I was also able to help another team and see them be very happy with the results and whatnot, which was very rewarding for me. So, going forward, I do take a lot of computer science classes, particularly I'm going to be going into a research class where I will have free reign on whatever kind of research I want to do in data. So I'm very excited to take those skills of AI and work with super large data sets and whatnot and apply them to my new class. So, I'm super excited about that coming up.
Adam Steidley
Excellent. Well, we've had a lot of strong feedback from the client about your work. We're really happy about the success that you've had, and we're sure you'll have lots more in the future. So, thanks, everyone, for joining on the call today.Â