I’m assuming exploratory work from the exec, not something they make decisions with or put into production. If you need something you can trust, you typically need a lot of checks, including multiple humans.
I play a weird part at work near AI. I use it all the time but I’m the first person to warn everyone that it’s absolutely not trustworthy. No matter your prompt, the data, the guidelines built into it, the output is fundamentally flaky. But I use it while knowing that and working around it. Making the process reliable is a big part of my focus, and usually that means minimising the part the LLM plays. Checks and balances live where things are predictable.
I play a weird part at work near AI. I use it all the time but I’m the first person to warn everyone that it’s absolutely not trustworthy. No matter your prompt, the data, the guidelines built into it, the output is fundamentally flaky. But I use it while knowing that and working around it. Making the process reliable is a big part of my focus, and usually that means minimising the part the LLM plays. Checks and balances live where things are predictable.