There’s been a lot of speculation about the future of AI Agents & agentic applications, much of it vague or overly optimistic.
After doing my own research and prototyping, I’ve found one way to avoid falling into the trap of false forecasts: focus on the technology we have now.
Look at how it’s evolving, the directions it’s pulling us, and build projections from that. It’s not about dreaming up distant possibilities but understanding the path that’s already being paved, with all its limits and quiet promises.
The Shift From Large Language Models to Smaller, Vision-Enhanced Models & The Rise of AI Agents— Version 6
In recent developments within AI research, the focus has gradually shifted from reliance on Large Language Models (LLMs) to more agile, adaptable Smaller Language Models (SLMs).
Which in turn, increasingly incorporate multimodal capabilities such as visual understanding. These foundation models, which blend natural language processing with vision, overcoming many of the limitations of LLMs such as high computational cost and dependency on extensive datasets.
AI Agents are agentic software applications which primarily live within a digital environment.
Agentic applications are premised on basic LLM related skills like decomposing a problem into smaller sub-tasks, and leveraging reasoning to move from task to task, by observing and assessing each task.
The Self-Discover approach allows Large Language Models Automatically Compose Their Own Reasoning Structures.
Balancing Latency, Interpretability, and Consistency in Hallucination Detection for Conversational AI
If you find any of my observations to be inaccurate, please feel free to let me know…
I appreciate that this study focuses on introducing guardrails & checks for conversational UIs.
When interacting with real users, incorporating a human-in-the-loop approach helps with data annotation and continuous improvement by reviewing conversations.
It also adds an element of discovery, observation and interpretation, providing insights into the effectiveness of hallucination detection.
The architecture presented in this study offers a glimpse into the future, showcasing a more orchestrated approach where multiple models work together.
The study also addresses current challenges like cost, latency, and the need to critically evaluate any additional overhead.
Using small language models is advantageous as it allows for the use of open-source models, which reduces costs, offers hosting flexibility, and provides other benefits.
Additionally, this architecture can be applied asynchronously, where the framework reviews conversations after they occur. These human-supervised reviews can then be used to fine-tune the SLM or perform system updates.
What I love about a two-by-two matrix is that it provides a clear, structured way to visualise and analyse complex relationships between two sets of variables or factors.
Considering Language Models & Conversational AI, by breaking down a problem into four distinct categories, it provides a simplified view to make it easier to grasp concepts. I came across these diagrams in the OpenAI & Ragas documentation.
Applications based on LLMs are evolving & the next step in this progression of AI Agents are Agentic Applications. Agentic applications still have a Foundation Model as their backbone, but have more agency.
Agentic applications are AI-driven systems designed to autonomously perform tasks and make decisions based on user inputs and environmental context.
These applications leverage advanced models and tools to plan, execute, and adapt their actions dynamically.
By integrating capabilities like tool access, multi-step reasoning, and real-time adjustments, agentic applications can generate and complete complex workflows and provide intelligent solutions.
I must add that while many theories and future projections are based on speculation, I prioritise prototyping and creating working examples. This approach grounds commentary in practical experience, leading to more accurate future projections.
A manual analysis was conducted on 2,364 sampled questions, leading to the development of a taxonomy of challenges that includes 27 distinct categories, such as:
- Prompt Design,
- Integration with Custom Applications, and
- Token Limitation.
Based on this taxonomy, the study also summarizes key findings and provides actionable implications for LLM stakeholders, including developers and providers.
The Six Challenges Working With LLMs
1. Automating Task Processing: LLMs can automate tasks like text generation and image recognition, unlike traditional software that requires manual coding.
2. Dealing with Uncertainty: LLMs produce variable and sometimes unpredictable outputs, requiring developers to manage this uncertainty.
3. Handling Large-Scale Datasets: Developing LLMs involves managing large datasets, necessitating expertise in data preprocessing and resource efficiency.
4. Data Privacy and Security: LLMs require extensive data for training, raising concerns about ensuring user data privacy and security.
5. Performance Optimisation: Optimising LLM performance, particularly in output accuracy, differs from traditional software optimisation.
6. Interpreting Model Outputs: Understanding and ensuring the reliability of LLM outputs can be complex and context-dependent.
The reason OpenAI bought Rockset is to enhance their retrieval infrastructure, and move closer to the implementation side of Generative AI.
With more models being open-source, and the focus small language modes which many are open-sourced, it seems like the market value is shifting from Zone 4 to Zone 5.
Only a few months ago, it seemed like OpenAI has captured the LLM market and no-one will ever be able to compete.
Then Meta AI open-sourced numerous very capable models.
Followed by stellar work from Microsoft on imbuing Small Language Models (SLMs) with enhanced reasoning capabilities, and by open-sourcing models.
Organisations moved their focus from gradient approaches (fine-tuning) to adapt models to their environment, to non-gradient approaches, like RAG and prompt engineering techniques.
These non-gradient frameworks demand vector technology, data centric tooling and RAG frameworks.
LLMs are becoming a mere utility and innovation is taking place in building applications which are powered by LLMs under the hood.
At last the market is moving to toward a Data centric approach.
This acquisition also brings OpenAI closer to the developer community with Rockset’s integration with LangChian & LlamaIndex.
AI agents capable of navigating screens within the context of an operating system, particularly in web browsers and mobile iOS environments.
As I’ve discussed, the architecture and implementation of text-based AI agents (Agentic Applications) are converging on similar core principles.
The next chapter for AI agents is now unfolding: AI agents capable of navigating mobile or browser screens, with a particular focus on using bounding boxes to identify and interact with screen elements.
Some frameworks propose a solution where the agent has power to open browser tabs and navigate to URLs, and perform agent tasks by interacting with a website.
Also known as graph data, defined by nodes and edges, this concept is becoming increasingly relevant in the context of Agentic Applications, also referred to as AI Agents. Traditionally, people approach development through no-code, low-code, or pro-code methods. However, when it comes to AI Agents, there’s been a notable shift toward flow or graph representations. But why is that?
After doing my own research and prototyping, I’ve found one way to avoid falling into the trap of false forecasts: focus on the technology we have now.
Look at how it’s evolving, the directions it’s pulling us, and build projections from that. It’s not about dreaming up distant possibilities but understanding the path that’s already being paved, with all its limits and quiet promises.