I'm working on an AI Data Analyst in MLJAR Studio and found this small but interesting bug while testing a medical data use case. The AI-generated Pandas code looked correct and executed without errors, but the dataframe was misaligned. The first patient had 148 pregnancies because glucose values were shifted into the Pregnancies column. The interesting part for me was that the bug was caught only because both the displayed dataframe and an extra LLM output checking step were reviewed.
User is not touching notebook at all, user just ask questions in natural language, and AI is using Python to compute answer, the ipynb notebook format is used to save the conversation.
Human in the loop in data analysis is really challenging task. We provide Python code for inspection, so user can check details how results were produced. Additionally, we run AI on results - user need to check the outputs and AI provided insights.
Thanks for sharing! MLJAR Studio was created for people with domain knowledge but not much technical expertise. For them, setting up a Python environment, installing required packages, configuring Jupyter Lab, the MCP server, and Claude Code might be technically demanding.
MLJAR Studio is a desktop application available for Windows, MacOS, and Linux. MLJAR Studio creates a Python environment for the user and installs all required packages. The user can focus on data rather than fighting technical challenges.
The goal of MLJAR Studio is to make it easy to analyze data for people with large domain knowledge but lack of programming skills. We do not focus on notebooks. Python notebook for us is compute and store layer. Our main interface is chat with AI data analyst. The conversation can be opened as classic notebook, but the main UI is simple chat.
Thank you! I will check them out. It is worth to mention that MLJAR Studio is a desktop application, which is easy to install. It is running locally, and support local LLMs so all data stay safe.
Python notebooks are not reproducible when used by humans. When notebook format is used to store conversations for AI data analysis, it preserves the chat history and is ideal for reproducibility.
I like Ollama Cloud service (I'm paid pro user), because it let me test several open source LLMs very fast - I dont need to download anything locally, just change the model name in the API. If I like the model then I can download it and run locally with sensitive data. I also like their CLI, because it is simple to use.
The fact that they are trying to make money is normal - they are a company. They need to pay the bills.
I agree that they should improve communication, but I assume it is still small company with a lot of different requests, and some things might be overlooked.
Overall I like the software and services they provide.
We built a benchmark to evaluate LLMs on real data analysis workflows. Instead of single prompts, each task is a sequence of prompts (steps). It is similar to how a human data analyst works in practice. Each run is saved as full python notebook, including prompts, code and outputs. We evaluated runs across task completion, code correctess, output quality, reasoning and reliability. Each workflow is execuuted multiple times and scored automatically.
Modern LLMs perform very well on individual steps. The benchmark currently inludes 23 workflows from different data analysis tasks (EDA, ML, NLP, statistics ...). The top-3 models across the 23 workflows, gpt-oss:120b scored 9.87/10, followed by gpt-5.4 at 9.65/10, glm-5.1 at 9.48/10. Which is very high in my opinion. The results show that modern LLMs perform very well on data analysis tasks. All feedback is welcome! I uploaded all notebooks for each model https://github.com/pplonski/ai-for-data-analysis
reply