There are bunch of tedious / routine tasks that AI can automate.
I think the big hurdle is mostly education / shift in mindset. We are so used to doing the task manually that most of us (including me) don't pause to think if I should be doing this or can I give to an agent.
I had browseros do a bunch of data validation for me in my Dolibarr ERP system. It cross checked my new master data against our old ERP, flagging bad links and filling in missing data. I could have done it much quicker overall with the api and some scripting, but it was easy to just write a two line prompt telling it where the data is and how to manage disagreements. Then I just watched it run on a second monitor for a few hours while I worked other projects.
I used a local Ollama model and though it was kind of amazing that it worked. I couldn't turn a typical user lose with something like this yet, but I think I see the vision. I image a lot of automation could happen this way in the future. I put less effort into the prompt than I would have needed to spend teaching someone from the office pool to accomplish the same goal, and got a good enough result.
In practice I have found that I can accomplish the same results in a stricter, more accurate, and faster way just using codex on the command line with some scripting and API access, but that's not going to work for a lot of people and putting it in the browser is pretty convenient... The MCP server that's built in can also become a bit of an API for the entire web if you're careful in how you use it, which opens up possibilities for things that don't have real APIs.
Have you ever thought about a marketplace for premade workflows? Or a library of integrations that are already tested that a user can mix and match to create complex automations? Or access to more MCP servers?
For example, it would be really neat to trigger jobs that perform some task and then make a call to Twilio or something to send an alert. Or some building blocks that tie into my Square account or Amazon account. I want to be able to describe the results I want, but I don’t want to explain how to interact with a particular service and then test that.
I would love to be able to give a prompt like this: “review my item library in Square, identify items that are missing descriptions or are miscategorized, propose the fixes, and confirm with me before making any changes.” That’s an extremely tedious task that requires a lot of clicking and page loads. I hate it and I would pay for your product if you could save me that time.
Or this: “Every month, alert me to any fluctuations in product cost and which items in my Square catalog are affected. Highlight any items where my COGS exceeds 35%. All the invoices are available in my email.” That would be incredibly powerful. Doing this manually can take days.
> “Every month, alert me to any fluctuations in product cost and which items in my Square catalog are affected. Highlight any items where my COGS exceeds 35%. All the invoices are available in my email.” That would be incredibly powerful.
You could try this use case on on agent builder even today. We also have a scheduled tasks for you to schedule it to run monthly
> Have you ever thought about a marketplace for premade workflows?
We want to do this and are moving towards that! But we first need to make the premade (or user published) workflows very reliable.
> whole Browser and not a Chrome Extension argument
Both of us are definitely biased to think our own approach is better :)
But without owning the binary, we couldn't shipped today's feature -- Agent with access to your filesystem and being able to run shell commands like Claude Cowork.
> your interface is still literally a chrome extension side panel
Yep, our interface is a chrome extension to make iterating on the UX faster. But it uses a ton of C++ APIs that we expose under `chrome.browseros.*`
> Your workflow pipeline is really cool! Any blog post/summary on how you set it up?
> But without owning the binary, we couldn't shipped today's feature -- Agent with access to your filesystem and being able to run shell commands like Claude Cowork
Chrome Extension can also access local files and can also execute LLM generated code in sandboxes
At the chromium level, you have access to every single DOM element and coordinate space around it. So, when a click happens either user or agent, we have a neat way of enforcing required action (either allow it or nullify the click).
We are still at early version. And mostly targeting enterprise sites (like SAP) which don't change that often.
Ohh, interesting, technically this should already be possible. Because we already package gemini-cli into the sidecar (bun) binary. We just have to create a good UX.
What angle are you looking at this from? Is it for convenience? Or do you not like terminal UI and need a web-friendly UI for these agents?
I think the big hurdle is mostly education / shift in mindset. We are so used to doing the task manually that most of us (including me) don't pause to think if I should be doing this or can I give to an agent.