It all started with https://github.com/pixeltable/pixeltable which is the core project we are actively working on and the problem space of working with video and frames for Computer Vision teams who struggle with the explosion of data and maintaining lineage and versioning for frames of video.
The same problems arise with LLMs for Documents and Chunks, Audio and Samples, etc.
LLM is just a function call. We are agnostic to whatever framework, library, and data format are used; we are focused on solving the data plumbing issues (mulitmodal storage and orchestration) with a simple python sdk: https://github.com/pixeltable/pixeltable.