I'm a Data Scientist. For some time, I've been working on a library for feature engineering.
• GitHub: https://github.com/feature-express/feature-express
• Website: https://feature.express
It isn't yet complete, and I wouldn't consider it ready for production use or handling larger datasets. Here are some of its characteristics:
• Event-based workflows: Initially, everything is converted to an event format, ingested into an event store, and processed from there.
• In-memory: Both the event store and evaluation have been built from scratch.
• Written in Rust, but there's a Python package available.
• A DSL (Domain Specific Language) for defining aggregations, similar to SQL.
Why am I developing this? I've always found it challenging to build models based on time. These models can be surprisingly tricky, and there's a high risk of accidentally using future data, which can lead to data leakage. FeatureExpress is designed to nearly eliminate such mistakes. Moreover, I believe that representing data as events is an intuitive approach.