close
close

first Drop

Com TW NOw News 2024

(P) AnyClassifier – Synthetic data generation for text classification
news

(P) AnyClassifier – Synthetic data generation for text classification

(P) AnyClassifier – Synthetic data generation for text classification

I would like to share this with everyone as I think it would be a great source of information for most ML engineers and software engineers.

I created a synthetic data generation for text classification module. It allows one to build classifiers from scratch, even without a dataset. It achieves competitive results with synthetic data, comparable to using real data, in 5 benchmarks.

https://preview.redd.it/rqyzrcbl1njd1.png?width=1704&format=png&auto=webp&s=476804a719f8619b46fb35d88361583e285c777a

There are many open researches and implementations in the future:

  • research into synthetic data algorithms resulting in higher performance
  • agentic workflow of model evaluation, failure analysis and model improvement
  • multilingual support

The project is released under MIT license. If you are interested, try it out and contribute.

Code: https://github.com/kenhktsui/anyclassifier

Detailed blog: https://huggingface.co/blog/kenhktsui/anyclassifier

EDIT: add benchmark result

submitted by /u/transformer_ML
(link) (reactions)