Here you can find all my discoveries on Github, projects I starred and liked or you can visit my personal Github profile.
deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Check on Githubtransformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Check on Githubtpot
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Check on Githubkedro
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
Check on Github