Big Data 19 min read

2021 InfoWorld BOSSIE Awards: 29 Must‑Know Open‑Source Projects Across AI, Data & Cloud

InfoWorld's 2021 BOSSIE Awards highlight 29 standout open‑source projects—from front‑end frameworks like Svelte to cloud‑native tools such as Minikube, AI platforms like Hugging Face, data‑engineered solutions including Presto and Apache Arrow, and many more—offering developers a curated snapshot of the most influential software of the year.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
2021 InfoWorld BOSSIE Awards: 29 Must‑Know Open‑Source Projects Across AI, Data & Cloud

Svelte and SvelteKit

Svelte and its full‑stack counterpart SvelteKit are ambitious JavaScript frameworks that use a compile‑time strategy to deliver outstanding performance, developer experience, and now support serverless deployments in its beta.

Address: https://github.com/sveltejs/svelte

Minikube

Minikube provides an easy way to run a single‑node Kubernetes cluster locally in a VM, ideal for trying out Kubernetes or daily development.

Address: https://github.com/kubernetes/minikube

Pixie

Pixie is an observability tool for Kubernetes, offering high‑level views like service maps and detailed insights such as pod status, flame graphs, and request tracing, all while using less than 5% of cluster CPU.

Address: https://github.com/pixie-io/pixie

FastAPI

FastAPI is a high‑performance web framework for building APIs, boasting speed comparable to NodeJS and Go, rapid development (200‑300% faster), fewer errors, intuitive editor support, and full OpenAPI/JSON Schema compliance.

Fast: performance on par with NodeJS and Go

Rapid coding: 200‑300% faster development

Fewer errors: ~40% reduction in human mistakes

Intuitive: strong editor support and auto‑completion

Easy to learn: reduces documentation time

Concise: less code duplication

Robust: production‑ready with automatic interactive docs

Standard‑based: fully compatible with OpenAPI and JSON Schema

Address: https://github.com/tiangolo/fastapi

Crystal

Crystal combines C‑level speed with Ruby‑like expressiveness, using static typing and LLVM compilation, offering seamless C interop, compile‑time macros, and a stable 1.0 release suitable for general workloads.

Address: https://github.com/crystal-lang/crystal

Windows Terminal

Windows Terminal is a modern, feature‑rich command‑line tool with multi‑tab support, rich text, theming, GPU‑accelerated rendering, and low resource consumption.

Address: https://github.com/Microsoft/Terminal

OBS Studio

OBS Studio is a real‑time streaming and screen‑recording application designed for efficient capture, composition, encoding, and broadcasting to any platform.

High‑performance real‑time video/audio capture and mixing

Unlimited scenes with custom transitions

Intuitive audio mixer with filters (noise gate, suppression, gain)

Powerful yet easy configuration and layout docking

Address: https://github.com/obsproject/obs-studio

Shotcut

Shotcut is a cross‑platform video editor offering standard editing features, a responsive UI, and strong community support on macOS, Linux, BSD, and Windows.

Address: https://github.com/mltframework/shotcut

Weave GitOps Core

Weave GitOps enables effective GitOps workflows for continuous delivery to Kubernetes clusters, built on the CNCF Flux engine.

Address: https://github.com/weaveworks/weave-gitops

Apache Solr

Apache Solr is a Lucene‑based enterprise search server, clusterable, cloud‑deployable, and includes learning‑to‑rank algorithms for fine‑tuned result weighting.

Address: https://github.com/apache/solr

MLflow

MLflow, created by Databricks and hosted by the Linux Foundation, is an MLOps platform for tracking, managing, and deploying machine‑learning experiments and models.

Address: https://github.com/mlflow/mlflow

Orange

Orange makes data mining productive and fun by allowing users to build visual workflows for machine‑learning and analytics without coding.

Address: https://github.com/biolab/orange3

Flutter

Flutter, built by Google, enables high‑performance, cross‑platform mobile app development with low latency input and high frame rates on Android and iOS.

Address: https://github.com/flutter

Apache Superset

Apache Superset, originally developed at Airbnb, is an open‑source data exploration and visualization platform that provides an intuitive, enterprise‑grade BI web app.

Address: https://github.com/apache/superset

Presto

Presto is an open‑source distributed SQL engine for interactive analytics, capable of querying heterogeneous data sources—including Hive, Cassandra, and relational databases—without moving data.

Address: https://github.com/prestodb/presto

Apache Arrow

Apache Arrow defines a language‑agnostic columnar memory format optimized for modern CPUs and GPUs, supporting zero‑copy reads and libraries across many languages.

Address: https://github.com/apache/arrow

InterpretML

InterpretML is an open‑source Explainable AI library offering state‑of‑the‑art techniques, including glass‑box models like the Explainable Boosting Machine and post‑hoc methods such as LIME.

Address: https://github.com/interpretml/interpret

Lime

LIME (Local Interpretable Model‑agnostic Explanations) is a post‑hoc technique that perturbs input features to explain predictions of any classifier, supporting both text and image domains.

Address: https://github.com/marcotcr/lime

Dask

Dask is an open‑source parallel computing library that scales Python workloads across multiple machines or GPUs, integrating with RAPIDS, NumPy, Pandas, and Scikit‑learn.

Address: https://github.com/dask/dask

BlazingSQL

BlazingSQL is a GPU‑accelerated SQL engine built on the RAPIDS ecosystem, leveraging Apache Arrow and cuDF to provide a SQL interface for large‑scale data science.

Address: https://github.com/BlazingDB/blazingsql

Rapids

NVIDIA Rapids is an open‑source suite of GPU‑accelerated libraries (cuDF, cuML, cuGraph) for end‑to‑end data science pipelines, built on Apache Arrow.

Address: https://github.com/rapidsai/cudf

PostHog

PostHog is an open‑source product analytics platform for developers that automatically captures every event on a website or app without sending data to third parties.

Address: https://github.com/PostHog/posthog

LakeFS

LakeFS adds a Git‑like version‑control layer to object storage, enabling zero‑copy data branching, commits, metadata, and validation hooks for data lakes such as S3 and Azure Blob.

Address: https://github.com/treeverse/lakeFS

Meltano

Meltano is an open‑source DataOps platform that replaces traditional ELT pipelines, offering extractors, loaders, and Singer‑compatible taps and targets for data orchestration.

Address: https://github.com/meltano/meltano

Trino

Trino (formerly PrestoSQL) is a distributed SQL analytics engine that queries large data sources—including data lakes and relational stores—without moving data, integrating smoothly with BI tools.

Address: https://github.com/trinodb/trino

StreamNative

StreamNative is a highly scalable messaging and event‑stream platform that combines Apache Pulsar with Kubernetes, hybrid‑cloud support, and enterprise‑grade tooling for real‑time analytics.

Address: https://github.com/streamnative

Hugging Face

Hugging Face hosts the most important open‑source deep‑learning model repository, extending beyond text to images, audio, video, and object detection.

Address: https://github.com/huggingface/transformers

EleutherAI

EleutherAI is a distributed research group that released The Pile (825 GB dataset) and large language models such as GPT‑J (6 B parameters) and GPT‑NeoX, aiming to democratize GPT‑3‑scale models.

Address: https://github.com/EleutherAI/gpt-neo

Colab notebooks for generative art

OpenAI’s CLIP model, combined with open‑source generators like BigGAN and VQGAN in community‑maintained Colab notebooks, enables prompt‑based generative art creation under an MIT license.

Address: https://github.com/openai/CLIP

These 29 projects constitute the 2021 InfoWorld BOSSIE Awards, many of which are new to the author and enrich their open‑source toolkit.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data engineeringAIopen sourcesoftware
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.