Discover the Top 29 Open‑Source Projects That Dominated InfoWorld’s 2021 BOSSIE Awards
This translated article presents InfoWorld’s 2021 Best Open‑Source Software (BOSSIE) list, showcasing 29 award‑winning projects—from frontend frameworks and cloud‑native tools to AI libraries and data platforms—highlighting their key features, use cases, and GitHub repositories.
InfoWorld, an information‑technology media company founded in 1978, annually selects the Best Open Source Software (BOSSIE) based on contributions to the open‑source community and industry impact. This article translates the 2021 list, featuring 29 award‑winning projects across development, cloud‑native, AI, data, and more.
1. Svelte and SvelteKit
Among many innovative open‑source JavaScript frameworks, Svelte and its full‑stack counterpart SvelteKit are among the most ambitious and forward‑looking. Svelte disrupts the status quo with a compile‑time strategy, delivering outstanding performance, continuous evolution, and an excellent developer experience. SvelteKit, now in public beta, extends Svelte’s tradition with modern tooling and built‑in serverless deployment.
Address: https://github.com/sveltejs/svelte
2. Minikube
Minikube is an easy‑to‑use tool for running Kubernetes locally; it creates a single‑node Kubernetes cluster inside a VM on your laptop, making it convenient for trying out Kubernetes or for day‑to‑day development.
Address: https://github.com/kubernetes/minikube
3. Pixie
Pixie is an observability tool for Kubernetes applications. It provides high‑level views such as service maps, cluster resources, and application traffic, as well as detailed views like pod status, flame graphs, and full‑body request traces. Pixie uses eBPF to automatically collect telemetry data locally in the cluster, consuming less than 5% of cluster CPU. Use cases include network monitoring, infrastructure health, service performance, and database query profiling.
Address: https://github.com/pixie-io/pixie
4. FastAPI
FastAPI is a high‑performance web framework for building APIs. Its main features include:
Very high performance comparable to NodeJS and Go.
Rapid development speed, boosting coding speed by 200‑300%.
Fewer errors, reducing human mistakes by about 40%.
Intuitive editor support with ubiquitous auto‑completion and less debugging time.
Ease of use and learning, minimizing documentation reading.
Concise code, reducing duplication.
Robust production‑ready code with automatic interactive documentation.
Based on open standards OpenAPI and JSON Schema.
Address: https://github.com/tiangolo/fastapi
5. Crystal
Crystal is a programming language that combines C‑level speed with Ruby‑like expressiveness. After the release of Crystal 1.0 in early 2021, the language is stable enough for general workloads. It uses static typing and the LLVM compiler for high speed and avoids common runtime issues such as null references. Crystal can interface with existing C code and offers compile‑time macros to extend the language syntax.
Address: https://github.com/crystal-lang/crystal
6. Windows Terminal
Windows Terminal is a modern, powerful command‑line terminal that supports multiple tabs, rich text, multilingual support, extensive theming, emoji, and GPU‑accelerated text rendering while remaining fast, efficient, and low‑resource.
Address: https://github.com/Microsoft/Terminal
7. OBS Studio
OBS Studio is software for real‑time streaming and screen recording, designed for efficient capture, composition, encoding, recording, and streaming to any platform. Key features include high‑performance video/audio capture and mixing, unlimited scenes with custom transitions, intuitive audio mixer with filters, modular dock UI, and extensive configuration options.
Address: https://github.com/obsproject/obs-studio
8. Shotcut
Shotcut is a cross‑platform video editor that allows standard video/audio corrections, effects, and layering. It has an active community, extensive tutorials, and runs on macOS, Linux, BSD, and Windows with a lightweight, user‑friendly interface.
Address: https://github.com/mltframework/shotcut
9. Weave GitOps Core
Weave GitOps supports efficient GitOps workflows for continuously delivering applications to Kubernetes clusters. It is built on the CNCF Flux engine.
Address: https://github.com/weaveworks/weave-gitops
10. Apache Solr
Apache Solr is a Lucene‑based full‑text search server and a popular enterprise‑grade search engine. It is clusterable, cloud‑deployable, and includes learning‑to‑rank algorithms for automatic result weighting.
Address: https://github.com/apache/solr
11. MLflow
MLflow, created by Databricks and hosted by the Linux Foundation, is an MLOps platform that tracks, manages, and maintains machine‑learning models, experiments, and deployments. It provides tools for logging experiments, packaging code, and integrating projects into workflows.
Address: https://github.com/mlflow/mlflow
12. Orange
Orange aims to make data mining productive and fun. It lets users build visual data‑analysis workflows, perform machine‑learning tasks, and visualize results, offering a more intuitive experience than code‑centric tools like R Studio or Jupyter.
Address: https://github.com/biolab/orange3
13. Flutter
Flutter, built by Google engineers, enables high‑performance, cross‑platform mobile app development. It optimizes for current and future mobile devices, focusing on low‑latency input and high frame rates for Android and iOS.
Address: https://github.com/flutter
14. Apache Superset
Apache Superset, originally developed at Airbnb, is an open‑source data exploration and visualization platform. It offers a rich, user‑friendly web UI for creating interactive dashboards and performing business‑intelligence analysis.
Address: https://github.com/apache/superset
15. Presto
Presto is an open‑source distributed SQL engine for interactive analytics. It can query a variety of data sources—including files, Hive, Cassandra, relational databases, and proprietary stores—allowing federated queries across multiple systems. Facebook uses Presto for its massive data warehouse.
Address: https://github.com/prestodb/presto
16. Apache Arrow
Apache Arrow defines a language‑agnostic columnar memory format for flat and hierarchical data, enabling efficient analytics on modern CPUs and GPUs. It supports zero‑copy reads for lightning‑fast data access and provides libraries for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.
Address: https://github.com/apache/arrow
17. InterpretML
InterpretML is an open‑source Explainable AI (XAI) package that includes state‑of‑the‑art model‑interpretability techniques. It lets you train glass‑box models and explain black‑box systems, offering global behavior insights and per‑prediction explanations. It includes Microsoft Research’s Explainable Boosting Machine and post‑hoc LIME support.
Address: https://github.com/interpretml/interpret
18. Lime
LIME (Local Interpretable Model‑agnostic Explanations) is a post‑hoc technique that perturbs input features and observes prediction changes to explain any classifier, supporting both text and image domains. LIME is also included in InterpretML.
Address: https://github.com/marcotcr/lime
19. Dask
Dask is an open‑source library for parallel computing in Python, extending the language to multiple machines and GPUs. It integrates with RAPIDS cuDF, XGBoost, and cuML for GPU‑accelerated analytics and machine learning, and works with NumPy, Pandas, and Scikit‑learn to parallelize workflows.
Address: https://github.com/dask/dask
20. BlazingSQL
BlazingSQL is a GPU‑accelerated SQL engine built on the RAPIDS ecosystem and the Apache Arrow columnar format. It serves as the SQL interface for cuDF, supporting large‑scale data‑science workflows and enterprise data sets.
Address: https://github.com/BlazingDB/blazingsql
21. Rapids
NVIDIA’s RAPIDS open‑source libraries and APIs enable end‑to‑end data‑science and analytics pipelines to run entirely on GPUs. Built on the Apache Arrow format, RAPIDS includes cuDF (a Pandas‑like DataFrame), cuML (GPU‑accelerated machine‑learning algorithms), and cuGraph (GPU‑accelerated graph analytics).
Address: https://github.com/rapidsai/cudf
22. PostHog
PostHog is an open‑source product analytics platform for developers. It automatically captures every event on your website or app without sending data to third parties, providing user‑level, event‑based analysis and insights.
Address: https://github.com/PostHog/posthog
23. LakeFS
LakeFS adds a Git‑like version‑control layer to object storage, allowing zero‑copy data branches, commit notes, metadata, and rollback options. It brings familiar Git semantics to data lakes such as Amazon S3 and Azure Blob, helping maintain data integrity and quality.
Address: https://github.com/treeverse/lakeFS
24. Meltano
Meltano, spun out of GitLab in 2021, is a free open‑source DataOps toolchain that replaces traditional ELT. It provides a data‑warehouse framework for modeling, extracting, and transforming data, with built‑in analytics dashboards and support for Singer taps and targets.
Address: https://github.com/meltano/meltano
25. Trino
Trino (formerly PrestoSQL) is a distributed SQL analytics engine that executes fast queries across large, distributed data sources. It can query data lakes, relational stores, and multiple heterogeneous sources without moving data, integrating smoothly with BI and analytics tools.
Address: https://github.com/trinodb/trino
26. StreamNative
StreamNative is a highly scalable messaging and event‑stream platform that simplifies real‑time reporting, analytics, and enterprise application pipelines. It combines Apache Pulsar’s distributed streaming architecture with Kubernetes, hybrid‑cloud support, connectors, authentication, and monitoring tools.
Address: https://github.com/streamnative
27. Hugging Face
Hugging Face provides the most important open‑source deep‑learning resource library. While not a deep‑learning framework itself, it extends beyond text to support images, audio, video, and object detection, and is a must‑watch repo for deep‑learning practitioners.
Address: https://github.com/huggingface/transformers
28. EleutherAI
EleutherAI is a distributed group of machine‑learning researchers aiming to bring GPT‑3 to everyone. In early 2021 they released The Pile, an 825 GB diverse training dataset, followed by GPT‑J (6 billion parameters) and later GPT‑NeoX, targeting up to 175 billion parameters to compete with OpenAI’s GPT‑3.
Address: https://github.com/EleutherAI/gpt-neo
29. Colab notebooks for generative art
The OpenAI CLIP (Contrastive Language‑Image Pre‑training) model is open‑source, but the generative DALL‑E model is not. To fill the gap, Ryan Murdoch and Katherine Crowson created Colab notebooks that combine CLIP with other open‑source models such as BigGAN and VQGAN, enabling prompt‑based generative art that has been widely shared and adapted.
Address: https://github.com/openai/CLIP
These are the 2021 InfoWorld BOSSIE award winners. Many of them are new to me, and my open‑source toolbox now includes several high‑end, impressive projects. 🥳
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
