Circos: The Beauty of Circle - Data Visualization with Circos
Yang Zhentao’s 2018 conference talk surveys data‑visualization fundamentals, highlights the multidisciplinary skills required, introduces the open‑source Circos tool and its polar‑coordinate workflow, showcases genomic and business use cases, and compares alternative platforms, emphasizing data quality, query capability, and proper view selection.
This presentation was delivered by Yang Zhentao at the 9th China Database Technology Conference on May 12, 2018. It provides a comprehensive introduction to data visualization and the Circos circular visualization tool.
Part 1: Data Visualization Analysis
The presentation begins by exploring what data visualization entails. According to the DIKW (Data-Information-Knowledge-Wisdom) model, data visualization accelerates the transformation from data to wisdom. The speaker argues that data visualization is a multidisciplinary field combining data processing, programming/algorithms, design/aesthetics, and statistics.
Various data types and visualization tools are discussed, including ECharts for general charting, specialized tools for network and timeline data, and domain-specific browsers like genomic browsers and virtual planetarium software. The presentation emphasizes that visualization is both a technical and artistic endeavor, requiring knowledge of computer graphics, planar geometry, coordinate transformations, and programming languages.
Key visualization technologies mentioned include SVG, Canvas (for H5 animations), OpenGL/WebGL for game development, and Three.js for 3D rendering. The fundamental elements of charts—coordinate systems, axes, scales, legends, and titles—are also discussed, along with the importance of professional presentation standards.
Part 2: Circos Features and Key Principles
Circos is introduced as a Perl-based open-source visualization tool (GPL license) that creates circular layout graphics. Originally developed for genomics, it has found applications across many industries. The tool uses configuration files to drive visualization output in PNG/SVG formats.
The presentation covers practical aspects of using Circos: installation via the official website or GitHub mirror, dependency checking with ./circos -modules , and basic usage with configuration files. The three-step workflow involves defining configuration, executing the Circos command, and generating output.
Key technical concepts explained include:
Coordinate transformations: Cartesian coordinates → polar coordinates → SVG coordinates
Bezier curves (quadratic Bezier curves for connecting data points)
Data object distribution on circles
Curve intersection control for organized visualizations
The presentation also mentions alternative implementations like R packages that provide similar functionality to Circos with interactive capabilities.
Part 3: Circos Cases and Application Scenarios
Typical applications include genome visualization (e.g., Chinese genome, panda genome, cucumber genome projects) and business analytics (e.g., DHL's global express delivery visualization).
Ideal Circos use cases have these characteristics:
Limited number of data entities (not suitable for 200,000+ individual data points)
Quantifiable relationships between entities that need directionality
Need for different resolutions or local zooming
Multi-dimensional data requiring comparison
Part 4: Rethinking Visualization
The final section discusses visualization platforms like Kibana (with good programmability), Grafana (strong platform capabilities supporting mainstream data sources), and Alibaba's Quick BI and DataV. The speaker emphasizes three key elements for successful visualization: data quality, query/search capabilities, and appropriate view selection.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.