Databases 5 min read

AWS S3 SELECT, Aurora Multi-Master & Serverless, DynamoDB Backup & Global Tables, and Neptune Overview

The article introduces AWS S3 SELECT for in‑object SQL filtering, highlights Aurora's new Multi‑Master and Serverless capabilities, describes DynamoDB's on‑demand backup and Global Tables, and provides a brief overview of the managed graph database Neptune, emphasizing performance and cost benefits.

Liulishuo Tech Team

Dec 8, 2017

AWS S3 SELECT, Aurora Multi-Master & Serverless, DynamoDB Backup & Global Tables, and Neptune Overview

Storage

As more users adopt S3 as a data lake, AWS has released the S3 SELECT feature, which pushes SQL SELECT statements down to S3 objects, reducing data transfer and delivering up to four‑times performance improvement.

# 官方示例代码
import boto3
from s3select import ResponseHandler

class PrintingResponseHandler(ResponseHandler):
    def handle_records(self, record_data):
        print(record_data.decode('utf-8'))

handler = PrintingResponseHandler()
s3 = boto3.client('s3')
response = s3.select_object_content(
    Bucket="super-secret-reinvent-stuff",
    Key="stuff.csv",
    SelectRequest={
        'ExpressionType': 'SQL',
        'Expression': 'SELECT s._1 FROM S3Object AS s'',
        'InputSerialization': {
            'CompressionType': 'NONE',
            'CSV': {
                'FileHeaderInfo': 'IGNORE',
                'RecordDelimiter': '
',
                'FieldDelimiter': ',',
            }
        },
        'OutputSerialization': {
            'CSV': {
                'RecordDelimiter': '
',
                'FieldDelimiter': ',',
            }
        }
    }
)
handler.handle_response(response['Body'])

Preview currently supports CSV and JSON objects; future format support is expected. During preview S3 SELECT is free, and for Athena users it can significantly reduce data scanned and lower costs.

The ecosystem must also adopt the feature, with vendors like Cloudara and DataBricks expected to add support.

Glacier offers a similar capability called Glacier SELECT, which pushes SELECT logic to Glacier for faster data access.

Databases

Database updates include:

Aurora now supports Multi‑Master and Serverless.

DynamoDB adds on‑demand backup/recovery and Global Tables.

Amazon Neptune, a managed graph database, has been launched.

Aurora Multi‑Master

Previously Aurora allowed only a single master; now multiple AZ masters provide automatic failover with zero downtime. While write throughput is expected to improve, the impact on latency due to conflict resolution remains to be seen.

Aurora Serverless

Aurora Serverless offers true on‑demand, pay‑as‑you‑go OLTP capabilities, automatically scaling compute resources based on load without user‑managed nodes.

For more Aurora details, see the AWS Database team session “Deep Dive on the Amazon Aurora MySQL version”.

DynamoDB

DynamoDB now offers a one‑click backup feature that can back up tables of any size in seconds, even up to petabyte scale.

Global Tables provide cross‑region data availability with automatic conflict resolution, akin to an AWS‑managed version of Google Spanner.

Amazon Neptune

Neptune is a fully managed graph database with standard access interfaces, positioned as the “Aurora of the graph database space”.

[1] https://www.youtube.com/watch?v=rPmKo2g9znA

Stay tuned for more engineer insights in the coming days.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Database AWS Aurora DynamoDB Neptune S3 SELECT

Written by

Liulishuo Tech Team

Help everyone become a global citizen!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.