Comparative Study of JSON Processing Methods in MaxCompute
The study compares MaxCompute JSON extraction functions—FROM_JSON, get_json_object, and custom JMESPath/JSONPath UDFs—showing simple field extraction with get_json_object is fastest, while complex queries benefit from FROM_JSON or JMESPath, and outlines corresponding JSON generation methods and best‑practice recommendations.
Background: Processing large JSON fields in MaxCompute often requires extracting specific parts or generating new JSON structures.
Methods for JSON extraction compared: FROM_JSON , get_json_object , and custom UDFs based on JMESPath or JSONPath.
Selection criteria: simple field extraction (use get_json_object ); need to retain the original JSON or apply complex logic (use JMESPath); other cases (use FROM_JSON ).
Performance test on a table with 3,484,503 rows shows extraction speeds (rec/ms): get_json_object 146.951 (head) / 70.277 (tail), custom JMESPath UDFs ~15‑16, FROM_JSON 46.958 (head) / 29.239 (tail). Simple extraction is fastest, while complex logic incurs higher cost.
Recommendations: use FROM_JSON or JMESPath for complex queries, get_json_object for straightforward field or array‑index extraction.
JSON generation methods in MaxCompute: TO_JSON , custom UDFs ( bi_to_json_string , bi_to_json_string_ex , bi_put_json_values , bi_merge_json_array_distinct , bi_sort_json_array ), and JMESPath‑based concatenation.
Example of TO_JSON :
SELECT to_json(named_struct('a',1,'b',2));Example of a custom UDF for merging values:
SELECT bi_put_json_values('{"fav":[12,3,4]}','name',bi_to_json_string_ex(name),'age',bi_to_json_string_ex(age));Example of JMESPath merge to add a field to each array element:
SELECT jmespath_get_json('[{"a":1,"b":"b0"},{"a":1,"b":"b1"}]','map(&merge(@,{"c":1}),@)');Conclusion: choose the method based on whether you need to generate JSON from scratch, modify shallow structures, or perform conditional, deep transformations.
DaTaobao Tech
Official account of DaTaobao Technology
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.