How to Achieve Zero‑Copy String Construction Across JDK Versions
This article explains the internal differences of Java's String implementation from JDK 8 to JDK 9+, demonstrates how to use sun.misc.Unsafe and trusted MethodHandles.Lookup to build zero‑copy String objects, and provides practical code examples for high‑performance string handling.
1. JDK String Implementation
In JDK 8 a String stores its characters in a char[] value array and copies the array in its public constructor. The class looks like:
class String {
char[] value;
// Constructor copies the array
public String(char[] value) {
this.value = Arrays.copyOf(value, value.length);
}
// Non‑copying constructor used internally
String(char[] value, boolean share) {
this.value = value;
}
}From JDK 9 onward the representation changes to a byte[] value plus a byte coder field that indicates LATIN1 (0) or UTF‑16 (1). Most strings are LATIN1, allowing a zero‑copy construction for better performance.
class String {
static final byte LATIN1 = 0;
static final byte UTF16 = 1;
byte code;
byte[] value;
// Non‑copying constructor used internally
String(byte[] value, byte coder) {
this.value = value;
this.coder = coder;
}
}2. Using sun.misc.Unsafe
Unsafe provides low‑level operations that can bypass normal Java safety checks. The following utility obtains the singleton Unsafe instance:
public class UnsafeUtils {
public static final Unsafe UNSAFE;
static {
Unsafe unsafe = null;
try {
Field f = Unsafe.class.getDeclaredField("theUnsafe");
f.setAccessible(true);
unsafe = (Unsafe) f.get(null);
} catch (Throwable ignored) {}
UNSAFE = unsafe;
}
}3. Trusted MethodHandles.Lookup
To invoke private constructors or methods, a trusted MethodHandles.Lookup object is required. The code below extracts the internal IMPL_LOOKUP field via Unsafe and creates a lookup that can access any JDK class:
static final MethodHandles.Lookup IMPL_LOOKUP;
static {
Class<?> lookupClass = MethodHandles.Lookup.class;
Field f = lookupClass.getDeclaredField("IMPL_LOOKUP");
long offset = UNSAFE.staticFieldOffset(f);
IMPL_LOOKUP = (MethodHandles.Lookup) UNSAFE.getObject(lookupClass, offset);
}
public static MethodHandles.Lookup trustedLookup(Class<?> cls) throws Exception {
return IMPL_LOOKUP.in(cls);
}4. Zero‑Copy String Construction
Using the trusted lookup, a BiFunction that creates a String without copying can be built for each JDK version.
JDK 8
BiFunction<char[], Boolean, String> STRING_CREATOR_JDK8 =
(char[] chars, Boolean share) ->
(String) MethodHandles.lookup()
.findConstructor(String.class,
MethodType.methodType(void.class, char[].class, boolean.class))
.invokeExact(chars, share);JDK 9‑15
BiFunction<byte[], Byte, String> STRING_CREATOR_JDK11 =
(byte[] bytes, Byte coder) ->
(String) MethodHandles.lookup()
.findConstructor(String.class,
MethodType.methodType(void.class, byte[].class, byte.class))
.invokeExact(bytes, coder);When the JVM is started with -XX:-CompactStrings, these tricks no longer work.
5. Direct Access to String Internals
For JDK 8 the internal char[] value field can be read via Unsafe:
static final Field FIELD_STRING_VALUE;
static final long FIELD_STRING_VALUE_OFFSET;
static {
Field f = String.class.getDeclaredField("value");
FIELD_STRING_VALUE_OFFSET = UNSAFE.objectFieldOffset(f);
FIELD_STRING_VALUE = f;
}
public static char[] getCharArray(String s) {
try {
return (char[]) UNSAFE.getObject(s, FIELD_STRING_VALUE_OFFSET);
} catch (Exception e) {
return s.toCharArray();
}
}For JDK 9+ the coder and value methods are also private; they can be accessed similarly:
MethodHandles.Lookup lookup = trustedLookup(String.class);
MethodHandle coderHandle = lookup.findSpecial(String.class, "coder", MethodType.methodType(byte.class), String.class);
MethodHandle valueHandle = lookup.findSpecial(String.class, "value", MethodType.methodType(byte[].class), String.class);
ToIntFunction<String> STRING_CODER = (String s) -> (byte) coderHandle.invokeExact(s);
Function<String, byte[]> STRING_VALUE = (String s) -> (byte[]) valueHandle.invokeExact(s);6. Practical Example: Fast Date Formatting
The following method formats a LocalDate to YYYY‑MM‑DD using the zero‑copy creators appropriate for the running JDK:
static String formatYYYYMMDD(LocalDate date) {
int y = date.getYear();
int m = date.getMonthValue();
int d = date.getDayOfMonth();
if (STRING_CREATOR_JDK11 != null) {
byte[] bytes = new byte[10];
bytes[0] = (byte) (y / 1000 + '0');
bytes[1] = (byte) ((y / 100) % 10 + '0');
bytes[2] = (byte) ((y / 10) % 10 + '0');
bytes[3] = (byte) (y % 10 + '0');
bytes[4] = '-';
bytes[5] = (byte) (m / 10 + '0');
bytes[6] = (byte) (m % 10 + '0');
bytes[7] = '-';
bytes[8] = (byte) (d / 10 + '0');
bytes[9] = (byte) (d % 10 + '0');
return STRING_CREATOR_JDK11.apply(bytes, (byte) 0); // LATIN1
} else {
char[] chars = new char[10];
chars[0] = (char) (y / 1000 + '0');
chars[1] = (char) ((y / 100) % 10 + '0');
chars[2] = (char) ((y / 10) % 10 + '0');
chars[3] = (char) (y % 10 + '0');
chars[4] = '-';
chars[5] = (char) (m / 10 + '0');
chars[6] = (char) (m % 10 + '0');
chars[7] = '-';
chars[8] = (char) (d / 10 + '0');
chars[9] = (char) (d % 10 + '0');
return STRING_CREATOR_JDK8 != null ? STRING_CREATOR_JDK8.apply(chars, true) : new String(chars);
}
}This approach is considerably faster than using SimpleDateFormat or the standard java.time.format.DateTimeFormatter.
7. Caveats
The techniques rely on internal APIs and unsafe operations; they should only be used by experienced developers who understand the risks, as incorrect usage can crash the JVM or break with future JDK releases.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
