GKI Transformation Principles and Implementation Methods
The article explains Google’s GKI transformation requirements—preserving a stable Kernel Module Interface, using only exported and whitelisted symbols, employing vendor hook mechanisms for custom SoC/OEM code, detecting interface mismatches, and offering alternatives such as padding macros and existing kernel event registration to avoid breaking KMI.
Google requires all downstream manufacturers to use Generic Kernel Image (GKI) in the android11-5.4 branch, necessitating the separation of SoC and device-related code from the core kernel into loadable modules (referred to as GKI transformation) to address kernel fragmentation issues. GKI provides a stable Kernel Module Interface (KMI) that allows modules and kernel to be updated independently. This article primarily introduces the principles, problems, and solutions that must be followed during GKI transformation.
1. Cannot Break KMI
After freezing KMI, the branch remains frozen throughout its lifecycle, and modifications that break KMI are generally not accepted (unless severe security issues are discovered that cannot be mitigated without affecting KMI stability). In frozen branches, only bug fixes and partner features that do not break KMI are accepted. New symbols can be exported to extend KMI without affecting existing KMI interfaces, and new interfaces added to KMI must remain stable and cannot be broken by future modifications.
1. Problem Phenomenon:
After replacing google boot.img, the serial port shows the following error during boot:
[ 1.669135] init: Loading module /lib/modules/foo.ko with args "" [ 1.676281] foo: disagrees about version of symbol xxx2. Cause Analysis:
This error may occur because the crc values of symbols called in ko do not match those in vmlinux. For example, adding new fields to structures used in KMI interfaces indirectly modifies the interface definition.
3. Recommended Methods:
(1) Extend native kernel structures and interfaces
(2) Apply to Google to add padding to native kernel structures
Google added two new macros in the android11-5.4 branch:
ANDROID_VENDOR_DATA: Reserves padding in structures for potential future use, typically at the end of structures, with padding variable identifiers starting from 1.
ANDROID_VENDOR_DATA_ARRAY: Similar to ANDROID_VENDOR_DATA, allocates an array of size s with u64 type elements.
The following example shows how to use ANDROID_VENDOR_DATA and ANDROID_VENDOR_DATA_ARRAY to add new fields to kernel structures:
4. Measures:
Add detection in the compilation script. After compiling the GKI kernel, compare the crc values of generated symbols with those in the native android/abi_gki_aarch64.xml file. If crc values do not match, compilation fails, indicating non-standard methods were used to modify native interfaces or structures.
2. Kernel Modules Can Only Use Exported and Whitelisted Interfaces
The android11-5.4 branch build.config.gki.aarch64 file has the following configuration:
This indicates modules can only use symbols in the abi_gki_aarch64 file.
1. Problem Phenomenon:
After replacing google boot.img, the serial port shows the following error:
[ 1.735506] foo: Unknown symbol xxx(err -2)2. Cause Analysis:
The native kernel did not use EXPORT_SYMBOL_GPL to export interface xxx, or the already exported interface was not added to the whitelist (the interface ABI may be unstable).
3. Recommended Methods:
(1) Google recommends applying to the upstream Linux community to export the kernel interfaces to be used, and applying to Google to add the interface to the whitelist android/abi_gki_aarch64_xxx
(2) Use other interfaces already on the whitelist as alternatives.
3. Vendor Hook Mechanism
Considering that SoC and OEM manufacturers may need to make custom modifications and optimizations to the native kernel, Google provides a vendor hook mechanism. Downstream manufacturers add hooks where kernel source code needs to be modified and apply to Google to upstream the patch to AOSP.
1. Vendor Hook Implementation Steps:
(1) Create a new header file xxx.h in the include/trace/hooks/ directory, defining a set of hook interfaces register_trace_android_vh_xxx, trace_android_vh_xxx, and the global variable __tracepoint_android_vh_xxx
(2) Include the hook header file xxx.h in drivers/android/vendor_hooks.c and export the hook variable __tracepoint_android_vh_xxx for module use
(3) In the kernel module, add register code to bind the callback function to the hook variable __tracepoint_android_vh_xxx
(4) In the kernel xxx.c file, include the hook header file xxx.h and call the hook interface trace_android_vh_xxx (the callback function bound to the hook variable __tracepoint_android_vh_xxx)
2. Problem Phenomenon:
Testing occasionally shows dump "BUG: scheduling while atomic:"
3. Cause Analysis:
Vendor hook variables come in two types, both based on tracepoints:
Normal: Uses the DECLARE_HOOK macro to create a tracepoint function trace_ , requiring name to be unique in the trace, with callback functions used in preemption-disabled scenarios
Restricted: Used in scheduler hook class scenarios, where bound callback functions can be called in cpu offline or non-atomic contexts (without preemption disabled before calling), restricted vendor hooks cannot be unbound, so bound modules cannot be unloaded, and only one binding is allowed (any other binding will return -EBUSY error).
4. Recommended Methods:
Choose appropriate vendor hook variables based on usage scenarios. Use restricted vendor hooks in scenarios where scheduling may occur.
4. Vendor Hook Extension
SoC and OEM features must be separated from the kernel and compiled into kernel modules. Calling exported interfaces between kernel source code is not a problem, but what about calling exported interfaces between modules?
1. Problem Phenomenon:
Compilation error: depmod: ERROR: Found 2 modules in dependency cycles!
2. Cause Analysis:
Modules calling exported interfaces between each other causes compilation errors.
3. Recommended Methods:
Inspired by Google's vendor hook mechanism, define and export global variables in Module A, and in Module B's initialization function, register and bind the callback function to this global variable. This way, only Module B calls variables and interfaces in Module A, and Module A calls back Module B's interface through the hook variable, solving the compilation call problem.
5. Using Existing Kernel Event Registration Interfaces Instead of Vendor Hook
Is vendor hook the only way to modify the kernel?
In the android11-5.4 branch log, there is a commit about vendor hook:
Adding vendor hook for key combinations
We notice that the kernel has an interface input_register_handle, whose comment is to register a new input handle and add it to the device and handle list. As long as the input_open_device() interface opens the device, input events will poll to this handle. The input_register_handle interface should be called in the handler's connect method. Therefore, we can use input_register_handler and input_register_handle to implement the key combination function without adding vendor hook to the kernel.
Through this transformation example, we are inspired to think about whether vendor hook mechanism is the only option, or whether other existing kernel mechanisms and event registration interfaces can be used to separate features originally embedded in the kernel.
OPPO Kernel Craftsman
Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.