Long Story Short
Compiler vulnerabilities tend to be overlooked because compiler developers may perceive them as either having little to no consequences or being easily avoidable in the final DApps. However, DApps developers, unaware of these compiler bugs, are likely to fail in detecting the unintended effects those bugs have on their products. The information gap between the two groups can lead to severe incidents, resulting in significant financial loss. Notable examples include the Vyper reentrancy bug, which led to the loss of over $26 million in smart contracts.
In this article, we will dive into the details of the Fuel-Swaylend buffer overflow vulnerability, which led to an incident where the contract was halted for 2 days. The story begins with an issue reported by our team during the Fuel Attackathon, an audit competition held on Immunefi, along with 20+ other Sway miscompilation bugs. These bugs, however, were not prioritized for immediate fixing.
Months later, shortly after fuel launched its mainnet, we noticed Swaylend transactions failing with error code 123. This error, which rarely occurs under normal operation, coincided with the impact of one of the compiler bugs we had reported during the Attackathon. Our further investigation confirmed that the bug was indeed the root cause. Consequently, Swaylend was paused until the compiler bug was fixed and a newly recompiled version was deployed. However, a 2-day shutdown had already occurred.
While the bugs we reported were initially not considered high priority, the incident highlighted their potential severity, prompting further attention to these issues. Some of the bugs still remain unresolved, pointing to the need for ongoing vigilance in addressing compiler-related vulnerabilities.
Now, let’s explore the technical details in more depth, shall we? :)
The Incident
Discovery
While casually browsing through fuel transactions, we noticed certain Swaylend transactions were failing with error code 123. This was alarming, as code 123 is reserved for mismatched selector reverts—in other words, calling an unknown function of an external contract.
1 | pub(crate) const MISMATCHED_SELECTOR_REVERT_CODE: u32 = 123; |
Such errors rarely occur during normal operation. Moreover, this coincided with an observable artifact of one of the compiler bugs we reported during the Fuel Attackathon, leading us to suspect that it might have a similar root cause. We will walk you through our journey of investigating the issue, as well as highlight key takeaways.
Investigation
Starting with the failing transaction, we first need to gather information on what happened. The Operations section of the transaction provides a simple overview of the events. It shows that the execution begins with a script calling the Sway proxy contract (0x657ab45a6eb98a4893a99fd104347179151e8b3828fd8f2a108cc09770d1ebae)
, which then calls the Pyth oracle contract (0x1c86fdd9e0e7bc0d2ae1bf6817ef4834ffa7247655701ee1b031b52a24c523da)
before reverting. While this is helpful, it doesn’t reveal which function of the Sway contract was called. To determine that, we need to examine the script.
The script, along with the script data, can be obtained from the advanced transaction view. The script itself is quite short, and when plugged into the disassembler, the logic is also fairly straightforward: it simply calls a contract with parameters from the script data.
1 | byte op notes |
Let’s examine the script data to identify which functions are called. Looking up the code library, we see target_struct
(or params
) is the serialization of 3 fields: contract_id
, method_name
and other function arguments.
1 | pub fn contract_call<T, TArgs>( |
Cross-referencing it with the script data, we find that the contract_id
is 0x657ab45a6eb98a4893a99fd104347179151e8b3828fd8f2a108cc09770d1ebae
, which matches the call we observed earlier. The method pointer points to the address 0x28f0
which holds the string withdraw_collateral
with its length prepended. From the Swaylend contract, we find the function signature for withdraw_collateral
is fn withdraw_collateral(asset_id: AssetId, amount: u64, price_data_update: PriceDataUpdate)
. We then proceed to decode the arguments as (AssetId, u64, PriceDataUpdate)
. The respective fields are annotated below (type definitions).
1 | 0x2890(10384) : 00 00 00 00 00 00 00 07 |
Bug Analysis
Now we know withdraw_collateral
is called and have the arguments, we are ready to dive into the code. We also know the actual failure occurs when calling the pyth
contract, so let’s jump directly to update_price_feed_if_necessary_internal within withdraw_collateral
where pyth
contract is called. This is when things start getting interesting. The failure happens after calling oracle.update_price_feeds_if_necessary
, but Swaylend uses the correct ABI, so what can possibly go wrong?
1 | impl Market for Contract { |
This brings us to the compiler internals of Fuel. For contract ABI method calls, Fuel automatically translates them into the contract_call
function defined in the core library, which we’ve already shown above. So, we can mentally unpack oracle.update_price_feeds_if_necessary
into an explicit call instead.
1 | pub(crate) fn type_check_method_application( |
contract_call
is responsible for several tasks: serializing the arguments, calling the external contract, and then deserializing the return value. The panic occurred when an external contract was called and the function name could not be found. This indicates either the serialized function name provided to the external contract is incorrect, or the function dispatching in the external contract does not work properly.
Before we dig further into the Swaylend incident, let’s take a step back and discuss the compiler bug we mentioned earlier, which we discovered during the Fuel Attackathon. This will provide important context for the Swaylend case when we revisit it later.
The codec library defines a trait called AbiEncode
used for encoding data. Any structures passed across contract boundaries must implement this trait for the compiler to be able to serialize it.
1 | pub trait AbiEncode { |
At the core of the trait is a Buffer
structure, which is used to track encoded data. A Buffer
is created with the __encode_buffer_empty
intrinsic, and serialized structure bytestreams are appended to it through the __encode_buffer_append
intrinsic. Once encoding is complete, the Buffer
is destructured into a raw_slice
using the encode_buffer_as_raw_slice
intrinsic.
1 | pub struct Buffer { |
While the usage of all these intrinsics may seem overwhelming at first, the compiler implementations are actually quite simple.
In EncodeBufferEmpty, the compiler allocates a memory chunk of size 1024, and packs the (ptr, capacity = 1024, len = 0)
tuple into the Buffer
structure before returning it.
1 | Intrinsic::EncodeBufferEmpty => { |
Appending to the Buffer
is slightly more involved, but it can be broken down into a few simple steps
- Calculate the address of
&Buffer.ptr[Buffer.len]
- Store the encoded data at the calculated address.
- Increase
Buffer.len
1 | Intrinsic::EncodeBufferAppend => { |
And EncodeBufferAsRawSlice packs Buffer.ptr
and Buffer.len
into a raw_slice
structure.
1 | Intrinsic::EncodeBufferAsRawSlice => { |
It is clear that EncodeBufferAppend
contains a critical bug: the buffer is never resized when the encoded data exceeds the original buffer length. If the encoded data is large, the append operation will silently overflow the allocated heap memory and overwrite subsequent data.
So, what consequences could this bug have? To answer that, we need to understand what lies after the overflown data chunk. The Fuel VM heap grows from high memory towards low memory and never garbage collects. Thus chunks allocated later are always placed at lower memory addresses than those allocated earlier. As a result, a sufficiently large overflow on a newer chunk can always overwrite data in an older chunk.
1 | Fuel VM Heap Layout |
In contract_call
, we can identify 3 encodings at play. The method_name
is generally hardcoded and short, making it is unlikely to overflow during encoding. On the other hand, the args
are often user-controllable and can have dynamic lengths, thus susceptible to overflow. The same applies to params
. Since method_name
is encoded before args
, the heap chunk in the Buffer
for the first_parameter
(method_name
) precedes the heap chunk in the Buffer
for second_parameter
(args
). This means a sufficiently long arg
can overflow during execution and overwrite the method_name
being called, resulting in the unknown function name error we observed.
Returning to Swaylend, is this what has happened? Close, but not exactly. It turns out the Fuel team has attempted to fix this bug at one point. In this commit, they added code to double the size of the Buffer
whenever it runs out of space. Unfortunately, doubling the buffer size was not enough to fully resolve the bug. Take the failing transaction as example, the final field to serialize is 0xba3 bytes, and the entire param
exceeds 0xc00 bytes. Since Doubling the 1024-bytes Buffer
to 2048-bytes is not enough to store the entire param
, the encoding still overflow into method_name
, corrupting it.
The End of the Story?
Reverting when it shouldn’t is bad enough on its own. But hold on—does an overflow always end with a revert? Let’s consider the bug more carefully. What if attackers craft their overflowing encoded argument carefully to control the method_name
, directing it to an existing function rather than some corrupted data? The hypothetical DApp below demonstrates how this could turn the bug into a serious loss-of-funds issue. Readers are encouraged to take some time with this to truly understand how the bug works 0.<
1 | contract; |
1 | contract; |
Besides controlling the method_name
to redirect code execution, other attack vectors also exist. Since the overflow doesn’t necessarily stop at the method_name
buffer, if there is other data placed on the heap before the call, attacker could tamper that as well. The potential of powerful exploits surrounding this bug is truly unlimited.
On the bright side for Swaylend, the Pyth
contract they’re calling doesn’t have any functionality that could enable a more severe attack. Additionally, there’s also no useful data on the heap for an attacker to corrupt. This limits the impact of the bug to only transaction failures. However, other DApps may not be as fortunate. Our suggestion to Sway developers is to review your code to ensure there are no dynamic-length arguments passed between contracts. If so, recompile your contract with the latest version of the Sway compiler and upgrade it immediately.
Reflection
So, what can we learn from the incident, and why do we call this a “preventable” issue?
Let’s take a look at the reporting timeline:
- 6/19 : We reported the compiler bug to Fuel via Immunefi, but the report was automatically closed because the contest didn’t include an appropriate impact option. The custom impact we provided—“Incorrect Sway intrinsics leading to Fuel heap buffer overflow”—was deemed out of scope
- 7/1 : We reached out to Immunefi and received a response that they would ask Fuel to review the reports that were automatically, but incorrectly, closed
- 8/23 : Long after the end of the Attackathon, we reminded Immunefi and Fuel the report had not been reviewed
- 8/26 : The report was once again automatically closed due to “out of scope” impacts
- 8/30 : The report was accepted, but its severity was downgraded from Critical to Low
- 8/30 : We provided the proof of concept DApp above to strengthen our claim that the bug could have severe impacts, but were unable to convince Fuel and Immunefi to reassess its severity
- 10/31: Two month later, we noticed transactions failing with error code 123 (mismatched selector reverts)
- 11/1 : Swaylend was halted
- 11/3 : Swaylend was recompiled and upgraded in this transaction
Compiler bug severities can be a source of contention. The main arguments for assigning a lower severity are:
- Compiler bugs rarely make it to production. They are typically caught by DApp developers during testing and can easily be identified and fixed.
- Compiler bugs don’t have an immediate impact, so by nature, they can’t be considered severe.
- It’s uncommon for DApps to encounter compiler bugs, as code that triggers them often involves anti-patterns.
On the other hand, the counterarguments for high severity are:
- It is unreasonable to expect DApp developers to catch compiler bugs during testing. Testing coverage is often insufficient, and even with high coverage, bugs may still go undetected.
- Programming languages are meant to provide developers with a trusted foundation. If developers cannot rely on a language to function as intended, building anything useful becomes impossible.
- Nearly all miscompilations have the potential to lead to critical consequences. If not taken seriously, it’s only a matter of time before compiler bugs result in significant losses.
While the consequences of compiler bugs are still up for debate, we want to highlight that negligence in compiler security has already had visible impacts in the industry. A few well-known examples include:
- The Vyper reentrancy bug => over $26 million stolen.
- The ZKSync-Aave optimization bug => fortunately identified before activation.
- The Fuel-Swaylend buffer overflow vulnerability => Swaylend halted for 2 days.
Although it is common for people to underestimate the potential impact of vulnerabilities yet to occur, recent examples demonstrate our industry has reached a point where imminent threats, such as compiler issues, are looming. A certain portion of related incidents could likely have been avoided if reported vulnerabilities had received more attention and if security researchers had been more actively engaged in the process of reviewing fixes.
Bugs are an inherent part of the development process, and determining the timing and approach for addressing them is a crucial decision. The more seriously security bugs are handled, the less likely they are to come back to bite us later. If you are concerned about compiler bugs affecting your contracts or need help assessing the risks, contact us at th3.anatomist@gmail.com. We can help you conduct the most thorough and rigorous review.
In our next post, we will dive into the Sway compiler, breaking down the pipeline of modern compilers and examining the bugs we discovered during the Attackathon. Feel free to follow us on X to stay tunned!