Skip to main content

Let's flood a HashMap!

· 14 min read
Rynco Maekawa

This article gives a brief introduction of the structure of a hash table, demonstrates hash flooding attack -- a common attack on it, and how to militate it when implementing this data structure.

Everybody loves hashmaps.

They provide a blazing fast average O(1)O(1) access* to associate any value to any key, asking for only two things in return: an equality comparer and a hash function, nothing more. This unique property makes hashmaps often more efficient than other associative data structures like search trees. As a result, hashmaps are nowadays one of the most used data structures in programming languages.

From the humble dict in Python, to databases and distributed systems, and even JavaScript objects, they're everywhere. They power database indexing systems, enable efficient caching mechanisms, and form the backbone of web frameworks for routing requests. Modern compilers use them for symbol tables, operating systems rely on them for process management, and virtually every web application uses them to manage user state.

Whether you're building a web server, parsing JSON values, dealing with configurations, or just counting word frequencies, chances are you'll reach for a hashmap. They've become so fundamental that many developers take their O(1)O(1) magic for granted -- but the 11 in O(1)O(1) has got some strings* attached.

The anatomy of a hashmap

A hashmap is made of two parts: a bucket array and a hash function.

struct MyHashMap[K, V] {
  
Array[ChainingBucket[K, V]]
buckets
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
struct ChainingBucket[K, V] {
  values: Array[(K, V)]
}
Bucket
[

type parameter K

K
,

type parameter V

V
]]
(K) -> UInt
hash_fn
: (

type parameter K

K
) ->
UInt
UInt
}

The bucket array contains a list of what we call "buckets". Each bucket stores some data we have inserted.

The hash function H associates each key with an integer. This integer is used to find an index in the bucket array to store our value. Usually, the index is derived by simply moduloing the integer with the size of the bucket array, i.e. index = H(key) % bucket_array_size. The hashmap expects the function to satisfy two important properties:

  1. The same key is always converted to the same number. i.e., if a == b, then H(a) == H(b).

    This property ensures that, once we have found a bucket to insert using a key, we can always find the same bucket where it has been inserted, using the same key.

  2. The resulting number is distributed uniformly across the space of possible results for different keys.

    This property ensures that different keys are unlikely to have the same associated integer, and in consequence, unlikely to be mapped to the same bucket in the array, allowing us to retrieve the value efficiently.

Now, you may ask, what would happen if two keys map to the same bucket? This comes to the realm of hash collisions.

Hash collisions

When two keys have the same hash value, or more broadly, when two keys map to the same bucket, a hash collision occurs.

As hashmaps determines everything based on the hash value (or bucket index), the two keys now look the same to the hashmap itself -- they should be put into the same place, but still unequal enough to not overwriting each other.

Hashmap designers have a couple of strategies to deal with collisions, which fall into one of the two broad categories:

  • The chaining method puts these keys in the same bucket. Each bucket now may contain the data for a number of keys, instead of just one. When searching for a colliding key, all keys in the bucket are searched at once.

    struct ChainingBucket[K, V] {
      
    Array[(K, V)]
    values
    :
    type Array[T]

    An Array is a collection of values that supports random access and can grow in size.

    Array
    [(

    type parameter K

    K
    ,

    type parameter V

    V
    )]
    }

    Java's HashMap is a popular example of this approach.

  • The open addressing method still has one key per bucket, but uses a separate strategy to choose another bucket index when keys collide. When searching for a key, buckets are searched in the order of the strategy until the it is obvious that there are no more keys that could match.

    struct OpenAddressBucket[K, V] {
      
    Int
    hash
    :
    Int
    Int
    K
    key
    :

    type parameter K

    K
    V
    value
    :

    type parameter V

    V
    }

    MoonBit's standard library Map is an example of this approach.

Either case, when a hash collision happens, we have no choice but to search through everything corresponding to the bucket we've found, to determine whether the key we are looking for is there or not.

Using a chaining hashmap (for simplicity), the whole operation looks something like this:

typealias 
struct ChainingBucket[K, V] {
  values: Array[(K, V)]
}
ChainingBucket
as Bucket
/// Search for the place where the key is stored. /// /// Returns `(bucket, index, number_of_searches_done)` fn[K :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
, V]
struct MyHashMap[K, V] {
  buckets: Array[ChainingBucket[K, V]]
  hash_fn: (K) -> UInt
}
MyHashMap
::
fn[K : Eq, V] MyHashMap::search(self : MyHashMap[K, V], key : K) -> (Int, Int?, Int)

Search for the place where the key is stored.

Returns (bucket, index, number_of_searches_done)

search
(
MyHashMap[K, V]
self
:
struct MyHashMap[K, V] {
  buckets: Array[ChainingBucket[K, V]]
  hash_fn: (K) -> UInt
}
MyHashMap
[

type parameter K

K
,

type parameter V

V
],
K
key
:

type parameter K

K
) -> (
Int
Int
,
Int
Int
?,
Int
Int
) {
let
UInt
hash
= (
MyHashMap[K, V]
self
.
(K) -> UInt
hash_fn
)(
K
key
)
let
Int
bucket
= (
UInt
hash
fn Mod::mod(self : UInt, other : UInt) -> UInt

Calculates the remainder of dividing one unsigned integer by another.

Parameters:

  • self : The unsigned integer dividend.
  • other : The unsigned integer divisor.

Returns the remainder of the division operation.

Throws a panic if other is zero.

Example:

let a = 17U
let b = 5U
inspect(a % b, content="2") // 17 divided by 5 gives quotient 3 and remainder 2
inspect(7U % 4U, content="3")
%
MyHashMap[K, V]
self
.
Array[ChainingBucket[K, V]]
buckets
.
fn[T] Array::length(self : Array[T]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length
().
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
()).
fn UInt::reinterpret_as_int(self : UInt) -> Int

reinterpret the unsigned int as signed int For number within the range of 0..=2^31-1, the value is the same. For number within the range of 2^31..=2^32-1, the value is negative

reinterpret_as_int
()
// Result let mut
Int?
found_index
=
Int?
None
let mut
Int
n_searches
= 0
// Search through all key-value pairs in the bucket. for
Int
index
,
(K, V)
keyvalue
in
MyHashMap[K, V]
self
.
Array[ChainingBucket[K, V]]
buckets
fn[T] Array::op_get(self : Array[T], index : Int) -> T

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

let arr = [1, 2, 3]
inspect(arr[1], content="2")
[
bucket].
Array[(K, V)]
values
{
Int
n_searches
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+=
1
if
(K, V)
keyvalue
.
K
0
(_ : K, _ : K) -> Bool
==
K
key
{ // Check if the key matches.
Int?
found_index
=
(Int) -> Int?
Some
(
Int
index
)
break } } return (
Int
bucket
,
Int?
found_index
,
Int
n_searches
)
} /// Insert a new key-value pair. /// /// Returns the number of searches done. fn[K :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
, V]
struct MyHashMap[K, V] {
  buckets: Array[ChainingBucket[K, V]]
  hash_fn: (K) -> UInt
}
MyHashMap
::
fn[K : Eq, V] MyHashMap::insert(self : MyHashMap[K, V], key : K, value : V) -> Int

Insert a new key-value pair.

Returns the number of searches done.

insert
(
MyHashMap[K, V]
self
:
struct MyHashMap[K, V] {
  buckets: Array[ChainingBucket[K, V]]
  hash_fn: (K) -> UInt
}
MyHashMap
[

type parameter K

K
,

type parameter V

V
],
K
key
:

type parameter K

K
,
V
value
:

type parameter V

V
) ->
Int
Int
{
let (
Int
bucket
,
Int?
index
,
Int
n_searches
) =
MyHashMap[K, V]
self
.
fn[K : Eq, V] MyHashMap::search(self : MyHashMap[K, V], key : K) -> (Int, Int?, Int)

Search for the place where the key is stored.

Returns (bucket, index, number_of_searches_done)

search
(
K
key
)
if
Int?
index
is
(Int) -> Int?
Some
(
Int
index
) {
MyHashMap[K, V]
self
.
Array[ChainingBucket[K, V]]
buckets
fn[T] Array::op_get(self : Array[T], index : Int) -> T

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

let arr = [1, 2, 3]
inspect(arr[1], content="2")
[
bucket].
Array[(K, V)]
values
fn[T] Array::op_set(self : Array[T], index : Int, value : T) -> Unit

Sets the element at the specified index in the array to a new value. The original value at that index is overwritten.

Parameters:

  • array : The array to modify.
  • index : The position in the array where the value will be set.
  • value : The new value to assign at the specified index.

Throws an error if index is negative or greater than or equal to the length of the array.

Example:

let arr = [1, 2, 3]
arr[1] = 42
inspect(arr, content="[1, 42, 3]")
[
index] = (
K
key
,
V
value
)
} else {
MyHashMap[K, V]
self
.
Array[ChainingBucket[K, V]]
buckets
fn[T] Array::op_get(self : Array[T], index : Int) -> T

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

let arr = [1, 2, 3]
inspect(arr[1], content="2")
[
bucket].
Array[(K, V)]
values
.
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
((
K
key
,
V
value
))
}
Int
n_searches
}

This is the string attached to the O(1)O(1) access magic -- we'd have to search through everything if we're unlucky. This gives the hashmap a worst-case complexity of O(n)O(n), where nn is the number of keys in the hashmap.

Crafting a collision

For most hash functions we use for hashmaps, unlucky collisions are rare. This means that we usually won't need to bother with the worst case scenario and enjoy the O(1)O(1) speed for the vast majority of the time.

That is, unless someone, maybe some black-suited hackerman with some malicious intent, forces you into one.

Hash functions are usually designed to be deterministic and fast, so even without advanced cryptanalysis of the function itself, we can still find some keys that will collide with each other by brute force. 1

fn 
fn find_collision(bucket_count : Int, target_bucket : Int, n_collision_want : Int, hash_fn : (String) -> UInt) -> Array[String]
find_collision
(
Int
bucket_count
:
Int
Int
,
Int
target_bucket
:
Int
Int
,
Int
n_collision_want
:
Int
Int
,
(String) -> UInt
hash_fn
: (
String
String
) ->
UInt
UInt
,
) ->
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
String
String
] {
let
Array[String]
result
= []
let
UInt
bucket_count
=
Int
bucket_count
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
()
let
UInt
target_bucket
=
Int
target_bucket
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
()
for
Int
i
= 0; ;
Int
i
=
Int
i
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
1 {
// Generate some string key. let
String
s
=
Int
i
.
fn Int::to_string(self : Int, radix~ : Int) -> String

Converts an integer to its string representation in the specified radix (base). Example:

inspect((255).to_string(radix=16), content="ff")
inspect((-255).to_string(radix=16), content="-ff")
to_string
(
Int
radix
=36)
// Calculate the hash value let
UInt
hash
=
(String) -> UInt
hash_fn
(
String
s
)
let
UInt
bucket_index
=
UInt
hash
fn Mod::mod(self : UInt, other : UInt) -> UInt

Calculates the remainder of dividing one unsigned integer by another.

Parameters:

  • self : The unsigned integer dividend.
  • other : The unsigned integer divisor.

Returns the remainder of the division operation.

Throws a panic if other is zero.

Example:

let a = 17U
let b = 5U
inspect(a % b, content="2") // 17 divided by 5 gives quotient 3 and remainder 2
inspect(7U % 4U, content="3")
%
UInt
bucket_count
let
UInt
bucket_index
= if
UInt
bucket_index
fn Compare::op_lt(x : UInt, y : UInt) -> Bool
<
0 {
UInt
bucket_index
fn Add::add(self : UInt, other : UInt) -> UInt

Performs addition between two unsigned 32-bit integers. If the result overflows, it wraps around according to the rules of modular arithmetic (2^32).

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand to be added.

Returns the sum of the two unsigned integers, wrapped around if necessary.

Example:

let a = 42U
let b = 100U
inspect(a + b, content="142")

// Demonstrate overflow behavior
let max = 4294967295U // UInt::max_value
inspect(max + 1U, content="0")
+
UInt
bucket_count
} else {
UInt
bucket_index
} // Check if it collides with our target bucket. if
UInt
bucket_index
fn Eq::equal(self : UInt, other : UInt) -> Bool

Compares two unsigned 32-bit integers for equality.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand to compare with.

Returns true if both integers have the same value, false otherwise.

Example:

let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
==
UInt
target_bucket
{
Array[String]
result
.
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
String
s
)
if
Array[String]
result
.
fn[T] Array::length(self : Array[T]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length
()
fn Compare::op_ge(x : Int, y : Int) -> Bool
>=
Int
n_collision_want
{
break } } }
Array[String]
result
}

Hash flooding attack

With colliding values in hand, we (in the role of malicious hackermen) can now attack hashtables to constantly exploit their worst-case complexity.

Consider the following case: you are inserting keys into the same hashmap, but every key hashes into the same bucket. With each insert, the hashmap must search through all the existing keys in the bucket to determine whether the new key is already there.

The first insertion compares with 0 keys, the second with 1 key, the third compares with 2 keys, and the number of keys compared grows linearly with each insertion. For nn insertions, the total number of keys compared is:

0+1++(n1)=n(n1)2=n2+n20 + 1 + \dots + (n - 1) = \frac{n(n - 1)}{2} = \frac{n^2 + n}{2}

The total list of nn insertions now takes O(n2)O(n^2) compares to complete2, as opposed to the average case of O(n)O(n) compares. The operation will now take far more time than it ought to.

The attack is not just limited to insertion. Every time when an attacked key is being searched for, the same number of keys will be compared, so every single operation that would have been O(1)O(1) now becomes O(n)O(n). These hashmap operations that would otherwise take negligible time will now be severely slower, making the attacker far easier to deplete the program's resources than before.

This, is what we call a hash flooding attack, taken its name from it flooding the same bucket of the hashmap with colliding keys.

We can demonstrate this with the hashmap implementation we wrote earlier:

/// A simple string hasher via the Fowler-Noll-Vo hash function.
/// https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function
fn 
fn string_fnv_hash(s : String) -> UInt

A simple string hasher via the Fowler-Noll-Vo hash function. https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function

string_fnv_hash
(
String
s
:
String
String
) ->
UInt
UInt
{
// In reality this should directly operate on the underlying array of the string let
Bytes
s_bytes
=
fn @moonbitlang/core/encoding/utf16.encode(str : StringView, bom? : Bool, endianness? : @encoding/utf16.Endian) -> Bytes

Encodes a string into a UTF-16 byte array.

Assuming the string is valid.

@encoding/utf16.encode
(
String
s
)
let mut
UInt
acc
:
UInt
UInt
= 0x811c9dc5
for
Byte
b
in
Bytes
s_bytes
{
UInt
acc
= (
UInt
acc
fn BitXOr::lxor(self : UInt, other : UInt) -> UInt

Performs a bitwise XOR (exclusive OR) operation between two unsigned 32-bit integers. Each bit in the result is set to 1 if the corresponding bits in the operands are different, and 0 if they are the same.

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand.

Returns the result of the bitwise XOR operation.

Example:

let a = 0xFF00U // Binary: 1111_1111_0000_0000
let b = 0x0F0FU // Binary: 0000_1111_0000_1111
inspect(a ^ b, content="61455") // Binary: 1111_0000_0000_1111
^
Byte
b
.
fn Byte::to_uint(self : Byte) -> UInt

Converts a Byte to a UInt.

Parameters:

  • byte : The Byte value to be converted.

Returns the UInt representation of the Byte.

to_uint
())
fn Mul::mul(self : UInt, other : UInt) -> UInt

Performs multiplication between two unsigned 32-bit integers. The result wraps around if it exceeds the maximum value of UInt.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand.

Returns the product of the two unsigned integers. If the result exceeds the maximum value of UInt (4294967295), it wraps around to the corresponding value modulo 2^32.

Example:

let a = 3U
let b = 4U
inspect(a * b, content="12")
let max = 4294967295U
inspect(max * 2U, content="4294967294") // Wraps around to max * 2 % 2^32
*
0x01000193
}
UInt
acc
} fn
fn test_attack(n_buckets : Int, keys : Array[String], hash_fn : (String) -> UInt) -> Int
test_attack
(
Int
n_buckets
:
Int
Int
,
Array[String]
keys
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
String
String
],
(String) -> UInt
hash_fn
: (
String
String
) ->
UInt
UInt
,
) ->
Int
Int
{
let
MyHashMap[String, Int]
map
= {
Array[ChainingBucket[String, Int]]
buckets
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
::
fn[T] Array::makei(length : Int, value : (Int) -> T raise?) -> Array[T] raise?

Creates a new array of the specified length, where each element is initialized using an index-based initialization function.

Parameters:

  • length : The length of the new array. If length is less than or equal to 0, returns an empty array.
  • initializer : A function that takes an index (starting from 0) and returns a value of type T. This function is called for each index to initialize the corresponding element.

Returns a new array of type Array[T] with the specified length, where each element is initialized using the provided function.

Example:

let arr = Array::makei(3, i => i * 2)
inspect(arr, content="[0, 2, 4]")
makei
(
Int
n_buckets
, _ => {
Array[(String, Int)]
values
: [] }),
(String) -> UInt
hash_fn
}
let mut
Int
total_searches
= 0
for
String
key
in
Array[String]
keys
{
Int
total_searches
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+=
MyHashMap[String, Int]
map
.
fn[K : Eq, V] MyHashMap::insert(self : MyHashMap[K, V], key : K, value : V) -> Int

Insert a new key-value pair.

Returns the number of searches done.

insert
(
String
key
, 0)
}
Int
total_searches
} test {
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Demonstrate hash flooding attack")
let
Int
bucket_count
= 2048
let
Int
target_bucket_id
= 42
let
Int
n_collision_want
= 1000
//
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("First, try to insert non-colliding keys.")
let
Array[String]
non_colliding_keys
=
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
::
fn[T] Array::makei(length : Int, value : (Int) -> T raise?) -> Array[T] raise?

Creates a new array of the specified length, where each element is initialized using an index-based initialization function.

Parameters:

  • length : The length of the new array. If length is less than or equal to 0, returns an empty array.
  • initializer : A function that takes an index (starting from 0) and returns a value of type T. This function is called for each index to initialize the corresponding element.

Returns a new array of type Array[T] with the specified length, where each element is initialized using the provided function.

Example:

let arr = Array::makei(3, i => i * 2)
inspect(arr, content="[0, 2, 4]")
makei
(
Int
n_collision_want
,
Int
i
=> (
Int
i
fn Mul::mul(self : Int, other : Int) -> Int

Multiplies two 32-bit integers. This is the implementation of the * operator for Int.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns the product of the two integers. If the result overflows the range of Int, it wraps around according to two's complement arithmetic.

Example:

inspect(42 * 2, content="84")
inspect(-10 * 3, content="-30")
let max = 2147483647 // Int.max_value
inspect(max * 2, content="-2") // Overflow wraps around
*
37).
fn Int::to_string(self : Int, radix~ : Int) -> String

Converts an integer to its string representation in the specified radix (base). Example:

inspect((255).to_string(radix=16), content="ff")
inspect((-255).to_string(radix=16), content="-ff")
to_string
(
Int
radix
=36))
let
Int
n_compares_nc
=
fn test_attack(n_buckets : Int, keys : Array[String], hash_fn : (String) -> UInt) -> Int
test_attack
(
Int
bucket_count
,
Array[String]
non_colliding_keys
,
fn string_fnv_hash(s : String) -> UInt

A simple string hasher via the Fowler-Noll-Vo hash function. https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function

string_fnv_hash
,
)
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
(
"Total compares for \{
Int
n_collision_want
} non-colliding keys: \{
Int
n_compares_nc
}",
)
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("")
//
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Now, we want all keys to collide into bucket #\{
Int
target_bucket_id
}.")
let
Array[String]
colliding_keys
=
fn find_collision(bucket_count : Int, target_bucket : Int, n_collision_want : Int, hash_fn : (String) -> UInt) -> Array[String]
find_collision
(
Int
bucket_count
,
Int
target_bucket_id
,
Int
n_collision_want
,
fn string_fnv_hash(s : String) -> UInt

A simple string hasher via the Fowler-Noll-Vo hash function. https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function

string_fnv_hash
,
)
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Found \{
Array[String]
colliding_keys
.
fn[T] Array::length(self : Array[T]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length
()} colliding keys.")
let
Int
n_compares_c
=
fn test_attack(n_buckets : Int, keys : Array[String], hash_fn : (String) -> UInt) -> Int
test_attack
(
Int
bucket_count
,
Array[String]
colliding_keys
,
fn string_fnv_hash(s : String) -> UInt

A simple string hasher via the Fowler-Noll-Vo hash function. https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function

string_fnv_hash
)
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
(
"Total compares for \{
Int
n_collision_want
} colliding keys: \{
Int
n_compares_c
}",
) // let
Double
increase
=
Int
n_compares_c
.
fn Int::to_double(self : Int) -> Double

Converts a 32-bit integer to a double-precision floating-point number. The conversion preserves the exact value since all integers in the range of Int can be represented exactly as Double values.

Parameters:

  • self : The 32-bit integer to be converted.

Returns a double-precision floating-point number that represents the same numerical value as the input integer.

Example:

let n = 42
inspect(n.to_double(), content="42")
let neg = -42
inspect(neg.to_double(), content="-42")
to_double
()
fn Div::div(self : Double, other : Double) -> Double

Performs division between two double-precision floating-point numbers. Follows IEEE 754 standard for floating-point arithmetic, including handling of special cases like division by zero (returns infinity) and operations involving NaN.

Parameters:

  • self : The dividend (numerator) in the division operation.
  • other : The divisor (denominator) in the division operation.

Returns the result of dividing self by other. Special cases follow IEEE 754:

  • Division by zero returns positive or negative infinity based on the dividend's sign
  • Operations involving NaN return NaN
  • Division of infinity by infinity returns NaN

Example:

inspect(6.0 / 2.0, content="3")
inspect(-6.0 / 2.0, content="-3")
inspect(1.0 / 0.0, content="Infinity")
/
Int
n_compares_nc
.
fn Int::to_double(self : Int) -> Double

Converts a 32-bit integer to a double-precision floating-point number. The conversion preserves the exact value since all integers in the range of Int can be represented exactly as Double values.

Parameters:

  • self : The 32-bit integer to be converted.

Returns a double-precision floating-point number that represents the same numerical value as the input integer.

Example:

let n = 42
inspect(n.to_double(), content="42")
let neg = -42
inspect(neg.to_double(), content="-42")
to_double
()
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("The number of compares increased by a factor of \{
Double
increase
}")
}

The output of the code above is:

Demonstrate hash flooding attack
First, try to insert non-colliding keys.
Total compares for 1000 non-colliding keys: 347

Now, with colliding keys...
Found 1000 colliding keys.
Total compares for 1000 colliding keys: 499500
The number of compares increased by a factor of 1439.4812680115274

... as can be seen directly, now the insertion is some 1000 times slower!

In reality, although the number of buckets in hashmaps is not fixed like our examples, they often follow a certain growing sequence, such as doubling or following a list of predefined prime numbers. This growth pattern makes the bucket count very predictable. Thus, an attacker can initiate a hash flooding attack even if they don't know the exact bucket count.

Mitigating hash flooding attacks

Hash flooding attack works because the attacker knows exactly how a hash function works, and how it connects to where the key is inserted into the hashmap. If we change either of them, the attack will no longer work.

Seeded hash function

By far, the easiest way to do this is to prevent the attacker from knowing how the hash algorithm exactly works. This might sound impossible, but the properties of the hash function actually only need to hold within a single hashmap!

When dealing with hashmaps, we don't need a single, global "hash value" that can be used everywhere, because hashmaps don't care about what happens outside them. Simply swapping out the hash function from table to table, and you get something that's unpredictable to the attacker.

But hey, you may say, "we don't have an infinite supply of different hash algorithms!"

Well, you do. Remember that hash functions need to distribute the value across the result space as uniform as possible? That means, for a good hash function, a slight change in the input can cause a large change in the output. So, in order to get a hash function unique to each table, we only need to feed it some data unique to the table before feeding it the data we want to hash. This is called a "seed" to the hash function, and each table can now have a different seed to use.

Let's demonstrate how the seed solves the problem with a seeded hash function and two tables with different seeds:

/// A modified version of the FNV hash before to allow a seed to be used.
fn 
fn string_fnv_hash_seeded(seed : UInt) -> (String) -> UInt

A modified version of the FNV hash before to allow a seed to be used.

string_fnv_hash_seeded
(
UInt
seed
:
UInt
UInt
) -> (
String
String
) ->
UInt
UInt
{
let
Bytes
seed_bytes
=
UInt
seed
.
fn UInt::to_le_bytes(self : UInt) -> Bytes

Converts the UInt to a Bytes in little-endian byte order.

to_le_bytes
()
fn
(String) -> UInt
string_fnv_hash
(
String
s
:
String
String
) ->
UInt
UInt
{
let
Bytes
s_bytes
=
fn @moonbitlang/core/encoding/utf16.encode(str : StringView, bom? : Bool, endianness? : @encoding/utf16.Endian) -> Bytes

Encodes a string into a UTF-16 byte array.

Assuming the string is valid.

@encoding/utf16.encode
(
String
s
)
let mut
UInt
acc
:
UInt
UInt
= 0x811c9dc5
// Mix in the seed bytes. for
Byte
b
in
Bytes
seed_bytes
{
UInt
acc
= (
UInt
acc
fn BitXOr::lxor(self : UInt, other : UInt) -> UInt

Performs a bitwise XOR (exclusive OR) operation between two unsigned 32-bit integers. Each bit in the result is set to 1 if the corresponding bits in the operands are different, and 0 if they are the same.

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand.

Returns the result of the bitwise XOR operation.

Example:

let a = 0xFF00U // Binary: 1111_1111_0000_0000
let b = 0x0F0FU // Binary: 0000_1111_0000_1111
inspect(a ^ b, content="61455") // Binary: 1111_0000_0000_1111
^
Byte
b
.
fn Byte::to_uint(self : Byte) -> UInt

Converts a Byte to a UInt.

Parameters:

  • byte : The Byte value to be converted.

Returns the UInt representation of the Byte.

to_uint
())
fn Mul::mul(self : UInt, other : UInt) -> UInt

Performs multiplication between two unsigned 32-bit integers. The result wraps around if it exceeds the maximum value of UInt.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand.

Returns the product of the two unsigned integers. If the result exceeds the maximum value of UInt (4294967295), it wraps around to the corresponding value modulo 2^32.

Example:

let a = 3U
let b = 4U
inspect(a * b, content="12")
let max = 4294967295U
inspect(max * 2U, content="4294967294") // Wraps around to max * 2 % 2^32
*
0x01000193
} // Hash the string bytes. for
Byte
b
in
Bytes
s_bytes
{
UInt
acc
= (
UInt
acc
fn BitXOr::lxor(self : UInt, other : UInt) -> UInt

Performs a bitwise XOR (exclusive OR) operation between two unsigned 32-bit integers. Each bit in the result is set to 1 if the corresponding bits in the operands are different, and 0 if they are the same.

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand.

Returns the result of the bitwise XOR operation.

Example:

let a = 0xFF00U // Binary: 1111_1111_0000_0000
let b = 0x0F0FU // Binary: 0000_1111_0000_1111
inspect(a ^ b, content="61455") // Binary: 1111_0000_0000_1111
^
Byte
b
.
fn Byte::to_uint(self : Byte) -> UInt

Converts a Byte to a UInt.

Parameters:

  • byte : The Byte value to be converted.

Returns the UInt representation of the Byte.

to_uint
())
fn Mul::mul(self : UInt, other : UInt) -> UInt

Performs multiplication between two unsigned 32-bit integers. The result wraps around if it exceeds the maximum value of UInt.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand.

Returns the product of the two unsigned integers. If the result exceeds the maximum value of UInt (4294967295), it wraps around to the corresponding value modulo 2^32.

Example:

let a = 3U
let b = 4U
inspect(a * b, content="12")
let max = 4294967295U
inspect(max * 2U, content="4294967294") // Wraps around to max * 2 % 2^32
*
0x01000193
}
UInt
acc
}
(String) -> UInt
string_fnv_hash
} test {
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Demonstrate flooding attack mitigation")
let
Int
bucket_count
= 2048
let
Int
target_bucket_id
= 42
let
Int
n_collision_want
= 1000
// The first table has a seed of 42. let
UInt
seed1
:
UInt
UInt
= 42
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("We find collisions using the seed \{
UInt
seed1
}")
let
(String) -> UInt
hash_fn1
=
fn string_fnv_hash_seeded(seed : UInt) -> (String) -> UInt

A modified version of the FNV hash before to allow a seed to be used.

string_fnv_hash_seeded
(
UInt
seed1
)
let
Array[String]
colliding_keys
=
fn find_collision(bucket_count : Int, target_bucket : Int, n_collision_want : Int, hash_fn : (String) -> UInt) -> Array[String]
find_collision
(
Int
bucket_count
,
Int
target_bucket_id
,
Int
n_collision_want
,
(String) -> UInt
hash_fn1
,
) let
Int
n_compares_c
=
fn test_attack(n_buckets : Int, keys : Array[String], hash_fn : (String) -> UInt) -> Int
test_attack
(
Int
bucket_count
,
Array[String]
colliding_keys
,
(String) -> UInt
hash_fn1
)
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
(
"Total compares for \{
Int
n_collision_want
} colliding keys with seed \{
UInt
seed1
}: \{
Int
n_compares_c
}",
)
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("")
// The second table has a different seed let
UInt
seed2
:
UInt
UInt
= 100
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
(
"We now use a different seed for the second table, this time \{
UInt
seed2
}",
) let
(String) -> UInt
hash_fn2
=
fn string_fnv_hash_seeded(seed : UInt) -> (String) -> UInt

A modified version of the FNV hash before to allow a seed to be used.

string_fnv_hash_seeded
(
UInt
seed2
)
let
Int
n_compares_nc
=
fn test_attack(n_buckets : Int, keys : Array[String], hash_fn : (String) -> UInt) -> Int
test_attack
(
Int
bucket_count
,
Array[String]
colliding_keys
,
(String) -> UInt
hash_fn2
)
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
(
"Total compares for \{
Int
n_collision_want
} keys that were meant to collide with seed \{
UInt
seed1
}: \{
Int
n_compares_nc
}",
) }

The output of the program above was:

Demonstrate flooding attack mitigation
We find collisions using 42
Total compares for 1000 colliding keys with seed 42: 499500

We now use a different seed for the second table, this time 100
Total compares for 1000 keys that were meant to collide with seed 42: 6342

We can see that, the keys that were colliding in the first table are not colliding in the second. 3 Therefore, we have successfully mitigated the hash flooding attack using this simple trick.

As of where the seed that randomizes each hashmap comes from... For programs with access to an external random source (like Linux's /dev/urandom), using that would generally be the best choice. For programs without such access (such as within a WebAssembly sandbox), a per-process random seed is also a preferrable solution (this is what Python does). Even simpler, a simple counter that increments with each seeding attempt could be good enough -- guessing how many hashmaps have been created can still be quite hard for an attacker.

Other choices

Java uses a different solution, by falling back to a binary search tree (red-black tree) when too many values occupy the same bucket. Yes, this requires the keys to be also comparable in addition to being hashable, but now it guarantees O(logn)O(\log n) worst-case complexity, which is far better than O(n)O(n).

Why does it matter to us?

Due to the ubiquitous nature of hashmaps, it's extremely easy to find some hashmap in a program where you can control the keys, especially in Web programs. Headers, cookies, query parameters and JSON bodies are all key-value pairs, and often stored in hashmaps, which might be vulnerable to hash flooding attacks.

A malicious attacker with enough knowledge of the program (programming language, frameworks, etc.) can then try to send carefully-crafted request payloads to the Web API endpoints. These requests take a lot longer to handle, so if a regular denial-of-service (DoS) attack takes n requests/s to bring down a server, a hash flooding attack might only a tiny fraction of that number, often a magnitude smaller -- making it far more efficient for the attacker. This turns the DoS attack into a HashDoS attack.

Fortunately, by introducing some even slightly unpredictable patterns (such as a per-process randomness or keyed hashing) into hashmaps, we can make such attack significantly harder, often impractical. Also, as such attack is highly dependent on the language, framework, architecture and implementation of target application, crafting one could be quite hard already, and modern, well-configured systems are even more harder to exploit.

Takeaways

Hashmaps give us powerful, constant-time average access -- but that "constant" depends on assumptions an attacker can sometimes break. A targeted hash-flooding attack forces many keys into the same bucket and turns O(1) operations into O(n), enabling highly efficient resource exhaustion.

The good news is the mitigations are simple and practical: introduce some unpredictableness to your hashmaps, use side-channel information when hash alone is not enough, or rehash when the behavior doesn't look right. With these, we can keep our hashmaps fast and secure.

Footnotes

  1. Side note, this is also similar to how Bitcoin mining works -- finding a value to add to an existing string, so the hash of the entire thing (with bits reversed), modulo some given value, is zero.

  2. There's even a Tumblr blog for unexpected quadratic complexity in programming languages, Accidentally Quadratic. You can even find a hashmap-related one here! -- It's almost a manually-introduced hash flooding attack.

  3. You may notice that this number is still slightly higher than that we got with randomly-generated, non-colliding keys. This might be related to that FNV is not designed for the best quality of its output. Since the two seeds are pretty close to each other, the result might still have some similarity. Using a better hash function (or even a cryptographically-secure one like SipHash) would greatly reduce this effect.