As a follow-up to my last post on computing the size of a HashMap I've compiled a list of some common data types and their memory footprints. The caveat from the previous post applies: the following only applies to GHC.
N is the length of the value (for string types) or the number of elements (for container types).
|Data type||Memory footprint||Notes|
||9 words + N bytes||1|
||6 words + 2N bytes||1, 2|
||6N words + size of keys & values|
||5N words + size of elements|
||3N + 5(N-1) words + size of values||4|
||2N + 5(N-1) words||4|
||5N + 4(N-1) words + size of keys & values||4|
||5N + 4(N-1) words + size of elements||4, 5|
For example, on a 32-bit machine a
HashMap Text Int where the average string length is 8 characters will use roughly
4 * (5N + 4(N-1)) + N * (4 * 6 + 16) + 4 * 2N = 84N - 16
bytes, or about 84 bytes per key-value pair.
N.B. The given sizes assumes that the data type isn't
UNPACKed into some other data type, which decreases the total memory usage.
The size of the content is rounded up to a multiple of the machine word size.
Data.Textstores the Unicode string encoded using UTF-16. The memory usage depends on the actual code points in the string, but can be approximated as 2N bytes.
GHC allocates a small pool of shared
Chars so the actual memory usage might be smaller depending, on the content of the string. In addition, if the
Stringisn't fully evaluated it might us more or less space, depending on the size of the thunk representing the rest of the computation.
Assuming perfectly balanced tree.
At the time of writing
HashSetis a simple wrapper around
HashMap. The overhead could easily be decreased by N words to 4N + 4(N-1) words by not storing a dummy value.