What Is Base64? The Math, the History, and the Story Behind It

A scientific look at Base64 - the formula that defines it, the engineers behind its standards, and why a 1980s solution to an email problem still runs the modern internet.

Advertisement

The Short Answer

Base64 is a binary-to-text encoding scheme: a mathematical method for representing arbitrary binary data (any sequence of bytes - images, PDFs, encrypted keys, audio, anything) using only 64 printable ASCII characters: A-Z, a-z, 0-9, +, and / (with = reserved for padding). It doesn't compress or encrypt data - it simply re-packages it so that systems built only for plain text can carry it safely.

The Mathematics: Why 64, and Why It Works

The number 64 isn't arbitrary - it's a deliberate consequence of binary math. Computers store data as bytes, and each byte is 8 bits. Base64's job is to slice that 8-bit stream into smaller chunks that map cleanly onto a set of safe characters. The key relationship is:

26 = 64

Six bits can represent exactly 64 distinct values (0 through 63) - and 64 is conveniently small enough to fit entirely within the set of upper- and lower-case Latin letters and digits (26 + 26 + 10 = 62), plus two extra symbols (+ and /) to round it out to 64. Every possible 6-bit value therefore has exactly one corresponding character.

The Core Formula: 3 Bytes In, 4 Characters Out

The encoding works by finding the lowest common multiple of 8 (bits per byte) and 6 (bits per Base64 character), which is 24. So Base64 processes data in blocks of:

  • 3 bytes = 24 bits of input
  • 4 characters = 24 bits of output (4 × 6 bits)

The 24 input bits are split into four 6-bit groups. Each group is a number from 0 to 63, which is used as an index into the 64-character alphabet table. The result: every 3 bytes of original data becomes exactly 4 Base64 characters.

This gives Base64 its signature size overhead. The output is always:

output_size = ceil(input_bytes / 3) × 4

...which works out to roughly 4/3, or about a 33% size increase over the original data. If the input length isn't a multiple of 3 bytes, the final group is padded with one or two = characters so the output length always remains a multiple of 4.

A Worked Example

Take the three letters "Man". As bytes, these are 77, 97, 110, or in binary: 01001101 01100001 01101110 (24 bits total). Regrouped into four 6-bit chunks: 010011 010110 000101 10111019, 22, 5, 46 in decimal. Looking those values up in the Base64 alphabet table gives T, W, F, u - so "Man" becomes "TWFu". This exact lookup-table approach is what every Base64 encoder and decoder performs, billions of times a second, across the internet.

The History: From a 1980s Email Problem to a Web Standard

Base64 doesn't have a single inventor or a single "eureka" moment - it evolved gradually to solve one persistent, very practical problem.

The Root Cause: 7-Bit Networks

In the early days of computer networking, protocols like SMTP (email, standardized in 1982) and many transmission systems were built around 7-bit ASCII text. They were never designed to carry raw 8-bit binary data - sending a binary file straight through them could corrupt bytes, trigger control characters, or simply get stripped out entirely. Yet people very quickly wanted to send programs, images, and compressed files over these text-only channels.

The Early Attempts: uuencode and BTOA (Around 1980)

The first widely used answer was uuencode ("Unix-to-Unix encoding"), created around 1980 by Mary Ann Horton for the Unix-to-Unix Copy Program (UUCP). It converted binary files into lines of printable ASCII text so they could be posted to Usenet newsgroups and emailed safely. Around the same time, similar encodings like BTOA ("binary to ASCII") appeared on other systems. These were the direct conceptual ancestors of Base64: take bytes, map them onto a safe character set, send them through a text pipe, and reverse the process on the other end.

Standardization: RFC 1421 and the Birth of "Base64" (1993)

The term "Base64" and its now-familiar 64-character alphabet were formally defined by the Internet Engineering Task Force (IETF) in RFC 1421 (February 1993), part of the "Privacy Enhanced Mail" (PEM) specification effort, authored by a working group that included John Linn. PEM needed a way to encode encrypted and digitally signed email content into plain ASCII, and the 6-bit-to-character scheme it defined became the template for everything that followed.

Going Mainstream: MIME (1993-1996)

Base64 became truly ubiquitous through MIME (Multipurpose Internet Mail Extensions), defined in RFCs 1521 and 1522 (1993), later revised as RFC 2045-2049 (1996) by Nathaniel Borenstein and Ned Freed. MIME adopted Base64 as its standard mechanism for embedding binary attachments - images, documents, audio - inside plain-text email messages. This is the moment Base64 went from a niche encoding to something every email client on Earth had to understand.

The Modern Standard: RFC 4648 (2006)

As Base64 spread far beyond email - into HTTP, XML, JSON, cryptography, and URLs - its many slightly-incompatible variants needed a unifying reference. RFC 4648, "The Base16, Base32, and Base64 Data Encodings," published in October 2006 by Simon Josefsson, is the document most modern systems cite today. It formally defined standard Base64, the URL-and-filename-safe variant ("Base64URL", using - and _ instead of + and /), and clarified padding rules - the exact specification this site's Base64URL tool implements.

Why Base64 Is Still Everywhere Today

Decades after its origins in dial-up email, Base64 remains a foundational piece of internet infrastructure because the underlying problem never went away: text-based formats (JSON, XML, HTML, URLs, JWTs) still can't safely contain raw binary bytes. Today you'll find Base64 used to:

  • Embed images directly into HTML/CSS using data: URIs (see our Image to Base64 tool)
  • Encode the header and payload of JWTs (JSON Web Tokens) for authentication
  • Send binary file attachments through JSON-based APIs
  • Store cryptographic keys, certificates, and hashes as readable text
  • Pass binary data safely inside URL query parameters via Base64URL

Frequently Asked Questions

No single person "invented" it. Base64 evolved from earlier schemes like Unix uuencode (circa 1980, Mary Ann Horton) and was formally defined by the IETF in RFC 1421 (1993) and popularized through MIME (RFCs 2045-2049, edited by Nathaniel Borenstein and Ned Freed). The modern unified spec, RFC 4648, was published in 2006 by Simon Josefsson.

Because 26 = 64. Six bits is the largest "clean" chunk size that maps to a manageable, printable character set with no fractional bits left over, making the math simple and fast.

Data is processed in 3-byte (24-bit) blocks, split into four 6-bit groups (4 × 6 = 24), each mapped to one of 64 characters. Output size = ceil(input bytes / 3) × 4, roughly a 33% increase over the original size.

Early networks like SMTP email were built for 7-bit ASCII text and could corrupt raw binary data. Base64 "wraps" binary data into safe printable characters so it survives these text-only systems intact.

No. It's an encoding with a fixed, public lookup table and no secret key - anyone can decode it instantly. It exists for compatibility, not security.

Try It Yourself

Now that you know the math and the history behind it, see Base64 in action with our free, instant, 100% client-side tools below.