Understanding Strings in Go — More Than Just Text

Understanding Strings in Go — More Than Just Text
Photo by Liana S / Unsplash

When you first start working with Go (Golang), strings might appear simple. You wrap some text in quotes, concatenate them, print them — and everything seems fine. But under the hood, strings in Go are not just a series of characters. They are far more structured, optimized, and — for many beginners — surprising.

Let’s dive deep into how Go treats strings, what makes them different from languages like Java or Python, and how to handle them effectively.


🔤 What Exactly Is a String?

In many languages such as Java, a string is basically a collection of characters (char), each representing a textual symbol in Unicode.

But in Go, a string is a read-only slice of bytes that contains UTF-8 encoded text.

In other words, Go strings are byte-based, not character-based.
You can think of them as:

“An immutable view of a sequence of bytes that represent Unicode text.”

💡 Why UTF-8?

Let’s rewind for a bit.

Computers understand numbers, not letters. To bridge that gap, early programming languages adopted ASCII (American Standard Code for Information Interchange). ASCII uses 7 bits, which means it can represent only 128 characters — enough for English letters, numbers, and a few control symbols.

But the world speaks more than English — and we love emojis.
ASCII couldn’t represent ñ, , or 🙂.

To fix this limitation, the Unicode standard was born.


🌍 Unicode — One Code Point to Rule Them All

Unicode assigns a unique number, called a code point, to every symbol, letter, or emoji across all languages.
Each code point is written like this: U+XXXX.

For example:

  • 'A'U+0041
  • 'ñ'U+00F1
  • '你'U+4F60
  • '🙂'U+1F642

But we still need a way to store these numbers efficiently in memory — that’s where UTF-8 comes in.


🧩 What Is UTF-8?

UTF-8 (Unicode Transformation Format, 8-bit) is a clever encoding scheme for storing Unicode code points as bytes.
It’s variable-length, meaning each character can use 1 to 4 bytes depending on its complexity.

Character Unicode UTF-8 Bytes (Hex) Bytes Used
A U+0041 41 1
ñ U+00F1 C3 B1 2
U+3042 E3 81 82 3
🙂 U+1F642 F0 9F 99 82 4

UTF-8 is efficient for English (since ASCII stays 1 byte) and universal for everything else — which is why Go adopted UTF-8 as the default encoding for strings.


⚙️ How Go Represents a String Internally

In Go, a string isn’t stored as a plain text sequence.
It’s actually a small descriptor structure (like a mini struct) that holds two things:

  1. A pointer to the actual bytes in memory.
  2. The length of the string (in bytes).

Roughly speaking, Go’s internal representation is equivalent to:

type stringStruct struct {
    str *byte // pointer to string data
    len int   // length in bytes
}

This means:

  • len("hello")5
  • len("🙂")4 (because the emoji takes 4 bytes)
Important: len() in Go counts bytes, not characters.

🧱 Strings Are Immutable

Once you create a string in Go, you cannot modify it. Any operation that seems to change a string (such as concatenation) actually creates a new string.

Example:

s := "hello"
s2 := s + " world"

fmt.Println(s)  // "hello"
fmt.Println(s2) // "hello world"

Here, "hello" and "hello world" occupy different areas in memory.

This immutability ensures safety, predictability, and easy sharing across goroutines without needing locks.


🔡 The Rune Type — Go’s Way of Representing Characters

In Go, a rune is an alias for int32, representing a single Unicode code point.
If you want to handle actual characters (not bytes), you must convert your string into a slice of runes.

Example:

s := "🙂hello"
fmt.Println(len(s))                    // 8 bytes
fmt.Println(utf8.RuneCountInString(s)) // 6 runes

for i, r := range s {
    fmt.Printf("%d: %c (U+%04X)\n", i, r, r)
}

Output:

0: 🙂 (U+1F642)
4: h (U+0068)
5: e (U+0065)
6: l (U+006C)
7: l (U+006C)
8: o (U+006F)

Notice how the emoji starts at byte position 0 but takes 4 bytes.


🧰 Common Gotchas and Best Practices

1. Avoid Direct Indexing

If you index a string directly (s[i]), you get a byte, not a character.
To safely access characters, use a for range loop or convert to []rune.

s := "你好"
fmt.Println(s[0])         // 228 (just a byte)
fmt.Println(string(s[0])) // Invalid partial rune

2. Use utf8 Package for Counting

Use utf8.RuneCountInString() to count actual characters (runes), not bytes.

3. Conversions Between []byte and string

Converting between string and []byte is common, especially for I/O operations:

b := []byte("hello")  // String to bytes
s := string(b)        // Bytes to string

Be cautious — each conversion creates a copy, which can impact performance in large data processing.

4. String Comparison

String comparison in Go is lexicographical and case-sensitive by default:

fmt.Println("apple" < "banana") // true

5. Multi-line Strings

Go supports multi-line strings using backticks:

msg := `This is
a multi-line
string.`

This is useful for embedding templates or SQL queries without escaping.


🧠 Performance and Memory Considerations

  • Immutable data = fewer race conditions.
  • Copy on conversion = potential performance cost in loops.
  • Small descriptor (16 bytes) = lightweight for variable passing.
  • UTF-8 = compact for ASCII-heavy text, universal for global apps.

If you’re manipulating large amounts of text, consider using strings.Builder for efficient concatenation:

var sb strings.Builder
sb.WriteString("Hello, ")
sb.WriteString("World!")
fmt.Println(sb.String())

This avoids unnecessary allocations compared to using +.


🔒 Thread Safety

Because strings are immutable, they are thread-safe by design.
You can safely share a string across multiple goroutines without synchronization — a huge advantage when writing concurrent Go programs.


📜 Summary Table

Concept Description
String Immutable UTF-8 byte slice
Rune int32, represents a Unicode code point
len() Returns byte count, not rune count
UTF-8 Variable-length Unicode encoding (1–4 bytes)
Immutability Strings cannot be changed once created
Thread-safety Strings are safe for concurrent reads
Best Practice Use rune or utf8 package for character-level work

✨ Finally

Strings in Go are simple on the surface but deep in design.
Their UTF-8 foundation, immutability, and lightweight representation make them efficient and reliable — ideal for modern, multilingual software.

Once you understand that a Go string is just an immutable byte sequence with UTF-8 meaning, you unlock a more powerful way to work with text — one that’s efficient, safe, and globally compatible.

Support Us

Share to Friends