Understanding Strings in Go — More Than Just Text
When you first start working with Go (Golang), strings might appear simple. You wrap some text in quotes, concatenate them, print them — and everything seems fine. But under the hood, strings in Go are not just a series of characters. They are far more structured, optimized, and — for many beginners — surprising.
Let’s dive deep into how Go treats strings, what makes them different from languages like Java or Python, and how to handle them effectively.
🔤 What Exactly Is a String?
In many languages such as Java, a string is basically a collection of characters (char
), each representing a textual symbol in Unicode.
But in Go, a string is a read-only slice of bytes that contains UTF-8 encoded text.
In other words, Go strings are byte-based, not character-based.
You can think of them as:
“An immutable view of a sequence of bytes that represent Unicode text.”
💡 Why UTF-8?
Let’s rewind for a bit.
Computers understand numbers, not letters. To bridge that gap, early programming languages adopted ASCII (American Standard Code for Information Interchange). ASCII uses 7 bits, which means it can represent only 128 characters — enough for English letters, numbers, and a few control symbols.
But the world speaks more than English — and we love emojis.
ASCII couldn’t represent ñ
, 你
, or 🙂
.
To fix this limitation, the Unicode standard was born.
🌍 Unicode — One Code Point to Rule Them All
Unicode assigns a unique number, called a code point, to every symbol, letter, or emoji across all languages.
Each code point is written like this: U+XXXX
.
For example:
'A'
→U+0041
'ñ'
→U+00F1
'你'
→U+4F60
'🙂'
→U+1F642
But we still need a way to store these numbers efficiently in memory — that’s where UTF-8 comes in.
🧩 What Is UTF-8?
UTF-8 (Unicode Transformation Format, 8-bit) is a clever encoding scheme for storing Unicode code points as bytes.
It’s variable-length, meaning each character can use 1 to 4 bytes depending on its complexity.
Character | Unicode | UTF-8 Bytes (Hex) | Bytes Used |
---|---|---|---|
A | U+0041 | 41 | 1 |
ñ | U+00F1 | C3 B1 | 2 |
あ | U+3042 | E3 81 82 | 3 |
🙂 | U+1F642 | F0 9F 99 82 | 4 |
UTF-8 is efficient for English (since ASCII stays 1 byte) and universal for everything else — which is why Go adopted UTF-8 as the default encoding for strings.
⚙️ How Go Represents a String Internally
In Go, a string isn’t stored as a plain text sequence.
It’s actually a small descriptor structure (like a mini struct
) that holds two things:
- A pointer to the actual bytes in memory.
- The length of the string (in bytes).
Roughly speaking, Go’s internal representation is equivalent to:
type stringStruct struct {
str *byte // pointer to string data
len int // length in bytes
}
This means:
len("hello")
→5
len("🙂")
→4
(because the emoji takes 4 bytes)
Important: len()
in Go counts bytes, not characters.
🧱 Strings Are Immutable
Once you create a string in Go, you cannot modify it. Any operation that seems to change a string (such as concatenation) actually creates a new string.
Example:
s := "hello"
s2 := s + " world"
fmt.Println(s) // "hello"
fmt.Println(s2) // "hello world"
Here, "hello"
and "hello world"
occupy different areas in memory.
This immutability ensures safety, predictability, and easy sharing across goroutines without needing locks.
🔡 The Rune Type — Go’s Way of Representing Characters
In Go, a rune is an alias for int32
, representing a single Unicode code point.
If you want to handle actual characters (not bytes), you must convert your string into a slice of runes.
Example:
s := "🙂hello"
fmt.Println(len(s)) // 8 bytes
fmt.Println(utf8.RuneCountInString(s)) // 6 runes
for i, r := range s {
fmt.Printf("%d: %c (U+%04X)\n", i, r, r)
}
Output:
0: 🙂 (U+1F642)
4: h (U+0068)
5: e (U+0065)
6: l (U+006C)
7: l (U+006C)
8: o (U+006F)
Notice how the emoji starts at byte position 0
but takes 4 bytes.
🧰 Common Gotchas and Best Practices
1. Avoid Direct Indexing
If you index a string directly (s[i]
), you get a byte, not a character.
To safely access characters, use a for range
loop or convert to []rune
.
s := "你好"
fmt.Println(s[0]) // 228 (just a byte)
fmt.Println(string(s[0])) // Invalid partial rune
2. Use utf8 Package for Counting
Use utf8.RuneCountInString()
to count actual characters (runes), not bytes.
3. Conversions Between []byte and string
Converting between string
and []byte
is common, especially for I/O operations:
b := []byte("hello") // String to bytes
s := string(b) // Bytes to string
Be cautious — each conversion creates a copy, which can impact performance in large data processing.
4. String Comparison
String comparison in Go is lexicographical and case-sensitive by default:
fmt.Println("apple" < "banana") // true
5. Multi-line Strings
Go supports multi-line strings using backticks:
msg := `This is
a multi-line
string.`
This is useful for embedding templates or SQL queries without escaping.
🧠 Performance and Memory Considerations
- Immutable data = fewer race conditions.
- Copy on conversion = potential performance cost in loops.
- Small descriptor (16 bytes) = lightweight for variable passing.
- UTF-8 = compact for ASCII-heavy text, universal for global apps.
If you’re manipulating large amounts of text, consider using strings.Builder
for efficient concatenation:
var sb strings.Builder
sb.WriteString("Hello, ")
sb.WriteString("World!")
fmt.Println(sb.String())
This avoids unnecessary allocations compared to using +
.
🔒 Thread Safety
Because strings are immutable, they are thread-safe by design.
You can safely share a string across multiple goroutines without synchronization — a huge advantage when writing concurrent Go programs.
📜 Summary Table
Concept | Description |
---|---|
String | Immutable UTF-8 byte slice |
Rune | int32 , represents a Unicode code point |
len() | Returns byte count, not rune count |
UTF-8 | Variable-length Unicode encoding (1–4 bytes) |
Immutability | Strings cannot be changed once created |
Thread-safety | Strings are safe for concurrent reads |
Best Practice | Use rune or utf8 package for character-level work |
✨ Finally
Strings in Go are simple on the surface but deep in design.
Their UTF-8 foundation, immutability, and lightweight representation make them efficient and reliable — ideal for modern, multilingual software.
Once you understand that a Go string is just an immutable byte sequence with UTF-8 meaning, you unlock a more powerful way to work with text — one that’s efficient, safe, and globally compatible.
Comments ()