⚙️ DevOps Utilities
What Is the CSV Format? Delimiters, Quoting & Pitfalls
By Justin Le
· 6 min read · Updated June 27, 2026 CSV is the lingua franca of tabular data — every spreadsheet and data tool reads it. It looks almost too simple to need explaining, but the details (especially quoting) are where data gets silently corrupted. Here's what you actually need to know.
What is CSV?
CSV stands for Comma-Separated Values. It's a plain-text format where each line is a row and fields within a row are separated by a delimiter, usually a comma. The first row is typically a header naming each column. That's the whole idea — its simplicity is why it's so universal.
name,role,active
Alice,admin,true
Bob,user,false
The quoting rules (where it gets tricky)
The simplicity breaks down as soon as a value contains the delimiter. What if a name is "Doe, John"? The comma would be misread as a field separator. The convention, formalised in RFC 4180, handles this with quoting:
- A field containing a comma, quote or newline is wrapped in double quotes.
- A double quote inside a quoted field is escaped by doubling it (
"").
So Doe, John becomes "Doe, John", and a value containing a quote
like say "hi" becomes "say ""hi""". A naive "split on comma" parser
gets these wrong — which is why you should use a proper CSV parser.
Values can even contain newlines
Because of quoting, a single field can span multiple lines if it's wrapped in quotes. That means you can't reliably parse CSV by splitting on line breaks either — a quoted newline is part of the value, not a new row. This trips up many home-grown parsers.
Common pitfalls
- Delimiter confusion. Some locales use a semicolon (
;) instead of a comma, because the comma is their decimal separator. "CSV" doesn't always mean comma. - Lost leading zeros. Spreadsheets may read
00123or a phone number as a number and drop the zeros. CSV itself has no types — everything is text. - Encoding issues. Non-ASCII characters need a consistent encoding (UTF-8); the wrong one turns accents into garbage.
- No nested data. CSV is flat. Nested objects or arrays have no native representation and must be flattened or serialised.
CSV vs JSON
CSV is compact and perfect for flat, tabular data that goes into spreadsheets. JSON handles nested structures and carries (some) type information, making it better for APIs and complex data. Converting between them is a common task — just remember that going to CSV flattens and stringifies everything. See our JSON vs YAML guide for the broader format picture.
Try it
Convert a JSON array of objects to clean, correctly-quoted CSV — and back — with our JSON ↔ CSV converter, which handles the RFC 4180 quoting for you. Tidy the JSON side first with the JSON formatter.
Frequently asked questions
How does CSV handle a comma inside a value?
The field is wrapped in double quotes, so 'Doe, John' becomes "Doe, John". A double quote inside a quoted field is escaped by doubling it. This is the RFC 4180 convention.
Can a CSV value contain a line break?
Yes, if the field is wrapped in double quotes. That's why you can't reliably parse CSV by splitting on newlines — a quoted newline is part of the value, not a row boundary.
Why did my leading zeros or long numbers change in CSV?
CSV has no types — every field is text. Spreadsheets often interpret values like 00123 or long IDs as numbers and reformat them. Keep such fields as text, or use a format that preserves types.
Try the related tools
Related guides
- JSON vs YAML: When to Use Each JSON and YAML describe the same data in very different styles. Here's how they compare, the YAML traps to watch for, and which to reach for when.
- Unix Timestamps Explained: Epoch, Seconds vs Milliseconds Epoch time, demystified: what the number really means, the seconds-vs-milliseconds bug that bites everyone, and why timestamps have no timezone.
- 301 vs 302 Redirects: Which to Use (and Why It Matters for SEO) 301 or 302? The wrong choice can quietly tank your SEO. Here's what each redirect means, when to use it, and the chain mistakes to avoid.