Back to blog
Blog

CSV files explained: how they work and break in production

CSV files look simple. In real B2B SaaS onboarding flows, they break in 8 predictable ways. Learn what causes each and how to handle it.

Alain TiembloCo-founder, CTO

Who this article is for

This article is written for CTOs, Heads of Product, and technical founders at B2B SaaS companies. If your customers send you data files as part of their onboarding, and you are wondering whether to build your own CSV importer or why your current import flow is slowing customer activation, this article is for you. If you handle customer file imports at scale, the production problems described here will be familiar.

What a CSV file actually is

A CSV (Comma-Separated Values) file is a plain text file where each line represents a row of data, and each value within the row is separated by a delimiter, usually a comma. That is the entirety of the standard.

RFC 4180 specifies the format. A header row contains column names. Subsequent rows contain data. Fields may optionally be enclosed in double quotes, which is necessary when a field itself contains a comma or a line break. CSV's simplicity is its original advantage: any spreadsheet, any ERP, any legacy system can export CSV. No proprietary format, no special software to open. In 1983, that was a meaningful advantage. In 2026, it is also the reason CSV imports fail in production.

Why CSV looks simple but breaks in production

The RFC 4180 standard is optional. Most CSV files in the wild do not conform to it strictly. Different systems use different delimiters: semicolons in French Excel exports, tabs in many database exports. Different encodings: UTF-8 vs Windows-1252 vs Latin-1. Different line ending conventions: CRLF vs LF.

When you test with your own CSV files, everything works. When your customers send their files, exported from a French accounting tool, a Spanish ERP, a UK logistics platform, the edge cases appear. The file looks fine in Excel. Your parser chokes on it. This is not a bug. It is the structural reality of a format that was designed for simple use cases and stretched to handle production data flows it was never meant for.

The 8 most common CSV errors in B2B SaaS onboarding flows

WeTransform processes thousands of customer file imports each month. Across that volume, 8 failure patterns appear with high frequency.

  1. Encoding mismatch. The file was saved in Windows-1252 or Latin-1 but your system expects UTF-8. Special characters become garbage. Silent failures are common: the file imports successfully, but the data is corrupted.

  2. Wrong delimiter. French and German Excel exports use semicolons by default. CSV parsers expecting commas split each row as a single column. The file appears to import with one column and zero data.

  3. Inconsistent quoting. Some fields are quoted, others are not. Some fields contain unescaped quotes within quoted strings. The parser loses track of where each field ends.

  4. Extra columns in data rows. A customer adds a column to their export after you built the importer. The rows no longer match the expected schema. The parser either errors or silently drops the extra data.

  5. Missing required fields. A column that your system requires is blank or absent in the customer's export. Without explicit validation, null values propagate downstream.

  6. Date and number format variations. '01/06/2026' means January 6 in the US and June 1 in France. '1,500.00' and '1.500,00' both represent fifteen hundred, depending on locale. Neither format is wrong at the source. Both will break a parser that assumes one convention.

  7. BOM (Byte Order Mark) in the header. Some Windows tools prepend a UTF-8 BOM to the first row. The first column name contains invisible characters that cause schema matching to fail.

  8. Schema drift. The customer changed their export template three months ago. Column names shifted. Your import code, written against the original schema, silently skips or misroutes the data.

8 most common CSV errors in B2B SaaS customer onboarding imports: encoding, delimiter, quoting, extra columns, missing fields, date formats, BOM, schema drift

Each of these errors is individually fixable. Together, across a growing customer base where every customer exports from a different tool, they represent a maintenance surface that expands with every new logo.

What a production-ready CSV importer handles automatically

A CSV importer that handles the 8 breakage patterns above does three things that a basic parser does not.

It detects format automatically. Delimiter, encoding, quoting style, date locale: a production importer infers these from the file itself rather than requiring the customer to specify them. When Sellermania reduced customer onboarding from 3 days to 2 hours, automatic detection was a major factor. No more back-and-forth asking the customer to re-export in a specific format.

It validates against the expected schema before ingesting. Instead of failing after ingestion, a production importer checks that required columns are present, that data types match expectations, and that missing values are flagged, before a single row enters the system. This moves the error handling from your support queue to the customer's browser. See B2B SaaS data import use cases for examples across logistics, e-commerce, and financial services.

It maps intelligently when schemas drift. When a customer's export uses 'client_ref' instead of 'Customer ID', AI-assisted mapping detects the correspondence and applies it automatically. This eliminates the silent data misrouting that breaks downstream processes.

Production CSV import flow: from raw customer file through AI detection, validation, and clean delivery to the destination system

When CSV is not the right format (and what to use instead)

CSV is appropriate when the data is flat (one row per record, consistent columns), the customer's system can export it reliably, and the schema is stable. E-commerce product catalogs, contact lists, transaction exports: these are well-suited to CSV.

CSV becomes the wrong choice when the data is hierarchical (orders with line items, invoices with multiple tax lines), when the schema changes frequently, or when precision is critical and encoding ambiguity is unacceptable. In these cases, JSON and XML provide the structure and type safety that CSV cannot. That said: most of your customers will continue to send CSV files. The real question is not which format to accept, but whether your import layer can handle what customers actually send, rather than what you wish they would send.

Building a CSV import layer vs embedding one

If you decide to build a production-grade CSV importer in-house, the scope is larger than it first appears. You need an encoding detection layer, a delimiter inference engine, a schema validation library, an error presentation layer that customers can act on, an AI mapping layer for schema drift, and ongoing maintenance as customers add new export sources.

The embedded CSV import decision guide and the why not build internally page cover the full build vs embed argument. The short version: the initial build is the smaller cost. The maintenance cost compounds with every new customer format, and engineering time spent on import edge cases is time not spent on core product.

Embedding an existing import layer eliminates the maintenance surface. Your customers bring whatever format they have. Your system receives clean, mapped data. Engineering moves on. If your customer onboarding includes a data import step and that step is currently manual, measured in days, or producing support tickets, the embedded approach deserves a closer look.

See how WeTransform handles CSV imports in a live product context.

Get started

See it in action

Try the interactive demo, or book a call to walk through your specific import workflow with our team.

Stay in the loop

Every two weeks, what we learn building WeTransform: product, market, method.