How to Fix CSV Encoding Problems: UTF-8, Shift-JIS, and Beyond

Published: 2026-03-13

You downloaded a CSV, opened it, and instead of readable text you got Ã©, 繧ｱ繧ｫ, or a wall of question marks. That's an encoding mismatch — the file was saved in one character encoding, but your app is reading it in another.

The fix is straightforward once you know which encoding you're dealing with. This guide covers five methods to convert your file (from text editors to Python one-liners), explains the BOM problem that trips up even experienced users, and shows how to stop dealing with encoding issues altogether.

5 Ways to Fix a CSV File with Encoding Problems

Before converting anything, figure out what encoding the file actually uses. Then pick the method that fits your workflow.

Identify the Encoding First (Pattern Cheat Sheet)

Garbled characters aren't random. The specific garbage you see tells you exactly what went wrong.

What you see	What it means	Actual encoding → Read as
`Ã©` instead of `é`	Windows-1252 read as UTF-8	Windows-1252 → UTF-8
`Ã¼` instead of `ü`	Windows-1252 read as UTF-8	Windows-1252 → UTF-8
`â€œ` instead of `"`	Windows-1252 smart quotes read as UTF-8	Windows-1252 → UTF-8
`繧ｱ繧ｫ繧ｫ`	Shift-JIS read as UTF-8	Shift-JIS → UTF-8
`ÿþ` at file start	UTF-16 Little Endian BOM	UTF-16 LE → ASCII/UTF-8
`ï»¿` at file start	UTF-8 BOM displayed as text	UTF-8 BOM → Windows-1252
`?` replacing characters	Encoding conversion lost unmappable characters	Various → lossy conversion

Most text editors show the detected encoding in the status bar. In VS Code, it's in the bottom-right corner. In Notepad++, it's in the bottom bar labeled "UTF-8", "ANSI", etc. If the status bar says "UTF-8" but the text looks wrong, the file probably isn't actually UTF-8 — the editor guessed wrong.

Fix It in a Text Editor — Notepad++, VS Code, or Sublime Text

This is the most common approach and works for any file size you'd open in an editor.

VS Code:

Open the CSV file
Look at the encoding label in the bottom-right status bar (it might say "UTF-8" even if that's wrong)
Click the encoding label → select "Reopen with Encoding"
Choose the correct source encoding (e.g., "Shift JIS", "Windows 1252")
The text should now display correctly
Click the encoding label again → select "Save with Encoding" → choose "UTF-8"

Notepad++:

Open the CSV file
Go to Encoding menu → check what's currently selected
Select "Encode in UTF-8" (or "Encode in UTF-8-BOM" if you need Excel compatibility)
Save the file

The critical distinction here: "Reopen with Encoding" re-reads the raw bytes using a different decoder — it's diagnostic. "Save with Encoding" re-encodes the current text and writes it — it's the actual conversion. If you "Save with Encoding" on a file that's already displaying garbled text, you'll permanently bake in the corruption. Always reopen with the correct encoding first, confirm the text looks right, then save.

Fix It in Excel — the Import Wizard Method

Double-clicking a CSV in Excel doesn't let you pick an encoding — Excel just guesses (usually wrong for non-ASCII files). The import wizard gives you control.

Open Excel with a blank workbook
Go to Data tab → From Text/CSV (or "Get Data" → "From File" → "From Text/CSV")
Select your CSV file
In the preview dialog, find the "File Origin" or "Encoding" dropdown
Switch between encodings until the preview shows correct text:
- Try 65001: Unicode (UTF-8) first
- If that's garbled, try 932: Japanese (Shift-JIS) or 1252: Western European (Windows)
Click Load

This method doesn't modify the original file — it just reads it correctly into Excel. If you want to save a properly encoded version, use "Save As" → "CSV UTF-8 (Comma delimited)".

Fix It from the Command Line — iconv and PowerShell

Command-line tools are the best option for batch processing or scripting into a data pipeline.

macOS / Linux (iconv):

# Convert Shift-JIS to UTF-8
iconv -f SHIFT_JIS -t UTF-8 input.csv > output_utf8.csv

# Convert Windows-1252 to UTF-8
iconv -f WINDOWS-1252 -t UTF-8 input.csv > output_utf8.csv

# Detect encoding first
# macOS:
file -I input.csv
# Linux:
file -i input.csv
# Output: input.csv: text/csv; charset=shift_jis

If iconv throws an illegal input sequence error, the source file contains characters that don't map to the target encoding (common when converting to Shift-JIS from UTF-8 files with emoji or symbols). Add the -c flag to skip unmappable characters and continue:

iconv -f UTF-8 -c -t SHIFT_JIS input.csv > output_sjis.csv

Characters that are skipped will be silently dropped — always inspect the output before discarding the original.

Batch conversion (all CSVs in a folder):

for f in *.csv; do
  iconv -f SHIFT_JIS -t UTF-8 "$f" > "utf8_${f}"
done

Windows PowerShell:

# Read with one encoding, write as UTF-8
Get-Content -Path input.csv -Encoding Default |
  Set-Content -Path output_utf8.csv -Encoding UTF8

PowerShell encoding caveat: In PowerShell 5.x (the default on Windows 10 and 11), -Encoding UTF8 produces UTF-8 with BOM. This is fine for Excel users, but will break Python scripts and most data pipelines. If you need BOM-free UTF-8, either use PowerShell 7+ with -Encoding utf8NoBOM, or pipe through a .NET method:

# PowerShell 7+: BOM-free UTF-8
Get-Content -Path input.csv -Encoding Default |
  Set-Content -Path output_utf8.csv -Encoding utf8NoBOM

# PowerShell 5.x: BOM-free UTF-8 workaround
$content = Get-Content -Path input.csv -Encoding Default
[System.IO.File]::WriteAllLines("output_utf8.csv", $content)

If iconv throws an error on Mac/Linux, you probably have the source encoding wrong. Try a different -f value. The file -I / file -i command can help you identify the actual encoding before converting.

Fix It in Python — pandas and codecs

For developers or anyone already working in Python, this is often the fastest path. It handles edge cases better than most GUI tools.

Basic conversion with pandas:

import pandas as pd

# Read with the source encoding
df = pd.read_csv('input.csv', encoding='shift_jis')

# Write as UTF-8
df.to_csv('output_utf8.csv', encoding='utf-8', index=False)

Auto-detect encoding with chardet:

import chardet

with open('input.csv', 'rb') as f:
    result = chardet.detect(f.read())
    print(result)
# {'encoding': 'SHIFT_JIS', 'confidence': 0.99, 'language': 'Japanese'}

df = pd.read_csv('input.csv', encoding=result['encoding'])
df.to_csv('output_utf8.csv', encoding='utf-8', index=False)

Without pandas (pure Python):

import codecs

with codecs.open('input.csv', 'r', encoding='shift_jis') as source:
    with codecs.open('output_utf8.csv', 'w', encoding='utf-8') as target:
        target.write(source.read())

A word of caution on chardet: it's a statistical guesser, not a decoder. It works well on large files with lots of text, but can guess wrong on short files or files with mostly ASCII content. Always inspect the output after conversion.

The BOM Problem — When UTF-8 Isn't Enough

BOM stands for Byte Order Mark — three invisible bytes (EF BB BF) at the very start of a file. It's a flag that says "this file is UTF-8." And it's the source of an annoying split in the CSV world.

BOM Makes Excel Happy but Breaks Everything Else

Here's the dilemma:

Excel on Windows needs BOM to auto-detect UTF-8. Without it, Excel falls back to your system's default encoding (often Windows-1252 or Shift-JIS) and garbles the text.
Python, CLI tools, and most web applications treat BOM as data. You'll see an extra invisible character (\ufeff) in the first cell or column name, which breaks column lookups and key matching.

Scenario	Use BOM?	Why
CSV for Excel users	Yes	Excel won't auto-detect UTF-8 without it
CSV for Python / data pipelines	No	BOM becomes a phantom character in column headers
CSV for web app import	No	Most web apps don't expect BOM
CSV for Japanese Excel (Shift-JIS required)	N/A	Save as Shift-JIS instead of UTF-8

How to Add or Remove BOM

Add BOM (for Excel compatibility):

Tool	Method
Notepad++	Encoding → "UTF-8-BOM" → Save
Python	`open('out.csv', 'w', encoding='utf-8-sig')`
VS Code	Click encoding → "Save with Encoding" → "UTF-8 with BOM"
PowerShell 5.x	`-Encoding UTF8` adds BOM automatically

Remove BOM (for programmatic use):

Tool	Method
VS Code	Click encoding → "Save with Encoding" → "UTF-8"
Python	Read `utf-8-sig`, write `utf-8`
Command line	`sed -i '1s/^\xEF\xBB\xBF//' file.csv`
PowerShell 7+	`-Encoding utf8NoBOM`

# Python: remove BOM during conversion
with open('input.csv', 'r', encoding='utf-8-sig') as f:
    content = f.read()
with open('output.csv', 'w', encoding='utf-8') as f:
    f.write(content)

Which Method Should You Use?

Method	OS	Difficulty	Batch Support	Best for
Text editor (VS Code / Notepad++)	Any	Easy	No	Quick one-off fix, any encoding
Excel Import Wizard	Windows / Mac	Easy	No	Non-technical users, preview before commit
iconv (command line)	Mac / Linux	Medium	Yes	Batch conversion, scripting
PowerShell	Windows	Medium	Yes	Windows automation
Python (pandas)	Any	Medium	Yes	Developers, data pipelines
Online tool (browser-based, local processing)	Any	Easy	Limited	No install needed, encoding handled automatically

For a single file you need to fix right now, a text editor is the fastest. For recurring CSV exports, set up a Python script or iconv command and forget about it.

3 Mistakes That Corrupt Your Data During Conversion

Encoding conversion is simple in concept but has a few traps that can silently destroy data.

Overwriting the Original File

This is the most common and most painful mistake. If a conversion goes wrong — wrong source encoding, unmappable characters, interrupted write — and you saved over the original, that data is gone.

Always save to a new file name. output_utf8.csv, not input.csv. Multi-byte encodings like Shift-JIS and GBK are especially risky: characters that don't have a UTF-8 equivalent get silently replaced with ? during conversion. You won't notice until someone points out the missing data.

Trusting Auto-Detection Blindly

Tools like Python's chardet, VS Code's auto-detect, and Notepad++'s encoding guess are all heuristics. They analyze byte patterns and make a statistical prediction. They're usually right, but they fail in predictable ways:

Short files (under 100 bytes): not enough data to guess reliably
ASCII-heavy files with a few special characters: multiple encodings produce identical byte sequences for ASCII
Mixed-encoding files: some rows in UTF-8, others in Windows-1252 (this happens more often than you'd think with merged datasets)

Always open the converted file and scan for garbled characters before deleting the original. Spot-check rows that contain accented characters, CJK text, or currency symbols.

Mixing Up "Reopen with Encoding" and "Save with Encoding"

In VS Code and similar editors, these are two different operations:

"Reopen with Encoding": Re-reads the same bytes from disk using a different decoder. Non-destructive. Use this to figure out the correct encoding.
"Save with Encoding": Takes the currently displayed text and writes it to disk in a new encoding. Destructive.

If the file is already showing garbled text and you hit "Save with Encoding → UTF-8", you're encoding the garbled characters as UTF-8. The corruption is now permanent. The correct sequence is: Reopen with the right encoding → verify text is correct → then save with the target encoding.

Stop Converting Files — Fix the Source Instead

If you're converting CSVs every week, the real problem isn't the file — it's the system that produced it. Here's how to eliminate encoding issues at the root.

Standardize on UTF-8 Across Your Pipeline

Most encoding problems disappear when everyone agrees on UTF-8.

Scenario	Recommendation
Export CSVs from Excel	Use "Save As" → "CSV UTF-8 (Comma delimited)" (saves UTF-8 with BOM)
Code that produces CSVs	Explicitly specify `encoding='utf-8'`; don't rely on system defaults
Receive CSVs from partners	Request UTF-8; set up automated conversion for legacy systems stuck on Shift-JIS

Use Tools That Handle Encoding Automatically

The easiest way to deal with encoding is to not deal with it at all. Some browser-based CSV tools auto-detect encoding on upload — you drag in a Shift-JIS file, a Windows-1252 file, or a UTF-8 file, and it just works.

LeapRows (disclosure: built by the author) processes files entirely in the browser without uploading them to a server. It automatically detects and handles CSV encoding regardless of whether the file is UTF-8, Shift-JIS, or another format — no manual conversion, no encoding menus, no guessing. It also handles analysis tasks like filtering, pivoting, and aggregation directly in the browser, so you can skip the Excel round-trip entirely.

For teams that regularly process CSVs from multiple sources with different encodings, a tool like this removes an entire class of errors from the workflow. The key distinction to look for is local (in-browser) processing — tools that upload your file to a server still require you to think about what data you're sharing, even if they handle encoding automatically.

Wrapping Up

CSV encoding problems boil down to one thing: the file was saved in encoding A, but your tool is reading it as encoding B. Once you identify the mismatch (use the pattern cheat sheet above), the fix takes under a minute with any of the five methods covered here.

The quick version:

Check the garbled pattern to identify the source encoding
Convert using whichever tool fits — text editor for one-offs, Python or iconv for batches
Watch out for BOM: add it if Excel users need the file, remove it if code will process it; and be aware that PowerShell 5.x's -Encoding UTF8 adds BOM by default
Never overwrite the original until you've verified the conversion

Long-term, push for UTF-8 everywhere. The fewer encoding decisions humans have to make, the fewer encoding problems you'll have.