How to Merge Multiple CSV Files Into One — Python, CLI, and No-Code Methods Compared

You have a folder full of monthly CSV exports. Same columns, same structure, different months. You need them in one file. Copy-pasting across twelve spreadsheets is not going to cut it.

There are five practical ways to merge CSV files, and the right one depends on how often you do this and how comfortable you are with code. Here's a side-by-side comparison, followed by step-by-step instructions for each method.


5 Ways to Merge CSV Files Into One

Method Difficulty Handles Headers Automation Best For
Online tools Easy Yes No Instant merge, no setup, encoding auto-detect
Bash / CMD / PowerShell Easy–Medium Depends on tool .bat / .sh / Task Scheduler Quick one-off merge on any OS
Excel Power Query Easy Yes Refresh button Excel-centric workflows, non-coders
Python pandas Medium Yes Script / cron Recurring merges, large files, custom logic

If this is a one-time job, an online tool or a CLI one-liner gets it done in under a minute. If you merge the same files every month, invest 10 minutes in a Python script or Power Query setup — it pays for itself from the second month onward.


Merge CSVs with an Online Tool

The fastest path from "I have files" to "I have one file" is a browser-based CSV merger. No installation, no code, no terminal.

The workflow is the same across most tools: drag your CSV files into the browser window, confirm the settings, and download the merged result. The entire process takes about 30 seconds.

When choosing a tool, check three things:

  1. Client-side processing — Tools that process data in your browser (not on a server) are safer for sensitive data. Your files never leave your machine.
  2. Encoding auto-detection — If your CSVs come from different systems, they may use different encodings (UTF-8, latin-1, cp1252). A tool that detects encoding automatically saves you from garbled output.
  3. Header handling — The tool should keep only one header row in the merged output, not repeat it from every file.

LeapRows (disclosure: built by the author) runs entirely in the browser, auto-detects encoding across UTF-8, Shift_JIS, and EUC-JP, and lets you preview columns before merging. merge-csv.com is another option that handles header deduplication automatically.

If you don't want to think about encoding, column order, or headers, this is the easiest starting point.


Merge CSV Files from the Command Line

A single command can merge every CSV in a folder. No libraries, no GUI, no internet connection. The tradeoff is that you get raw concatenation — there's no column validation or type checking.

Windows CMD

Open Command Prompt, navigate to the folder containing your CSVs, and run:

del merged.csv
copy *.csv merged.csv

The del on the first line removes any previous output file. Without it, re-running the command includes merged.csv in the *.csv glob, and your output ends up with last month's merged data concatenated in again.

The remaining catch: header rows from each file are included as-is. If you have 12 files, you get the header repeated 12 times — 11 of those are now mixed into your data rows. You'll need to clean those out manually in Excel or a text editor afterward.

Linux / macOS Bash

Bash gives you more control. This one-liner keeps the header from the first file and strips it from the rest:

first=$(ls *.csv | head -1)
head -1 "$first" > merged.csv && tail -n +2 -q *.csv >> merged.csv

Breaking it down:

  • first=$(ls *.csv | head -1) picks the first CSV in the folder dynamically, so no filename is hardcoded
  • head -1 "$first" > merged.csv writes that file's header to the output
  • tail -n +2 -q *.csv >> merged.csv appends all rows from every CSV, skipping each file's first line

Clean output, no duplicate headers. Works on any Unix-like system out of the box.

On re-runs: if merged.csv already exists in the same folder, *.csv will include it on the next run. Either delete it first (rm -f merged.csv) or write output to a different directory.

PowerShell

PowerShell parses CSVs natively, which means it understands headers, data types, and column names:

Remove-Item -ErrorAction SilentlyContinue merged.csv
Get-ChildItem *.csv | Import-Csv | Export-Csv merged.csv -NoTypeInformation

The Remove-Item line on the first line prevents the previous merged.csv from being picked up by Get-ChildItem *.csv on re-runs — the same problem as with CMD, handled the same way. -ErrorAction SilentlyContinue suppresses the error if the file doesn't exist yet.

Unlike CMD, PowerShell handles headers correctly — one header row in the output, no duplicates. It's also aware of column names, so if one file has columns in a different order, the data still aligns properly.

The downside is speed. For hundreds of large files, PowerShell's per-row parsing is slower than raw concatenation. For a dozen monthly reports, it's fine.


Merge CSVs with Excel Power Query

Power Query is built into Excel 2016 and later. Once configured, adding a new month's file is a one-click refresh.

Step 1: Put all your CSVs in a single folder. Nothing else in that folder — Power Query will try to read everything in it.

Step 2: In Excel, go to Data → Get Data → From File → From Folder. Select the folder.

Step 3: Click Combine & Load. Excel loads every CSV into a single table on a new sheet.

Step 4: In the Power Query Editor, delete any auto-generated columns you don't need (like file path) and set the correct data types for date and number columns.

That's the setup. From now on, when next month's CSV arrives, drop it in the same folder and hit Data → Refresh All. The merged table updates automatically.

Power Query also handles headers correctly and can filter rows, rename columns, and change data types as part of the merge — all without writing a formula. For people who live in Excel, this is the most practical long-term solution.


Merge CSV Files with Python pandas

Python is where you go when you need more than concatenation — column normalization, encoding handling, data validation, or processing files too large for Excel.

The minimal script (8 lines)

This handles the most common case: a folder of CSVs with identical columns.

import glob
import pandas as pd

csv_files = glob.glob("monthly_reports/*.csv")
df = pd.concat([pd.read_csv(f) for f in csv_files], ignore_index=True)
df.to_csv("yearly_report.csv", index=False, encoding="utf-8-sig")

print(f"Merged {len(csv_files)} files → {len(df)} rows")

glob.glob() finds every CSV in the folder. pd.read_csv() reads each one into a DataFrame. pd.concat() stacks them vertically. to_csv() writes the result.

Two details worth noting:

  • ignore_index=True resets the row index so you get a clean 0, 1, 2, … sequence instead of repeating indexes from each file
  • encoding="utf-8-sig" adds a BOM marker that prevents Excel from showing garbled characters when you open the output

Handle column mismatches and mixed schemas

Real-world CSVs aren't always clean. One month's export might add a column, rename one, or change the order.

By default, pd.concat() uses join='outer' — it keeps all columns from all files and fills gaps with NaN. If you only want rows where every column is present, use join='inner':

df = pd.concat([pd.read_csv(f) for f in csv_files], join="inner", ignore_index=True)

For column name normalization (whitespace, casing), add a preprocessing step:

dfs = []
for f in csv_files:
    temp = pd.read_csv(f)
    temp.columns = temp.columns.str.strip().str.lower().str.replace(" ", "_")
    dfs.append(temp)
df = pd.concat(dfs, ignore_index=True)

This turns "Total Revenue", " total_revenue", and "TOTAL REVENUE" into the same column name before merging.

Add a source file column

When analyzing merged data, you often need to know which file each row came from — especially for monthly breakdowns.

dfs = []
for f in csv_files:
    temp = pd.read_csv(f)
    temp["source_file"] = f.split("/")[-1]
    dfs.append(temp)
df = pd.concat(dfs, ignore_index=True)

Now every row has a source_file column like 2024_01_sales.csv, 2024_02_sales.csv, and so on. Group by this column for per-month analysis.

Large files — avoid memory crashes

pd.concat() loads every file into memory at once. With a dozen 50 MB CSVs, that's 600 MB of RAM before any processing. Hundreds of large files can crash the process entirely.

Two options:

Option A: Write incrementally instead of holding everything in memory. Track the row count inside the loop to avoid re-reading files later:

total_rows = 0
first = True
for f in csv_files:
    chunk = pd.read_csv(f)
    total_rows += len(chunk)
    chunk.to_csv("merged.csv", mode="a", header=first, index=False)
    first = False

print(f"Merged {len(csv_files)} files → {total_rows} rows")

This appends each file to the output one at a time. Memory usage stays flat regardless of how many files you have. Tracking total_rows inside the loop also avoids re-reading the merged file for validation, which would undo the memory savings.

Option B: Use Dask for parallel, out-of-core processing:

import dask.dataframe as dd

df = dd.read_csv("monthly_reports/*.csv")
df.to_csv("merged.csv", single_file=True)

Dask reads and writes in chunks behind the scenes. It handles files that are collectively larger than your available RAM. Note that single_file=True behavior can vary across Dask versions — verify against your installed version if you run into issues.

After merging, verify the row count matches what you tracked during the loop (or recompute from individual files if you used Dask):

# Only use this if you didn't track total_rows during the loop
# (re-reading all files is memory-intensive for large datasets)
expected = sum(len(pd.read_csv(f)) for f in csv_files)
actual = len(pd.read_csv("merged.csv"))
assert expected == actual, f"Row count mismatch: expected {expected}, got {actual}"

4 Mistakes That Corrupt Your Merged CSV

The merge itself is straightforward. The problems show up afterward — garbled text, phantom rows, broken aggregations. These four issues cause most of them.

Encoding mismatches (UnicodeDecodeError)

A file saved as UTF-8 and a file saved as latin-1 look identical on the surface. But when pandas tries to read the latin-1 file as UTF-8, you get a UnicodeDecodeError or, worse, silently garbled characters that only surface when someone opens the file months later.

Detect encoding before merging:

import chardet

with open("suspect_file.csv", "rb") as f:
    result = chardet.detect(f.read(10000))
    print(result["encoding"])  # e.g., 'utf-8', 'latin-1', 'cp1252'

Then pass it explicitly:

pd.read_csv(f, encoding=result["encoding"])

If you're using an online tool, pick one that auto-detects encoding. LeapRows (disclosure: built by the author) handles this automatically across common encodings. On the CLI side, neither CMD copy nor bash cat does any encoding conversion — they just concatenate raw bytes, which works only if every file uses the same encoding.

Duplicate headers hiding in the data

When you merge CSVs with copy *.csv or cat *.csv, the header row from every file ends up in the middle of the data. Twelve files means eleven extra header rows scattered through your output.

The danger isn't just clutter. If your header row contains text like "revenue" and the column is supposed to be numeric, any aggregation (sum, average) will either error out or silently skip those rows. Pandas and Power Query handle this automatically — they only keep one header. CLI users need to strip duplicates manually.

Quick detection in pandas:

# After merging, check if any "data" row matches the header
header = df.columns.tolist()
mask = df.apply(lambda row: row.tolist() == header, axis=1)
print(f"Found {mask.sum()} duplicate header rows")
df = df[~mask]

Column name drift across monthly exports

You merge January through December. January's file has "Revenue". March's file has "revenue". October's has "Total Revenue". After pd.concat(), you end up with three separate columns, each 75% empty.

This happens more than you'd expect with exports from SaaS tools, CRMs, and internal dashboards where column names change with product updates.

The fix is a one-line normalization before concat:

temp.columns = temp.columns.str.strip().str.lower().str.replace(" ", "_")

Run this on every file before merging. "Revenue", " revenue", and "Total Revenue" all become "total_revenue" — or whatever convention you pick.

Running out of memory on large merges

pd.concat() loads every DataFrame into memory simultaneously. A hundred 100 MB files means 10 GB of RAM. Most laptops don't have that headroom.

Symptoms: the script hangs, your fan spins up, and eventually Python crashes with a MemoryError — or your OS kills the process silently.

The incremental write approach from the Python section avoids this entirely. Alternatively, Dask handles out-of-core processing without rewriting your logic.

As a sanity check after any large merge, compare the sum of individual file row counts against the merged file's row count. If they don't match, something was dropped or duplicated.


Automate Recurring CSV Merges

If you merge the same set of files every month, running a script manually is one step too many. Schedule it and forget about it.

Python + cron / Task Scheduler

Save your merge script as a .py file. Then schedule it.

Linux / macOS (cron):

# Run on the 2nd of every month at 9 AM
0 9 2 * * /usr/bin/python3 /home/user/scripts/merge_monthly.py >> /home/user/logs/merge.log 2>&1

Windows (Task Scheduler):

  1. Open Task Scheduler → Create Basic Task
  2. Set the trigger to monthly (e.g., 2nd of each month)
  3. Set the action to run python C:\scripts\merge_monthly.py

A useful pattern: have the script watch a specific folder, merge whatever CSVs it finds, move the processed files to an archive/ subfolder, and log the result. Next month, drop the new files in the folder and the scheduled task handles the rest.

Validate the merged output

Automation without validation is a recipe for silent data loss. Add these checks to the end of your merge script:

# 1. Row count check (use the counter tracked during the merge loop,
#    not a re-read of all source files — that defeats the point of incremental writing)
merged = pd.read_csv("yearly_report.csv")
assert len(merged) == total_rows, f"Expected {total_rows} rows, got {len(merged)}"

# 2. No fully-empty rows
assert merged.dropna(how="all").shape[0] == len(merged), "Found empty rows"

# 3. No duplicate header rows
header_as_list = merged.columns.tolist()
header_rows = merged.apply(lambda r: r.tolist() == header_as_list, axis=1).sum()
assert header_rows == 0, f"Found {header_rows} duplicate header rows"

print("Validation passed ✓")

If any assertion fails, the script exits with an error instead of producing a corrupt file. Pipe the output to a log file so you have a trail.


After the Merge — What's Next

Merging is a means to an end. Here's what typically comes right after.

Quick analysis with pandas groupby and pivot_table

With all your data in one DataFrame, aggregation is straightforward:

# Monthly revenue totals — parse dates properly first
df["date"] = pd.to_datetime(df["date"])
monthly = df.groupby(df["date"].dt.to_period("M"))["revenue"].sum()
print(monthly)

# Cross-tabulation: revenue by region and quarter
pivot = pd.pivot_table(df, values="revenue", index="region", columns="quarter", aggfunc="sum")
print(pivot)

Using pd.to_datetime() and .dt.to_period("M") is more robust than slicing date strings — it works regardless of whether dates are stored as YYYY-MM-DD, MM/DD/YYYY, or other common formats.

When to stop merging CSVs and build a pipeline

If you're merging the same CSVs every month, you've outgrown CSVs. Consider:

  • SQLite or DuckDB: Load CSVs into a local database. Query with SQL instead of concatenating files. DuckDB can even query CSV files directly without importing them — SELECT * FROM 'monthly_reports/*.csv'.
  • A proper ETL tool: If multiple people depend on this data, a scheduled pipeline (Airflow, Prefect, or even a simple cron + dbt setup) is more reliable than a Python script on someone's laptop.
  • Cloud warehouses: For larger teams, push monthly data to BigQuery, Snowflake, or Redshift. The merge happens at ingest time, not analysis time.

The rule of thumb: if you're writing a script to fix the same data issues every month, it's time to fix the source instead.


Conclusion

For a quick one-off merge, an online tool or a CLI one-liner gets it done in under a minute. For recurring monthly merges, a Python script with pd.concat() is the most flexible option — eight lines of code handles the common case, and you can layer on encoding detection, column normalization, and validation as needed.

Whichever method you pick, check two things before merging: that every file uses the same encoding, and that column names are consistent across files. These two issues cause the majority of post-merge headaches.

If you find yourself doing the same merge every month, automate it. A scheduled script plus a few validation assertions turns a manual chore into a background task that just works.