Merge Multiple Shapefiles in PyQGIS

Field surveys, agency downloads, and tiled datasets often arrive as dozens of separate shapefiles that you need as a single layer before you can analyze or publish them. PyQGIS handles this with the native:mergevectorlayers Processing algorithm, which stacks any number of input layers into one output, unions their attribute schemas, and can reproject everything to a common CRS in a single pass. This is a core Vector Data Manipulation task and pairs well with format conversion once the data is combined.

This page shows how to discover every shapefile in a folder with pathlib, merge them safely when their schemas or projections differ, record which source file each feature came from, and write the result to a GeoPackage rather than another fragile shapefile.

Prerequisites

  • QGIS 3.34 LTR (bundled Python 3.12) with Processing available.
  • A folder containing the shapefiles to merge — all the same geometry type (all polygons, or all lines, or all points).
  • Read access to the source folder and write access to the output location.
  • The QGIS Python Console (Plugins > Python Console).

native:mergevectorlayers only merges layers of a single geometry type per run. Do not mix polygons and lines in one call.

Discover the Shapefiles with pathlib

Hard-coding file names does not scale. Use pathlib.Path.glob to collect every .shp in a directory, sorted for deterministic ordering:

from pathlib import Path

source_dir = Path("/data/tiles")
shapefiles = sorted(source_dir.glob("*.shp"))

print(f"Found {len(shapefiles)} shapefiles")
for shp in shapefiles:
    print(" -", shp.name)

if not shapefiles:
    raise FileNotFoundError(f"No shapefiles in {source_dir}")

Breakdown: Path.glob("*.shp") returns a generator of Path objects for the top-level directory; use rglob("*.shp") to recurse into subfolders. Sorting gives a stable feature order across runs, which matters if you later assign sequential IDs. The early FileNotFoundError stops a silent no-op merge.

Before merging, it is worth confirming every file actually loads, because one corrupt .shp will otherwise abort the whole run:

from qgis.core import QgsVectorLayer

layers = []
for shp in shapefiles:
    layer = QgsVectorLayer(str(shp), shp.stem, "ogr")
    if layer.isValid():
        layers.append(layer)
    else:
        print(f"Skipping invalid layer: {shp.name}")

Breakdown: Each shapefile becomes a QgsVectorLayer named after the file stem (the name without extension). Validity is checked up front so the merge only receives loadable layers; the skipped ones are reported rather than crashing the job.

Merge into a Single Layer

Pass the list of valid layers to native:mergevectorlayers. Set CRS to force a common projection — the algorithm reprojects any layer that does not already match:

import processing
from qgis.core import QgsCoordinateReferenceSystem

merged = processing.run("native:mergevectorlayers", {
    "LAYERS": layers,
    "CRS": QgsCoordinateReferenceSystem("EPSG:3857"),
    "OUTPUT": "TEMPORARY_OUTPUT",
})["OUTPUT"]

print(f"Merged feature count: {merged.featureCount()}")
print("Output CRS:", merged.crs().authid())

Breakdown: LAYERS accepts a list of QgsVectorLayer objects or file-path strings. The CRS parameter is the destination projection; supplying it guarantees a uniform output even when inputs disagree, which prevents the misalignment described in Coordinate Reference Systems. TEMPORARY_OUTPUT keeps the result in memory until you write it.

Handle Differing Schemas

When inputs have different field sets, native:mergevectorlayers builds a union of all columns: a feature simply gets NULL for fields it did not originally have. It also adds two helper fields automatically:

  • layer — the source layer name each feature came from.
  • path — the full source file path.

These built-in fields are exactly the provenance you usually want, so explicit source tagging is often unnecessary. If you prefer a custom, cleaner source field (for example, just the tile code), add it before merging by editing each input's attributes — or compute it afterward with the field calculator algorithm:

import processing

tagged = processing.run("native:fieldcalculator", {
    "INPUT": merged,
    "FIELD_NAME": "source_file",
    "FIELD_TYPE": 2,            # 2 = string (text)
    "FIELD_LENGTH": 80,
    "FORMULA": "regexp_substr(\"layer\", '([^\\\\\\\\/]+)$')",
    "OUTPUT": "TEMPORARY_OUTPUT",
})["OUTPUT"]

Breakdown: native:fieldcalculator derives source_file from the auto-generated layer field, stripping any path so only the file name remains. FIELD_TYPE 2 is text; the expression uses a QGIS expression-engine regex. Because the merge already records provenance, this step is purely to present it the way you want.

When schemas differ in type for the same field name (one file stores a code as text, another as an integer), the merge keeps both as separately typed columns or promotes to text. Standardize column types across inputs beforehand if downstream tools require a single typed column.

Write a GeoPackage

Shapefiles truncate field names to 10 characters, cap at 2 GB, and split across multiple sidecar files — all reasons to write the merged result as a GeoPackage instead. Give OUTPUT a .gpkg path:

import processing
from qgis.core import QgsCoordinateReferenceSystem

processing.run("native:mergevectorlayers", {
    "LAYERS": [str(p) for p in shapefiles],
    "CRS": QgsCoordinateReferenceSystem("EPSG:3857"),
    "OUTPUT": "/data/output/merged_tiles.gpkg",
})

print("Wrote /data/output/merged_tiles.gpkg")

Breakdown: The .gpkg extension selects the GeoPackage driver automatically. Passing path strings in LAYERS lets the algorithm stream from disk, which is lighter on memory for large batches. The output preserves full field names and Unicode, unlike a shapefile destination.

Once merged, converting the layer for web use is a short follow-on step — see Automating Shapefile to GeoJSON Conversion in QGIS. To trim the merged layer to a study area afterward, use Clip a Vector Layer in PyQGIS.

Filter Inputs by Geometry Type

Because native:mergevectorlayers refuses to mix geometry types, a folder containing both line and polygon shapefiles needs filtering first. Load each candidate, check its geometryType, and group accordingly:

from pathlib import Path
from qgis.core import QgsVectorLayer, QgsWkbTypes

source_dir = Path("/data/mixed")
by_type = {}

for shp in sorted(source_dir.glob("*.shp")):
    layer = QgsVectorLayer(str(shp), shp.stem, "ogr")
    if not layer.isValid():
        continue
    gtype = QgsWkbTypes.geometryDisplayString(layer.geometryType())
    by_type.setdefault(gtype, []).append(layer)

for gtype, group in by_type.items():
    print(f"{gtype}: {len(group)} layers")

Breakdown: layer.geometryType() returns a QgsWkbTypes.GeometryType enum (point, line, or polygon), and geometryDisplayString turns it into a readable key. Grouping into a dict means you can run one merge per geometry type, each producing a clean single-type output. Feed each group list straight into native:mergevectorlayers.

Deduplicate After Merging

Tiled datasets frequently overlap at their edges, so the merged layer can contain duplicate features along seams. Remove exact geometry-and-attribute duplicates with native:deleteduplicategeometries:

import processing

deduped = processing.run("native:deleteduplicategeometries", {
    "INPUT": merged,
    "OUTPUT": "TEMPORARY_OUTPUT",
})["OUTPUT"]

print(f"Before: {merged.featureCount()}  After: {deduped.featureCount()}")

Breakdown: native:deleteduplicategeometries drops features whose geometry is identical to one already kept, which cleans up the overlap seams typical of tiled survey exports. Comparing feature counts before and after confirms how many duplicates were removed. Run this after the merge and before writing your final GeoPackage.

QGIS Version Compatibility

The code targets QGIS 3.34 LTR (Python 3.12).

QGIS versionPythonNotes
3.28 LTR3.9Same algorithm; pathlib glob unchanged. FIELD_TYPE codes identical.
3.34 LTR3.12Baseline for this page.
3.40 / 3.443.12native:mergevectorlayers unchanged; better performance on many inputs.

The auto-generated layer and path provenance fields have been present since early 3.x, so the merge behavior is stable across every version above.

Troubleshooting

  • Layers do not have the same geometry type. One file is a different type (e.g. a points file mixed in with polygons). Filter by geometry before merging, or run separate merges per type.
  • Merge runs but a file's features are missing. That layer failed validity and was skipped — check the console for the skip message and inspect the file with ogrinfo.
  • Field names look truncated or mangled. You wrote to a shapefile output. Use a .gpkg destination to keep full names.
  • Everything reprojected unexpectedly. The CRS parameter forces a target projection. Set it to the CRS of your inputs if you do not want reprojection.
  • No shapefiles found. The glob pattern or path is wrong. Confirm with list(Path(source_dir).glob("*.shp")), and use rglob if files sit in subfolders.

Conclusion

Merging shapefiles in PyQGIS comes down to three moves: glob the folder with pathlib to gather inputs reproducibly, run native:mergevectorlayers with a forced CRS so projections and schemas reconcile in one pass, and write the union to a GeoPackage to escape shapefile limits. The algorithm's automatic layer/path fields give you provenance for free, and validity checks keep one bad file from sinking the whole batch — making this pattern dependable from a handful of tiles to hundreds of survey exports.

Frequently Asked Questions

Can I merge shapefiles with different attribute columns? Yes. native:mergevectorlayers unions all columns and fills missing values with NULL. Conflicting types for the same field name may be kept separately, so standardize types first if a downstream tool needs one typed column.

How do I know which file each feature came from? The algorithm adds layer and path fields automatically, recording the source layer name and full path for every feature. You can derive a cleaner source code from these with the field calculator.

Do all the shapefiles need the same CRS? No. Set the CRS parameter to your target projection and the algorithm reprojects any non-matching input during the merge, producing one consistently projected output.

Should I output a shapefile or a GeoPackage? Prefer GeoPackage. It avoids the 10-character field-name truncation, the 2 GB size cap, and the multi-file sprawl of shapefiles, and it stores everything in a single portable file.