Classify a Layer with Natural Breaks (Jenks) in PyQGIS
Natural breaks (the Jenks optimization) is the classification method that respects the shape of your data. Instead of cutting a numeric field into equal-width bins or equal-count groups, Jenks searches for class boundaries that sit in the natural gaps of the distribution — minimizing variance within each class and maximizing it between classes. For real-world data with clusters and outliers, it produces the most faithful thematic map. This task-focused guide applies Jenks programmatically with QgsClassificationJenks, then compares it against the alternatives, as part of the Graduated & Categorized Renderers in PyQGIS cluster.
Prerequisites
- QGIS 3.34 LTR (Python 3.12) with PyQGIS, or a comparable 3.x release.
- A vector layer loaded in the project with a numeric field that has a non-uniform (clustered or skewed) distribution — that is where Jenks shines.
- The QGIS Python Console open (
Ctrl+Alt+P). - Optional: familiarity with graduated renderers from the parent cluster.
Step 1: Build a Graduated Renderer with the Jenks Method
The modern PyQGIS pattern separates the renderer from the classification method. You instantiate QgsGraduatedSymbolRenderer on a field, hand it a QgsClassificationJenks instance via setClassificationMethod(), then call updateClasses() to compute the breaks.
from qgis.core import (
QgsProject,
QgsGraduatedSymbolRenderer,
QgsClassificationJenks,
QgsSymbol,
QgsStyle,
)
layer = QgsProject.instance().mapLayersByName("counties")[0]
field_name = "median_income"
class_count = 5
renderer = QgsGraduatedSymbolRenderer(field_name)
renderer.setSourceSymbol(QgsSymbol.defaultSymbol(layer.geometryType()))
# Plug in Jenks natural breaks as the classification algorithm.
renderer.setClassificationMethod(QgsClassificationJenks())
# Compute the break values from the layer's field.
renderer.updateClasses(layer, class_count)
renderer.updateColorRamp(QgsStyle.defaultStyle().colorRamp("YlOrRd"))
layer.setRenderer(renderer)
layer.triggerRepaint()
for r in renderer.ranges():
print(f"{r.lowerValue():.0f} – {r.upperValue():.0f}")
Breakdown: setClassificationMethod(QgsClassificationJenks()) is the line that selects natural breaks; swapping the class object is all it takes to change strategy. updateClasses(layer, class_count) runs the Jenks optimization over median_income and creates five QgsRendererRange objects whose boundaries fall in the data's natural gaps. The sequential YlOrRd ramp encodes the ordered magnitude. Printing the ranges lets you eyeball where Jenks placed the breaks before trusting the map.
Step 2: Handle Nulls and Invalid Values
Jenks operates only on valid numeric values; NULL entries are excluded from the optimization but still render as "no value" (typically grey) on the map. Decide explicitly how to treat them rather than letting them disappear silently.
# Count how many features carry a usable value.
total = layer.featureCount()
valid = sum(
1 for f in layer.getFeatures()
if f[field_name] is not None and f[field_name] != ""
)
print(f"{valid} of {total} features have a value; {total - valid} are null")
# Option A: build a clean subset by filtering, then classify on it.
layer.setSubsetString(f'"{field_name}" IS NOT NULL')
renderer.updateClasses(layer, class_count)
layer.setSubsetString("") # restore full layer for display
# Option B: keep nulls visible but labelled distinctly.
# (Leave them unclassified; QGIS draws them with the layer's
# "no value" symbol, which you can style separately.)
Breakdown: Counting first tells you whether nulls are a rounding issue or a data-quality problem. setSubsetString() applies a temporary filter so updateClasses() computes breaks from clean data only; clearing it afterward keeps every feature visible. Because Jenks break positions depend on the value set it sees, classifying on the filtered subset prevents a single stray null-coerced zero from distorting the lowest class. To style the null features' fill explicitly, manipulate the symbol as shown in Set a Vector Layer Symbol Color in PyQGIS.
Step 3: Compare Jenks Against Equal Interval and Quantile
The value of Jenks is clearest when you see the alternatives on the same data. Loop the three methods and print where each puts its breaks — identical data, three very different maps.
from qgis.core import (
QgsClassificationEqualInterval,
QgsClassificationQuantile,
QgsClassificationJenks,
)
methods = {
"Equal Interval": QgsClassificationEqualInterval(),
"Quantile": QgsClassificationQuantile(),
"Jenks": QgsClassificationJenks(),
}
for name, method in methods.items():
renderer.setClassificationMethod(method)
renderer.updateClasses(layer, class_count)
breaks = [round(r.upperValue(), 1) for r in renderer.ranges()]
print(f"{name:>14}: upper bounds {breaks}")
# Settle on Jenks for the final map.
renderer.setClassificationMethod(QgsClassificationJenks())
renderer.updateClasses(layer, class_count)
layer.triggerRepaint()
Breakdown: Equal interval will return evenly spaced upper bounds regardless of where the data actually sits, so a skewed field leaves some classes nearly empty. Quantile returns bounds that put equal feature counts in each class, often splitting visually identical values across a boundary. Jenks returns bounds clustered around the data's real gaps. Comparing the printed upper bounds lists makes the trade-off concrete before you commit.
| Method | How breaks are placed | When it wins |
|---|---|---|
| Equal interval | Range divided into equal widths | Uniform data; legends that must read as round numbers |
| Quantile | Equal feature count per class | Ranked maps; balancing map ink |
| Jenks | Minimizes within-class variance at natural gaps | Clustered or skewed real-world data |
Step 4: Choosing the Class Count for Jenks
Unlike equal interval, Jenks break positions change with the class count because the optimization re-partitions the whole distribution each time. Test a few counts and inspect the goodness of variance fit (GVF) implicitly by checking how tight the printed ranges are around clusters.
for n in (4, 5, 6, 7):
renderer.setClassificationMethod(QgsClassificationJenks())
renderer.updateClasses(layer, n)
spans = [round(r.upperValue() - r.lowerValue(), 1)
for r in renderer.ranges()]
print(f"{n} classes -> span widths {spans}")
Breakdown: Re-running updateClasses() with different n shows how the partition shifts. Look for the smallest class count where each class still maps to a meaningful group; adding classes past that point yields diminishing returns and a harder-to-read legend. Five classes is a reliable default, but clustered data with three obvious tiers may read better with four.
Step 5: Measure the Fit Yourself
QGIS does not expose a built-in goodness of variance fit (GVF) score, but it is cheap to compute and it turns "this map looks better" into a number you can defend. GVF compares the variance the classification removes against the total variance in the data; values closer to 1 mean tighter, more faithful classes.
def gvf(values, ranges):
"""Goodness of variance fit for a set of classification ranges."""
mean = sum(values) / len(values)
sdam = sum((v - mean) ** 2 for v in values) # total variance
sdcm = 0.0
for r in ranges:
members = [v for v in values
if r.lowerValue() <= v <= r.upperValue()]
if members:
cmean = sum(members) / len(members)
sdcm += sum((v - cmean) ** 2 for v in members)
return 1 - (sdcm / sdam) if sdam else 0.0
values = [f[field_name] for f in layer.getFeatures()
if f[field_name] is not None]
for method_cls in (QgsClassificationEqualInterval,
QgsClassificationQuantile,
QgsClassificationJenks):
renderer.setClassificationMethod(method_cls())
renderer.updateClasses(layer, class_count)
print(f"{method_cls.__name__}: GVF = {gvf(values, renderer.ranges()):.3f}")
Breakdown: sdam is the sum of squared deviations about the grand mean — the total variance to "explain." sdcm is the residual variance left inside the classes after partitioning. Their ratio subtracted from 1 is the GVF. Run it across the three methods and Jenks will almost always post the highest score on clustered data, giving you objective evidence that its breaks fit the distribution better than equal interval or quantile. Note the inclusive comparison on both bounds can double-count a value that lands exactly on a boundary; for production accuracy use the half-open intervals QGIS itself applies.
Step 6: Apply the Final Renderer to the Map
Once Jenks is your chosen method and nulls are handled, commit the renderer and refresh the display so the canvas and legend reflect the natural-breaks classification.
from qgis.utils import iface
renderer.setClassificationMethod(QgsClassificationJenks())
renderer.updateClasses(layer, class_count)
renderer.updateColorRamp(QgsStyle.defaultStyle().colorRamp("YlOrRd"))
layer.setRenderer(renderer)
layer.triggerRepaint()
if iface is not None:
iface.layerTreeView().refreshLayerSymbology(layer.id())
iface.mapCanvas().refresh()
Breakdown: Re-running setClassificationMethod and updateClasses immediately before setRenderer guarantees the committed renderer uses Jenks even after the comparison loops in earlier steps changed the method. refreshLayerSymbology rebuilds the legend swatches so they match the natural-breaks ranges. The result is ready to feed into a polygon choropleth — see Create a Choropleth Map in PyQGIS for the full thematic-map workflow.
QGIS Version Compatibility
This recipe targets QGIS 3.34 LTR (Python 3.12) and relies on the method-object classification API.
| QGIS / Python | Notes |
|---|---|
| 3.34 LTR (Python 3.12) | Baseline. QgsClassificationJenks, setClassificationMethod, updateClasses all current. |
| 3.28 LTR (Python 3.9) | Fully supported with identical code. |
| 3.40 / 3.44 | Same API; Jenks performance improved for large layers. |
| Pre-3.10 | No setClassificationMethod. Legacy code used QgsGraduatedSymbolRenderer.createRenderer(layer, field, classes, QgsGraduatedSymbolRenderer.Jenks, symbol, ramp). |
The deprecated Jenks mode enum still resolves on the LTR builds, but new code should use QgsClassificationJenks() so it keeps working as the enum is phased out.
Troubleshooting
updateClasses is slow or freezes on a huge layer. Jenks is computationally heavier than equal interval or quantile. Pre-filter with setSubsetString(), sample the layer, or reduce the class count. On layers over a few hundred thousand features, consider classifying on a representative sample.
Breaks change every time I run it. That is expected behavior, not a bug, only if you changed the class count or the underlying values. With identical inputs Jenks is deterministic. If values shift, check for an active setSubsetString filter affecting the value set.
Some polygons render grey. They hold NULL values excluded from classification. Use the null-handling pattern in step 2 to either filter them or style them deliberately.
AttributeError: colorRamp returned None. The ramp name is not installed. List valid names with QgsStyle.defaultStyle().colorRampNames().
Conclusion
Jenks natural breaks gives you a classification that mirrors the real structure of your data rather than imposing an arbitrary grid on it. The PyQGIS workflow is compact: instantiate the renderer, call setClassificationMethod(QgsClassificationJenks()), run updateClasses(), and handle nulls deliberately. Compare it against equal interval and quantile on your own field, and let the printed break values — not habit — guide the final choice. From here, apply the result to a polygon layer to produce a faithful choropleth.
Frequently Asked Questions
What exactly does Jenks optimize? It minimizes the sum of squared deviations within each class and maximizes the deviation between classes, placing boundaries in the natural gaps of the value distribution.
Is Jenks always the best choice? No. For uniform data, equal interval is simpler and yields rounder legend numbers. For ranked maps where balanced feature counts matter, quantile is better. Jenks excels specifically on clustered or skewed data.
Why do my Jenks breaks differ from QGIS 3.10? The algorithm was refined for performance and edge cases across releases, and break positions depend on the exact value set. Small differences on identical data across major versions are expected.
Does the class count affect Jenks more than other methods?
Yes. Jenks re-partitions the entire distribution for each class count, so breaks move when you change n, whereas equal-interval bounds simply subdivide a fixed range.