Hierarchical clustering
Hierarchical clustering is a clustering method that creates a tree-like structure of groups. Instead of producing only one fixed set of clusters, it shows how observations can be grouped at different levels of similarity.
In machine learning, clustering means grouping similar observations without using predefined labels. Hierarchical clustering is one of the best-known clustering approaches because it produces a hierarchy, not only a flat partition.
The result is usually visualised as a dendrogram. A dendrogram is a tree-like diagram that shows which observations or clusters are merged together and at what distance. This makes hierarchical clustering useful when you want to understand relationships between groups, not only assign each point to one cluster.
Hierarchical clustering groups data into a hierarchy. The output can be shown as a dendrogram, where small clusters gradually merge into larger groups.
What hierarchical clustering means
Hierarchical clustering is an unsupervised learning method that organises data into nested groups. The word hierarchical is important. It means that the method does not only say “these are the clusters”. It also shows how smaller clusters are related to larger clusters.
Imagine a set of documents. Some documents are very similar and form small groups. Several small groups may then belong to a broader topic. Those broader topics may then belong to an even larger category. Hierarchical clustering tries to represent this layered structure.
This makes it useful when the data has natural levels. Customers can be grouped into narrow behavioural segments and broader market groups. Products can be grouped into specific subcategories and larger product families. Texts can be grouped into small topics and broader themes.
A simple example of hierarchical clustering
Imagine you have six animals: cat, dog, wolf, lion, eagle and sparrow.
A clustering method may first group cat and lion because both are felines. It may group dog and wolf because both are canines. It may group eagle and sparrow because both are birds. Then it may merge cat, lion, dog and wolf into a larger group of mammals. Finally, it may merge mammals and birds into one group of animals.
This is the idea of hierarchy. The method can show both small, specific groups and broader, more general groups.
Hierarchical clustering is useful when you do not only want to know which items belong together, but also how groups are related at several levels.
Clustering vs hierarchical clustering
Clustering is the general task of grouping similar observations. Hierarchical clustering is one specific way to do it.
Some clustering methods produce a flat result. For example, a method may divide customers into five clusters and stop there. Each customer belongs to one of those five groups.
Hierarchical clustering gives more structure. It can show two broad groups, five medium groups or twenty small groups depending on where you cut the tree. This makes it more flexible for exploration.
Flat clustering vs hierarchical clustering
Flat clustering produces one level of groups. Hierarchical clustering produces several levels.
For example, flat clustering may tell you that your products belong to four categories. Hierarchical clustering can show that those four categories also contain smaller subcategories and that some categories are closer to each other than others.
The difference is practical:
- Flat clustering – one final grouping of the data.
- Hierarchical clustering – a tree of nested groups that can be cut at different levels.
This is why hierarchical clustering is often used in exploratory analysis. It helps people inspect possible group structure before deciding how many clusters make sense.
Dendrogram
A dendrogram is the tree-like diagram produced by hierarchical clustering. It shows how observations and clusters are connected.
At the bottom of the dendrogram, each observation may start as its own small cluster. As you move upward, similar observations merge into clusters. Those clusters then merge into larger clusters. The height of the merge usually represents the distance or dissimilarity between the merged groups.
A short merge height usually means the clusters were similar. A tall merge height usually means the clusters were less similar. This helps analysts see where natural group boundaries may appear.
A dendrogram is not only a chart. It is the visual representation of the hierarchy created by the clustering process.
How to read a dendrogram
Reading a dendrogram means looking at where observations merge and how far apart those merges are.
If two observations merge very low in the tree, they are very similar according to the chosen distance metric and linkage method. If two clusters merge only near the top, they are more different.
To choose a final number of clusters, analysts often cut the dendrogram at a chosen height. Everything below that cut becomes a cluster. A higher cut creates fewer broader clusters. A lower cut creates more smaller clusters.
This is useful because the analyst can inspect the data at several levels instead of committing to one cluster count immediately.
Agglomerative hierarchical clustering
Agglomerative hierarchical clustering is the most common form of hierarchical clustering. It works from the bottom up.
At the beginning, each observation is its own cluster. The algorithm then repeatedly merges the two closest clusters. This continues until all observations are merged into one large cluster or until a chosen stopping rule is reached.
The steps are simple:
- Start with each observation as its own cluster.
- Calculate distances between clusters.
- Merge the two closest clusters.
- Recalculate distances between the new cluster and the others.
- Repeat until the hierarchy is complete.
The result can be shown as a dendrogram.
Divisive hierarchical clustering
Divisive hierarchical clustering works in the opposite direction. It starts with all observations in one large cluster and then repeatedly splits clusters into smaller groups.
This is a top-down approach. Instead of merging small clusters, it divides a large group into smaller parts.
Divisive methods are less common in many practical machine learning workflows because they can be more computationally demanding and less directly available in common tools. Still, the idea is important because hierarchical clustering can be built from either direction: bottom-up or top-down.
Agglomerative clustering builds the hierarchy by merging. Divisive clustering builds it by splitting.
Distance metric
A distance metric defines how similarity or dissimilarity between observations is measured.
If the distance is small, the observations are considered similar. If the distance is large, they are considered different.
Common distance metrics include:
- Euclidean distance – straight-line distance, often used with numerical data.
- Manhattan distance – distance measured as the sum of absolute differences across dimensions.
- Cosine distance – often used with high-dimensional text or embedding vectors.
- Correlation distance – useful when the shape of a pattern matters more than the absolute scale.
- Jaccard distance – useful for binary or set-based data.
The choice of distance metric matters. The same dataset can produce different clusters if the distance definition changes.
Linkage method
The linkage method defines how the distance between two clusters is calculated.
This is necessary because once observations merge into clusters, the algorithm must decide how far one cluster is from another. There is more than one reasonable answer.
Common linkage methods include:
- Single linkage – distance between the closest points of two clusters.
- Complete linkage – distance between the farthest points of two clusters.
- Average linkage – average distance between all points in two clusters.
- Ward linkage – merges clusters in a way that tries to minimise within-cluster variance.
Different linkage methods can produce very different dendrograms. Choosing a linkage method is not a technical detail. It affects the meaning of the final hierarchy.
Distance tells the algorithm how to compare observations. Linkage tells it how to compare clusters.
Single linkage
Single linkage uses the closest pair of points between two clusters. If any point in one cluster is close to any point in another cluster, the clusters may be merged.
This can be useful for detecting elongated or chain-like structures. But it can also create the chaining effect, where clusters become connected through a series of close points even if the broader groups are not very compact.
For example, if customers form a long gradual line of similarity, single linkage may connect them into one large cluster even when a business analyst would prefer several separate groups.
Complete linkage
Complete linkage uses the farthest pair of points between two clusters. Two clusters are close only if all points in one cluster are relatively close to all points in the other.
This tends to produce more compact clusters than single linkage. It avoids some chaining problems, but it can be sensitive to outliers because one distant point can increase the distance between clusters.
Complete linkage is often useful when you want clusters where all members are reasonably close to each other.
Average linkage
Average linkage uses the average distance between all pairs of points in two clusters.
This is a compromise between single and complete linkage. It does not depend only on the closest pair or the farthest pair. It considers the overall relationship between the two clusters.
Average linkage is often used when the analyst wants a balanced approach, but it still depends on the distance metric and data preprocessing.
Ward linkage
Ward linkage merges clusters in a way that minimises the increase in within-cluster variance. In practical terms, it tends to create compact, roughly spherical clusters.
Ward linkage is common with numerical data and Euclidean distance. It can work well when clusters are expected to be compact and variance-based grouping makes sense.
However, Ward linkage is not always suitable. If the data contains irregular shapes, strong outliers or non-Euclidean similarity, another method may be better.
The linkage method can change the whole clustering result. Hierarchical clustering should not be interpreted without knowing which linkage was used.
How hierarchical clustering works in practice
A typical workflow for hierarchical clustering looks like this:
- Choose the observations – decide what should be clustered: customers, products, documents, images, genes, events or something else.
- Prepare the features – select or create variables that describe the observations.
- Scale the data if needed – make sure variables with large numerical ranges do not dominate the distance.
- Choose a distance metric – define what similarity means.
- Choose a linkage method – define how clusters are compared.
- Run the clustering algorithm – create the hierarchy.
- Inspect the dendrogram – look for meaningful levels and group structure.
- Cut the tree if needed – choose a final number of clusters for practical use.
- Interpret clusters – describe what the groups mean in the real context.
- Validate the result – check whether clusters are stable, useful and not just artefacts.
Choosing the number of clusters
Hierarchical clustering does not always require the number of clusters to be chosen before the algorithm runs. This is one of its advantages over methods such as k-means.
Instead, the hierarchy can be created first and the number of clusters can be chosen later by cutting the dendrogram at a certain height.
There is no universal correct cut. The choice depends on the data, the distance jumps in the dendrogram, the business or scientific goal and whether the resulting clusters are useful.
A common approach is to look for a large vertical gap in the dendrogram. A cut through that gap may separate groups that are clearly different. But this is still a judgement call, not a mathematical guarantee.
Hierarchical clustering vs k-means
Hierarchical clustering and k-means are both clustering methods, but they work differently.
K-means requires the number of clusters to be chosen in advance. It assigns each observation to one of k clusters and tries to minimise distances to cluster centres.
Hierarchical clustering builds a tree of clusters. The analyst can inspect the hierarchy and choose a level later.
The practical differences are:
- K-means is often faster on larger datasets.
- K-means works best with compact, roughly spherical clusters.
- Hierarchical clustering gives a dendrogram and multi-level structure.
- Hierarchical clustering can be more interpretable for small and medium datasets.
- Hierarchical clustering can become expensive for very large datasets.
Neither method is always better. The right choice depends on the data and the purpose.
Hierarchical clustering and unsupervised learning
Hierarchical clustering belongs to unsupervised learning. This means the algorithm does not learn from predefined labels.
There is no column saying “this customer belongs to group A” or “this document belongs to topic B”. The algorithm tries to discover structure from the features provided.
This is useful when labels do not exist, but it also creates responsibility. The clusters are not automatically true categories. They are patterns produced by the chosen features, distance metric and linkage method.
Hierarchical clustering and embeddings
Embeddings are numerical representations of content such as text, images, products, users or documents. Hierarchical clustering can be applied to embeddings to group similar items.
For example, document embeddings can be clustered to find topic groups. Product embeddings can be clustered to identify similar product families. Customer embeddings can be clustered to explore behavioural segments.
However, the result depends on the quality of the embeddings. If the embedding model captures the wrong kind of similarity, the clusters will reflect that mistake. Good clustering cannot fix a poor representation.
Hierarchical clustering and text documents
Text documents are a common use case for hierarchical clustering. Articles, support tickets, reviews, emails or knowledge base pages can be represented with TF-IDF vectors or embeddings and then clustered.
The benefit is that hierarchical clustering can show topic structure at different levels. For example, a company may discover that support tickets first split into technical and billing topics, then into smaller subtopics.
This can help with content organisation, topic modelling, FAQ structure, customer support routing and SEO research.
For content work, hierarchical clustering is useful because topics often naturally have levels: broad categories, subtopics and very specific questions.
Hierarchical clustering and SEO
In SEO, hierarchical clustering can help organise keywords, search queries, pages or topics into meaningful groups.
For example, a website may have thousands of keywords. Some keywords describe the same intent. Some belong to broader topical clusters. Some are close enough to be handled on the same page, while others deserve separate landing pages.
Hierarchical clustering can help reveal this structure. It can support decisions about content hubs, internal linking, taxonomy, category pages and topic coverage.
But the method should not replace editorial judgement. Search intent, SERP differences, business value and content strategy still matter.
Hierarchical clustering and customer segmentation
Customer segmentation is another common use case. Customers can be grouped by behaviour, purchase history, engagement, value, product preference, geography or lifecycle stage.
Hierarchical clustering can show both broad groups and detailed subgroups. For example, the first split may separate active and inactive customers. Later splits may reveal high-value repeat buyers, discount-sensitive users, one-time buyers or seasonal buyers.
This can help marketing, sales and product teams understand customer structure. But the clusters should be validated against real business outcomes, not only visual similarity.
Hierarchical clustering and anomaly detection
Anomalies are unusual observations or patterns that may indicate something important or abnormal. Hierarchical clustering can help detect anomalies by showing observations that do not merge naturally with other groups.
For example, one customer, transaction or document may remain far away from all other clusters until very high in the dendrogram. That can suggest that it is unusual.
However, being far from other points does not automatically mean the observation is bad or important. It may be a rare but valid case, a data quality issue, an outlier or a meaningful niche group.
Hierarchical clustering and PCA
PCA, or principal component analysis, is often used before clustering when the data has many numerical variables.
PCA can reduce dimensionality and remove some redundancy. This can make clustering easier to visualise and sometimes more stable.
But PCA also changes the representation of the data. Clustering on principal components is not the same as clustering on the original variables. The analyst should check whether the reduced representation still preserves the structure that matters.
Hierarchical clustering and dimensionality reduction
Dimensionality reduction can help hierarchical clustering in high-dimensional data.
High-dimensional spaces can make distance measures less intuitive. Many points may appear similarly far apart. Dimensionality reduction can make patterns easier to inspect and can reduce noise before clustering.
Methods such as PCA, UMAP or autoencoders may be used depending on the data and goal. But the reduced space should be treated carefully. A cluster that appears in a 2D projection may not always represent a robust group in the original data.
Hierarchical clustering and feature selection
Feature selection matters because clustering depends heavily on the features used.
If irrelevant features are included, they can distort distances. If important features are missing, meaningful groups may disappear. If two variables repeat the same information, they may overweight one aspect of similarity.
For example, customer clustering based only on revenue may miss behavioural differences. Clustering based on too many unrelated variables may create unstable or meaningless groups.
Good clustering begins with meaningful features.
Hierarchical clustering and scaling
Scaling is one of the most important preprocessing steps for hierarchical clustering with numerical data.
If one variable uses a much larger numerical range than others, it can dominate the distance calculation. For example, income measured in euros may dominate age measured in years if the data is not scaled.
Standardisation can help put variables on a comparable scale. The right preprocessing depends on the metric, data type and business meaning of variables.
If the data is not scaled correctly, hierarchical clustering may group observations by measurement units rather than real similarity.
Hierarchical clustering and categorical data
Hierarchical clustering can be used with categorical data, but it requires appropriate representation and distance measures.
Simple numerical distance may not make sense for categories such as country, product type or industry. The analyst may need one-hot encoding, similarity rules, Gower distance or another method suitable for mixed data.
This matters because clustering algorithms do not understand business meaning automatically. They only work with the representation and distance definition they are given.
Hierarchical clustering and missing values
Missing values can create problems for hierarchical clustering because distance calculations usually require complete comparable values.
Common approaches include removing rows, removing variables, imputing missing values or using distance methods that can handle missingness. The right choice depends on why the values are missing and how much information would be lost.
Missing values should not be treated casually. In some datasets, missingness itself may be meaningful. For example, missing customer information may signal a different onboarding path or a data collection problem.
Hierarchical clustering and outliers
Hierarchical clustering can be sensitive to outliers. An outlier may form its own branch in the dendrogram or distort distances between clusters.
This is not always bad. Sometimes the outlier is exactly what the analyst wants to detect. But if the outlier is a data error, it may make the hierarchy misleading.
Before clustering, analysts should inspect extreme values, impossible values and data quality anomalies. The decision should be documented: remove, transform, cap, keep or investigate.
Hierarchical clustering and model explainability
Model explainability matters because clustering results need interpretation.
A dendrogram can show that observations are grouped, but it does not automatically explain why those groups make sense. The analyst must inspect cluster features, representative examples and differences between groups.
For example, customer cluster 1 may be high-value repeat buyers. Cluster 2 may be inactive discount-sensitive buyers. Cluster 3 may be new users with high browsing activity but low purchase intent. These labels come from interpretation, not from the clustering algorithm itself.
Hierarchical clustering and data leakage
Data leakage can affect clustering workflows when clustering is used as part of a machine learning pipeline.
For example, if clustering is used to create features for a supervised model, it should be fitted only on training data inside the proper validation workflow. If clustering is fitted on the full dataset before splitting, information from the test data may influence the derived cluster features.
Data leakage is less obvious in unsupervised steps because there may be no target variable. But unsupervised preprocessing can still contaminate evaluation if it learns from data that would not be available in real use.
Hierarchical clustering and overfitting
Overfitting is usually discussed in supervised learning, but clustering can also be overinterpreted.
An analyst may find a beautiful dendrogram in one dataset and assume the groups are real, stable and useful. But if the same clustering structure disappears with slightly different data, different features or another distance metric, the interpretation may be fragile.
This is why hierarchical clustering should be checked for stability. A cluster should not be trusted only because it appears once in one dendrogram.
Hierarchical clustering and AI risk management
AI risk management can use clustering to inspect system behaviour, detect unusual patterns or organise incidents.
For example, AI outputs can be clustered to find repeated failure types. User prompts can be clustered to identify common misuse patterns. Retrieval failures in a RAG system can be clustered to discover source or chunking problems.
But clustering itself can also create risk if it is used for high-impact decisions without explanation or validation. If people are segmented into groups that affect offers, prices, services or treatment, governance and fairness checks become important.
Hierarchical clustering and RAG systems
RAG systems often use document embeddings and similarity search. Hierarchical clustering can help analyse the structure of the document collection.
For example, a company knowledge base may contain product manuals, support tickets, legal documents and technical notes. Clustering can show whether these document types form clear groups or whether some content is mixed together in a way that may confuse retrieval.
Hierarchical clustering can also help review chunk groups, identify duplicated content and inspect topical coverage. But it does not replace retrieval evaluation with real user questions.
Hierarchical clustering and chunking
Chunking means splitting longer documents into smaller parts for retrieval or model context.
Hierarchical clustering can be used to inspect chunks after they are embedded. If chunks from the same topic are scattered across unrelated groups, the chunking or embedding setup may need review. If many near-duplicate chunks form tight clusters, the knowledge base may contain redundant content.
This can help improve RAG quality, content organisation and internal search.
Advantages of hierarchical clustering
Hierarchical clustering has several practical advantages.
- It creates a hierarchy – analysts can inspect groups at several levels.
- It can be visualised – dendrograms make relationships easier to explore.
- It does not always require a cluster count upfront – the tree can be cut later.
- It is useful for exploratory analysis – especially on small and medium datasets.
- It can work with different distance metrics – depending on data type and goal.
- It can reveal nested structure – broad groups and subgroups appear together.
- It supports interpretation – the merge structure can help explain group relationships.
Limitations of hierarchical clustering
Hierarchical clustering also has important limitations.
- It can be computationally expensive – especially for large datasets.
- It is sensitive to distance choices – different metrics can produce different trees.
- It is sensitive to linkage choices – single, complete, average and Ward linkage behave differently.
- It can be affected by scaling – variables with larger ranges may dominate.
- It can be affected by outliers – extreme values may distort the hierarchy.
- Merges are usually irreversible – once clusters are merged, the algorithm does not normally split them later.
- Dendrograms can become unreadable – large datasets can produce very complex trees.
- Clusters still need interpretation – the algorithm does not know business meaning.
Hierarchical clustering can look very convincing visually. But a dendrogram is only as reliable as the features, distance metric, linkage method and data quality behind it.
When hierarchical clustering is useful
Hierarchical clustering is useful when you want to explore structure and relationships between groups.
It is especially useful for:
- small and medium datasets – where dendrograms remain readable,
- customer segmentation – broad segments and detailed subsegments,
- document organisation – topics, subtopics and content families,
- SEO keyword grouping – keyword clusters and search intent groups,
- product taxonomy – products, variants and categories,
- bioinformatics – gene expression patterns and sample similarity,
- exploratory data analysis – understanding possible group structure,
- embedding analysis – inspecting relationships in vector representations,
- anomaly investigation – identifying points that do not merge naturally with others.
When hierarchical clustering may be a poor choice
Hierarchical clustering may be a poor choice when the dataset is very large, the dendrogram becomes unreadable or the main goal is fast production clustering at scale.
It may also be unsuitable when the distance metric is unclear, features are weak or the data is extremely noisy. In those cases, the hierarchy may reflect preprocessing artefacts rather than meaningful structure.
For large production systems, other clustering methods or approximate methods may be more practical. Hierarchical clustering is often strongest as an exploratory and interpretive tool.
Common mistakes with hierarchical clustering
Hierarchical clustering is easy to run, but easy to misread.
- Ignoring scaling – large-range variables dominate the distance.
- Using the wrong distance metric – similarity is defined poorly for the data type.
- Not reporting linkage – the dendrogram cannot be interpreted properly.
- Cutting the tree mechanically – the number of clusters is chosen without context.
- Overinterpreting small branches – not every branch is meaningful.
- Ignoring outliers – extreme values distort cluster structure.
- Using irrelevant features – clusters reflect noise or arbitrary variables.
- Assuming clusters are natural truths – clustering creates a structure, but that structure must be validated.
- Using clustering results in supervised models without proper validation – this can create leakage if the workflow is wrong.
- Forgetting the business question – a mathematically clean cluster may not be useful.
How to validate hierarchical clustering
Validation is difficult because clustering is unsupervised. There is often no correct answer label.
Useful validation approaches include:
- stability checks – do similar clusters appear if the data changes slightly?
- feature interpretation – do clusters differ in meaningful variables?
- business usefulness – can the clusters support real decisions?
- comparison of linkage methods – do conclusions change strongly?
- comparison of distance metrics – are results robust to reasonable alternatives?
- expert review – do domain specialists recognise the groups?
- external outcomes – do clusters differ in conversion, churn, risk, value or another relevant metric?
A clustering result should be treated as a hypothesis about structure. It becomes useful when it is stable, interpretable and connected to a real task.
Hierarchical clustering in everyday language
You can think of hierarchical clustering as organising a messy desk into piles, then grouping those piles into larger categories.
At first, you may put very similar documents together. Then you may group those small piles into broader themes. Then you may group those themes into a few large folders.
The final tree shows both the small groups and the large groups. That is the main value of hierarchical clustering: it does not force one level of organisation.
How to remember hierarchical clustering
Hierarchical clustering can be remembered as clustering that builds a family tree of data points.
Close observations become close relatives. Small clusters become larger families. The dendrogram shows how all groups are connected.
The method is useful for exploration, but it must be interpreted carefully. The tree depends on the features, scaling, distance metric, linkage method and data quality.
Hierarchical clustering = grouping data into a tree. It shows nested relationships between observations, from small similar groups to broader clusters.
Related terms
- Clustering – the broader task of grouping similar observations without predefined labels.
- Dendrogram – a tree-like diagram that shows how observations and clusters merge in hierarchical clustering.
- Agglomerative clustering – a bottom-up form of hierarchical clustering where each observation starts as its own cluster and clusters are merged step by step.
- Divisive clustering – a top-down form of hierarchical clustering where one large cluster is split into smaller clusters.
- Linkage – the rule used to calculate distance between clusters.
- Single linkage – linkage based on the closest points between two clusters.
- Complete linkage – linkage based on the farthest points between two clusters.
- Average linkage – linkage based on the average distance between points in two clusters.
- Ward linkage – linkage that tries to minimise the increase in within-cluster variance.
- Distance metric – the rule used to measure similarity or dissimilarity between observations.
- Machine learning – the broader field in which systems learn patterns from data.
- Embedding – a numerical representation of content. Hierarchical clustering can group embeddings into topic or similarity structures.
- PCA – principal component analysis – a dimensionality reduction method often used before clustering high-dimensional numerical data.
- Dimensionality reduction – reducing data into fewer dimensions. It can help visualise or prepare data for clustering.
- Feature selection – choosing useful variables. Clustering quality depends heavily on which features are used.
- Anomaly – an unusual observation or pattern that may stand apart from the main cluster structure.
- Model explainability – the ability to understand why a model or method produced a certain result.
- Data leakage – a situation where information unavailable in real use influences training or evaluation.
- Overfitting – a model or analysis fitting one dataset too closely and failing to generalise.
- RAG – retrieval-augmented generation. Hierarchical clustering can help analyse document groups in RAG knowledge bases.
- Chunking – splitting longer content into smaller parts. Cluster analysis can help inspect chunk groups and redundancy.
- AI risk management – identifying, measuring, controlling and monitoring risks created by AI systems.
Sources and further reading
- AgglomerativeClustering – scikit-learn.org – June 2026 – technical documentation describing agglomerative clustering and linkage criteria such as ward, complete, average and single linkage.
- Clustering – scikit-learn.org – June 2026 – scikit-learn documentation explaining that AgglomerativeClustering performs hierarchical clustering with a bottom-up approach where observations start as individual clusters and are successively merged.
- Plot Hierarchical Clustering Dendrogram – scikit-learn.org – June 2026 – example showing how to plot a dendrogram for hierarchical clustering using AgglomerativeClustering and SciPy.
- linkage – docs.scipy.org – June 2026 – SciPy documentation for performing hierarchical agglomerative clustering from observation vectors or a condensed distance matrix.
- dendrogram – docs.scipy.org – June 2026 – SciPy documentation explaining how dendrograms visualise hierarchical clustering and how cluster merges are represented.
- What is Hierarchical Clustering? – ibm.com – June 2026 – explains hierarchical clustering as an unsupervised algorithm that groups data into a tree of nested clusters, usually visualised with a dendrogram.
- Hierarchical Clustering – cs.princeton.edu – June 2026 – Princeton lecture notes explaining hierarchical agglomerative clustering, dendrograms and how hierarchical methods support exploration compared with flat approaches.
- Hierarchical Clustering: Objective Functions and Algorithms – arxiv.org – June 2026 – research paper discussing hierarchical clustering as recursive partitioning of a dataset into clusters at increasingly fine granularity.
- Ward’s Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm – arxiv.org – June 2026 – paper reviewing Ward’s hierarchical clustering method and its variance-related clustering criterion.
Was this article helpful?
Support us to keep up the good work and to provide you even better content. Your donations will be used to help students get access to quality content for free and pay our contributors’ salaries, who work hard to create this website content! Thank you for all your support!
Reaction to comment: Cancel reply