解析hierarchical.py内部函数
From Agglomerative clustering
,
self.children_, self.n_components_, self.n_leaves_, parents = \ memory.cache(tree_builder)(X, connectivity, n_clusters=n_clusters, **kwargs)
memory.cache
input X,connectivity
to tree_builder
(line 739), tree_builder
is initialized by self.linkage
(line 712)
tree_builder = _TREE_BUILDERS[self.linkage]
Here, linkage='ward'
by default, go back to ward_tree
(line 86)
The connectivity
parameter is a $n\times n$ can restrict the clustering, those have no connectivity can not be clustered together
connectivity, n_components = _fix_connectivity(X, connectivity, affinity='euclidean')# generalized connectivity, prevent it from empty, all are True by default
then, create nodes:
if n_clusters is None: n_nodes = 2 * n_samples - 1# binary tree has 2*n-1 nodes totallyelse: n_nodes = 2 * n_samples - n_clusters# stop when there are enough clusters, if never stop then it will go to 1 cluster finally
build a heap for inertia
, then pop one by one to build the cluster tree
heapify(inertia)
in the loop (begin from line 239)
for k in range(n_samples, n_nodes): # identify the merge while True: inert, i, j = heappop(inertia) if used_node[i] and used_node[j]: break parent[i], parent[j] = k, k children.append((i, j)) # merge i and j, stored in children
after merge, put the new node in the heap (line 274)
[heappush(inertia, (ini[idx], k, coord_col[idx])) for idx in range(n_additions)]
then one iteration ends