cepy package

Submodules

cepy.ce module

class cepy.ce.CE(dimensions: int = 30, walk_length: int = 20, num_walks: int = 800, permutations: int = 100, p: float = 1, q: float = 1, weight_key: str = 'weight', workers: int = 1, sampling_strategy: Optional[dict] = None, verbosity: int = 1, temp_folder: Optional[str] = None, seed: Optional[int] = None, window: int = 3, min_count: int = 0, iter: int = 1, save_walks: bool = False, word2vec_kws: dict = {}, pregenerated_walks: Optional[list] = None)

Bases: object

The main Cepy class for buildings and fitting the connectome embedding model

Parameters
  • dimensions (int, optional) – Number of embedding dimensions.

  • walk_length (int, optional) – Number of nodes in each walk.

  • num_walks (int, optional) – Number of walks initiated from each node.

  • permutations (int, optional) – Number of independent fitting iteration.

  • p (float, optional) – Return hyper parameter (see 1).

  • q (float, optional) – In-out parameter (see 1).

  • weight_key (str, optional) – On weighted graphs, this is the key for the weight attribute.

  • workers (int, optional) – Number of workers for parallel execution.

  • sampling_strategy (dict, optional) – Node specific sampling strategies, supports setting node specific ‘q’, ‘p’, ‘num_walks’ and ‘walk_length’. Set to None for homogeneous sampling.

  • verbosity (int, optional) – Verbosity level from 2 (high) to 0 (low).

  • seed (int, optional) – Seed for the random number generator. Deterministic results can be obtained if seed is set and workers=1.

  • window (int, optional) – The maximum number of steps between the current and predicted node within a sequence.

  • min_count (int, optional) – Ignores all nodes with total frequency lower than this.

  • iter (int, optional) – Number of iterations (epochs) over all random walk samples.

  • save_walks (bool, optional) – Whether to save the sampled random walks, if True will result in larger memory consumption.

  • word2vec_kws (dict, optional) – Additional parameteres for gensim.models.Word2Vec. Notice that window, min_count, iter should be entered as separate parameters (would be ignored).

  • temp_folder (str, optional) – Path to folder with enough space to hold the memory map of self.d_graph (for big graphs); to be passed joblib.Parallel.temp_folder.

  • pregenerated_walks (list, optional) – List of lists of node names, the walks to train the word2vec model

References

1(1,2)

Grover, A., & Leskovec, J. (2016, August). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 855-864).

Examples

>>> #Learn embeddings for a given connectome:
>>> import numpy as np
>>> import cepy as ce
>>> sc_group = ce.get_example('sc_group_matrix')
>>> ce_group = ce.CE(permutations=1, seed=1)  # initiate the connectome embedding model
>>> ce_group.fit(sc_group)  # fit the model
Start training  1  word2vec models on  1 threads.
>>> '%.2f' % ce_group.similarity()[0, 1]  # Extract the cosine similarity between node 0 and 1
'0.62'
>>> ce_group.save_model('group_ce_copy.json')  # save a model:
>>> ce_loaded_copy = ce.load_model('group_ce_copy.json')  # load it
>>> # Extract the same cosine similarity again, this should be identical apart from minor numerical difference
>>> '%.2f' % ce_loaded_copy.similarity()[0, 1]
'0.62'
class Weights

Bases: object

Stores the trained weight (W and W’ matrices) of all fitting permutations.

Extract the weights with get_w_permut(index, norm_flag) and get_w_mean(norm_flag) or get_w_apos_permut(index, norm_flag) and get_w_apos_mean(norm_flag). If norm_flag is set to True l2 normalization would apply on each vector before extraction.

get_w_apos_mean(norm=True)
get_w_apos_permut(index=0, norm=True)
get_w_mean(norm=True)
get_w_permut(index=0, norm=True)
fit(X: array)

Sample random walks and fit a word2vec model.

Parameters

X (ndarray) – Input adjacency matrix, shape: (n_nodes, n_nodes)

Returns

walks – List of lists of nodes

Return type

list, optional

pickle_model(path, compress=False)

Save a model to a pikle object

Parameters
  • path (str) – Path to the file.

  • compress (bool) – Whether to compress the file with gzip

Examples

>>> #Load a model and save to file:
>>> import cepy as ce
>>> data_path = ce.get_examples_path()
>>> ce_subject1 = ce.load_model(data_path + '/ce_subject1.json.gz')
>>> ce_subject1.pickle_model('saved_model.pkl')
save_model(path, compress=False)

Save a model to a pikle object

Parameters
  • path (str) – Path to the file.

  • compress (bool) – Whether to compress the file with gzip

Examples

>>> #Load a model and save to file:
>>> import cepy as ce
>>> data_path = ce.get_examples_path()
>>> ce_subject1 = ce.load_model(data_path + '/ce_subject1.json')
>>> ce_subject1.save_model('saved_model.json')
similarity(*args, **kwargs)
cepy.ce.get_example(name)

Returns an existing file example. Can be used for testing/ experimenting.

Parameters

file (str) – File name (without the extention).

Returns

path – path to the file

Return type

str

Examples

>>> #Load an existing connectome embedding model:
>>> import cepy as ce
>>> ce_subject1= ce.get_example('ce_subject1')
>>> w = ce_subject1.weights.get_w_mean()
>>> w.shape
(200, 30)
cepy.ce.get_examples_path()

Returns the file examples path.

cepy.ce.load_model(path)

Returns a saved model from a pikle object

Parameters

path (str) – Path to the file.

Returns

x

Return type

CE

Examples

>>> # Save then load a model
>>> import cepy as ce
>>> ce_subject1 = ce.get_example('ce_subject1')
>>> sim = ce_subject1.similarity()
>>> '%.2f' % sim[2,5]
'0.16'
>>> ce_subject1.save_model('ce_subject1_copy.json')
>>> ce_subject1_copy = ce.load_model('ce_subject1_copy.json')
>>> sim = ce_subject1_copy.similarity()
>>> '%.2f' % sim[2,5]
'0.16'
cepy.ce.model_from_dict(m_dict)
cepy.ce.similarity(X, Y=None, permut_indices=None, method='cosine_similarity', norm=None)

Derive several similarity measures among nodes within the same connectome embeding or among differnet embeddings

Parameters
  • X (CE) – The first connectome embedding class on which we perform the similarity measurement

  • Y (CE, optional) – The second connectome embedding class on which we perform the similarity measurement. If None, then Y = X.

  • permut_indices (tuple or list of tuple, optional) – Indices pairs of permutation (idependent fitting iterations) of the first and secocond CEs. Similarity would be taken for X[index1] and Y[index2]. For a list of tuples similarity would be taken for all pairs. If None all possible pairs are tested.

  • method (str, optional) – The similarity measure, one of ‘cosine_similarity’ | ‘hadamard’ | ‘l1’ | ‘l2’.

  • norm (str, optional) – Which norm sholud be taken before the smilarity measure, on of ‘l1’ | ‘l2’ | ‘max’. If None no normalization is applied. This has no effect on cosine similarity.

Returns

x

Return type

{(num_nodes, num_nodes), (num_nodes, num_nodes, num_embedding_dim)} ndarray or list of ndarray

Examples

>>> #Load, align and measure the similarity among two connectome embedding:
>>> import numpy as np
>>> import cepy as ce
>>> ce_subject1 = ce.get_example('ce_subject1')
>>> sim = ce.similarity(ce_subject1, ce_subject1, method='cosine_similarity')
>>> '%.2f' % sim[3,2]
'0.83'
>>> sim = ce_subject1.similarity(ce_subject1, method='cosine_similarity') # equivalent
>>> '%.2f' % sim[3,2]
'0.83'
>>> ce_subject2 = ce.get_example('ce_subject2')
>>> ce_group = ce.get_example('ce_group')
>>> # aligned both subject to the group consensus space
>>> ce_subject1_aligned = ce.align(ce_group, ce_subject1)
>>> ce_subject2_aligned = ce.align(ce_group, ce_subject2)
>>> # and measure the similarity among all corresponding nodes across subjects
>>> sim = ce.similarity(ce_subject1, ce_subject2, method='cosine_similarity')
>>> diagonal_indices = np.diag_indices(sim.shape[0])
>>> '%.2f' % sim[diagonal_indices].mean()
'0.57'

cepy.embed_align module

cepy.embed_align.align(base_ce, target_ce, base_index=0, target_indices='all')

Aligned connectome embeddings originated from independent fitting iteration

Parameters
  • base_ce (CE) – Containes the latent space for which all target connectome embeddings would be aligned to

  • target_ce (CE) – The connectome embeddings to be aligned

  • base_index (int, optional) – The index of the connectome embedding iteration within base_ce

  • target_indices (str or list) – Index of the connectome embedding within target_ce to be aligned. if set to ‘all’ then all available fitting iteration are aligned.

Examples

>>> #Load, align and measure the similarity among two connectome embedding:
>>> import numpy as np
>>> import cepy as ce
>>> ce_subject1 = ce.get_example('ce_subject1')
>>> ce_subject2 = ce.get_example('ce_subject2')
>>> sim = ce.similarity(ce_subject1, ce_subject2, method='cosine_similarity')
>>> diagonal_indices = np.diag_indices(sim.shape[0])
>>> '%.2f' % sim[diagonal_indices].mean()  # measure the similarity among all corresponding nodes across subjects
'0.57'
>>> # now we repeat the process but first align the two:
>>> ce_group = ce.get_example('ce_group')
>>> ce_subject1_aligned = ce.align(ce_group, ce_subject1)
>>> ce_subject2_aligned = ce.align(ce_group, ce_subject2)
>>> sim = ce.similarity(ce_subject1_aligned,ce_subject2_aligned,method='cosine_similarity')
>>> '%.2f' % sim[diagonal_indices].mean()
'0.79'

cepy.parallel module

cepy.parallel.get_hash(astring)

Returns consistent values to the word2vec model to ensure reproducibility.

Replace python’s inconsistent hashing function (notice this is not a real hashing function but it will work for the current use).

cepy.parallel.parallel_generate_walks(d_graph: dict, global_walk_length: int, num_walks: int, cpu_num: int, sampling_strategy: Optional[dict] = None, num_walks_key: Optional[str] = None, walk_length_key: Optional[str] = None, neighbors_key: Optional[str] = None, probabilities_key: Optional[str] = None, first_travel_key: Optional[str] = None, seed: Optional[int] = None, verbosity: int = 1) list

Generates the random walks which will be used as the skip-gram input.

Returns

List of walks. Each walk is a list of nodes.

cepy.parallel.parallel_learn_embeddings(walks_file, word2vec_kws, nonzero_indices, num_nodes, cpu_num, verbosity)

Fit the node2vec model on the sampled walks and returns the learned parameters.

Returns

A dictionary with the w and w’ parameters and the final training loss.

cepy.utils module

cepy.utils.check_adjacency_matrix(X)
cepy.utils.normalize(X, norm='l2', axis=1)

Scale input vectors individually to unit norm (vector length).

Parameters
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The data to normalize, element by element. scipy.sparse matrices should be in CSR format to avoid an un-necessary copy.

  • norm ({'l1', 'l2', 'max'}, default='l2') – The norm to use to normalize each non zero sample (or each non-zero feature if axis is 0).

  • axis ({0, 1}, default=1) – axis used to normalize the data along. If 1, independently normalize each sample, otherwise (if 0) normalize each feature.

Returns

X – Normalized input X.

Return type

{ndarray, sparse matrix} of shape (n_samples, n_features)

cepy.utils.row_norms(X, squared=False)

Row-wise (squared) Euclidean norm of X. Equivalent to np.sqrt((X * X).sum(axis=1)), but also supports sparse matrices and does not create an X.shape-sized temporary. Performs no input validation. :param X: The input array. :type X: array-like :param squared: If True, return squared norms. :type squared: bool, default=False

Returns

The row-wise (squared) Euclidean norm of X.

Return type

array-like

Module contents