cepy package¶
Submodules¶
cepy.ce module¶
- class cepy.ce.CE(dimensions: int = 30, walk_length: int = 20, num_walks: int = 800, permutations: int = 100, p: float = 1, q: float = 1, weight_key: str = 'weight', workers: int = 1, sampling_strategy: Optional[dict] = None, verbosity: int = 1, temp_folder: Optional[str] = None, seed: Optional[int] = None, window: int = 3, min_count: int = 0, iter: int = 1, save_walks: bool = False, word2vec_kws: dict = {}, pregenerated_walks: Optional[list] = None)¶
Bases:
object
The main Cepy class for buildings and fitting the connectome embedding model
- Parameters
dimensions (int, optional) – Number of embedding dimensions.
walk_length (int, optional) – Number of nodes in each walk.
num_walks (int, optional) – Number of walks initiated from each node.
permutations (int, optional) – Number of independent fitting iteration.
p (float, optional) – Return hyper parameter (see 1).
q (float, optional) – In-out parameter (see 1).
weight_key (str, optional) – On weighted graphs, this is the key for the weight attribute.
workers (int, optional) – Number of workers for parallel execution.
sampling_strategy (dict, optional) – Node specific sampling strategies, supports setting node specific ‘q’, ‘p’, ‘num_walks’ and ‘walk_length’. Set to None for homogeneous sampling.
verbosity (int, optional) – Verbosity level from 2 (high) to 0 (low).
seed (int, optional) – Seed for the random number generator. Deterministic results can be obtained if seed is set and workers=1.
window (int, optional) – The maximum number of steps between the current and predicted node within a sequence.
min_count (int, optional) – Ignores all nodes with total frequency lower than this.
iter (int, optional) – Number of iterations (epochs) over all random walk samples.
save_walks (bool, optional) – Whether to save the sampled random walks, if True will result in larger memory consumption.
word2vec_kws (dict, optional) – Additional parameteres for gensim.models.Word2Vec. Notice that window, min_count, iter should be entered as separate parameters (would be ignored).
temp_folder (str, optional) – Path to folder with enough space to hold the memory map of self.d_graph (for big graphs); to be passed joblib.Parallel.temp_folder.
pregenerated_walks (list, optional) – List of lists of node names, the walks to train the word2vec model
References
- 1(1,2)
Grover, A., & Leskovec, J. (2016, August). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 855-864).
Examples
>>> #Learn embeddings for a given connectome: >>> import numpy as np >>> import cepy as ce >>> sc_group = ce.get_example('sc_group_matrix') >>> ce_group = ce.CE(permutations=1, seed=1) # initiate the connectome embedding model >>> ce_group.fit(sc_group) # fit the model Start training 1 word2vec models on 1 threads. >>> '%.2f' % ce_group.similarity()[0, 1] # Extract the cosine similarity between node 0 and 1 '0.62' >>> ce_group.save_model('group_ce_copy.json') # save a model: >>> ce_loaded_copy = ce.load_model('group_ce_copy.json') # load it >>> # Extract the same cosine similarity again, this should be identical apart from minor numerical difference >>> '%.2f' % ce_loaded_copy.similarity()[0, 1] '0.62'
- class Weights¶
Bases:
object
Stores the trained weight (W and W’ matrices) of all fitting permutations.
Extract the weights with
get_w_permut(index, norm_flag)
andget_w_mean(norm_flag)
orget_w_apos_permut(index, norm_flag)
andget_w_apos_mean(norm_flag)
. If norm_flag is set to True l2 normalization would apply on each vector before extraction.- get_w_apos_mean(norm=True)¶
- get_w_apos_permut(index=0, norm=True)¶
- get_w_mean(norm=True)¶
- get_w_permut(index=0, norm=True)¶
- fit(X: array)¶
Sample random walks and fit a word2vec model.
- Parameters
X (ndarray) – Input adjacency matrix, shape: (n_nodes, n_nodes)
- Returns
walks – List of lists of nodes
- Return type
list, optional
- pickle_model(path, compress=False)¶
Save a model to a pikle object
- Parameters
path (str) – Path to the file.
compress (bool) – Whether to compress the file with gzip
Examples
>>> #Load a model and save to file: >>> import cepy as ce >>> data_path = ce.get_examples_path() >>> ce_subject1 = ce.load_model(data_path + '/ce_subject1.json.gz') >>> ce_subject1.pickle_model('saved_model.pkl')
- save_model(path, compress=False)¶
Save a model to a pikle object
- Parameters
path (str) – Path to the file.
compress (bool) – Whether to compress the file with gzip
Examples
>>> #Load a model and save to file: >>> import cepy as ce >>> data_path = ce.get_examples_path() >>> ce_subject1 = ce.load_model(data_path + '/ce_subject1.json') >>> ce_subject1.save_model('saved_model.json')
- similarity(*args, **kwargs)¶
- cepy.ce.get_example(name)¶
Returns an existing file example. Can be used for testing/ experimenting.
- Parameters
file (str) – File name (without the extention).
- Returns
path – path to the file
- Return type
str
Examples
>>> #Load an existing connectome embedding model: >>> import cepy as ce >>> ce_subject1= ce.get_example('ce_subject1') >>> w = ce_subject1.weights.get_w_mean() >>> w.shape (200, 30)
- cepy.ce.get_examples_path()¶
Returns the file examples path.
- cepy.ce.load_model(path)¶
Returns a saved model from a pikle object
- Parameters
path (str) – Path to the file.
- Returns
x
- Return type
Examples
>>> # Save then load a model >>> import cepy as ce >>> ce_subject1 = ce.get_example('ce_subject1') >>> sim = ce_subject1.similarity() >>> '%.2f' % sim[2,5] '0.16' >>> ce_subject1.save_model('ce_subject1_copy.json') >>> ce_subject1_copy = ce.load_model('ce_subject1_copy.json') >>> sim = ce_subject1_copy.similarity() >>> '%.2f' % sim[2,5] '0.16'
- cepy.ce.model_from_dict(m_dict)¶
- cepy.ce.similarity(X, Y=None, permut_indices=None, method='cosine_similarity', norm=None)¶
Derive several similarity measures among nodes within the same connectome embeding or among differnet embeddings
- Parameters
X (CE) – The first connectome embedding class on which we perform the similarity measurement
Y (CE, optional) – The second connectome embedding class on which we perform the similarity measurement. If None, then Y = X.
permut_indices (tuple or list of tuple, optional) – Indices pairs of permutation (idependent fitting iterations) of the first and secocond CEs. Similarity would be taken for X[index1] and Y[index2]. For a list of tuples similarity would be taken for all pairs. If None all possible pairs are tested.
method (str, optional) – The similarity measure, one of ‘cosine_similarity’ | ‘hadamard’ | ‘l1’ | ‘l2’.
norm (str, optional) – Which norm sholud be taken before the smilarity measure, on of ‘l1’ | ‘l2’ | ‘max’. If None no normalization is applied. This has no effect on cosine similarity.
- Returns
x
- Return type
{(num_nodes, num_nodes), (num_nodes, num_nodes, num_embedding_dim)} ndarray or list of ndarray
Examples
>>> #Load, align and measure the similarity among two connectome embedding: >>> import numpy as np >>> import cepy as ce >>> ce_subject1 = ce.get_example('ce_subject1') >>> sim = ce.similarity(ce_subject1, ce_subject1, method='cosine_similarity') >>> '%.2f' % sim[3,2] '0.83' >>> sim = ce_subject1.similarity(ce_subject1, method='cosine_similarity') # equivalent >>> '%.2f' % sim[3,2] '0.83' >>> ce_subject2 = ce.get_example('ce_subject2') >>> ce_group = ce.get_example('ce_group') >>> # aligned both subject to the group consensus space >>> ce_subject1_aligned = ce.align(ce_group, ce_subject1) >>> ce_subject2_aligned = ce.align(ce_group, ce_subject2) >>> # and measure the similarity among all corresponding nodes across subjects >>> sim = ce.similarity(ce_subject1, ce_subject2, method='cosine_similarity') >>> diagonal_indices = np.diag_indices(sim.shape[0]) >>> '%.2f' % sim[diagonal_indices].mean() '0.57'
cepy.embed_align module¶
- cepy.embed_align.align(base_ce, target_ce, base_index=0, target_indices='all')¶
Aligned connectome embeddings originated from independent fitting iteration
- Parameters
base_ce (CE) – Containes the latent space for which all target connectome embeddings would be aligned to
target_ce (CE) – The connectome embeddings to be aligned
base_index (int, optional) – The index of the connectome embedding iteration within base_ce
target_indices (str or list) – Index of the connectome embedding within target_ce to be aligned. if set to ‘all’ then all available fitting iteration are aligned.
Examples
>>> #Load, align and measure the similarity among two connectome embedding: >>> import numpy as np >>> import cepy as ce >>> ce_subject1 = ce.get_example('ce_subject1') >>> ce_subject2 = ce.get_example('ce_subject2') >>> sim = ce.similarity(ce_subject1, ce_subject2, method='cosine_similarity') >>> diagonal_indices = np.diag_indices(sim.shape[0]) >>> '%.2f' % sim[diagonal_indices].mean() # measure the similarity among all corresponding nodes across subjects '0.57' >>> # now we repeat the process but first align the two: >>> ce_group = ce.get_example('ce_group') >>> ce_subject1_aligned = ce.align(ce_group, ce_subject1) >>> ce_subject2_aligned = ce.align(ce_group, ce_subject2) >>> sim = ce.similarity(ce_subject1_aligned,ce_subject2_aligned,method='cosine_similarity') >>> '%.2f' % sim[diagonal_indices].mean() '0.79'
cepy.parallel module¶
- cepy.parallel.get_hash(astring)¶
Returns consistent values to the word2vec model to ensure reproducibility.
Replace python’s inconsistent hashing function (notice this is not a real hashing function but it will work for the current use).
- cepy.parallel.parallel_generate_walks(d_graph: dict, global_walk_length: int, num_walks: int, cpu_num: int, sampling_strategy: Optional[dict] = None, num_walks_key: Optional[str] = None, walk_length_key: Optional[str] = None, neighbors_key: Optional[str] = None, probabilities_key: Optional[str] = None, first_travel_key: Optional[str] = None, seed: Optional[int] = None, verbosity: int = 1) list ¶
Generates the random walks which will be used as the skip-gram input.
- Returns
List of walks. Each walk is a list of nodes.
- cepy.parallel.parallel_learn_embeddings(walks_file, word2vec_kws, nonzero_indices, num_nodes, cpu_num, verbosity)¶
Fit the node2vec model on the sampled walks and returns the learned parameters.
- Returns
A dictionary with the w and w’ parameters and the final training loss.
cepy.utils module¶
- cepy.utils.check_adjacency_matrix(X)¶
- cepy.utils.normalize(X, norm='l2', axis=1)¶
Scale input vectors individually to unit norm (vector length).
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The data to normalize, element by element. scipy.sparse matrices should be in CSR format to avoid an un-necessary copy.
norm ({'l1', 'l2', 'max'}, default='l2') – The norm to use to normalize each non zero sample (or each non-zero feature if axis is 0).
axis ({0, 1}, default=1) – axis used to normalize the data along. If 1, independently normalize each sample, otherwise (if 0) normalize each feature.
- Returns
X – Normalized input X.
- Return type
{ndarray, sparse matrix} of shape (n_samples, n_features)
- cepy.utils.row_norms(X, squared=False)¶
Row-wise (squared) Euclidean norm of X. Equivalent to np.sqrt((X * X).sum(axis=1)), but also supports sparse matrices and does not create an X.shape-sized temporary. Performs no input validation. :param X: The input array. :type X: array-like :param squared: If True, return squared norms. :type squared: bool, default=False
- Returns
The row-wise (squared) Euclidean norm of X.
- Return type
array-like