Guided Examples

Run Examples of our toolbox

0. Data

The first example is a manifold built by three substructures. Two arms and one head in the middle. All of them has the same amount of points, namely, 10000. The manifold is submerged in 10000 points of noise, distribuited uniformly. The file containing the data is called "Synthetic_Manifold_1.csv" and it is localized in the "Synthetic_Manifold_1" folder (here).

A visual of representation of the data is shown below:

00_Synthetic_Manifold_1-4f12f8f322933d452a1daf51655635ca.png

1. LAAT

To enter the folder, run the following command in the terminal:

cd examples/Synthetic_Manifold_1/LAAT

In the "LAAT" folder, run the script "run_LAAT.py". It will generate in the "Output" folder a file called "LAAT_output_pheromone.csv". This file will contain the pheromone level of every particle at the end of LAAT.

python3 run_LAAT.py

The parameters for the LAAT algorithm are:

#LAAT parameters
LAAT_numberOfAnts = 5**3 # Number of Ants
LAAT_numberOfIterations = 100 # Number of epochs
LAAT_numberOfSteps = 2500 # Number of steps of every ant between epochs
LAAT_th_neighb = 3 # Threshold in the amount of neighbors of the points for the pheromone deliver
LAAT_neighbdradii = 3.5 # Radius of Neighborhoods
LAAT_beta_antmovement = 5.0 # Probability parameter in the desition of the movement of the ants.
LAAT_kappa = 0.8 # Coeffient paramter given the amount of importance to reinforment or alignment
LAAT_beta_antinitialization = 5.0 # (float), Probability parameter in the desition of the ant initialization
LAAT_beta_pheromonedeliver = 0.0 # (float) Probability parameter in the pheromone delivered through steps
LAAT_total_pheromona_per_particle = 1.0 # (float) Final amount of pheromone expected per particle
LAAT_evapRate = 0.1 # amount of pheromone evaporated follow the formula Ph^n = (1-evapRate)*Ph^(n-1)
LAAT_lowerlimit = 0.000001 # Initial and minimum amount of pheromone in every point
LAAT_upperlimit = 10 # Maximum amount of pheromone in every point
LAAT_numberofthreads = 8 # Number of threds (cores) used to run the script.

When calling the function, it should look like this, where the expected variables are presented in the previous example.

pheromone = LAAT_MBMS.LAAT(data,
LAAT_numberOfAnts,
LAAT_numberOfIterations,
LAAT_numberOfSteps,
LAAT_th_neighb,
LAAT_neighbdradii,
LAAT_beta_antmovement,
LAAT_kappa,
LAAT_beta_antinitialization,
LAAT_beta_pheromonedeliver,
LAAT_total_pheromona_per_particle,
LAAT_evapRate,
LAAT_lowerlimit,
LAAT_upperlimit,
LAAT_numberofthreads)

2.- To filter the data, you can run the script "pheromone_thresholding.py" localized in the "LAAT" directory. It will generate 2 files. One will contain the parameters used and a resume of the results, while the other will contain the data filtered using the pheromone of LAAT. This last file will be called "LAAT_output_selected_data.csv".

python3 pheromone_thresholding.py

2. MBMS

Now you can run MBMS using the file "LAAT_output_selected_data.csv". To perform this, in the folder "MBMS" you can run the script "run_MBMS.py" or you can add in the terminal:

cd ..

cd MBMS/

python3 run_MBMS.py

It will generate a file called "MBMS_output.csv" (in the "Output" folder), which will contain the particles displaced towards the centers of the structures in the data, according to the MBMS method.

The parameters for the MBMS algorithm are:

#MBMS parameters

MBMS_iter = 2 # Number of times the points are picked up and moved closer to the center

MBMS_radius = 3.5 # This radius must be biger or equal than the radius used on LAAT

MBMS_sigma = 1.5

MBMS_k = 3

When calling the function, it should look like this, where the expected variables are presented in the previous example.

MBMS_data = LAAT_MBMS.MBMS(data,

iter = MBMS_iter,

radius = MBMS_radius,

sigma = MBMS_sigma,

k = MBMS_k)

3. DimIndex

Now you can run DimIndex. Go to the folder DimIndex, and run the script "run_DimIndex.py" or you can add in the terminal:

cd ..

cd DimIndex/

python3 run_DimIndex.py

It .py file will generate two .csv files, namely, the dimensionality indexes and the data filtered by those indexes. Index 0 corresponds to 1D structures, index 1 to 2D structures, and index 2 to 3D structures. We only choose the particles with an index equal to 0, i.e. 1D structures. The output file is localized in the "Output" folder and is named "Selected_data_after_DimIndex.csv".

The parameters for the DimIndex algorithm are:

#DimIndex parameters

radius = 3.5

cutoff = 5

Simplex = "Barycentric"

smooth = 'l2'

When calling the function, it should look like this, where the expected variables are presented in the previous example.

from DimIndexModule import *

#Loading the data

LAAT_data_org = pd.read_csv('../Output/LAAT_output_selected_data.csv', header=None).to_numpy()

MBMS_data_org = pd.read_csv('../Output/MBMS_output.csv', header=None).to_numpy()

#Discard any sparse neighborhoods of size 3.5 that have less than 5 points

LAAT_data, MBMS_data, Labels, _, _ = Filtering(LAAT_data_org, MBMS_data_org, radius, cutoff)

#Running Dim index function

Struct , indexes = Dim_Index(MBMS_data,radius,Simplex,smooth)

4. Crawling

Now you can run Crawling. Go to the folder "Crawling", and run the script "run_Crawling.py" or you can add in the terminal:

cd ..

cd Crawling/

python3 run_Crawling.py

Crawling uses random parameters, so the output is variable. It generates two groups of files, both inside of the "Output" folder. The first group of files is called "Nodeposx.csv" (x is a number from 1 to N, where N is the number of 1D structures in your data). Here you can find the positions of the nodes constructing the graphs (i.e., skeletons or a group of nodes connected together by edges) that model your structures. The second group of files is called "Subsetx.csv" which is the collection of subsets of data points surrounding your graphs. Therefore, there are as many files as skeletons found by Crawling. We also export all output of Crawling in a binary file "Crawling_output.pkl" so that all properties of the graphs (like the indices of their nodes, their adjacency matrix, lengths of the edges... etc) can be reused later.

The parameters for the Crawling algorithm are:

#Crawling parameters

radius = 3.5

ldim = 1

betha = 0.4

When calling the function, it should look like this, where the expected variables are presented in the previous example.

from CrawlingModule import *

#Loading the data

NoisyD = pd.read_csv('../Synthetic_Manifold_1.csv', header=None).to_numpy()

spine_data = pd.read_csv('../Output/Selected_data_after_DimIndex_original.csv', header=None).to_numpy()

#Running Crawling function

FG, NoisyMan, _ = MultiM(spine_data,NoisyD,radius,ldim,betha)

5. SGTM

Now you can run SGTM. Go to the folder SGTM, and run the script "run_SGTM.py" or you can add in the terminal:

cd ..

cd SGTM/

python3 run_SGTM.py

SGTM will model the structures in the data, each, as a mixture of Gaussian distributions. It will train the centers, covariance matrices, and the weights of these distributions to best fit the data. It will generate a file with the positions of the centers of the trained Gaussians called "SGTM_positions_node_x.csv" (x is a number from 1 to N, where N is the number of 1D structures in your data). We also export all output of SGTM in the binary file "SGTM_output.pkl" to make use of all the properties of the Gaussian Mixture Model (GMM) created.

e.g: in the plotting file "plot_SGTM.py" we show how to get the likelihoods of each point in the data to belong to a modeled structure.

The parameters for the SGTM algorithm are:

#SGTM parameters

IntDim = 1

radius = 3.5

epsilon = 2

mem = 2

When calling the function, it should look like this, where the expected variables are presented in the previous example.

from AGTMModule import *

#Loading the data

NoisyD = pd.read_csv('../Synthetic_Manifold_1.csv', header=None).to_numpy()

FG, Subsets = pickle.load(open("../Output/Crawling_output.pkl", 'rb'))

net , logL , GMDist , NoisyMan = Standardized_AGTM_InitTrain(FG, NoisyD, IntDim,radius,epsilon,mem)

6. Visualisation

For matplotlib use at least version 3.2.1, previous versions might not work.

Similar to the previous steps, we can run the plot files inside of every tool folder. So for example, to run the visualization of DimIndex, just go to the "DimIndex" folder, and run the file "DimIndex_plot.py" (obviously after having obtained the output of "run_DimIndex.py"). The output of this image is shown on the screen and also is stored inside the "Images" folder.\

The visualisation of all the steps of the process for running the first example is shown below.

6.1 Visualisation of LAAT

6.2 Visualisation of MBMS

6.3 Visualisation of DimIndex

6.4 Visualisation of Crawling

6.5 Visualisation of SGTM

1. First enter the directory of the Cosmological simulation example with:

2. Run LAAT with:

This produces the file 'LAAT_output_pheromone.csv' in the Output directory with the pheromone values at each data point.

3. Take a threshold cut in pheromone value with:

You can choose your own choice of the threshold value in the code pheremone_thresholding.py

This produces the file 'LAAT_output_selected_data.csv' in the Output directory with just the data points above the chosen threshold pheromone values

Now plot the results with:

This outputs the following image in the Images directory: '2.-selected_data_after_LAAT.png'

4. Now change to the MBMS directory and run MBMS with:

This produces the file 'MBMS_output.csv' in the Output directory. This is where the points remaining after the pheromone thresholding are concentrated onto the spines of the filaments, using the MBMS diffusion algorithm.

Now plot the results with:

This outputs the following image in the Images directory: '3.-MBMS_output.png'

5. Now change to the DimIndex directory, and run the DimIndex tool with:

This produces the file 'Selected_data_after_DimIndex.csv' in the Output directory. Here the selected data points are separated according to their dimensional index.

Now plot the results with:

This outputs the following image in the Images directory: '4.-DimIndex_output.png'

6. Finally, change to DimIndex directory, and run the DimIndex tool with:

This produces multiple files in the Output directory. These describe how to connect up the data points that were identified to be filaments in the previous step into sets of filaments.

Now plot the results with:

This outputs several image in the Images directory, split into different sets of filaments: '5.-Crawling_output_set_??.png'

There is also a file showing all the filament sets combined, as shown below: '5.-Crawling_output_full_filaments.png'