Telling stories [P6,P7,P8]

We have developed a linear-time delay algorithm for enumerating all directed acyclic subgraphs of a directed graph G(V,E) that have their sources and targets included in two subsets S and T of V, respectively. From these subgraphs, called pitches, the maximal ones, called stories, may be extracted in a dramatically more efficient way in relation to a previous story telling algorithm (see the paper by Acuña et al.). The improvement may even increase if a pruning technique is further applied that avoids generating many pitches which have no chance to lead to a story. We experimentally demonstrate these statements by making use of a quite large dataset of real metabolic pathways and networks.

In order to evaluate the efficiency of our new algorithm for the enumeration of stories, called Touché, we performed three experiments, two of them comparing Touché with the previous algorithm, called Gobbolino, and the third one to evaluate the effect of the pruning approach. You can get the Java code, used in our experiments, as a single zip archive. Unzipping it, you create a folder called storysoftware. Inside it, you will find two jar files, that is, ciak.jar and gobbolino.jar.

  • Experiment 1: our first experiment consisted in the enumeration of the whole set of stories using both Gobbolino and Touché (with the pruning approach implemented) and the comparison of their running time. The dataset of this experiment consists of 62 pathways with no more than 10 vertices. For each of the examined networks (see the corresponding network page) we run the following two commands (the network file has to be included into a subfolder, called input, of the folder containing the jar file): java -jar -Xmx1000M gobbolino.jar -n="" -q=all and java -jar -Xmx1000M ciak.jar -f="" -verb=0 -bc -sc. Each of these commands will return the number of stories and the execution time.
  • Experiment 2: our second experiment consisted in comparing the randomized approach of Gobbolino to Touché (with the pruning approach implemented), giving a fixed amount of time for both algorithms (1 minute, in our experiment) and checking how many stories each method produced. The dataset of the second experiment consists of 118 metabolic networks of various sizes. The dataset may be divided as follows: 8 networks (with size greater than or equal to 10) come from the same metabolic pathways considered in the first experiment; 4 networks are inputs for some experiments in which the set of black vertices came from biological experiments; the remaining 106 were metabolic networks downloaded from the public database MetExplore and with a random set of black vertices (5% of the vertices of the graph were considered to be black). For each of the examined networks (see the corresponding network page) we run the following two commands (the network file has to be included into a subfolder, called input, of the folder containing the jar file): java -jar -Xmx1500M gobbolino.jar -n="" -q=10000 -stop=TIME -stopTime=1 -stopTimeCondition=elapsed and java -jar -Xmx1500M ciak1min.jar -f="" -verb=0 -bc -sc
    (to run the second command, the time limited version of Touché is needed). Each of these commands will return the number of produced stories.
  • Experiment 3: our third experiment was designed in order to evaluate how effective is the pruning approach described in the previous section. To this aim, we collected the running time of Touché with and without the pruning. The dataset of this experiment consists of 62 pathways with no more than 10 vertices. For each of the examined networks (see the corresponding network page) we run the following two commands (the network file has to be included into a subfolder, called input, of the folder containing the jar file): java -jar -Xmx1500M ciak.jar -f="" -verb=0 and java -jar -Xmx1500M ciak.jar -f="" -verb=0 -allpitches. Each of these commands will return the number of stories and the execution time.
  • Results: in the tables contained in this PDF file the results of the three experiments are detailed.