MONGOOSE Homepage

Recent Developments: The easiest way to install MONGOOSE and its graphical user interface (GUI) is via the Docker [Boettiger, 2015] distribution available at https://hub.docker.com/r/ctlevn/mongoose/. *

MONGOOSE (MetabOlic Network GrOwth Optimization Solved Exactly) is a package for structural analysis and refinement of constraint-based metabolic networks. Unlike other existing software, MONGOOSE uses exact rational arithmetic, which makes its results certifiably accurate. The MetaMerge algorithm (Chindelevitch et al, Genome Biology 2012, 13:r6, http://genomebiology.com/2012/13/1/r6), is based on and fully integrated with MONGOOSE. The operation of MONGOOSE requires the esolver executable from QSOpt_ex, available for download at http://www.dii.uchile.cl/~daespino/ESolver_doc/main.html, to be located in the working directory from which you run the Python script ModelProcessing.py. To check the results of an external program's solution to a metabolic problem please use the script ExternalChecker.py.

MONGOOSE version 1.1 source code is available here.

An updated table summarizing the results of applying MONGOOSE to a number of published metabolic models is available here.We originally reported the novel finding that the biomass was blocked in 44 out of the 89 models we examined. Of these, 10 models were those curated by the BIGG database, and 3 of these were reported as blocked in our analysis but not by COBRA. We have traced this difference to different parsing in two of them, EC3 (iAF1260) and HP2 (iIT341), and to a bug in one of our utility functions in one other, MT1 (iNJ661). Of the other 79 models, which are each linked to by the UCSD repository but not curated by it, 37 are available in SBML format. Of these, our analysis identifies 8 as still being blocked, with either COBRA-style or MONGOOSE-style parsing. However, the blockage we originally reported in an additional 5 models can be traced to differences in parsing. Of the remaining 42 models that we find only in the older Excel format, which to our knowledge COBRA cannot parse, we find 28 exhibit blockage. Overall, we now find a total of 36 blocked models, even if we use COBRA-style parsing. Our key finding that many of the available models are blocked remains valid.

Please cite our Nature Communications 2014 paper when referencing Mongoose.

* Le, Christopher, and Leonid Chindelevitch. The MONGOOSE Rational Arithmetic Toolbox. Methods in Molecular Biology, Springer New York, 2017, 77–99.


Commentary

In response to reports by members of the Palsson group of discrepancies between their results and ours on several metabolic network models, we would like to clarify several aspects of our work that differentiate it from other work and address issues relevant to the reported discrepancies. In particular, we discuss the unique capabilities of MONGOOSE to identify structural elements of metabolic networks, the constraints imposed on biomass coefficient vectors by metabolic network structure, and the differences in design between the MONGOOSE and COBRA parsers. We also provide updated code, an expanded set of results and a tutorial for MONGOOSE.


The first key difference between MONGOOSE and other work is that the MONGOOSE toolbox is designed for the structural analysis of metabolic networks, and while it is able to perform “standard” flux balance analysis (FBA), its main purpose is to identify structural issues of metabolic network models, such as topology, stoichiometry and irreversibility-blocked and semi-blocked reactions, and propose solutions to address these issues. Its secondary purpose is to perform compression of metabolic network models without loss of information to simplify their subsequent analyses. MONGOOSE is, to our knowledge, unique in its ability to perform both of these tasks in a way consistent with the model assumptions, and it is able to achieve this by using exact rational arithmetic.


The second aspect we would like to clarify is the difference between a model’s ability to produce biomass and its ability to produce each component of the biomass individually. In the case of toy model 1 below these conditions are indeed equivalent because the production of D and the production of F are decoupled, and the biomass coefficients d and f can be any non-negative values. However, in the case of toy model 2 these conditions are not equivalent because the production of D from B and of F from C are coupled through A, and are further constrained by the presence of the extra metabolite E. For this reason, biomass production is only possible if d = f and e = d + f. We emphasize that most published metabolic network models fall into the second category, which places linear constraints on the possible coefficients of the biomass reaction, making it very sensitive to small perturbations. This property can lead the biomass reaction to being stoichiometry or irreversibility-blocked.


Toy model 1:

R0:   A[e] -> A

R1:   A -> B

R2:   A -> C

R3:   B -> D

R4:   C -> F

R5:   dD + fF -> Biomass

ObjR: Biomass -> Biomass[e] 

Toy model 2:

R0:   A[e] -> A

R1:   A -> B + C


R2:   B -> D + E

R3:   C -> E + F

R4:   dD + eE + fF -> Biomass

ObjR: Biomass -> Biomass[e] 


The third issue concerns the differences in parsing the various metabolic models between MONGOOSE and COBRA. Our parser is able to flag potential errors in the models, and as a consequence we’ve made all the changes that we listed in Supplementary Data 2. In addition, we wanted to be able to analyze as many of the previously published models (as posted in the UCSD repository) as we could for our study, including both those in Excel format and those in SBML format levels 1 through 3. This desire led us to make design decisions that were different from those that went into the COBRA parser, which, to our knowledge, currently only handles SBML level 3 and above. We summarize the differences between the two parsers on SBML level 3 files in the table below.


Situation/decision

MONGOOSE approach

COBRA approach

#models affected

Which metabolites will be constrained to balance (in S v = 0)?

All except those with compartment set to extracellular/external 

All except those with boundaryCondition set to true or ending in _b

20

Which reactions will be forward-only?

All those with reversible set to false

All those with at least 0 as their LOWER_BOUND 

20

Which reactions will be reverse-only?

None (except after structural analysis)

All those with at most 0 as their UPPER_BOUND

3

Which reaction is the objective (biomass)?

The user can choose among reactions with ‘biomass’ in the name

The one with a 1 in its OBJECTIVE_COEFFICIENT 

8

A reaction with the same metabolite as reactant and product

Subtract the two coefficients (if equal, cancel them out)

Only keep the reactant or only the product in the reaction representation

12

Use of a metabolite not in listOfSpecies

Hard fail (do not complete parsing)

Soft fail (omit all the unlisted metabolites)

1


Since those differences may affect the results, in order to allow the community to benefit from both approaches we created an extension to our SBML parser that is able to imitate the COBRA approach (except for the last two situations described in the table, where we believe our approach to be preferable – in fact, after we pointed out this difference the developers of cobrapy added a patch to handle this situation the way MONGOOSE does). To use it, one simply needs to set the Cobra flag to True during parsing. We additionally post all the parsed models, using both approaches whenever they result in a difference (and using the extension “c” for those parsed COBRA-style). We also provide a complete collection of the modified source files, a full record of the parsing choices made for all of the files, and an example use of MONGOOSE on a specific model, which also serves as a MONGOOSE mini-tutorial.


For completeness, we also include the analysis of many “dual-source” models (i.e. those available in both Excel and SBML formats). In such cases, we use the extension “a” for those that are parsed from an SBML source if the original parse was from the Excel source only. Thus, for instance, we’ll have XY1, XY1a, and XY1c for parses of the XY model using Excel, SBML, and SBML COBRA-style, respectively. We corrected topologically blocked models when possible by including a biomass export reaction, but did not do so in the COBRA-style parses to ensure consistency with our results. We did, however, include biomass reactions found by MONGOOSE into COBRA-style parses when the latter were not able to identify a biomass reaction on their own.


We note that in many cases, our results differ between the Excel and SBML representations of the same model. This may be due to changes in the model between the two formats, inconsistencies between the two representations, or interpretations during parsing. We would also like to clarify that all our analyses were carried out on the models linked to in the UCSD repository as of March 2013. We encourage all model authors to perform analyses of their models’ latest versions using either MONGOOSE, or cobrapy with the QSopt_ex solver.


In addition, while further checking our results we discovered a minor bug in a utility function that affects the biomass coefficients and occasionally other coefficients in some of the models. Interestingly, fixing this bug has enabled growth through an additional 4 models out of the 44 we had previously classified as initially blocked, though the other 40 remain blocked (excluding the models that are not blocked in the COBRA-style parses described above). Since most changes in coefficients are on the order of 1e-5 or 1e-6, this result further supports our statement that metabolic network models are generally very sensitive to perturbations in the biomass coefficients, and that small changes are likely to alter their qualitative behavior, such as their ability to sustain flux through specific reactions.


The updated code, MONGOOSE version 1.1, is now available at the link above, and the updated results (with the exception of structural energy balance analysis, which we are still working on as it is more time-consuming) are available in New Supplementary Data 1 (above). We are glad to see a lively debate about our work, and we would particularly like to thank members of the Palsson group, especially Ali Ebrahim, for discussions helpful in resolving discrepancies between our work and theirs.


Questions regarding this package can be directed to the following address: leonidus at mit dot edu