Skip to content

2. KnetMiner server architecture

Marco Brandizi edited this page Sep 15, 2022 · 16 revisions

Overview of the Knetminer internal architecture

KnetMiner consists of several components, as shown in figure 1. This includes:

  • server-datasource-api: a set of core interfaces and components defining Knetminer functionality as a web service. A central concept in this module is the idea of data source, which provides the source to refer to in order to realise functions like gene searching or knowledge graph extraction.

  • server-datasource-ondexlocal: An implementation of server-datasource-api based on data fetched from an Ondex .oxl file and kept in memory. This is the currently only available implementation of the abstract API, so, in practice, we rely on this to provide the API functions. A Knetminer instance based on Ondex can be configured entirely via configuration files, the configuration loading starts with a web listener (ConfigFileHarvester in web.xml).

  • server-base: this defines a skeleton of a .war Java web application, which mostly maps
    server-datasource-api functions to web API URL-based calls. As other modules in Knetminer, this largely based on Spring. The resulting .war is essentially a web service, accessible programmatically and without an end user-dedicated interface.

  • client-base: similarly, this contains a skeleton for the Knetminer application that is visible to the end user, hence it defines most of the user interface.

  • aratiny: this project and its Maven submodules has a double function:

    • when built with the default configuration, which is defined in various Maven POMS, it produces a simple Knetminer instance, which is based on a small demo OXL file (a subset sampled from our Arabidopsis dataset). This kind of building is part of the Knetminer Maven build, which uses this sample application to perform some integration tests by calling the corresponding API. The application is bases on aratiny-ws and aratiny-client, which are concrete extensions of server-base and client-base mentioned above. Such extensions depend on server-datasource-ondexlocal, ie, they use an OXL and Ondex components as concrete data source. Both the client and the server can be run manually, using scripts in manual-test, which offer options to run this sample application with different options (eg, using the Neo4j graph traverser).
    • aratiny is also a template for building our Knetminer Docker image. This, when launched with its defaults, runs the same sample application over the same small sample dataset. However, a Knetminer Docker container can be configured with data, configuration parameters and user interface customisations about a specific Knetminer dataset (eg, about a given organism) and run a particular Knetminer instance. This is explained in the detail by our Docker documentation. This kind of reconfiguration and instantiation of aratiny is also possible when it's run directly, without Docker (though you should bear in mind we don't test this mode very often).

KnetMiner Figure 1: arrangement of the main KnetMiner Maven modules (POM's artifactId reported). The dependecies between WAR modules are those established by the overlaying mechanism provided by the Maven WAR plug-in.

Data sources

Note this part of the Knetminer architecture is rather cumbersome for the way we now use the code.Data sources described below were designed with the idea of having a data source interface, abstracting Knetminer API functions, and possibly multiple implementations per server instance (ie, per WAR), with the additional idea of having different data source implementations over different data backends (eg, OXL file, relational databsase). Nowadays, we always use the OXL-based data source only (with a variant that partially uses Neo4j, see TODO) and we have constrained code extensions to support one data source only per server. The latter is because it is now much easier to distribute data sets over Docker instances and also, we plan to switch to a graph database backend in future (very likely, Neo4j), but that could be implemented with a simpler architecture, where the data source implementation doesn't need to be configurable (although, very likely we will keep the separation between the abstract API and its implementation).

As mentioned above, a datasource is a basic unit of work in Knetminer. A datasource is an implementations of the KnetminerDataSource abstract class from the API. In order to have them recognised, implementing classes must be in the package rres.knetminer.datasource.provider, or within any sub-package of it, and be annotated with @Component (ie, they're based on Spring).

The Ondex-based implementation of this class, OndexLocalDataSource, contains the mechanism to bootstrapping the application configuration using the above-mentioned ConfigFileHarvester, which, in turn, is invoked by the web container when it's set up in the server's web.xml (see aratiny-ws).

To summarise, in most cases you can develop custom instances of Knetminer by working on the aratiny reference app, which already includes a configurable, Ondex-based datasource. Custom options to the latter can be defined in its configuration file.

Deploying data sources and servers

As mentioned above, a new Knetminer instance can be configured by means of configuration files and files that provide user interface customisation. We usually manage one instance per dataset, which means a single OXL file + configuration (what we call a dataset), a single pair of aratiny-ws and (customised) aratiny-client .war applications on top of the dataset, a single Tomcat server running the two web apps inside a single Docker instance. See the [Docker documentation][10.20] for many more details.

While we recommend to stick to the simple arrangement above, variants would be possible. For instance, the same Tomcat server could be running different copies of the same .war applications described above, in such a case you would just need to ensure things don't conflict (eg, ensure different names and URL paths are used, ensure distinct data paths are used).

KnetMiner applications architecture

To reiterate, KnetMiner makes use of a client-server model, where demanding computation is performed in the web server, and visualisation outputs are performed in the JavaScript-based client. The web server is based upon the Spring MVC framework, using Servlet, Ondex, and Lucene APIs to access the Genome-Scale Knowledge-Network (GSKN, or, simply knowledge networks). The web-client makes use of a range of libraries, including jQuery, KnetMaps.js, and GenoMaps.js for interactive data visualisation.

Within a Knetminer instance, the server is controlled via KnetminerServer in server-base. Incoming requests and outgoing responses are always JSON, the format of which is allowed for by KnetminerRequest and KnetminerResponse in the server-datasource-api. This one class contains all possible input parameters and suitable non-null defaults for each, regardless of which ones are used by which request. Responses are always JSON and defined by implementations of KnetminerResponse in server-datasource-api.

The web server requires, as arguments, the path to the GSKN (OXL format) and the semantic motif files (MGQE format). It pre-indexes the GSKN and waits for requests.

New server endpoints can be added here by adjusting the pattern recognition or adding new methods with appropriate annotations.

The overview of this architecture is shown below in Figure 2. This represents the internal architecture of the two .war applications that a default instance of Knetminer (eg, aratiny) consists of. This has to be coupled with details regarding how such an instance is made to work with specific data and a specific user interface. As said above, this is explained in the Docker-related documentation.

KnetMiner Figure 2: KnetMiner architecture overview, where green processes have been created by us, and dark blue processes are provided by SpringSource. The light blue process is provided by the SpringSource and our own API.