Skip to content

Language comparisons

Luni-4 edited this page Oct 23, 2020 · 7 revisions

Language Comparisons

We are interested in comparing some simple algorithms, written each in different programming languages, through the use of static metrics. A static metric is obtained parsing and extracting information from a source code without depending on any information deduced at runtime.

All considered metrics have been computed making use of a software written in Rust language and called: rust-code-analysis.
This software can receive in input either single files or entire directories, detect whether they contain any kind of code written in one of its supported languages, and output the resultant static metrics in various formats: textual, json, yaml, toml, plus a binary one called cbor.

In the next sections, we will describe how comparisons have been implemented and the choices adopted during the development.

Why have we chosen a software written in Rust?

Rust is an innovative programming language initially developed by Mozilla and currently maintained and improved by Rust Foundation.
Its main goal consists of allowing everyone to build reliable and efficient software.

Speaking of its main characteristics, Rust is multi-paradigm programming language focused on performance and safety, especially safe concurrency.
It can be used on different architectures with little effort and provide a good documentation for anyone who want to learn or improve their own knowledge of the language.
In addition, it is quite pervasive in the industrial world, indeed hundreds of companies around the world are currently using Rust in production today for fast, low-resource, cross-platform solutions. For example, software like Firefox, Dropbox, and Cloudflare use Rust. From startups to large corporations, from embedded devices to scalable web services, Rust is a great fit.

From our point of view instead, we have decided to adopt, and personally extend, a project written in Rust because of the advantages listed below:

  • Guarantees memory-safety and thread-safety, eliminating many classes of bugs at compile-time
  • Fast and memory-efficient
  • Few runtime checks
  • No garbage collector
  • Easily integrates with other programming languages
  • Useful and clear error messages
  • Good documentation

Why have we decided to modify and extend the json files produced by rust-code-analysis?

We have decided to modify the output produced by rust-code-analysis for the following reasons:

  • Change the names of the metrics which are not coherent with the ones present in scientific literature
  • Change the type of data associated to a metric. Indeed, rust-code-analysis returns floating point values instead of integers because aims at being as versatile as possible
  • Aggregate the metrics of each source code present in a directory within a single json-object containing not only the result of this aggregation but also the respective metrics of each file. This additional data allow to obtain a more general prospect on the quality of a project written in a determined programming language.

FIXME: In our experiment, data aggregation is NOT considered, so I don't know if we should mention it between the main reasons

Comparisons Algorithms and Languages

We are comparing 9 simple algorithms written each in 5 different languages. All implementations of these algorithms have been taken from this (https://github.com/greensoftwarelab/Energy-Languages)[https://github.com/greensoftwarelab/Energy-Languages] repository that has been chosen because it is actively maintained and whose algorithms are adopted by a great variety of other projects for tests and benchmarking purposes.

The considered algorithms, sorted out alphabetically, are:

  • binarytrees
  • fannkuchredux
  • fasta
  • knucleotide
  • mandelbrot
  • nbody
  • regexredux
  • revcomp
  • spectralnorm

All of them are contained in the Assets directory of our repository.

FIXME: pidigits can't be added because it is not implemented in Javascript and TypeScript, and the same goes for bubble_sort which is implemented in C, C++, and Rust only ---> should we remove them?

For what concerns the programming languages, we was restricted to use a limited number of 5 because only few languages are currently parsed by rust-code-analysis. Below a list of them sorted out alphabetically:

  • C
  • C++
  • JavaScript
  • Python
  • Rust
  • TypeScript

Json Structure of Computed Metrics

The types of metrics computed for each algorithm are described in the README of our repository (TODO: explain metrics in a detailed way within the paper).

We have set rust-code-analysis to export metrics as a json file. Then, through a Python script called analyzer.py, we have enriched the structure of each json file produced by rust-code-analysis such that it was possible to analyze the global metrics obtained aggregating metrics from different files contained in the same directory.
In addition, a json array has been added to this new json version containing all metrics computed for each file of a directory.

For our comparisons though, the additional global data computed by the analyzer.py script are not necessary at all, since the analyzed algorithms are processed one at a time and there is no correlation among them, practically they are all independent of each other.

Output json files are contained in the Results directory of our repository.

Comparison script structure

The Python script that executes the various comparisons is called compare.py. To simplify the entire comparison process, we have introduced the configuration concept.
A configuration is nothing less than a pair of different programming language versions of the same algorithm.

For each configuration, the script runs the following steps:

  1. Computes the metrics for the two files of a configuration calling the analyzer.py script
  2. Loads the two json files from the Results directory and compares them producing a json file of differences
  3. Deletes from the json file of differences all local metrics (the ones computed by rust-code-analysis for each subspace)
  4. Saves the json file of differences, now containing only global file metrics, in the Compare directory

The json file of differences is produced using a JavaScript program called json-diff that can be easily downloaded and built using the npm package manager.

Source Codes Resume

Name analyzer.py
Description Runs rust-code-analysis in order to compute the various metrics, formats the output json files in a certain way, and saves them in a determined directory
Reference https://github.com/SoftengPoliTo/SoftwareMetrics/blob/master/analyzer.py
Characteristics Analyzes the parameters passed as input to evaluate their correctness, contains some debug code to detect implementation errors in an easier way
Name compare.py
Description Executes the comparisons between various language configurations
Reference https://github.com/SoftengPoliTo/SoftwareMetrics/blob/master/compare.py
Characteristics Makes the difference between two json files and outputs the resultant json file

Input Algorithms Resume

The implementation of the input algorithms, with relative comments to the code, can be found on the Energy-Languages repository in the directories associated to the supported programming languages.

Name binarytrees
Description Allocate and deallocate many many binary trees
Reference https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/binarytrees.html#binarytrees
Name fannkuch-redux
Description Indexed-access to tiny integer-sequence
Reference https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/fannkuchredux.html#fannkuchredux
Name fasta
Description Generate and write random DNA sequences
Reference https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/fasta.html#fasta
Name k-nucleotide
Description Hashtable update and k-nucleotide strings
Reference https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/knucleotide.html#knucleotide
Name mandlebrot
Description Generate Mandelbrot set portable bitmap file
Reference https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/mandelbrot.html#mandelbrot
Name n-body
Description Double-precision N-body simulation
Reference https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/nbody.html#nbody
Name regex-redux
Description Match DNA 8-mers and substitute magic patterns
Reference https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/regexredux.html#regexredux
Name reverse-complement
Description Read DNA sequences - write their reverse-complement
Reference https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/revcomp.html#revcomp
Name spectral-norm
Description Eigenvalue using the power method
Reference https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/spectralnorm.html#spectralnorm