No matter in which format your tabular data is: rows
will import it,
automatically detect types and give you high-level Python objects so you can
start working with the data instead of trying to parse it. It is also
locale-and-unicode aware. :)
Note: if you're using rows in some project please tell us! :-)
The library is composed by:
- A common interface to tabular data (the
Table
class) - A set of plugins to populate
Table
objects (CSV, XLS, HTML, TXT, JSON -- more coming soon!) - A set of common fields (such as
BoolField
,IntegerField
) which know exactly how to serialize and deserialize data for each object type you'll get - A set of utilities (such as field type recognition) to help working with tabular data
- A command-line interface so you can have easy access to the most used features: convert between formats, sum, join and sort tables.
Just import rows
and relax.
- Simple, easy and flexible API
- Code quality
- Don't Repeat Yourself
Directly from PyPI:
pip install rows
But if you have a strong heart, install the bleeding edge version:
pip install git+https://github.com/turicas/rows.git@develop
or:
git clone https://github.com/turicas/rows.git
cd rows
python setup.py install
The plugins csv
, txt
and json
are built-in by default but if you want to
use another one you need to explicitly install its dependencies, for example:
pip install rows[html]
pip install rows[xls]
You also need to install some dependencies to use the command-line
interface. You can do it installing the cli
extra
requirement:
pip install rows[cli]
And - easily - you can install all the dependencies by using the all
extra
requirement:
pip install rows[all]
You can create a Table
object and populate it with some data
programmatically:
from collections import OrderedDict
from rows import fields, Table
my_fields = OrderedDict([('name', fields.UnicodeField),
('age', fields.IntegerField),
('married', fields.BoolField)])
table = Table(fields=my_fields)
table.append({'name': u'Álvaro Justen', 'age': 28, 'married': False})
table.append({'name': u'Another Guy', 'age': 42, 'married': True})
Then you can iterate over it:
def print_person(person):
married = 'is married' if person.married else 'is not married'
print u'{} is {} years old and {}'.format(person.name, person.age, married)
for person in table:
print_person(person) # namedtuples are returned
You'll see:
Álvaro Justen is 28 years old and is not married
Another Guy is 42 years old and is married
As you specified field types (my_fields
) you don't need to insert data using
the correct types. Actually you can insert strings and the library will
automatically convert it for you:
table.append({'name': '...', 'age': '', 'married': 'false'})
print_person(table[-1]) # yes, you can index it!
And the output:
... is None years old and is not married
rows
will help you importing data: its plugins will do the hard job of
parsing each supported file format so you don't need to. They can help you
exporting data also. For example, let's download a CSV from the Web and import
it:
import requests
import rows
from io import BytesIO
url = 'http://unitedstates.sunlightfoundation.com/legislators/legislators.csv'
csv = requests.get(url).content # Download CSV data
legislators = rows.import_from_csv(BytesIO(csv)) # already imported!
print 'Hey, rows automatically identified the types:'
for field_name, field_type in legislators.fields.items():
print '{} is {}'.format(field_name, field_type)
And you'll see something like this:
[...]
in_office is <class 'rows.fields.BoolField'>
gender is <class 'rows.fields.UnicodeField'>
[...]
birthdate is <class 'rows.fields.DateField'>
We can then work on this data:
women_in_office = filter(lambda row: row.in_office and row.gender == 'F',
legislators)
men_in_office = filter(lambda row: row.in_office and row.gender == 'M',
legislators)
print 'Women vs Men: {} vs {}'.format(len(women_in_office), len(men_in_office))
Then you'll see effects of our sexist society:
Women vs Men: 108 vs 432
Now, let's compare ages:
legislators.order_by('birthdate')
older, younger = legislators[-1], legislators[0]
print '{}, {} is older than {}, {}'.format(older.lastname, older.firstname,
younger.lastname, younger.firstname)
The output:
Stefanik, Elise is older than Byrd, Robert
Note that native Python objects are returned for each row inside a
namedtuple
! The library recognizes each field type and converts it automagically no matter which plugin you're using to import the data.
Each plugin has its own parameters (like index
in import_from_html
and
sheet_name
in import_from_xls
) but all plugins create a rows.Table
object
so they also have some common parameters you can pass to import_from_X
. They
are:
fields
: anOrderedDict
with field names and types (disable automatic detection of types).skip_header
: Ignore header row. Only used iffields
is notNone
. Default:True
.import_fields
: alist
with field names to import (other fields will be ignored) -- fields will be imported in this order.samples
: number of sample rows to use on field type autodetect algorithm. Default:None
(use all rows).
If you have a Table
object you can export it to all available plugins which
have the "export" feature. Let's use the HTML plugin:
rows.export_to_html(legislators, 'legislators.html')
And you'll get:
$ head legislators.html
<table>
<thead>
<tr>
<th> title </th>
<th> firstname </th>
<th> middlename </th>
<th> lastname </th>
<th> name_suffix </th>
<th> nickname </th>
Now you have finished the quickstart guide. See the examples
folder for more
examples.
The idea behing plugins is very simple: you write a little piece of code which extracts data from/to some specific format and the library will do the other tasks for you. So writing a plugin is as easy as reading from/writing to the file format you want. Currently we have the following plugins:
- CSV: use
rows.import_from_csv
androws.export_to_csv
(dependencies are installed by default) - TXT: use
rows.export_to_txt
(no dependencies) - JSON: use
rows.import_from_json
androws.export_to_json
(no dependencies) - HTML: use
rows.import_from_html
androws.export_to_html
(denpendencies must be installed withpip install rows[html]
) - XLS: use
rows.import_from_xls
androws.export_to_xls
(dependencies must be installed withpip install rows[xls]
)
More plugins are coming (like ODS, PDF, SQLite, JSON etc.) and we're going to re-design the plugin interface so you can create your own easily. Feel free to contribute. :-)
Each plugin has its own parameters (like encoding
in import_from_html
and
sheet_name
in import_from_xls
) but all plugins use the same mechanism to
prepare a rows.Table
before exporting, so they also have some common
parameters you can pass to export_to_X
. They are:
export_fields
: alist
with field names to export (other fields will be ignored) -- fields will be exported in this order.
rows
exposes a command-line interface with the common operations such as
convert data between plugins, sum, sort and join Table
s.
Run rows --help
to see the available commands and take a look at
rows/cli.py
. TODO.
Many fields inside rows.fields
are locale-aware. If you have some data using
Brazilian Portuguese number formatting, for example (,
as decimal separators
and .
as thousands separator) you can configure this into the library and
rows
will automatically understand these numbers!
Let's see it working by extracting the population of cities in Rio de Janeiro state:
import locale
import requests
import rows
from io import BytesIO
url = 'http://cidades.ibge.gov.br/comparamun/compara.php?idtema=1&codv=v01&coduf=33'
html = requests.get(url).content
with rows.locale_context(name='pt_BR.UTF-8', category=locale.LC_NUMERIC):
rio = rows.import_from_html(BytesIO(html))
total_population = sum(city.pessoas for city in rio)
# 'pessoas' is the fieldname related to the number of people in each city
print 'Rio de Janeiro has {} inhabitants'.format(total_population)
The column pessoas
will be imported as an IntegerField
and the result is:
Rio de Janeiro has 15989929 inhabitants
Available operations: join
, transform
and serialize
.
TODO. See rows/operations.py
.
Create the virtualenv:
mkvirtualenv rows
Install all plugins' dependencies:
pip install --editable .[all]
Install development dependencies:
pip install -r requirements-development.txt
Run tests:
make test
or (if you don't have make
):
nosetests -dsv --with-yanc --with-coverage --cover-package rows tests/*.py
- odo
- OKFN's messytables
- OKFN's goodtables
- tablib
- pandashells (and pandas DataFrame)
- Lack of Python 3 support
- Create a better plugin interface so anyone can benefit of it
- Create
TableSet
- Performance: the automatic type detection algorithm can cost time: it
iterates over all rows to determine the type of each column. You can disable
it by passing
samples=0
to anyimport_from_*
function or either changing the number of sample rows (any positive number is accepted). - See issue #31
rows
uses semantic versioning. Note that it means we do
not guarantee API backwards compatibility on 0.x.y
versions.
This library is released under the GNU General Public License version 3.