forked from sjsrey/seapysal
-
Notifications
You must be signed in to change notification settings - Fork 0
/
seapysal.tex
569 lines (498 loc) · 33.3 KB
/
seapysal.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
% Created 2019-01-17 Thu 09:46
% Intended LaTeX compiler: pdflatex
\documentclass[11pt]{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{graphicx}
\usepackage{grffile}
\usepackage{longtable}
\usepackage{wrapfig}
\usepackage{rotating}
\usepackage[normalem]{ulem}
\usepackage{amsmath}
\usepackage{textcomp}
\usepackage{amssymb}
\usepackage{capt-of}
\usepackage{hyperref}
\usepackage{minted}
\usepackage[ttscale=.875]{libertine}
\usepackage{sectsty}
\usepackage[authoryear]{natbib}
\sectionfont{\normalfont\scshape}
\subsectionfont{\normalfont\itshape}
\author{Sergio J. Rey}
\date{\today}
\title{PySAL: The First Ten Years}
\hypersetup{
pdfauthor={Sergio J. Rey},
pdftitle={PySAL: The First Ten Years},
pdfkeywords={},
pdfsubject={},
pdfcreator={Emacs 25.2.2 (Org mode 9.1.13)},
pdflang={English}}
\begin{document}
\maketitle
\section*{Introduction}
\label{sec:org8a5fc20}
I'm honored to have been selected to give the \emph{Spatial Economic Analysis
Plenary}.\footnote{This paper is based on the Spatial Economic Analysis Plenary Lecture
given at the 58th Congress of the European Regional Science Association, Cork,
Ireland, August 29, 2018.} I'd like to use this opportunity to reflect on my experiences as
co-founder and lead developer of the Python Spatial Analysis Library: PySAL. I
do so as it provides an interesting window on large-scale trends shaping
science, that I think our community of regional scientists need to be more
cognizant of. My goal in doing so is to uncover what I hope are useful insights
as to the importance of the role of community in the functioning of not only
academic fields but also in generating new science.
The talk will be a blend of two general themes. The first theme will focus on
PySAL and cover its history, the motivation for the project, its structure, and
how the library is used. With that point of departure, I will then
investigate some larger trends that I think are important. These
include the role of the open source movement and the opportunities and
challenges it affords regional science, the related development of the open
science concept and the challenges it is addressing, the importance of
community in fostering innovation in both science and academia, and
technological innovations that are growing out of open source and open science
which are fundamentally reshaping pedagogy as well as the way we do and report
research.
I recognize the structure of this talk is different from the normal
presentation of a new set of methods that one might expect in the literature.
But I hope that by blending the coverage of the PySAL together with these
broader themes, there will be something in it for everybody.
This talk makes three contributions. Through the lens of my
experiences on open source projects inside of academia, the underlying
institutional barriers that have isolated regional science from the open source
revolution are illuminated. Second, using the specific case of the Python
Spatial Analysis Library (PySAL), changes in the way research projects are
organized to reap the gains that the open source model offers are examined.
Finally, the paper identifies a number of opportunities for regional science to
reinvent itself to claim new territory in the modern era of big data.
I first provide an overview of PySAL in terms of the original motivation for
the project and its early history. The structure of PySAL and the key spatial
analytical functionality it offers are then discussed in Section 3. In section
4, I shift the focus to explore some of the wider currents that my experiences
on PySAL and the engagement with open source have identified. These include the
rise of open source, open science, open education, and open community
movements. Section 5 then brings these trends home to our discipline of
regional science where some thoughts on the evolution of our community and the
generations of regional scientists identified and the number of opportunities
that stand in front of the regional science community.
\section*{PySAL Origins}
\label{sec:org5c752ab}
PySAL began its life as many open source efforts do: it was scratching
an itch that its creators had. In this case the creators were myself and my
long-time colleague Luc Anselin. We have been working in spatial econometrics
using the matrix language GAUSS. Around 1993, I was exposed to the Python
language through my interactions with the Linux community. After a short span of
time with the Python language it became apparent that it offered some major
advantages over GAUSS as a language for supporting empirical research. Being a
matrix language, GAUSS was excellent for expressing the mathematical structure
of different econometrics specification, and for transcribing what one would
read in an econometrics textbook into practical code. However, in most empirical
research projects the estimation of models represents a tiny fraction of the
overall work effort. What often goes unsaid, but is painfully clear, is that
much of the effort in a research project is data gathering, munging,
harmonization, and transformation. And it was here where GAUSS fell down on its
face and Python shined brightly.
Python excelled at string processing and in dealing with heterogeneous data.
And although, at the time, the numerical support in Python was much thinner
than it is today there really good third-party modules at that time, for
example Numeric was the precursor to today's workhorse numpy. Numeric had a
syntax very similar to Matlab's and, in turn, GAUSS which made the transition
feasible. The gains Python offered in terms of the data integration and management
component of the research process more than offset the relatively weaker
Numerical libraries. And since then the numerical stack in Python has become
excellent.
As such I jumped to Python very early. Python was not only a fantastic
replacement for GAUSS in terms of doing econometric work, but it had wide scope
for all kinds of purposes. I used it to write grading programs, generate
indices for textbooks, and even wrote an entire journal management system which
I used as my term of editor of the International Regional Science Review.
Python quickly became my tool choice for new duties I had to carry out as an
academician.
With the move to Python and seeing the changes it led to in our research
productivity we started thinking of a collaborative project that built on the
work Luc and his students were doing in spatial regression in Python, a package
called pySpace, and my work on a package called STARS \citep{rey_stars:_2006-3}.
Although the two efforts were targeting different areas of the research stack,
spatial econometrics on one hand and space-time data analysis on the other
hand, the programs relied on some common constructs in the form of spatial
weights and certain types of representations and algorithms. We had been
implementing these separately on own two teams and we thought that, by
pulling our development resources and implementing these features in the best
way possible score to the best of our abilities, we could exploit economies of
scale. The idea was to build a library of advanced spatial analytical
functionality that each of our individual projects could then rely on to develop
specialized applications and, more importantly, a wider Community could
utilize.
And that is how PySAL was born.
A second motivation for PySAL stemmed from the recognition that Python was
starting to make inroads in other sciences, primarily bioinformatics,
astronomy, and physics, yet had seen only limited adoption in regional science
and geographic information science. Given the power of the language we had
experienced in our own use, we thought this was a missed opportunity and we
wanted to do what we could to fuel the uptake of Python in our home
disciplines. At the time, there was very limited Python code for doing
geospatial analysis, No shapefile readers were available nor anything for
spatial data analysis or spatial econometrics.
We found Python to be an excellent language to facilitate rapid prototyping and
testing of ideas. It is a simple language to pick up but that does not mean it
is a toy language. Not only does it excel from a pedagogical perspective, but
can be used to build applications that scale in an impressive fashion. For
example Google makes heavy use of Python. There are also prominent scientific
projects that rely on Python. For example, the LIGO project\footnote{\url{ https://www.ligo.caltech.edu/}} that recorded
the collision of two black holes for the first time or detected gravitational
waves made heavy use of Python and its workflow.
The original birth of PySAL in the sense of our pooling code together to start
to build the library probably dates to sometime around 2007. But as
usual things often take longer than one plans for. The first formal release of
PySAL was in July 2010 which came about as both Luc and I had moved to Arizona
State University. That move made it clear that geography matters because once
we were situated in the same institution it was much easier to organize the
project.
Initially we started with a six-month release cycle for PySAL, which aligned
very nicely with the academic calendar. We were able to keep to this for the
first six years of the project. We are both very proud of that record of
releasing every six months for the first six years of the project, on top of
all the responsibilities that one has in academia. Looking back, I think this
is feasible because PySAL affords many opportunities for structuring
independent studies and thesis topics, as well as to organize seminar/studio
courses around. I think the same holds for open source projects in general, and
I would expect (and hope) that academia becomes home to more such projects.
We were able to leverage these opportunities to benefit our teaching and
research goals but also to help the project move forward. I'd like to think
that we saw this coming in the early days of the library, but it actually is
something that emerged with time.
\section*{PySAL Structure}
\label{sec:orgc598589}
The original design of PySAL was to have a single monolithic library with
subcomponents that addressed different types of spatial analysis. This
facilitated the easy installation of the package for end-users. Another guiding
principle to minimize the complications of the install was fairly restrictive
use of dependencies. This ran counter to the normal development philosophy in
the open source community where other libraries that had functionality should be
relied upon. However, very often in the early days of the library
those dependencies were challenging to install, particularly for the target
audience of PySAL users who were not developers. What this meant for the
developers of PySAL is that we had to roll our own in many cases.
These two features of PySAL served us well in the early days of the project.
But as time has passed, the Python spatial analysis stack has matured, we are
now at a point where we can start to replace some of the Python implementations
that the early PySAL team did with more modern and specialized packages for
geoprocessing, file reading, and map projections.
Over time we have also come to recognize that the single monolithic
architecture of the library, while easing installation, had a number of
unintended side effects on the developers. Many of the features in the library
were buried deep in lower-level packages. This hindered discoverability of
those packages. This meant that the developers of those packages were not
getting the recognition that they deserved. This is particularly important in an
academic environment where the time dedicated to making these contributions was
essentially ignored in tenure and promotion cases. Moreover, the limited
discoverability also impacted end users who were not aware of the
functionality.
We recently decided to refactor the library to address these two limitations.
This has been a major change in the library, taking on the order of two years
to implement. The refactoring is recasting PySAL as a meta package which brings
together a federation of spatial analytical modules. This has several
advantages. Users who may want to focus on, say, spatial econometrics
may have no need to have all of PySAL so now they can install spreg as its own
package. The refactoring also increases discoverability as spreg is its own
active stand-alone package, and is no longer buried deep inside PySAL. With
this increased visibility, adoption increases, leading to greater recognition
for the developers as well as more feedback from users and, ultimately,
improvements to the package.
From a development perspective, the refactoring also increases the speed at
which were able to release new functionality in the individual packages.
Previously, under the monolithic model, anytime an enhancement was
added to one piece of PySAL a large number of integration tests would be run to
ensure that no side effects were triggered by the change. These tests could
take on the order of 20 minutes which tended to be frustrating
to the developers. Now, with the new packaging model, the developers can run
tests that are focused only on their package at hand, and these run much more
quickly.\footnote{There is a meta-package this is responsible for testing the
integration of all the PySAL packages.} This increases the cadence of the development for both
the individual packages as well as the meta-package.
The other benefit of this model is that end users who still want everything in
the PySAL federation can install the meta-package and should notice no
difference from their use of the monolithic PySAL package. In other words, we
support two different ways for users to interface with the library: users can
get everything in one shot through the meta-package, or they could go the ala
carte route and pick specific packages in mix and match them to support a
specialized workflow.
Since adopting this model, we have also seen benefits in the growth in the number
of packages coming into the system. So we are pleased to see that a lowering the
onboarding cost for new developers has resulted from this refactoring.
Prior to the refactoring there was another major shift in the PySAL Library. We
converted from Python2 to Python3 over the course of about a year. Earlier the
Python programming language had released a 3.0 version which was not backwards
compatible with Python 2. Our approach was to develop in Python 2 to but write
converter scripts which would automatically refactor the codebase to Python 3
if a user required Python 3. This was a major effort to implement, and was
actually a short-term solution, and a painful one at that. While it supported
users who switched to Python 3, it did not allow us to fully exploit the new
features in Python 3 as the converted code from Python 2 to had to be backwards
compatible. In other words, there are things that one can do in Python 3 that
one cannot do in Python 2, so in order to maintain 2.0 backwards compatability
we were not be able to take advantage of this Python3 enhancements. With the
refactoring, we have decided to make future versions of PySAL 3.0 only. Users
requiring support for Python 2.0 will still be able to use legacy PySAL that
will be supported, but only for bug fix releases.
The reorganization of PySAL is along four groups of packages that address the
certain type of spatial analysis: explore, model, viz, and lib. Lib is the core
package and it is here where we handle file-io, spatial weights, and
geoprocessing. All of the other packages in the Python ecosystem import where
are dependent upon lib.
Under the explore family of packages we have ESDA which supports exploratory
spatial data analysis in the form of global and local test for spatial
correlation as well as rates smoothing. GIDDY for geospatial distribution
Dynamics implements classic Markov and spatial Markov models for longitudinal
spatial data along with measures for spatial income mobility and other types of
intra-distributional change. In addition, explore includes spaghetti which is
for spatial analysis on networks, and pointpats which supports you
physically analysis of planner point patterns.
The viz group of packages includes splot, a new packages providing common applications programming interface (API) for lightweight visualization
functionality on top of the other PySAL packages. mapclassify is a second
component of the visualization layer that implements a large number of
classification schemes for choropleth mapping, and also supports updating and
streaming type data. Rouding out the viz group is legendgram, a novel approach
to develop being and representing the classification underlying a choropleth
map.
The third cluster of packages fall under the model heading. The workhorse here
is spreg which implements modern methods of spatial econometrics and has been a
key part of PySAL from day one. As part of the refactoring we have seen much
growth in the model space, as new packages that have been added include mgwr
implementing multiscale geographically weighted regression; spint for
estimating spatial interaction models, such as the production-constrained or
consumption-constrained gravity models; spvcm for spatially-correlated
multilevel models; and spglm a package for fitting sparse general linear models
(GLM).
Upstream packages that want to use pieces, but not all, of PySAL now have much
more flexibility. The most prominent case of this is geopandas\footnote{\url{https://geopandas.org}}
which prior, to the refactoring would import all of PySAL to have access to
the map classification routines. Now as part of the refactoring , the larger
import is no longer necessary and geopandas can instead import mapclassify
directly so that the dependency footprint is much thinner.
The refactoring has been largely successful, but there are some changes that
longtime users of PySAL should be aware of. First the region module which
implemented classical and spatially constrained clustering is no longer part of
the meta package. This is due to the development of the standalone package now
called region which has a heavy set of dependencies that were produced as part
of a Google summer of code project. For the first meta release, we have not
included region, but users can still install it separately. We have plans to
refactor region so that it can be integrated into the PySAL meta-package more
easily.
\section*{Wider Currents}
\label{sec:org701c4d3}
PySAL has reached the state that it has because of being embedded in a wider set
of developments. There are three currents that have benefited the project. These
pertain to the rise of the open-source movement, the development of the open
science movement, and the increasing recognition of the importance of scholarly
community.
The open source revolution has fundamentally impacted not only science but most
aspects of society. Although we may not directly recognize it, the regional
science community has benefited from the open source movement. There are two
freedoms underline the notion of free software. First, is to so called "free
beer" freedom. This means that there is no monetary cost involved in acquiring
software: it is available for anybody who can download it. This has
particularly important to universities given tight budgets. But this also has
profound pedagogical benefits in that students are now no longer tethered to a
lab computer holding licensed software. They can now install the software on
their own personal computers and time-shift their activity which facilitates
greater engagement.
The second, and arguably the more important, freedom is the "free as in free
speech" freedom. In general terms the open source licenses allow users to
modify the code directly. From a scientific perspective, this is critically
important as we will see later, the rise of the open science movement stresses
the importance of replication and reproducibility which become all but
impossible without access to the scientific source code. The free speech aspect
also has important implications for pedagogy in that now users can inspect the
source code and demystify the operation of an algorithm. This form of learning
provides for a deeper engagement of a student with the underlying computational
concepts.
The ability to replicate and reproduce previous research is fundamental to
the advancement of science. But building on the shoulders of giants is not
possible unless we have access to the shoulders. A slight variation on the
theme is that open science, by providing access to the source code and data
underline previous studies, can accelerate scientific discovery. As of now those
source materials can be acquired in a much more expeditious fashion which fuels
subsequent studies. This does require a mind shift on behalf of
the scientist who takes the extra steps to release their software
and data under open source terms.
It is not only our research production functions that can benefit from adopting
open science practices, but are educational efforts can also be enhanced if we
borrow from open science and open education developments. In teaching regional
science, there is so much duplication in individual scholars producing the
courses as part of their teaching mission. Everyone goes on it alone and there
is limited sharing of materials. At best, perhaps syllabi are exchanged and
maybe the occasional PowerPoint is borrowed, but there are no formal mechanisms
or any sense of infrastructure to facilitate the sharing. This is changing and
other disciplines where entire courses from lecture notes problem sets are
increasingly being posted on open source GitHub repositories. Releasing these
materials under Creative Commons license works to protect the intellectual
contributions of the original authors and they are very flexible licenses in
the sense that they allow for mashing up of the materials with new materials
and derivative works.
This type of model is very exciting if one thinks about being able to spend
time on an enhancement and building upon the shoulders of a great teacher
rather than having to reinvent many teaching wheels. Our courses would be much
better for if we could start to think about community-based educational
materials
The third larger current that PySAL has swum in reflects the growing emphasis
placed on the health of a community associated with a project. Here questions
about the exclusionary nature of disciplines have been at the forefront many of
the open-source meetings that I've attended in past. This has been a highly
educational process for me, as I was largely ignorant about the cost to our
science of explicit and implicit biases. These biases can lead to different
types of barriers to potential community membership. Some of these barriers
have been long-standing and are not easily removed, but with sincere and
prolonged effort, I have seen other communities make major strides in
redressing these barriers.
\section*{Bringing it home to regional science}
\label{sec:org2bffd42}
The academic world that I grew up in as a young regional scientist is
substantially different than what is emerging now. And part of that emergence
is due to the rise of the open-source movement and the changes it has induced in
the way science is being organized. There are some opportunities here for
regional science.
One key distinction between academia when I was a junior professor and now is
that the reward structure is changing. It was very difficult to get recognition
for software development contributions. What mattered were journal articles and
grants and contracts for promotion cases. As such there was no incentive to
pursue those activities and unsurprisingly scientific software for regional
science (and all science for that matter, were under furnished. All the same, I
worked on PySAL and related open source projects because I saw the benefits
from my own personal research agenda that these allowed for. And I was
convinced that these were important activities for me to spend my time on.
Others at the time others felt the same way. Jim LeSage was actually doing open
source before the term was coined. By releasing his spatial econometrics
toolbox\footnote{\url{https://www.spatial-econometrics.com/}} open to researchers, Jim played a major role in stimulating the growth
of spatial econometrics.
Paul Waddell's work on the UrbanSim project \citep{waddell_urbansim:_2002} is
another exemplar of first generation open source regional science. I remember
meeting with Paul at 2001 WRSA meeting in Palm Springs and discussing the
issues involved in moving from UrbanSim from Java to Python. That switch to
Python and the explicit open source model for UrbanSim have been a major
contributors to the project's success. Clearly, it is an excellent modeling
system which is important for its scientific application, but the open source
dimension has allowed others to be engaged in its enhancement and evolution, as
well as to help drive the adoption of UrbanSim throughout the world.
It is interesting to contrast the environments these first generation of open
source regional scientists faced with those that are emerging now. To do so, I
highlight the work of three members of this new generation: Dani Arribas-Bel,
Levi Wolf, and Geoff Boening. These individuals are prominent developers on
high-profile open source projects and have been very creative in positioning
their open source contributions into their formal academic profiles.
Dani has been very generous in posting his Geographic Data Science course\footnote{\url{https://darribas.org/gds16/}}
materials on his website and releasing them under Creative Commons
licenses\footnote{\url{https://creativecommons.org/}}. This is incredibly helpful to individuals who are developing
similar courses in that those materials are available and do not have to be
reinvented. Moreover it is possible to contribute enhancements back to Danis
courses resulting in a stronger set of materials for future iterations of the
course. Dani has been a core developer on the PySAL project where he became
introduced open source practices and creatively adopted them to his teaching
duties.
Levi Wolf is also a core member of the PySAL development project and has made
major contributions to not just to PySAL but other packages in the urban and
regional software ecosystem. Chief among these is CenPy\footnote{\url{https://github.com/ljwolf/cenpy}}, which is an
open-source package that allows a researcher to interface with the census api.
Geoff Boeing is a a third prominent member of this new generation who has
developed the impressive package OSMnx\footnote{\url{https://github.com/gboeing/osmnx}} that facilitates the
construction, analysis, and visualization of street networks from
OpenStreetMap.\footnote{\url{https://www.openstreetmap.org}} It is interesting to note that Geoff worked with Paul
Waddell at Berkeley.
In all three cases, we see examples of young regional and urban scientists
being exposed to first generation open source projects and then blazing new
paths by placing their open source contributions as first class citizens in
their evaluation and tenure cases. I think we are fortunate that these
individuals are doing this, and that academic institutions are starting to
recognize and reward these contributions. As this continues to grow, I think we
as a community can only benefit as it will bring more members into the
discipline as well as improve existing packages and lead to new tools.
These high-profile packages and contributions have brought Dani, Levi, and
Geoff increasing recognition as emerging leaders in open source spatial and
urban analysis. I'm very happy to see these developments as it was never
apparent to me that open source would actually succeed in the way it has inside
academia. I distinctly remember being told by senior colleagues when I was
working on earlier versions of PySAL and the package STARS, that developing
tools that are used for research is not research. "You need to be writing
papers". My colleagues were being brutally honest with me and were trying to
reign in my idealism so that my efforts were more aligned with the realities of
promotion and tenure cases at the time. And it is important to note here, that
I was really fortunate to be at places where most colleagues were supportive of
this work. I often wonder how many of my generation were not so fortunate and
did not have the possibility of using some of their research time to do this
kind of work.
Moreover, the climate surrounding open source has radically changed since I was
starting in academia. Back then, Microsoft was openly hostile to open
source.\footnote{See the so called Halloween Documents \url{http://www.catb.org/esr/halloween/}.} Contrast this with Microsoft's recent development of the
Linux subsystem which allows users run native Linux command-line tools directly
on Windows.
Indeed, I'm optimistic that the tide has turned and we will see more open
source regional scientist as we move forward. Being a geographer, however, I
can't fail to notice that there is spatial heterogeneity in this uptake. I was
struck by the reception of this talk at the ERSA conference. There was a
genuine enthusiasm in the audience for these ideas. While it is the first time
I've given this talk, I certainly have mentioned some of these themes elsewhere
in papers \citep{Rey17_codeastext,rey_open_2014,Rey_2009} and conferences in the
states. It could be the difference between the excited response at ERSA
and the more subdued response in the states may reflect differences in the
level of adoption of open source practices in the two regions, with adoption
being relatively more advanced in the states and thus the ideas more widely
accepted. If true, this would suggest European regional science is ripe for an
enhanced engagement with open source practices.
I also want to point out that PySAL itself was first announced to the formal
academic world in a regional science journal \citep{rey_pysal:_2007-3}. Yet,
the uptake of the library has been much more widespread in the GIScience world.
I think this reflects the latter being more engaged with developments in
machine learning and data science more broadly, while regional science as a
field has been fairly slow to explore these areas.
What can we as a academic community do to enhance the adoption open source
practices? We can do a lot. Some of it we already are doing, and I think we
should simply continue and enhance these efforts. For example, the NARSC
meetings have been offering regular workshops on PySAL and other packages
for the past five years. The number of people taking these has
continued to expand and, increasingly, the participants are asking for
multi-day workshops. So the demand is clearly there. This suggests that we
should be thinking about more offerings at the regional and international
regional science meetings.
The second thing that we can do is to be more welcoming to software development
pieces in our regional science journals. As I said we have already been doing
this in fact the first paper describing PySAL was published in the Review of
Regional Studies in 2007. But this has been a rare exception, and since then
academicians working on open source software have been looking at different
outlets to report these contributions such as the Journal of Open Source
Software.\footnote{\url{https://joss.theoj.org/}} While these outlets do provide the authors with academic credit for
their contributions, their impact on the field of regional science is limited
since these journals are not widely read by academic regional scientists. I
would think that our home journals could see this is an opportunity for new
types of materials and reinventing their branding in the new era of
data science and machine learning.
It is clear that the phrase "data is the new oil" has captured the imagination
and spirit of the data science era. And while it is true that data is incredibly
valuable to internet companies, I would argue that it is analytics that
increases the value of that data. Put another way, if data are the new oil then
analytics are the new refineries. And it is here where regional science has huge
opportunities. We are all about analytics in the form of models. But I think we
need to re-brand ourselves, we bring increasing rigor to the analysis of data in
the urban and regional problem domain. Companies are starting to rush in
to address this market. However, their underlying analytic frameworks are often
proprietary (and therefore of unknown scientific validity), simplistic, or both.
A prominent example of where we are missing opportunities is to compare
fantastic visibility of the Gapminder\footnote{\url{https://www.gapminder.org}} project by
Hans Rosling and colleagues which came up with innovative visualisations of
international inequality at the country scale. Contrast this with the massive
amount of work that has been done on the question of regional inequality but
a complete lack of any high-profile visualization capturing the public's
attention to the critical nature of this issue. I think this is low-hanging
fruit that could be grasped by a group of regional scientists to help put
us back on the radar screen
I'm glad to report that I am not Don Quixote here when it comes to the notion
of the importance of analytics. My colleague Alan Murray in his 2017 WRSA
presidential address \citep{Murray_2017_ars} actually spoke about the need for
regional analytics. We have the raw material, it is a matter of organizing the
community around these initiatives. I'm fully confident we are capable of doing
this and I'm very optimistic that we will do so and create an enhanced a
more relevant regional science.
\bibliography{../../bibliography/references}
\bibliographystyle{apa}
\end{document}