Replies: 2 comments
-
I'm happy to (discuss how to) align my implementation with existing ones.
Aleph will pick this up someday.
…On Fri, Oct 7, 2022 at 6:17 PM Friedrich Lindenberg < ***@***.***> wrote:
In some scenarios where followthemoney is used, there's a need to store
contextual information at a per-value level. For example, in OpenSanctions,
we want to record at what time we first saw an entity, and what dataset it
was observed.
This has lead me to adopt a statement-based data model as the core for
OpenSanctions, in which each row looks like this
<https://github.com/opensanctions/opensanctions/blob/main/opensanctions/core/statements.py#L22-L44>.
These statements can be made by breaking down an entity proxy, and can then
be re-constituted into an entity proxy.
Now, there's been a bit of a proliferation of this, in that @dchaplinsky
<https://github.com/dchaplinsky> has adopted it for thebeast
<https://github.com/dchaplinsky/thebeast/blob/main/thebeast/dump/statements.py>
and @simonwoerpel <https://github.com/simonwoerpel> has adopted it for FtG
<https://www.followthegrant.org/>, where he's got 500mn statements in
Clickhouse using this code
<https://github.com/simonwoerpel/ftm-columnstore/blob/main/ftm_columnstore/statements.py>.
Meanwhile I'm in the process of generalising this inside of nomenklatura
<opensanctions/nomenklatura#75>.
In short: this is turning into a mess. So I'm wondering if we should have
a followthemoney.statement module which contains a baseline
implementation of a statement data model and an EntityProxy based
internally on statements. It might go a long way in keeping these branches
of FtM interoperable and also provide extra functionality to other future
users.
Two concerns with this:
- It is of absolutely no use to Aleph, unless Aleph radically alters
its ways and decides to do data integration across collections
- It will introduce some sort of notion of a dataset (that's a key
column for any statement), which so far FtM has always avoided.
—
Reply to this email directly, view it on GitHub
<#841>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABAA4T7ELLHL43X3UNJEFTWCA5J3ANCNFSM6AAAAAAQ7WAZ5Y>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
0 replies
-
FWIW I've now properly productised an implementation of this in nomenklatura: https://github.com/opensanctions/nomenklatura/blob/master/tests/statement/test_entity.py |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In some scenarios where
followthemoney
is used, there's a need to store contextual information at a per-value level. For example, in OpenSanctions, we want to record at what time we first saw an entity, and what dataset it was observed.This has lead me to adopt a statement-based data model as the core for OpenSanctions, in which each row looks like this. These statements can be made by breaking down an entity proxy, and can then be re-constituted into an entity proxy.
Now, there's been a bit of a proliferation of this, in that @dchaplinsky has adopted it for thebeast and @simonwoerpel has adopted it for FtG, where he's got 500mn statements in Clickhouse using this code. Meanwhile I'm in the process of generalising this inside of nomenklatura.
In short: this is turning into a mess. So I'm wondering if we should have a
followthemoney.statement
module which contains a baseline implementation of a statement data model and anEntityProxy
based internally on statements. It might go a long way in keeping these branches of FtM interoperable and also provide extra functionality to other future users.Two concerns with this:
Beta Was this translation helpful? Give feedback.
All reactions