Discussion: include statement data model in core ftm? #882

pudo · 2022-10-07T15:17:38Z

pudo
Oct 7, 2022
Maintainer

In some scenarios where followthemoney is used, there's a need to store contextual information at a per-value level. For example, in OpenSanctions, we want to record at what time we first saw an entity, and what dataset it was observed.

This has lead me to adopt a statement-based data model as the core for OpenSanctions, in which each row looks like this. These statements can be made by breaking down an entity proxy, and can then be re-constituted into an entity proxy.

Now, there's been a bit of a proliferation of this, in that @dchaplinsky has adopted it for thebeast and @simonwoerpel has adopted it for FtG, where he's got 500mn statements in Clickhouse using this code. Meanwhile I'm in the process of generalising this inside of nomenklatura.

In short: this is turning into a mess. So I'm wondering if we should have a followthemoney.statement module which contains a baseline implementation of a statement data model and an EntityProxy based internally on statements. It might go a long way in keeping these branches of FtM interoperable and also provide extra functionality to other future users.

Two concerns with this:

It is of absolutely no use to Aleph, unless Aleph radically alters its ways and decides to do data integration across collections
It will introduce some sort of notion of a dataset (that's a key column for any statement), which so far FtM has always avoided.

dchaplinsky · 2022-10-07T20:17:22Z

dchaplinsky
Oct 7, 2022

I'm happy to (discuss how to) align my implementation with existing ones. Aleph will pick this up someday.

…

On Fri, Oct 7, 2022 at 6:17 PM Friedrich Lindenberg < ***@***.***> wrote: In some scenarios where followthemoney is used, there's a need to store contextual information at a per-value level. For example, in OpenSanctions, we want to record at what time we first saw an entity, and what dataset it was observed. This has lead me to adopt a statement-based data model as the core for OpenSanctions, in which each row looks like this <https://github.com/opensanctions/opensanctions/blob/main/opensanctions/core/statements.py#L22-L44>. These statements can be made by breaking down an entity proxy, and can then be re-constituted into an entity proxy. Now, there's been a bit of a proliferation of this, in that @dchaplinsky <https://github.com/dchaplinsky> has adopted it for thebeast <https://github.com/dchaplinsky/thebeast/blob/main/thebeast/dump/statements.py> and @simonwoerpel <https://github.com/simonwoerpel> has adopted it for FtG <https://www.followthegrant.org/>, where he's got 500mn statements in Clickhouse using this code <https://github.com/simonwoerpel/ftm-columnstore/blob/main/ftm_columnstore/statements.py>. Meanwhile I'm in the process of generalising this inside of nomenklatura <opensanctions/nomenklatura#75>. In short: this is turning into a mess. So I'm wondering if we should have a followthemoney.statement module which contains a baseline implementation of a statement data model and an EntityProxy based internally on statements. It might go a long way in keeping these branches of FtM interoperable and also provide extra functionality to other future users. Two concerns with this: - It is of absolutely no use to Aleph, unless Aleph radically alters its ways and decides to do data integration across collections - It will introduce some sort of notion of a dataset (that's a key column for any statement), which so far FtM has always avoided. — Reply to this email directly, view it on GitHub <#841>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABAA4T7ELLHL43X3UNJEFTWCA5J3ANCNFSM6AAAAAAQ7WAZ5Y> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

pudo · 2022-11-10T10:23:45Z

pudo
Nov 10, 2022
Maintainer Author

FWIW I've now properly productised an implementation of this in nomenklatura: https://github.com/opensanctions/nomenklatura/blob/master/tests/statement/test_entity.py

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: include statement data model in core ftm? #882

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Discussion: include statement data model in core ftm? #882

pudo Oct 7, 2022 Maintainer

Replies: 2 comments

dchaplinsky Oct 7, 2022

pudo Nov 10, 2022 Maintainer Author

pudo
Oct 7, 2022
Maintainer

dchaplinsky
Oct 7, 2022

pudo
Nov 10, 2022
Maintainer Author