Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StatementListChanger #157

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add StatementListChanger #157

wants to merge 1 commit into from

Conversation

thiemowmde
Copy link
Contributor

@thiemowmde thiemowmde commented Nov 7, 2016

  • This wraps the last remaining usage of the ByPropertyIdArray class. This is meant as a migration path. First, all relevant tests from ByPropertyIdArrayTest should be carried over. Then the implementation can be changed to not rely on ByPropertyIdArray any more. ByPropertyIdArray can be removed then.
  • I decided to not reimplement certain features of ByPropertyIdArray, especially it's magic behavior of reordering groups.
  • This also adds a few service methods that are already in use in different places in the Wikibase.git code base, but with individual private implementations.

@thiemowmde thiemowmde force-pushed the statementListChanger branch 2 times, most recently from fdfbd55 to a743d5c Compare November 8, 2016 11:20
@thiemowmde thiemowmde changed the title [WIP] Add StatementListChanger Add StatementListChanger Nov 8, 2016
* @license GPL-2.0+
* @author Thiemo Mättig
*/
class StatementListChanger {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels to me like this should not be a service, but rather a builder or cursor style object: it should wrap a StatementList it gets in the constructor, and offer specialized operations on that StatementList. Perhaps the name should indicate what kind of extra operations these are.

Copy link
Contributor Author

@thiemowmde thiemowmde Nov 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very open for suggestions.

I definitely do not want this to be a set of static methods. This is something I would have done a few years ago. I learned. :-)

A builder does not work because I definitely want this to manipulate the original object, not create new ones.

Passing the StatementList via the constructor instead of each method does not change much, on the one hand. On the other hand it makes it much harder to replace this implementation with an other one. You can not simply make this an interface and inject an other implementation. You do not know the StatementList object on construction time. This approach would need a factory.

In conclusion I still think my current approach is the best.

the name should indicate what kind of extra operations these are.

What do you mean with "extra operations"? What is not obvious enough from the method names?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method names are fine, but the class name is very generic. "Changer" could be anything. The methods mainly relate to groups, so maybe the class should have "Groups" in the name. But I can't think of anything nice offhand. So whatever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a StatementGrouper interface with multiple implementations (including a ByPropertyIdStatementGrouper), as well as a ByPropertyIdArray (that's what we need to get rid of) and a ByPropertyIdGrouper. I wanted this here to be different from these.

This is a layer of code that works with a StatementList while being aware of the concept of "grouping by property ID". ByPropertyIdAwareStatementListChanger?

@thiemowmde
Copy link
Contributor Author

Pinging @Benestar for the sake of completeness, because he spend a lot of time working on this, but we never managed to use one of his approaches. See wmde/WikibaseDataModel#479 and wmde/WikibaseDataModel#507, to name the most relevant ones.

@JeroenDeDauw
Copy link
Contributor

I'd put clear in StatementList. While StatementList states "Does not provide complex modification functionality.", clear is hardly complex.

If you change groupByProperty to a getGroupedByProperty, then it also seems to fit into StatementList.

That just leaves the group related stuff. I remember that how to best implement this functionality was an old open question already two years ago, at which point there was some investigation into introducing some kind of StatementGroup class. ... found it :)

sg

Start of that thread: https://lists.wikimedia.org/pipermail/wikidata-tech/2014-October/000660.html

Can't find anything about the discussion I mentioned in that mail though; too ancient history... Anyway, something like StatementGroup is what I'd look into for implementing such behaviour.

@thiemowmde
Copy link
Contributor Author

We had a longer discussion about clear methods a while ago. My main argument was and still is: Wikibase is all about stable identifiers. It should not be possible to "re-purpose" existing Entities by cleaning and replacing it's contents. Such code should not exist. What we do with Entities is to create them, add, edit and remove individual elements. But we never replace or remove entire sets of Labels, Descriptions, Aliases, Fingerprints, SiteLinks or Statements. The only reason to do this is when test code is written in a way that it reuses existing Entities (which should not be done), or as a temporary workaround for something.

At the moment there is not a single clear in the entire DataModel because of this.

Personally I do not like the idea of replacing the wonderful trivial StatementList with something more complex, because this complexity is almost never needed.

@JeroenDeDauw
Copy link
Contributor

JeroenDeDauw commented Nov 15, 2016

I see your point about clearing entities being weird. That does not automatically mean having such a method in StatementList, which does not know about entities, is also weird, or as weird. [*] I'm wondering what the clear method being added to this changer class is for now (this is a question). And I'm suspecting that if it being in StatementList is weird, that it being in StatementListChanger is going to be just as weird, plus the feature envy.

[*] Having such a method in StatementList and that being accessible in entities does not mean we think it often makes sense for people to use this. Something making sense depends on the context, and should typically be judged there. We can't really prevent users of code doing insensible things with it. For clearing statement lists, this can be done regardless of having the clear method, as this changer class demonstrates.

Personally I do not like the idea of replacing the wonderful trivial StatementList with something more complex, because this complexity is almost never needed.

I guess you are talking about the StatementGroup stuff here. Is that correct? If so, I'm not proposing to replace StatementList. It does its job well, and I agree with you that adding something like the group stuff to it increases complexity while most consumers of this code don't care about this functionality. Hence the suggestion for a new thing that holds the group related code.

@thiemowmde
Copy link
Contributor Author

thiemowmde commented Nov 15, 2016

[…] being in StatementListChanger is going to be just as weird […]

Indeed. Can't argue with that. ;-) The only difference is that it is in an other component (Services instead of DataModel itself) and therefore less visible, which makes what I just explained above so much more obvious. For example, nobody will start using this in test setups.

PS: I'm fine with removing clear entirely, if it helps getting approval.

I guess you are talking about the StatementGroup stuff here.

Yes.

Hence the suggestion for a new thing that holds the group related code.

Isn't this what I'm doing here? A new thing that knows how statements are grouped by property? What do you suggest instead?

If you change groupByProperty to a getGroupedByProperty […]

Turning the code I'm proposing here into getters would not help much. We are usually passing Entities around, not StatementLists. We would need to call StatementListHolder::setStatements, which is deprecated because of what I explained above, and the main reason why I came up with a "changer" that just does what needs to be done.

@JeroenDeDauw
Copy link
Contributor

For example, nobody will start using this in test setups.

What kind of bad test code do you image would be written if this method was readily available? If I'm not mistaken it was used in some really shitty tests, though those where not shitty because of that method, and I really hope no one is still writing tests in that manner.

Isn't this what I'm doing here? A new thing that knows how statements are grouped by property? What do you suggest instead?

I was thinking of an approach different in two ways:

  • A class responsible only for the grouping stuff (so you could have a StatementListGrouper if you only want this part)
  • A class that holds data and associated behavior. So for instance an actual StatementGroup and StatementGroupList. Perhaps exposed in entities, perhaps not. Perhaps used by them internally, perhaps not.

Turning the code I'm proposing here into getters would not help much. We are usually passing Entities around, not StatementLists. We would need to call StatementListHolder::setStatements

Thanks for explaining, I was not considering you'd want to use this for modifying entities.

We would need to call StatementListHolder::setStatements, which is deprecated because of what I explained above

What explanation are you referring to? Is the explanation that you don't want people to be able to clear the statements by using setStatements?

I'm very dubious about this whole Holder stuff. In the scenario you where thinking off, what would the structure of the code be? Something like this?

function doSomeStuff( StatementListHolder $holder ) {
    // stuff?
    $this->statementChanger->groupByProperty( $holder->getStatements() );
    // stuff?
}

What's the problem with

function doSomeStuff( StatementListHolder $holder ) {
    // stuff?
    $holder->setStatements( $holder->getStatements()->getGroupedByProperty() );
    // stuff?
}

While that still has the weird Holder stuff, it's clearer what's going on here (in the former case it's not obvious the $holder ends up with a new state), and the property grouping code does not need to suffer from feature envy.

@thiemowmde
Copy link
Contributor Author

thiemowmde commented Nov 15, 2016

$changer = new StatementListChanger( $entity->getStatements() );
$changer->groupByProperty();

Done. No setter. No "holder".

@JeroenDeDauw
Copy link
Contributor

JeroenDeDauw commented Nov 16, 2016

I was asking specifically about the holder scenario that you brought up. You have not answered my questions about that, so now I still don't know what's going on there. I went through the trouble of outlining stuff about the grouper approach and am a disappointed you also did not reply to that.

As for the code you provided now, I certainly agree it is more straightforward. That is due to the lack of the holder, not because of the StatementListChanger. My concern about the readability remains, though only partially. You can still do this instead, though the need is no longer as big (readability wise).

$entity->setStatements( $entity->getStatements()->getGroupedByProperty() );

However in the way you wrote that code, another issue which I was wondering about manifests itself. If you construct the StatementListChanger directly, you have static binding on it and its methods. It's not really different than calling a bunch of namespaced global functions, apart from it superficially looking OO. You can inject the StatementListChanger, though then the readability (which is already not good) suffers.

@thiemowmde
Copy link
Contributor Author

I don't understand what you want to know.

You small example looks nice, but does have the same problem as almost every other possible approach (including what you wrote about "groupers"): it calls a setter that should not be there, and is already deprecated. I don't know what else to say about this.

I went through multiple stages with the code in this patch:

  • Methods that don't manipulate the original object but return new ones. We already tried this multiple times (see the linked patches above). It never worked out.
  • A set of static methods? Sure, this is possible, but I really want to avoid this.
  • No constructor, but have a $statementList in every method call? I like this because it makes it possible to inject the service via an interface, and replace the implementation with an other one. But what "other one" would that be? This will never happen.
  • So I made it stateful, as it is now.

@thiemowmde
Copy link
Contributor Author

This is still waiting for a review. We want to get rid of the overengineered cruft that is ByPropertyIdArray for a looong, looong time now, and I still believe this here is a wonderful solution. @manicki maybe?

@thiemowmde thiemowmde modified the milestones: 3.7.0, 3.8.0 Feb 27, 2017
Copy link

@brightbyte brightbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions, but looks fine overall.

* @license GPL-2.0+
* @author Thiemo Mättig
*/
class StatementListChanger {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method names are fine, but the class name is very generic. "Changer" could be anything. The methods mainly relate to groups, so maybe the class should have "Groups" in the name. But I can't think of anything nice offhand. So whatever.

* Removes all statements from the $statementList object given in the constructor.
*/
public function clear() {
foreach ( $this->statementList->toArray() as $statement ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me like clear() should really be part of the container object. And should probably have a much more simple implementations...

public function groupByProperty() {
$byId = [];

foreach ( $this->statementList->toArray() as $statement ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is toArray needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is optional because StatementList itself is Traversable. But leaving the ->toArray() out will require an additional /** @var Statement $statement */ for type safety, and in most places we decided to go with ->toArray() instead.

/**
* @param Statement $newStatement
* @param int|null $index An absolute index in the list. If the index is not next to a statement
* with the same property, the closest possible position is used instead. Default is null,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "possible" mean here? Does "closest" check in both directions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both directions. This only matters if the list was not grouped by property ID before.

*/
private $statementList;

public function __construct( StatementList $statementList ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

if ( $index === null ) {
$index = $this->getClosestIndexAtGroupBorder( $statements, $validIndex );
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh This makes it really obvious that we should model groups directly as nested arrays, and get rid of the whole index crap completely.

Copy link
Contributor Author

@thiemowmde thiemowmde Feb 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean with "crap". Can you please avoid using such offensive language?

@Benestar suggested to replace our naive StatementList with one that's aware of PropertyId groups a long time ago. I'm not opposing such a change.

I still believe the code proposed here is a helpful step in a longer migration path that might end with replacing StatementList.

Note that the code I'm proposing here does have a very specific advantage: It allows us to add new statements where they belong, without the need to reorder existing statement lists. Not that this is of much relevance, because users usually do not see diffs on the JSON level. But still. This smooth migration is not possible with a StatementList that enforces grouping on construction time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was not a complaint about this patch. It's just one more place where it becomes obvious that insertion by index into a grouped list is broken by design. It's just bad interface design. And that bad design (aka "crap") is my fault; I remember the session with Denny where we designed this interface. So, sigh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the proposals we discussed had two numeric indexes: one for a relative position in the group, and one for moving the group to an other position. I found this harder to use than the current absolute index.

I'm still very happy with the super-trivial StatementList we have. Only very few places in our application need statements to be grouped, and the requirements are always a bit different. A "baked in" grouping on the DataModel level may help in some cases, but will not solve all use-cases, and will make others more complicated.

private function isWithinGroup( array $statements, PropertyId $id, $index ) {
$count = count( $statements );

// Valid if the index either prepends ot appends a statement with the same property

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"precedes or succeeds"?

public function testConstructor() {
$instance = new StatementListChanger( new StatementList() );

$this->assertInstanceOf( StatementListChanger::class, $instance );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TDD. This is one of the first tests I wrote. It became obsolete later because all other tests are covering the same feature. I just forgot to remove it.

@thiemowmde thiemowmde force-pushed the statementListChanger branch from 7ca9951 to 4a5c0e0 Compare March 7, 2017 16:06
@thiemowmde thiemowmde mentioned this pull request Mar 16, 2017
@thiemowmde thiemowmde removed this from the 3.8.0 milestone Mar 17, 2017
@JeroenDeDauw
Copy link
Contributor

No changes since 2016. Please reopen if needed for current goals

@JeroenDeDauw JeroenDeDauw deleted the statementListChanger branch August 8, 2018 15:15
@thiemowmde thiemowmde restored the statementListChanger branch August 13, 2018 11:27
@thiemowmde thiemowmde reopened this Aug 13, 2018
@JeroenDeDauw
Copy link
Contributor

JeroenDeDauw commented Sep 17, 2018

@thiemowmde it is cool you put this effort into getting rid of ByPropertyIdArray and it is too bad this stalled.

I looked at this now for 15ish minutes (which means I did not read all comments) and these are my thoughts:

  • To me it makes more sense to put this kind of code in WikibaseDataModel. Having gone though this component I spotted some other things I now think belong in there. Keeping all the logic operations on the data structures gives you an Anemic Domain model like we currently have.
  • Having a StatementGroup class to represent the groups and host the code dealing with stuff within a group is still a good idea to me. It seems like no comments where made on that suggestion.
  • StatementGroup is likely a good place to start to put this code and then the remainder likely fits in StatementList.

Edit: there is a PR relevant to the StatementGroup topic here: wmde/WikibaseDataModel#507

Copy link
Contributor

@JeroenDeDauw JeroenDeDauw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think responsibilities are misplaced as per previous comment

@JeroenDeDauw
Copy link
Contributor

Let's do a fresh PR that is more focused and does not contain no longer relevant discussion that scares everyone away if someone want to actually create that.

@JeroenDeDauw JeroenDeDauw deleted the statementListChanger branch September 17, 2018 13:53
@thiemowmde thiemowmde restored the statementListChanger branch September 17, 2018 14:16
@thiemowmde
Copy link
Contributor Author

This is not meant as "the one" solution to all the year-long "statement (re)ordering" discussions, but merely as a way forward to get rid of ByPropertyIdArray and it's scary capabilities to move statement groups around. Why should I open a new pull request with the same code if it is already here? How are the discussion not relevant?

The concern I raised in wmde/WikibaseDataModel#507 (comment) still holds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants