Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

translation map, any object really? #2

Open
jrochkind opened this issue Oct 1, 2013 · 11 comments
Open

translation map, any object really? #2

jrochkind opened this issue Oct 1, 2013 · 11 comments

Comments

@jrochkind
Copy link
Contributor

"Since loading a ruby file (with a .rb suffix) only requires that said file evaluates to an object with a # method, there are a variety of objects you can use:"

I did not write it with this intention, and the traject documentation says different.

But if this works now, that's cool. But I'm not comfortable including this statement in a guide we link to from traject itself (or traject release announcement) unless there's a traject CI test that ensures this is true and stays true.

@jrochkind
Copy link
Contributor Author

Actually, as I think about it more --

what is the advantage of using an arbitrary ruby object with a #[] method wrapped in a TranslationMap -- instead of just using that ruby object itself? What do you get by wrapping it in a TranslationMap, other than another level of indirection to make the code harder to understand?

The way I see it, the point of a TranslationMap is basically to let you easily load a Hash from a config file on disk.

While I provided for loading that config file from a live .rb file: A) I've never actually used this, and B) I anticipated that mostly being used for just putting a ruby Hash literal in a config file, essentially just an alternate syntax to yaml or properties, if you prefer ruby Hash literal syntax. And sure, it also maybe lets you do things like change the default value or default proc of the hash.

But if you're going all the way to using a 'MatchMap' or something -- what are you gaining by wrapping the MatchMap in a TranslationMap, why not just use the MatchMap?

I think it's a mistake to encourage people to use unneeded abstraction instead of just writing ruby.

@billdueber
Copy link
Member

I love the open-ended nature of TranslationMap -- apparently I gave you more credit than I should have. :-)

I see the tmap as offering two important advantages to just using the object bare.

  • The whole find it on the load path thing
  • The default value / passthrough handling

For example, it's not hard to imagine things like:

  • Well, MatchMap, obviously, for people coming from solrmarc
  • A trimming hash for fixed field values like language, so you could pass it 'io ' and get "Indonesia" back.
  • Case-insensitive has lookups, like a hashie or something

I'm not against pulling it out of the documentation, but I am against enforcing a "simple hash only" thing. I mean, why have duck typing if you're not going to let anyone quack?

I'll change the docs and we'll go from there?

@jrochkind
Copy link
Contributor Author

You can already find plain old ruby code on the load-path. And you can easily write your ruby code to do the default value/passthrough handling.

Some simplified ruby outline:

In a gem somewhere, or in your local project:

class MyGem::MyMap
   def initialize(default = nil)
      @default = default
   end
   def [](key)
      value = _something(key)
      return key if value.nil? && @default == :passthrough
   end
end

in a config file:

# ordinary ruby require can already be used to find ordinary
# ruby classes on the load path, that's what it's for!
require 'my_gem/my_map'  

to_field("whatever") do |record, acc|
    acc MyGem::MyMap.new[  get_key_from(record) ]
end

Or whatever. I am not seeing why you need a TranslationMap to a) look things up on the load path, or b) provide default/passthrough handling. These are already both simple things to do in ordinary ruby. )

I dislike providing or encouraging levels-of-indirection/abstraction complex solutions to what could be done by Just Writing Ruby. I think it leads to harder to understand, debug, and maintain code. I think we should be encouraging people to Just Write Ruby.

I sense this is to some extent a philosophical difference between us -- we have had other discussions where you wanted to add an option or feature to some traject method, when to me it would only take 1 or 2 lines of Just Writing Ruby to accomplish without adding the feature explicitly.

That was sort of the basis behind my brainstorm to make traject DSL based on blocks -- in SolrMarc, either an option had to be supported explicitly and seperately by the DSL, or you had to resort to writing custom logic with boilerplate in another file. With traject and ruby blocks, you can Just Write Ruby, and it often takes no more lines to Just Write Ruby than it would to call the hypothetical option or feature hypothetically added to traject.

(And personally, I wouldn't even use a MatchMap, I'd just write a block with a ruby case/when statement using regexes! SolrMarc needed to give you a weird map-like way of doing regexen, cause there was no good way to put ordinary code into a config file, everything had to be shoehorned into one of a few devices supported by it's limited DSL)

Anyhow

Anyhow. While I am not in favor of recommending that people wrap things in TranslationMaps when they could be Just Writing Ruby -- regardless of what the guide/docs say, if it is important to you that you be able to, then you should add CI tests to trjaject ensuring it. Just go ahead and commit em. If it's going to be in official or semi-official docs, then it def needs to be in CI tests, but even if it not, if it's important to you, then put it in CI tests.

As far as the docs go -- it's not entirely clear to me why we should split docs in these two places. Unless it's just cause you want to write docs that accord to your philosophy of traject, rather than mine? Which is potentially fine. Or, do you have other reasons you prefer to keep these docs separated from the traject docs?

But if not that, then I'd rather see us improving the traject docs by taking pieces of what you've wrote and moving them into the traject docs (README or ./docs), rather than splitting docs between two places for people to look, with some info duplicated between both.

I'd much rather fix up the traject docs for any insufficiencies you perceive.

If there are topics that you think need separate fleshed-out pages in traject docs, let's make such pages in ./docs, and then the traject README section that summarizes that topic can link out to them -- in some cases, after adding a seperate page on the topic, the README section summarizing that topic can be made even shorter than it is now, with just a few examples.

To begin with, do you mind if I copy in your 'motivation' page? Beyond that, can I copy stuff from your docs into traject docs, merging them, when it seems an improvement to traject docs?

@billdueber
Copy link
Member

It seems kinda funny to me that you'd argue that it's so easy to load stuff off the path and provide defaults that it's not worth supporting, when in fact you decided to write Traject::TranslationMap to take care of just that kind of bookkeeping.

I think this must, in fact, be a philosophical difference between us. I feel like you're constantly saying, "Don't over-abstract; it's easy to do it inline" and I'm saying, "Why on earth would I write this code over and over again when there's already an implementation of it sitting in the lib directory?" I also imagine part of it in this case is that you plan on using something other than a dumb hash (at least based on how you've talked about your solrmarc code) exactly zero times, so any support for it is going to seem like overkill. I've got at least three use cases (MatchMap itself is extracted from code I use to deal with this stuff in my current marc2solr setup). Most places won't have the bureaucratic nightmare that is the Michigan library (most of my MatchMap use is figuring out combinations of campus/library/building/floor and our crazy format determination stuff). Obviously I can use the stuff standalone, but then again, so could every current use of TranslationMap. It's just easier to use the abstraction.

But whatever. I'll change the documentation to "must evaluate to a hash" so people don't get confused, and you agree to not explicitly test to make sure it's just a dumb hash :-)

The perceived documentation hole I was trying to address (mostly by the use of the sample index file, but also with the guide docs) is "how the hell do I get started on this if I'm just a normal solrmarc user and/or maybe more of a python person?" I'm a huge player in the "give them something to massage into what they need" camp. Going through the docs myself, it wasn't at all clear to me how someone would get to "OK, now I'm writing my indexing file" based on the stuff currently in the docs directory and the README. It's all there; it's just not super-linear.

The docs are where there are for two reasons. First, I just wanted them in a different git repo while traject was churning so much; it allowed me to work on them without have a special "docs" branch that wasn't necessarily up to date and remembering to merge so I wasn't writing docs based on out-of-date code and such. Second was that I originally thought I'd be referring a lot more often to the code; that turned out to not be the case.

And actually, now that I'm thinking about it; I need to make sure it's clear that I contributed to this stuff in some way, so when I put myself up for promotion next year I have something to hang my hat on.

I'm not at all against moving the docs I've written in to the traject repo proper; I was under the impression that you thought I was wasting my time with the more explicit docs and probably didn't want them. If they can be massaged so they're not so distasteful to you, that'd be preferable all around.

[I hope you're enjoying this process. I know we're butting heads a lot, but I at least feel like I'm coming out smarter.]

@jrochkind
Copy link
Contributor Author

Hey, I'm pretty sure it's not true that you can pass in any object that implements #[].

At minimum, it needs to also implement has_key?. Note:

That latter one is quite intentional. The idea is that you might define some keys that intentionally map to nil, which would not trigger the default behavior. (And I should add a test for this intentional behavior!)

This is why CI with good coverage is needed for the claim that you can use 'any object that duck-types like so'. Not just to ensure it's true now, but to ensure it stays true -- what if it hadn't yet used has_key?, cause I hadn't thought of that case with intentionally mapping to nil yet, but then I later added it?

Without CI, I woudn't have realized I broke the thing your docs promissed -- or your code that relied upon that promise!

Of course, with CI, in the hypothetical case there was CI that confirmed all the object had to do was #[] -- when I tried to 'fix' what I saw as a bug around nil vs has_key, the CI would tell me I wasn't allowed to while still keeping it backwards compat!

Which is an illustration of why I want to keep the number of different traject features and devices and 'contracts' with a small surface area.

On the process

I am fine, I am glad you are too! I feel bad being so contrary sometimes.

On the docs and the repo

I think we should move em into traject itself where possible.

In some cases, I think your docs definitely are good and useful and an improvement.

In some cases, I actively disagree with what your docs say. :)

In other cases, I do think your docs are overkill -- the downside of too much docs (says the guy who can never be concise, this guy!), is that when things change, there are so many doc places to change to match. It can be easy to start leaving out of date wrong docs around. I've already run into this with traject just so far, parts of docs I hadn't noticed had become wrong until long after the commit that made them wrong!

But if there's going to be docs that someone can google that appears to be about traject, or even more so docs that we point to in our announcements, I'd just as soon they part of the actual traject repo.

I'm also fine with creating a ./samples directory in traject repo itself, if you think it's neccesary. The thing I was mainly opposed to was generation, not including samples. Although again the downside of the samples, is that you've got so many things that can become wrong (wont' work or no longer best practices) as traject changes.

So I don't think we need as much docs as you do. But I could be wrong. If you do, I'd rather they be in the traject repo (where you are of course a committer too!)

on attribution

No problem there. I've already added your name to to the gemspec, last week. https://github.com/jrochkind/traject/blob/master/traject.gemspec#L9

I'm fine with adding "by Bill Dueber and Jonathan Rochkind" to the README.

If part of the problem is that the repo is the jrochkind github area, I'm fine with moving it to a github 'team' area.

Or anything else you can think of to make it clear that you're an author and committer. Whatever you want for your promotion process or whatever, fine with me. Don't need to keep a seperate docs/sample repo for that purpose, it should be a non-issue there.

@jrochkind
Copy link
Contributor Author

(and i'm sorry, I"m totally going to go add (or make sure it's already added) the test case now that will require has_key? to be in the implementation in order to pass. . It really was my intention all along)

@jrochkind
Copy link
Contributor Author

PPS: You could be right that our disagreements are just based on us running into different problems in our local projects -- maybe they aren't philosophical at all, just a result of different use cases.

I would never have TranslationMap intentionally do a kind_of an reject non-Hashes. Duck-typing is good. It's just that duck typing is tricky when you can't be sure what methods might be called on the duck! For instance, the has_key? example.

But the other thing I tried to do is make it as low-friction as possible to extend traject with ruby code. If TranslationMap doesn't do what you want, it is absolutely expected and encouraged to provide your own MyTranslationMap class that does. (Likewise, if TranslationMap now seems to do what you want, but a future version broke it, you could always provide your own MyTranslationMap as a replacement, I guess).

Of course, it woudn't be built into extract_marc then, you'd have to call it explicitly in a block, yeah. Which is perhaps related to why I keep going back to "It doesn't need to be provided in extract_marc, it's okay if you need to Just Write Ruby in a block", heh, because it's that ethos that will make that kind of extension possible. In some cases, we may need to provide tools to make "just write ruby in a block" more convenient -- if the developers have that as a goal, they'll provide those tools when needed, providing for more flexible extnesion than "just add it as an option to extract_marc, or to TranslationMap".

Is the idea anyway.

If we're having any differences that you think are due to our use cases differing and me not understanding yours -- feel free to show me your actual code in your acutal config files, and "See how hard this would be to do if TranslationMap didn't do X?", or "See how hard this is since extract_marc doesn't do Y?"

@jrochkind
Copy link
Contributor Author

I don't know why I'm still thinking about this, but I'm afraid a MatchMap is actually going to break with a TranslationMap with a default.

While MatchMap does have has_key?, it's implementation means it will return false even if the key matched one of the regexes in the MatchMap. (Unless the key is one of the regexes in the MatchMap).

Which means in current TranslationMap implementation, if you set a default and give it a MatchMap, then it'll return the default for most everything, even if something matches a regex in the MatchMap.

So I don't think it'll work.

So, yeah, A) claims like this in docs need CI testing, and B) Just use the MatchMap directly, don't wrap it in TranslationMap, or if you really want to abstract/DRY some MatchMap details, then make a separate class MatchTranslationMap with appropriate semantics and implementation for a MatchMap!

@jrochkind
Copy link
Contributor Author

Oops, and actually it is already hard-coded to kind_of?(Hash), if it's not a Hash it assumes the arg can be treated as a string and used to look up the config file in the filesystem. (Cause String is the default expected typical case).

irb(main):009:0> t = Traject::TranslationMap.new(match_map, :default => "DEFAULT")
Traject::TranslationMap::NotFound: No translation map definition file found at '#<MatchMap:0x26114629>[.rb|.yaml]' in load path

@jrochkind
Copy link
Contributor Author

Or (I'm having some kind of manic obsession with TranslationMap now), if we really do want TranslationMap to be flexible, I'd say provide another optional argument, something that just implements #call. Like a lambda/proc. Document that if you pass one of those in, #default settings will be ignored, it'll just what's returned by the proc.call(key) -- and it's easy to do CI coverage for a lambda arg.

Then you can use any class you write or obtain that just implements #call, or you can just pass in a one-line lambda that calls your MatchMap or whatever the heck else you want in whatever way you want.

Nearly as convenient as hypothetical "anything that implements #[]" case, but more testable/enforecable as a contract.

I am still not in favor of adding this additional feature to TMap, but if you think you've got use cases (which won't be unique to you) for passing in 'hash-like' objects, then rather than try to make 'hash-like' a contract (it's too broad and ambiguous a target, "hash-like"!), I'd potentially rather this lambda/anything-with-call alternative instead.

@billdueber
Copy link
Member

"OK, buddy. Let's all be calm here. Just put the TranslationMap code on the
floor and we'll all walk out of here alive."

On Tue, Oct 1, 2013 at 11:12 PM, Jonathan Rochkind <[email protected]

wrote:

Or (I'm having some kind of manic obsession with TranslationMap now), if
we really do want TranslationMap to be flexible, I'd say provide another
optional argument, something that just implements #call. Like a
lambda/proc. Document that if you pass one of those in, #default settings
will be ignored, it'll just what's returned by the proc.call(key) -- and
it's easy to do CI coverage for a lambda arg.

Then you can use any class you write or obtain that just implements #call,
or you can just pass in a one-line lambda that calls your MatchMap or
whatever the heck else you want in whatever way you want.

Nearly as convenient as hypothetical "anything that implements #[]" case,
but more testable/enforecable as a contract.

I am still not in favor of adding this additional feature to TMap, but if
you think you've got use cases (which won't be unique to you) for passing
in 'hash-like' objects, then rather than try to make 'hash-like' a contract
(it's too broad and ambiguous a target, "hash-like"!), I'd potentially
rather this lambda/anything-with-call alternative instead.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2#issuecomment-25510807
.

Bill Dueber
Library Systems Programmer
University of Michigan Library

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants