Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relocate dependencies #39

Open
timrobertson100 opened this issue Oct 5, 2017 · 9 comments
Open

Relocate dependencies #39

timrobertson100 opened this issue Oct 5, 2017 · 9 comments

Comments

@timrobertson100
Copy link
Member

I suggest this project use minimal dependencies and relocate them to avoid e.g.:

Caused by: java.lang.NoSuchMethodError: org.apache.commons.lang3.StringUtils.countMatches(Ljava/lang/CharSequence;C)I
	at org.gbif.utils.file.tabular.TabularFileMetadataExtractor.lambda$lineToLineDelimiterStats$13(TabularFileMetadataExtractor.java:280)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at org.gbif.utils.file.tabular.TabularFileMetadataExtractor.lineToLineDelimiterStats(TabularFileMetadataExtractor.java:281)
@cgendreau
Copy link
Contributor

cgendreau commented Oct 5, 2017

Seems reasonable but this specific issue (with lineToLineDelimiterStats) is from gbif-common.
If we start to relocate, should we only do it for the dwca-io or we apply the same logic for the other libraries?

@timrobertson100
Copy link
Member Author

I'd encourage that anything that is expected to be used widely as a lib needs to keep dependencies to a minimum and consider relocating (especially things like jackson, guava, commons-lang which are really volatile across versions). Where only a few really basic utils are needed (e.g. strings being null or empty) might it be worth even considering adding that code natively?

@mdoering
Copy link
Member

mdoering commented Oct 5, 2017

Fully agree with the goal to keep dependencies minimal.
But applying that to dwca-io and gbif-commons is not especially simple looking at the rather large dependency tree. We might need to take some drastic cuts and refactoring:

org.gbif:dwca-io:jar:1.32-SNAPSHOT
+- org.gbif:dwc-api:jar:1.17:compile
|  +- (org.slf4j:slf4j-api:jar:1.7.12:compile - omitted for conflict with 1.7.21)
|  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.12:compile
|  \- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.12:compile
|     \- (org.codehaus.jackson:jackson-core-asl:jar:1.9.12:compile - omitted for duplicate)
+- org.gbif:gbif-common:jar:0.36:compile
|  +- (com.google.guava:guava:jar:18.0:compile - omitted for duplicate)
|  +- (commons-io:commons-io:jar:2.4:compile - omitted for conflict with 2.5)
|  +- (org.apache.commons:commons-lang3:jar:3.4:compile - omitted for duplicate)
|  +- org.apache.commons:commons-compress:jar:1.10:compile
|  +- com.googlecode.owasp-java-html-sanitizer:owasp-java-html-sanitizer:jar:20160924.1:compile
|  +- commons-beanutils:commons-beanutils:jar:1.9.2:compile
|  |  +- (commons-logging:commons-logging:jar:1.1.1:compile - omitted for duplicate)
|  |  \- commons-collections:commons-collections:jar:3.2.1:compile
|  \- (org.slf4j:slf4j-api:jar:1.7.16:compile - omitted for conflict with 1.7.12)
+- org.gbif.registry:registry-metadata:jar:2.59:compile
|  +- (com.google.guava:guava:jar:18.0:compile - omitted for duplicate)
|  +- org.gbif:gbif-parsers:jar:0.28:compile
|  |  +- (com.google.guava:guava:jar:18.0:compile - omitted for duplicate)
|  |  +- com.google.code.findbugs:jsr305:jar:3.0.1:compile
|  |  +- (org.apache.commons:commons-lang3:jar:3.4:compile - omitted for duplicate)
|  |  +- org.apache.commons:commons-math3:jar:3.6.1:compile
|  |  +- org.apache.tika:tika-core:jar:1.13:compile
|  |  +- (org.gbif:gbif-api:jar:0.42:compile - omitted for conflict with 0.46)
|  |  +- org.gbif:name-parser:jar:2.18:compile
|  |  |  +- (org.gbif:gbif-api:jar:0.41:compile - omitted for conflict with 0.42)
|  |  |  +- (org.gbif:gbif-common:jar:0.28:compile - omitted for conflict with 0.36)
|  |  |  +- (org.slf4j:slf4j-api:jar:1.7.21:compile - omitted for conflict with 1.7.12)
|  |  |  +- (commons-io:commons-io:jar:2.5:compile - omitted for conflict with 2.4)
|  |  |  +- (org.apache.commons:commons-lang3:jar:3.4:compile - omitted for duplicate)
|  |  |  \- (com.google.guava:guava:jar:18.0:compile - omitted for duplicate)
|  |  +- (org.gbif:gbif-common:jar:0.28:compile - omitted for conflict with 0.36)
|  |  +- org.threeten:threetenbp:jar:1.3.2:compile
|  |  \- (org.slf4j:slf4j-api:jar:1.7.21:compile - omitted for conflict with 1.7.12)
|  +- org.gbif:gbif-api:jar:0.46:compile
|  |  +- (org.gbif:dwc-api:jar:1.16:compile - omitted for conflict with 1.17)
|  |  +- (com.google.code.findbugs:jsr305:jar:3.0.1:compile - omitted for duplicate)
|  |  +- (com.google.guava:guava:jar:18.0:compile - omitted for duplicate)
|  |  +- javax.validation:validation-api:jar:1.1.0.Final:compile
|  |  +- (org.apache.commons:commons-lang3:jar:3.4:compile - omitted for duplicate)
|  |  +- com.vividsolutions:jts:jar:1.13:compile
|  |  +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.12:compile - omitted for duplicate)
|  |  +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.12:compile - omitted for duplicate)
|  |  \- (org.slf4j:slf4j-api:jar:1.7.12:compile - omitted for duplicate)
|  +- (org.slf4j:slf4j-api:jar:1.7.12:compile - omitted for duplicate)
|  +- (commons-beanutils:commons-beanutils:jar:1.9.2:compile - omitted for duplicate)
|  +- (org.gbif:gbif-common:jar:0.28:compile - omitted for conflict with 0.36)
|  +- (org.apache.commons:commons-digester3:jar:3.2:compile - omitted for duplicate)
|  +- (org.freemarker:freemarker:jar:2.3.25-incubating:compile - omitted for duplicate)
|  \- (commons-io:commons-io:jar:2.5:compile - omitted for conflict with 2.4)
+- org.mockito:mockito-core:jar:2.8.9:test
|  +- net.bytebuddy:byte-buddy:jar:1.6.14:test
|  +- net.bytebuddy:byte-buddy-agent:jar:1.6.14:test
|  \- org.objenesis:objenesis:jar:2.5:test
+- commons-io:commons-io:jar:2.5:compile
+- org.apache.commons:commons-lang3:jar:3.4:compile
+- org.apache.commons:commons-digester3:jar:3.2:compile
|  +- cglib:cglib:jar:2.2.2:compile
|  |  \- asm:asm:jar:3.3.1:compile
|  +- (commons-beanutils:commons-beanutils:jar:1.8.3:compile - omitted for conflict with 1.9.2)
|  \- commons-logging:commons-logging:jar:1.1.1:compile
+- com.google.guava:guava:jar:18.0:compile
+- org.freemarker:freemarker:jar:2.3.25-incubating:compile
+- org.slf4j:slf4j-api:jar:1.7.21:compile
+- junit:junit:jar:4.12:test
|  \- org.hamcrest:hamcrest-core:jar:1.3:test
\- ch.qos.logback:logback-classic:jar:1.1.7:test
   +- ch.qos.logback:logback-core:jar:1.1.7:test
   \- (org.slf4j:slf4j-api:jar:1.7.20:test - omitted for conflict with 1.7.21)

@mdoering
Copy link
Member

mdoering commented Oct 5, 2017

registry-metadata is pretty bad and should intuitively not be necessary.

@MattBlissett
Copy link
Member

The DatasetParser is used to return a GBIF-API Dataset from an Archive:

https://github.com/gbif/dwca-io/blob/master/src/main/java/org/gbif/dwca/io/Archive.java#L232

@MattBlissett
Copy link
Member

After removing the DatasetParser and Dataset we have this (test dependencies omitted)

org.gbif:dwca-io:jar:2.0-SNAPSHOT
+- org.gbif:dwc-api:jar:1.18-SNAPSHOT:compile
|  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.12:compile
|  \- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.12:compile
+- org.gbif:gbif-common:jar:0.38-SNAPSHOT:compile
|  +- org.apache.commons:commons-compress:jar:1.14:compile
|  +- com.googlecode.owasp-java-html-sanitizer:owasp-java-html-sanitizer:jar:20160924.1:compile
|  \- commons-beanutils:commons-beanutils:jar:1.9.3:compile
|     \- commons-collections:commons-collections:jar:3.2.2:compile
+- commons-io:commons-io:jar:2.5:compile
+- org.apache.commons:commons-lang3:jar:3.4:compile
+- org.apache.commons:commons-digester3:jar:3.2:compile
|  +- cglib:cglib:jar:2.2.2:compile
|  |  \- asm:asm:jar:3.3.1:compile
|  \- commons-logging:commons-logging:jar:1.1.1:compile
+- com.google.guava:guava:jar:18.0:compile
+- com.google.code.findbugs:jsr305:jar:3.0.2:compile
+- javax.validation:validation-api:jar:1.1.0.Final:compile
+- org.freemarker:freemarker:jar:2.3.25-incubating:compile
+- org.slf4j:slf4j-api:jar:1.7.21:compile

MattBlissett added a commit that referenced this issue Mar 27, 2018
@mdoering
Copy link
Member

mdoering commented Sep 28, 2018

unfortunately this has increased considerably as #47 noticed, mostly via the gbif-commons dependency:

 org.gbif:dwca-io:jar:2.3-SNAPSHOT
 +- org.gbif:dwc-api:jar:1.19:compile
 |  +- (org.slf4j:slf4j-api:jar:1.7.12:compile - omitted for conflict with 1.7.25)
 |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.12:compile
 |  \- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.12:compile
 |     \- (org.codehaus.jackson:jackson-core-asl:jar:1.9.12:compile - omitted for duplicate)
 +- org.gbif:gbif-common:jar:0.41:compile
 |  +- (commons-io:commons-io:jar:2.5:compile - omitted for duplicate)
 |  +- (org.apache.commons:commons-lang3:jar:3.6:compile - omitted for duplicate)
 |  +- org.apache.commons:commons-compress:jar:1.14:compile
 |  +- com.googlecode.owasp-java-html-sanitizer:owasp-java-html-sanitizer:jar:20160924.1:compile
 |  +- commons-beanutils:commons-beanutils:jar:1.9.3:compile
 |  |  +- (commons-logging:commons-logging:jar:1.2:compile - omitted for conflict with 1.1.1)
 |  |  \- commons-collections:commons-collections:jar:3.2.2:compile
 |  +- org.apache.poi:poi:jar:3.15:compile
 |  |  +- commons-codec:commons-codec:jar:1.10:compile
 |  |  \- org.apache.commons:commons-collections4:jar:4.1:compile
 |  +- org.apache.poi:poi-ooxml:jar:3.15:compile
 |  |  +- (org.apache.poi:poi:jar:3.15:compile - omitted for duplicate)
 |  |  +- org.apache.poi:poi-ooxml-schemas:jar:3.15:compile
 |  |  |  \- org.apache.xmlbeans:xmlbeans:jar:2.6.0:compile
 |  |  |     \- stax:stax-api:jar:1.0.1:compile
 |  |  \- com.github.virtuald:curvesapi:jar:1.04:compile
 |  +- org.apache.odftoolkit:simple-odf:jar:0.8.2-incubating:compile
 |  |  +- org.apache.odftoolkit:odfdom-java:jar:0.8.11-incubating:compile
 |  |  |  +- org.apache.odftoolkit:taglets:jar:0.8.11-incubating:compile
 |  |  |  |  \- com.sun:tools:jar:1.7.0:system
 |  |  |  +- (xerces:xercesImpl:jar:2.9.1:compile - omitted for duplicate)
 |  |  |  +- (xml-apis:xml-apis:jar:1.3.04:compile - omitted for duplicate)
 |  |  |  +- org.apache.jena:jena-core:jar:2.11.2:compile
 |  |  |  |  +- (org.slf4j:slf4j-api:jar:1.7.6:compile - omitted for conflict with 1.7.12)
 |  |  |  |  +- org.apache.jena:jena-iri:jar:1.0.2:compile
 |  |  |  |  |  \- (org.slf4j:slf4j-api:jar:1.7.6:compile - omitted for conflict with 1.7.12)
 |  |  |  |  \- (xerces:xercesImpl:jar:2.11.0:compile - omitted for conflict with 2.9.1)
 |  |  |  +- net.rootdev:java-rdfa:jar:0.4.2:compile
 |  |  |  |  +- (org.apache.jena:jena-iri:jar:0.9.1:compile - omitted for conflict with 1.0.2)
 |  |  |  |  \- (org.slf4j:slf4j-api:jar:1.5.6:compile - omitted for conflict with 1.7.12)
 |  |  |  \- commons-validator:commons-validator:jar:1.5.0:compile
 |  |  |     +- (commons-beanutils:commons-beanutils:jar:1.9.2:compile - omitted for conflict with 1.9.3)
 |  |  |     +- commons-digester:commons-digester:jar:1.8.1:compile
 |  |  |     +- (commons-logging:commons-logging:jar:1.2:compile - omitted for duplicate)
 |  |  |     \- (commons-collections:commons-collections:jar:3.2.2:compile - omitted for duplicate)
 |  |  +- xerces:xercesImpl:jar:2.9.1:compile
 |  |  |  \- (xml-apis:xml-apis:jar:1.3.04:compile - omitted for duplicate)
 |  |  \- xml-apis:xml-apis:jar:1.3.04:compile
 |  \- (org.slf4j:slf4j-api:jar:1.7.25:compile - omitted for conflict with 1.7.12)
 +- commons-io:commons-io:jar:2.5:compile
 +- org.apache.commons:commons-lang3:jar:3.6:compile
 +- org.apache.commons:commons-digester3:jar:3.2:compile
 |  +- cglib:cglib:jar:2.2.2:compile
 |  |  \- asm:asm:jar:3.3.1:compile
 |  +- (commons-beanutils:commons-beanutils:jar:1.8.3:compile - omitted for conflict with 1.9.3)
 |  \- commons-logging:commons-logging:jar:1.1.1:compile
 +- com.google.guava:guava:jar:23.0:compile
 |  +- (com.google.code.findbugs:jsr305:jar:1.3.9:compile - omitted for conflict with 3.0.2)
 |  +- com.google.errorprone:error_prone_annotations:jar:2.0.18:compile
 |  +- com.google.j2objc:j2objc-annotations:jar:1.1:compile
 |  \- org.codehaus.mojo:animal-sniffer-annotations:jar:1.14:compile
 +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
 +- javax.validation:validation-api:jar:1.1.0.Final:compile
 +- org.freemarker:freemarker:jar:2.3.26-incubating:compile
 +- org.slf4j:slf4j-api:jar:1.7.25:compile

Maybe we can exclude all of simple-odf and poi-ooxml? Or do we need to be able to parse excel sheets in dwca-io? Replacing apache digester with some less heavy xml parsing is also pretty simple for the meta.xml part. E.g. I have used Stax in the CoL+ project, which pulls in the woodstax parser.

@MattBlissett
Copy link
Member

gbif-commons's text handling has the ability to parse Excel and OpenDocument spreadsheets. This can safely be excluded by most users; it's probably appropriate to refactor it into a submodule of gbif-common.

However, I think we ourselves do use it to gracefully accept a CSV or spreadsheet with DWC headers.

@MattBlissett
Copy link
Member

However, I think we ourselves do use it to gracefully accept a CSV or spreadsheet with DWC headers.

In fact, we don't — not here anyway, the things that do that convert the spreadsheets to CSV before passing them on to DWCA-IO.

I've moved the spreadsheet handling classes to a new module, so this is now back down to where it was in March.

  • org.gbif:dwc-api — our DarwinCore terms API
    • org.codehaus.jackson:jackson-core-asl, jackson-mapper-asl — JSON serialization of terms
  • org.gbif:gbif-common — only the file part is used, decompressing ZIPs and reading CSVs.
    • org.apache.commons:commons-compress — decompress ZIP
    • com.googlecode.owasp-java-html-sanitizer:owasp-java-html-sanitizer:jar — unused
    • commons-beanutils:commons-beanutils — unused
      • commons-collections:commons-collections — unused
  • commons-io:commons-io — listing DarwinCore archives etc
  • org.apache.commons:commons-lang3 — basic string handling and HTML/XML/Unicode entity/escape handling
  • org.apache.commons:commons-digester3 — reading XML
    • cglib:cglib — …
      *asm:asm — …
    • commons-logging:commons-logging — …
  • com.google.guava:guava — preconditions, iterator utilities, visible-for-testing. Could be replaced by the shaded one included in gbif-common?
    • com.google.errorprone:error_prone_annotations — …
    • com.google.j2objc:j2objc-annotations — …
    • org.codehaus.mojo:animal-sniffer-annotations — …
  • com.google.code.findbugs — not used, will remove.
  • javax.validation:validation-api — @NotNull only used on ArchiveField.
  • org.freemarker:freemarker — generating meta.xml. Perhaps overkill?
  • org.slf4j:slf4j-api — logging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants