Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key projection #336

Open
NicoLaval opened this issue May 20, 2024 · 1 comment
Open

Key projection #336

NicoLaval opened this issue May 20, 2024 · 1 comment

Comments

@NicoLaval
Copy link
Collaborator

@noahboerger you reported when a dataset is project to a part of its key the operation fails in Trevas because of a duplicate key column.

Could you provide an example please?

@NicoLaval NicoLaval added question Further information is requested Needs more information and removed question Further information is requested labels May 20, 2024
@noahboerger
Copy link
Collaborator

This issue is related to the membership operator (#). The reference manual explains the behaviour in the following part (p. 56):

The membership operator returns a Data Set having the same Identifier Components of ds and a single Measure. If comp is a Measure in ds, then comp is maintained in the result while all other Measures are dropped. If comp is an Identifier or an Attribute Component in ds, then all the existing Measures of ds are dropped in the result and a new Measure is added. A default conventional name is assigned to the new Measure depending on its type: for example num_var if the Measure is numeric, string_var if it is string and so on (the default name can be renamed through the rename operator if needed).

When ds1 is

id_1 id_2 val
IDENTIFIER IDENTIFIER MEASURE
INTEGER INTEGER INTEGER
1 2 3
4 5 6

The result of ds2 := ds1#id_2; should be out of my point of view

id_1 id_2 num_var
IDENTIFIER IDENTIFIER MEASURE
INTEGER INTEGER INTEGER
1 2 2
4 5 5

Another example for this is the BdI testcase "general/membership_1".

In trevas currently the following error is raised:

Occured error

Exception

java.lang.IllegalArgumentException: duplicate column [Component{id_2, type=class java.lang.Long, role=IDENTIFIER}]
  at fr.insee.vtl.model.Structured$DataStructure.<init>(Structured.java:275)
  at fr.insee.vtl.spark.SparkDataset.fromSparkSchema(SparkDataset.java:158)
  at fr.insee.vtl.spark.SparkDataset.<init>(SparkDataset.java:54)
  at fr.insee.vtl.spark.SparkProcessingEngine.executeProject(SparkProcessingEngine.java:298)
  at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitMembershipExpr(ExpressionVisitor.java:140)
  at fr.insee.vtl.engine.visitors.expression.ExpressionVisitor.visitMembershipExpr(ExpressionVisitor.java:41)
  at fr.insee.vtl.parser.VtlParser$MembershipExprContext.accept(VtlParser.java:501)
  at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
  at fr.insee.vtl.engine.visitors.AssignmentVisitor.visitAssignment(AssignmentVisitor.java:51)
  at fr.insee.vtl.engine.visitors.AssignmentVisitor.visitTemporaryAssignment(AssignmentVisitor.java:59)
  at fr.insee.vtl.parser.VtlParser$TemporaryAssignmentContext.accept(VtlParser.java:372)
  at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
  at fr.insee.vtl.engine.VtlScriptEngine.evalStream(VtlScriptEngine.java:263)
  at fr.insee.vtl.engine.VtlScriptEngine.eval(VtlScriptEngine.java:282)
  at java.scripting/javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:262)
  at fr.insee.trevas.jupyter.VtlKernel.eval(VtlKernel.java:305)
  at io.github.spencerpark.jupyter.kernel.BaseKernel.handleExecuteRequest(BaseKernel.java:334)
  at io.github.spencerpark.jupyter.channels.ShellChannel.lambda$bind$0(ShellChannel.java:64)
  at io.github.spencerpark.jupyter.channels.Loop.lambda$new$0(Loop.java:21)
  at io.github.spencerpark.jupyter.channels.Loop.run(Loop.java:78)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants