-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NU-1823] Kafka source without topic schema #7066
base: staging
Are you sure you want to change the base?
Conversation
17b9df4
to
6ec935e
Compare
) | ||
}).map(getVersionParam) | ||
} else { | ||
val versionValues = List( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we need 2 options as I think currently we try to process every message as byte array or string
At least source with "Json" option chosen seems to be working with both normal json and byte array, as I'm testing it by using kafka producer locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments.
Let's start from some test with scenario runner which shows that the source works in desired configuration.
After that we can figure out whether we can render "content type" field in case of non-schemed topics instead of "version" field.
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
Outdated
Show resolved
Hide resolved
...a/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/ParsedSchemaSupport.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
Outdated
Show resolved
Hide resolved
|
||
case class DynamicSchemaVersion(typ: JsonTypes) extends SchemaVersionOption | ||
|
||
sealed abstract class JsonTypes(val value: Int) extends IntEnumEntry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need value
here? can we use only Enum[JsonTypes] instead of IntEnum[JsonTypes]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- as I wrote in previous comments IMO we should separate selected version representation from content type.
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/SchemaVersionOption.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
Outdated
Show resolved
Hide resolved
TODO: ad-hoc test and schema in sink
Added comments where not sure
…andling when couldn't list kafka topics (previously it wasn't needed)
…compiles Also a workaround in the same file for test, it will now fetch all topics from schema registry as before and from kafka, so tests should pass
When topic without schema is selected instead of version, user can now choose ContentType (which for now doesn't change anything, need to implement handling bytes array)
cd3c797
to
4ef3b59
Compare
|
||
class KafkaJsonItSpec extends FlinkWithKafkaSuite with PatientScalaFutures with LazyLogging { | ||
|
||
private val givenMatchingAvroObjV2 = avroEncoder.encodeRecordOrError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- imo we don't need anything avro connected in the test class, including encoding record using avroEncoder, using RecordSchemaV2 and decoding json using UniversalSchemaSupportDispatcher/RuntimeSchemaData/SchemaWithMetadata.
All of that we can do basing on simple circe json parser and json ast
package pl.touk.nussknacker.defaultmodel
import io.circe.{Json, parser}
import pl.touk.nussknacker.engine.api.process.TopicName.ForSource
import pl.touk.nussknacker.engine.api.validation.ValidationMode
import pl.touk.nussknacker.engine.build.ScenarioBuilder
import pl.touk.nussknacker.engine.graph.expression.Expression
import pl.touk.nussknacker.engine.kafka.KafkaTestUtils.richConsumer
import pl.touk.nussknacker.engine.schemedkafka.KafkaUniversalComponentTransformer
import pl.touk.nussknacker.engine.schemedkafka.schemaregistry.ContentTypes
import pl.touk.nussknacker.engine.spel.SpelExtension.SpelExpresion
import java.nio.charset.StandardCharsets
import java.time.Instant
class KafkaJsonItSpec extends FlinkWithKafkaSuite {
test("should round-trip json message without provided schema") {
val jsonRecord = Json.obj(
"first" -> Json.fromString("Jan"),
"middle" -> Json.fromString("Tomek"),
"last" -> Json.fromString("Kowalski")
)
val inputTopic = "input-topic-without-schema"
val outputTopic = "output-topic-without-schema"
kafkaClient.createTopic(inputTopic, 1)
kafkaClient.createTopic(outputTopic, 1)
sendAsJson(jsonRecord.toString(), ForSource(inputTopic), Instant.now.toEpochMilli)
val process =
ScenarioBuilder
.streaming("without-schema")
.parallelism(1)
.source(
"start",
"kafka",
KafkaUniversalComponentTransformer.topicParamName.value -> Expression.spel(s"'$inputTopic'"),
KafkaUniversalComponentTransformer.contentTypeParamName.value -> s"'${ContentTypes.JSON.toString}'".spel
)
.emptySink(
"end",
"kafka",
KafkaUniversalComponentTransformer.sinkKeyParamName.value -> "".spel,
KafkaUniversalComponentTransformer.sinkRawEditorParamName.value -> "true".spel,
KafkaUniversalComponentTransformer.sinkValueParamName.value -> "#input".spel,
KafkaUniversalComponentTransformer.topicParamName.value -> s"'$outputTopic'".spel,
KafkaUniversalComponentTransformer.contentTypeParamName.value -> s"'${ContentTypes.JSON.toString}'".spel,
KafkaUniversalComponentTransformer.sinkValidationModeParamName.value -> s"'${ValidationMode.lax.name}'".spel
)
run(process) {
val outputRecord = kafkaClient.createConsumer().consumeWithConsumerRecord(outputTopic).take(1).head
val parsedOutput = parser
.parse(new String(outputRecord.value(), StandardCharsets.UTF_8))
.fold(throw _, identity)
parsedOutput shouldBe jsonRecord
}
}
}
import java.time.Instant | ||
import java.util | ||
|
||
class KafkaJsonItSpec extends FlinkWithKafkaSuite with PatientScalaFutures with LazyLogging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik we don't need LazyLogging
here because we don't use loggers directly in the class.
PatientScalaFutures
is also not necessary as long we don't have a place to pass PantienceConfig in the class for eventually
/futureValue
/whenReady
usages (at least I can't find it).
@@ -0,0 +1,83 @@ | |||
package pl.touk.nussknacker.defaultmodel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo we need a test case or test class for PLAIN
content type
) | ||
.asInstanceOf[util.HashMap[String, String]] | ||
|
||
response.forEach((key, value) => givenMatchingAvroObjV2.get(key) shouldBe value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion in forEach is tricky - it won't fail if map is empty
) | ||
}).map(getVersionParam) | ||
val topicsWithSchema = topicSelectionStrategy.getTopics(schemaRegistryClient) | ||
if (topicsWithSchema.exists(_.contains(preparedTopic.prepared.topicName.toUnspecialized))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe lets extract the condition and topicsWithSchema
computing line to some method like isTopicWithoutSchema(topic, strategy)
, wdyt?
val valueBytes = readValueMessage(valueSchemaOpt, topic, value) | ||
(keyBytes, valueBytes) | ||
|
||
if (schemaRegistryClient.getAllTopics.exists(_.contains(UnspecializedTopicName(topic.name)))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw a similar condition in a few places in the change 😄 maybe we can extract it somewhere?
val valueSchemaOpt = | ||
Option( | ||
SchemaWithMetadata( | ||
OpenAPIJsonSchema("""{"type": "object"}"""), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this value actually have to be a json object in root? What about other valid json elements (strings, numbers, arrays, ... null?)?
} else { | ||
SchemaWithMetadata( | ||
// I don't know how these schemas affect deserialization later | ||
OpenAPIJsonSchema("""{"type": "object"}"""), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above - imo we can have other element types as root element.
_ | ||
) => | ||
val preparedTopic = prepareTopic(topic) | ||
val valueValidationResult = if (contentType.equals("JSON")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use some constant/enum here
Valid( | ||
( | ||
Some( | ||
RuntimeSchemaData[ParsedSchema]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does it work with non-json values? I mean e.g. unescaped string (someString
) as root element in kafka record value
- created a function for repeated code - removed use of avro in test for json - used more constants were applicable
WalkthroughThe changes introduce a comprehensive set of modifications across multiple files to enhance Kafka message processing capabilities, particularly focusing on JSON and plain message formats. A new integration test file, Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Outside diff range and nitpick comments (14)
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalSchemaPayloadDeserializer.scala (1)
85-89
: Document the plain content type handling behaviorThe special handling for plain content type should be documented to explain when raw bytes are returned versus when deserialization occurs. This aligns with the PR objective of supporting topics without schemas.
+ /** + * Deserializes the payload based on the schema type: + * - For plain content type (no schema): returns raw bytes without transformation + * - For other JSON schemas: uses the schema's deserializer + */ override def deserialize(engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala (3)
73-74
: Remove unused variableThe
longJsonInHex
variable is defined but never used in the test.- val longJsonInHex = - "227b226669727374223a2022546f6d656b222c20226d6964646c65223a20224a616e222c20226c617374223a20224b6f77616c736b69227d22"
76-82
: Remove debug codeRemove the debug-related code including the unused
BigInteger
conversion and print statements.- val big = new BigInteger(shortJsonInHex, 16).toByteArray - - val str = new String(byteString) - println(str) - println(byteString.mkString("Array(", ", ", ")")) - println(big.mkString("Array(", ", ", ")"))
66-114
: Consider adding validation for message contentWhile the test verifies the byte-level equality, it would be valuable to also validate that the message content is a valid JSON string as expected.
run(process) { val outputRecord = kafkaClient .createConsumer() .consumeWithConsumerRecord(outputTopic) .take(1) .head outputRecord.value() shouldBe byteString + // Verify the content is valid JSON + val outputJson = parser.parse(new String(outputRecord.value())) + outputJson.isRight shouldBe true }utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala (3)
75-101
: Reduce code duplication in schema handlingThe JSON and plain cases have similar structure. Consider extracting the common logic into a helper method.
private def createRuntimeSchemaData(contentType: ContentTypes, typingResult: TypingResult): (Option[RuntimeSchemaData[ParsedSchema]], TypingResult) = { val schema = contentType match { case ContentTypes.JSON => ContentTypesSchemas.schemaForJson case ContentTypes.PLAIN => ContentTypesSchemas.schemaForPlain } ( Some( RuntimeSchemaData[ParsedSchema]( new NkSerializableParsedSchema[ParsedSchema](schema), Some(SchemaId.fromString(contentType.toString)) ) ), typingResult ) }
97-98
: Address TODO comment about Array[Byte] handlingThe comment indicates incomplete implementation for plain content type handling. Please clarify the implementation timeline or requirements.
Would you like me to help implement the Array[Byte] handling or create a GitHub issue to track this task?
80-80
: Avoid using content type strings as schema IDsUsing content type strings as schema IDs could lead to conflicts if the schema registry is used in the future. Consider using dedicated schema identifiers.
Consider:
- Using a dedicated schema ID generation mechanism
- Adding a prefix to distinguish these special cases (e.g., "internal-json-schema")
- Documenting the schema ID allocation strategy
Also applies to: 93-93
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalKafkaDeserializer.scala (2)
56-56
: Improve Error Message with Actual Content TypeWhen throwing the
IllegalStateException
, it's helpful to include the actual content type that was received. This provides clearer context for debugging.Apply this diff to enhance the error message:
- throw new IllegalStateException("Topic without schema should have ContentType Json or Plain, was neither") + throw new IllegalStateException(s"Topic without schema should have ContentType JSON or PLAIN, but was [$writerSchemaId]")
Line range hint
61-68
: Add Tests for Mismatched Schema TypesThe current code throws a
MismatchReaderWriterSchemaException
when the reader and writer schema types do not match. However, there is a TODO comment indicating that this case needs testing, especially when supporting JSON schema.Consider adding unit tests to cover scenarios where the reader and writer schema types differ to ensure the deserializer handles these cases correctly. Would you like assistance in creating these test cases?
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala (1)
120-139
: Consider adding unit tests for schemaless topic handlingIt's important to ensure that the new logic for handling schemaless topics is thoroughly tested. Adding unit tests covering scenarios for JSON and plain content types can help prevent future regressions.
Would you like assistance in creating unit tests for these cases?
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/sink/UniversalKafkaSinkFactory.scala (4)
171-175
: Use named parameters incopy
method to enhance clarityIn line 174, when calling the
copy
method onjsonSchema
, it's advisable to use named parameters to explicitly indicate which field is being modified. This improves code readability and reduces potential errors if the class structure changes in the future.Apply this diff to use named parameters:
- jsonSchema.copy(new NkSerializableParsedSchema[ParsedSchema](ContentTypesSchemas.schemaForPlain)) + jsonSchema.copy( + schema = new NkSerializableParsedSchema[ParsedSchema](ContentTypesSchemas.schemaForPlain) + )
174-174
: Consider defining a separateplainSchema
variable for better clarityReusing
jsonSchema.copy
to represent a plain schema can be confusing. Defining a separate variableplainSchema
enhances code clarity and maintainability.Define
plainSchema
:private val plainSchema = RuntimeSchemaData[ParsedSchema]( new NkSerializableParsedSchema[ParsedSchema](ContentTypesSchemas.schemaForPlain), None )Update the code to use
plainSchema
:val runtimeSchemaData = if (contentType.equals(ContentTypes.JSON.toString)) { jsonSchema } else { - jsonSchema.copy( - schema = new NkSerializableParsedSchema[ParsedSchema](ContentTypesSchemas.schemaForPlain) - ) + plainSchema }
171-172
: Use case-insensitive comparison for content typeWhen comparing strings like
contentType
, it's safer to perform a case-insensitive comparison to handle input variations gracefully.Apply this diff:
- val runtimeSchemaData = if (contentType.equals(ContentTypes.JSON.toString)) { + val runtimeSchemaData = if (contentType.equalsIgnoreCase(ContentTypes.JSON.toString)) {
254-255
: Use case-insensitive comparison forcontentType
invalueEditorParamStep
Ensure that the comparison for
contentType
is case-insensitive to handle different input cases.Apply this diff:
- (`contentTypeParamName`, DefinedEagerParameter(contentType: String, _)) :: + (`contentTypeParamName`, DefinedEagerParameter(contentTypeRaw: String, _)) :: ... - if (contentType.equals(ContentTypes.JSON.toString)) { + val contentType = contentTypeRaw.toUpperCase + if (contentType == ContentTypes.JSON.toString) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (11)
engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala
(1 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
(6 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/ContentTypes.scala
(1 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/SchemaRegistryClient.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/ParsedSchemaSupport.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/RecordFormatterSupport.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalKafkaDeserializer.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalSchemaPayloadDeserializer.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/sink/UniversalKafkaSinkFactory.scala
(6 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala
(2 hunks)
🔇 Additional comments (17)
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/ContentTypes.scala (3)
1-4
: LGTM: Package structure and imports are well-organized
The package hierarchy and import statement are appropriate for schema registry functionality.
11-14
: Add documentation and consider schema validation implications
The current implementation uses extremely permissive schemas (empty object/string) which effectively bypass schema validation. While this aligns with the PR's goal of handling schema-less topics, it has several implications:
- There's no documentation explaining when these schemas should be used
- The empty schemas might mask potential data quality issues
- It's unclear how this relates to the metadata handling mentioned in the PR objectives
Let's check if these schemas are used appropriately:
#!/bin/bash
# Search for usages of these schemas to understand the context
rg -l "ContentTypesSchemas" | xargs rg "schemaFor(Json|Plain)"
Consider:
- Adding comprehensive documentation explaining the use cases and implications
- Implementing basic structural validation even for schema-less topics
- Adding logging when these permissive schemas are used
1-14
: Verify handling of hermes metadata
The PR objectives mention issues with hermes metadata handling, but this implementation doesn't explicitly address how metadata fields will be handled with these content types.
Let's check how metadata is handled:
✅ Verification successful
Let me gather more specific information about metadata handling.
Let me check one more specific aspect of the implementation to understand how the content types interact with message handling.
Based on the gathered information, I can now provide the final response:
Implementation correctly handles metadata through content type distinction
The implementation properly addresses metadata handling through the content type system:
- Plain and JSON content types are handled separately in UniversalKafkaDeserializer with appropriate schema assignments
- The deserializer implementation maintains metadata through SchemaWithMetadata wrapper
- The system correctly handles both schema-based and schema-less messages with proper content type detection
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for hermes metadata handling in the codebase
rg -A 5 "hermes.*metadata"
# Look for any message transformation logic
ast-grep --pattern 'def transform$_($$$) {
$$$
}'
Length of output: 2630
Script:
#!/bin/bash
# Look for metadata handling in UniversalKafkaDeserializer
rg -A 10 "SchemaWithMetadata" utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalKafkaDeserializer.scala
# Check for any specific hermes metadata handling
rg -A 5 "hermes" utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/
# Look for metadata-related schema definitions
ast-grep --pattern 'case class $_Metadata($$$)'
Length of output: 3538
Script:
#!/bin/bash
# Check UniversalSchemaPayloadDeserializer for message handling
rg -A 10 "deserialize" utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalSchemaPayloadDeserializer.scala
# Look for any schema-less topic handling
rg -A 5 "schemaless" utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/
Length of output: 2433
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/SchemaRegistryClient.scala (1)
43-46
: 🛠️ Refactor suggestion
Consider improving error handling and performance
The implementation has several areas that could be enhanced:
- Error Handling: The method silently handles potential errors from
strategy.getTopics
. Consider propagating errors to allow proper handling by callers. - Performance: Fetching all topics for each check could be inefficient. Consider caching results if this method is called frequently.
- Input Validation: Add parameter validation for null safety.
Consider this improved implementation:
- def isTopicWithSchema(topic: String, strategy: TopicSelectionStrategy): Boolean = {
- val topicsWithSchema = strategy.getTopics(this)
- topicsWithSchema.exists(_.map(_.name).contains(topic))
- }
+ def isTopicWithSchema(topic: String, strategy: TopicSelectionStrategy): Validated[SchemaRegistryError, Boolean] = {
+ require(topic != null, "Topic cannot be null")
+ require(strategy != null, "Strategy cannot be null")
+ strategy.getTopics(this).map(topics => topics.exists(_.name == topic))
+ }
Let's verify the usage of this method to understand the impact:
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalSchemaPayloadDeserializer.scala (2)
10-10
: LGTM: Import statement is correctly placed
The new import for ContentTypesSchemas
is appropriately placed with other schema-related imports and is necessary for the plain content type handling.
85-89
: Verify plain content type handling across the codebase
The introduction of special handling for plain content type may affect other components that consume the deserialized data. We should verify that consumers can handle raw byte arrays.
✅ Verification successful
The previous search didn't yield results because we were looking for the wrong class name. Let's broaden our search to understand the content type handling and byte array processing in the codebase.
Based on the search results, I can now make a determination about the plain content type handling. The codebase shows that:
- The plain content type handling is properly integrated in the UniversalSchemaPayloadDeserializer with
ContentTypesSchemas.schemaForPlain
- The byte array handling is widely supported throughout the codebase:
- Kafka sources and sinks work with
Array[Byte]
as their base type - The deserialization/serialization infrastructure has proper byte array support
- The Lite engine and other components handle raw byte arrays correctly
- Kafka sources and sinks work with
Raw byte array handling is properly supported across the codebase
The special handling for plain content type is consistent with the overall architecture where raw byte arrays are a fundamental data type. Consumers are already equipped to handle raw byte arrays as this is the base type for Kafka integration.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for places where JsonSchemaPayloadDeserializer.deserialize is used
rg -l "JsonSchemaPayloadDeserializer.*deserialize" | while read -r file; do
echo "=== $file ==="
rg -A 5 "JsonSchemaPayloadDeserializer.*deserialize" "$file"
done
Length of output: 67
Script:
#!/bin/bash
# Let's search for any usage of UniversalSchemaPayloadDeserializer and content type handling
rg -A 5 "ContentTypesSchemas.schemaForPlain|UniversalSchemaPayloadDeserializer"
# Also search for byte array handling in the codebase
rg -A 5 "Array\[Byte\]|deserialize.*bytes"
Length of output: 235407
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/RecordFormatterSupport.scala (2)
8-8
: LGTM: Import statement is correctly updated
The addition of ContentTypesSchemas
to the import statement is necessary for the schema-less topic handling functionality.
53-59
: Verify string content validation requirements
The implementation allows any string content when no schema is provided. Based on the PR objectives, this aligns with the requirement to treat input as Unknown
. However, please verify if any basic validation or sanitization is needed for security or data quality purposes.
Run the following script to check for any validation patterns in the codebase:
✅ Verification successful
Let me analyze the findings and generate additional verification scripts to better understand the validation context.
Let me generate one final verification to check the schema validation context:
Based on the analysis of the codebase, I can now generate the final response:
String content validation is appropriately handled through schema validation
The implementation is correct as-is. When no schema is provided, treating input as raw bytes is the intended behavior. For cases with schemas:
- JSON content is validated against
ContentTypesSchemas.schemaForJson
- Plain text content is validated against
ContentTypesSchemas.schemaForPlain
- Schema validation is enforced through
UniversalSchemaValidator
and type-specific validators (AvroSchemaValidator, JsonSchemaValidator)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for string validation patterns in Kafka-related files
rg -t scala "validate.*String|sanitize.*String"
# Search for similar message handling patterns
ast-grep --pattern 'case j if j.isString => $$$'
Length of output: 12377
Script:
#!/bin/bash
# Check for schema validation patterns in the relevant files
rg -A 5 "schemaOpt match" utils/schemed-kafka-components-utils/
# Look for any string content validation in the schema registry related files
rg -t scala "validate.*content|validate.*message" utils/schemed-kafka-components-utils/
# Check the parent class/trait for any validation requirements
ast-grep --pattern 'trait RecordFormatterSupport {
$$$
}'
Length of output: 2530
Script:
#!/bin/bash
# Check how schema validation is handled in the broader context
rg -t scala -B 5 -A 10 "ParsedSchema" utils/schemed-kafka-components-utils/
# Look for any content validation in the parent directories
rg -t scala "validate.*ParsedSchema|validate.*Schema" utils/
Length of output: 294921
engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala (1)
20-64
: LGTM! Clean and focused test implementation
The JSON round-trip test is well-structured and follows best practices:
- Clear test data setup
- Proper topic naming
- Explicit content type specification
- Robust error handling in JSON parsing
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/ParsedSchemaSupport.scala (2)
34-34
: LGTM!
The import is necessary for the Try-based error handling in the formValueEncoder implementation.
161-176
:
Several issues need to be addressed in the formValueEncoder implementation.
- The code uses unsafe tuple accessors (_1, _2) which should be replaced with pattern matching as suggested in previous reviews.
- The type casting with asInstanceOf is unsafe and should include better error handling.
- Using "Failure" as a value is incorrect - it should be a Failure instance.
- The special case handling for "Value" key needs better documentation.
Here's a suggested refactoring that addresses these issues:
(value: Any) => {
- // In ad-hoc test without schema we create object `{ "Value" = userInputInAdHoc }`, so if present we should just take the input
- Try {
- val temp = value.asInstanceOf[Map[String, Map[String, Any]]].head
- val key = temp._1
- // Any try to create a variable with value temp._2 fails
- if (key.equals("Value")) {
- temp._2
- } else Failure
- } match {
- // For normal usage
- case Failure(_) => encoder.encodeOrError(value, rawSchema)
- // If source with topic without schema
- case Success(objectInside) => encoder.encodeOrError(objectInside, rawSchema)
- }
+ /**
+ * Special handling for ad-hoc testing without schema:
+ * When testing topics without schema, the input is wrapped in {"Value": actualMessage}.
+ * This unwraps such messages to ensure consistent behavior with direct topic publishing.
+ */
+ value match {
+ case map: Map[String, Map[String, Any]] =>
+ map.headOption.flatMap {
+ case ("Value", innerMap) => Some(innerMap)
+ case _ => None
+ }.map(encoder.encodeOrError(_, rawSchema))
+ .getOrElse(encoder.encodeOrError(value, rawSchema))
+ case _ => encoder.encodeOrError(value, rawSchema)
+ }
}
The refactored version:
- Uses pattern matching instead of unsafe casting
- Provides clear documentation for the special case
- Handles the "Value" key case more safely
- Removes the incorrect usage of Failure as a value
- Simplifies the control flow
Let's verify if this pattern is used consistently across the codebase:
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala (1)
9-9
: LGTM: Import for serializable schema wrapper
The added import for NkSerializableParsedSchema
is appropriate for making schemas serializable, which is necessary for Flink's distributed execution.
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala (2)
115-119
: Logic for handling topics with schemas looks good
The implementation correctly processes topics that have associated schemas.
120-139
: Proper handling of schemaless topics based on content type
The code appropriately handles schemaless topics by checking the content type and processing JSON and plain text messages accordingly.
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala (3)
39-39
: Addition of contentTypeParamName
aligns with existing conventions
The introduction of contentTypeParamName
as a constant maintains consistency with the parameter naming strategy used throughout the transformer, facilitating clear and maintainable code.
117-146
: Effective handling of schema-less topics in getVersionOrContentTypeParam
The getVersionOrContentTypeParam
method correctly differentiates between topics with and without schemas. By providing content type options when a schema is absent, it enhances the component's flexibility and user experience.
229-235
: Seamless integration of content type parameter in schemaParamStep
The updates in schemaParamStep
method efficiently incorporate getVersionOrContentTypeParam
, ensuring that the appropriate parameters are generated based on the topic's schema presence without disrupting the existing flow.
object ContentTypes extends Enumeration { | ||
type ContentType = Value | ||
|
||
val JSON, PLAIN = Value | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider using sealed trait pattern instead of Enumeration
Scala's Enumeration
has several limitations that could impact maintainability and type safety. Consider using a sealed trait/case object pattern instead, which provides better type safety, pattern matching, and IDE support.
Additionally, consider adding scaladoc comments to document the purpose of each content type, especially since they're part of a public API.
Here's a suggested implementation:
-object ContentTypes extends Enumeration {
- type ContentType = Value
-
- val JSON, PLAIN = Value
-}
+sealed trait ContentType
+
+object ContentType {
+ /** Represents JSON formatted content */
+ case object JSON extends ContentType
+
+ /** Represents plain text content */
+ case object PLAIN extends ContentType
+}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
object ContentTypes extends Enumeration { | |
type ContentType = Value | |
val JSON, PLAIN = Value | |
} | |
sealed trait ContentType | |
object ContentType { | |
/** Represents JSON formatted content */ | |
case object JSON extends ContentType | |
/** Represents plain text content */ | |
case object PLAIN extends ContentType | |
} |
expectedSchemaData | ||
.getOrElse(writerSchemaData) | ||
.asInstanceOf[RuntimeSchemaData[OpenAPIJsonSchema]] | ||
.schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for schema type casting
The type casting to RuntimeSchemaData[OpenAPIJsonSchema]
could throw a ClassCastException
if the schema is of a different type. Consider adding proper error handling to provide a more informative error message.
- expectedSchemaData
- .getOrElse(writerSchemaData)
- .asInstanceOf[RuntimeSchemaData[OpenAPIJsonSchema]]
- .schema
+ try {
+ expectedSchemaData
+ .getOrElse(writerSchemaData)
+ .asInstanceOf[RuntimeSchemaData[OpenAPIJsonSchema]]
+ .schema
+ } catch {
+ case e: ClassCastException =>
+ throw new IllegalArgumentException(s"Expected OpenAPIJsonSchema but got ${writerSchemaData.schema.getClass.getSimpleName}", e)
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
expectedSchemaData | |
.getOrElse(writerSchemaData) | |
.asInstanceOf[RuntimeSchemaData[OpenAPIJsonSchema]] | |
.schema | |
try { | |
expectedSchemaData | |
.getOrElse(writerSchemaData) | |
.asInstanceOf[RuntimeSchemaData[OpenAPIJsonSchema]] | |
.schema | |
} catch { | |
case e: ClassCastException => | |
throw new IllegalArgumentException(s"Expected OpenAPIJsonSchema but got ${writerSchemaData.schema.getClass.getSimpleName}", e) | |
} |
case j if j.isString => | ||
schemaOpt match { | ||
case None => j.asString.get.getBytes() | ||
case Some(ContentTypesSchemas.schemaForJson) => j.asString.get.getBytes(StandardCharsets.UTF_8) | ||
case _ => j.asString.get.getBytes(StandardCharsets.UTF_8) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Simplify identical cases and consider charset handling
The current implementation has two concerns:
- Lines 56 and 57 perform identical operations, which can be simplified.
- When no schema is provided (line 55), using
getBytes()
without charset specification can lead to platform-dependent encoding issues.
Consider applying this refactoring:
case j if j.isString =>
schemaOpt match {
- case None => j.asString.get.getBytes()
- case Some(ContentTypesSchemas.schemaForJson) => j.asString.get.getBytes(StandardCharsets.UTF_8)
- case _ => j.asString.get.getBytes(StandardCharsets.UTF_8)
+ case None => j.asString.get.getBytes(StandardCharsets.UTF_8)
+ case Some(ContentTypesSchemas.schemaForJson) | _ => j.asString.get.getBytes(StandardCharsets.UTF_8)
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
case j if j.isString => | |
schemaOpt match { | |
case None => j.asString.get.getBytes() | |
case Some(ContentTypesSchemas.schemaForJson) => j.asString.get.getBytes(StandardCharsets.UTF_8) | |
case _ => j.asString.get.getBytes(StandardCharsets.UTF_8) | |
} | |
case j if j.isString => | |
schemaOpt match { | |
case None => j.asString.get.getBytes(StandardCharsets.UTF_8) | |
case Some(ContentTypesSchemas.schemaForJson) | _ => j.asString.get.getBytes(StandardCharsets.UTF_8) | |
} |
package pl.touk.nussknacker.defaultmodel | ||
|
||
import io.circe.{Json, parser} | ||
import org.apache.kafka.shaded.com.google.protobuf.ByteString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid using shaded dependencies directly
Using shaded dependencies directly (org.apache.kafka.shaded
) can lead to versioning issues. Consider using the non-shaded protobuf library instead.
-import org.apache.kafka.shaded.com.google.protobuf.ByteString
+import com.google.protobuf.ByteString
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import org.apache.kafka.shaded.com.google.protobuf.ByteString | |
import com.google.protobuf.ByteString |
_ | ||
) => | ||
val preparedTopic = prepareTopic(topic) | ||
val valueValidationResult = if (contentType.equals(ContentTypes.JSON.toString)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Replace string comparison with enum/constant
Using string literals for content type comparison is error-prone. Consider using the ContentTypes
enum values directly.
-if (contentType.equals(ContentTypes.JSON.toString)) {
+if (ContentTypes.JSON.toString.equals(contentType)) {
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
val valueValidationResult = if (contentType.equals(ContentTypes.JSON.toString)) { | |
val valueValidationResult = if (ContentTypes.JSON.toString.equals(contentType)) { |
case step @ TransformationStep( | ||
(`topicParamName`, DefinedEagerParameter(topic: String, _)) :: | ||
(`contentTypeParamName`, DefinedEagerParameter(contentType: String, _)) :: _, | ||
_ | ||
) => | ||
val preparedTopic = prepareTopic(topic) | ||
val valueValidationResult = if (contentType.equals(ContentTypes.JSON.toString)) { | ||
Valid( | ||
( | ||
Some( | ||
RuntimeSchemaData[ParsedSchema]( | ||
new NkSerializableParsedSchema[ParsedSchema](ContentTypesSchemas.schemaForJson), | ||
Some(SchemaId.fromString(ContentTypes.JSON.toString)) | ||
) | ||
), | ||
// This is the type after it leaves source | ||
Unknown | ||
) | ||
) | ||
} else { | ||
Valid( | ||
( | ||
Some( | ||
RuntimeSchemaData[ParsedSchema]( | ||
new NkSerializableParsedSchema[ParsedSchema](ContentTypesSchemas.schemaForPlain), | ||
Some(SchemaId.fromString(ContentTypes.PLAIN.toString)) | ||
) | ||
), | ||
// This is the type after it leaves source | ||
// TODO: Should be Array[Byte] when handling is implemented | ||
Typed[Array[java.lang.Byte]] | ||
) | ||
) | ||
} | ||
prepareSourceFinalResults(preparedTopic, valueValidationResult, context, dependencies, step.parameters, Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Additional error handling and test coverage needed
The verification reveals several gaps that need to be addressed:
- No explicit error handling for invalid content types is implemented. The code only handles JSON and PLAIN types without validation for unsupported values.
- While integration tests exist for both JSON and PLAIN content types (
KafkaJsonItSpec
), they only cover the "happy path" with round-trip scenarios. - Schema evolution handling is present in the codebase but lacks dedicated test coverage.
Required changes:
- Add validation and error handling for unsupported content types in
UniversalKafkaSourceFactory
- Extend
KafkaJsonItSpec
with negative test cases for invalid content types - Add test coverage for schema evolution scenarios
🔗 Analysis chain
Verify impact on error handling and testing
The new content type handling introduces a significant change in message processing. Please ensure:
- Error cases are properly handled (e.g., invalid content types)
- Integration tests cover both content types
- Schema evolution scenarios are tested
Let me check the content of these files to better understand the test coverage and error handling.
Let me check one more aspect to verify error handling for invalid content types.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for existing error handling tests
rg -l "contentType.*test" --type scala
# Check for integration tests covering both content types
rg -l "ContentTypes.(JSON|PLAIN)" --type scala
# Look for schema evolution tests
rg -l "schema.*evolution.*test" --type scala
Length of output: 1061
Script:
#!/bin/bash
# Check content of the Kafka JSON integration test
rg "ContentTypes" -A 5 -B 5 engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala
# Check error handling in UniversalKafkaSourceFactory
rg "contentType" -A 5 -B 5 utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala
# Check schema evolution handling
rg "schema.*evolution" -A 5 -B 5 utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala
Length of output: 4773
Script:
#!/bin/bash
# Check for content type validation or error handling
ast-grep --pattern 'case $_ if !contentType.equals(ContentTypes.JSON.toString) && !contentType.equals(ContentTypes.PLAIN.toString) => $_'
# Check for additional test cases in KafkaJsonItSpec
rg "test|should" engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala
Length of output: 468
val schemaWithMetadata = { | ||
if (schemaRegistryClient.isTopicWithSchema(topic, new AllTopicsSelectionStrategy)) { | ||
schemaRegistryClient.getSchemaById(writerSchemaId.value) | ||
} else { | ||
writerSchemaId.value match { | ||
case StringSchemaId(value) => | ||
if (value.equals(ContentTypes.PLAIN.toString)) { | ||
SchemaWithMetadata(ContentTypesSchemas.schemaForPlain, SchemaId.fromString(ContentTypes.PLAIN.toString)) | ||
} else { | ||
SchemaWithMetadata(ContentTypesSchemas.schemaForJson, SchemaId.fromString(ContentTypes.JSON.toString)) | ||
} | ||
case _ => | ||
throw new IllegalStateException("Topic without schema should have ContentType Json or Plain, was neither") | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Pattern Matching on writerSchemaId
The pattern matching on writerSchemaId.value
is incorrect because writerSchemaId.value
is likely a String
, whereas StringSchemaId
is a case class expecting to match against writerSchemaId
itself. This will result in a match failure.
Apply this diff to correct the pattern matching:
- writerSchemaId.value match {
+ writerSchemaId match {
case StringSchemaId(value) =>
if (value.equals(ContentTypes.PLAIN.toString)) {
SchemaWithMetadata(ContentTypesSchemas.schemaForPlain, SchemaId.fromString(ContentTypes.PLAIN.toString))
} else {
SchemaWithMetadata(ContentTypesSchemas.schemaForJson, SchemaId.fromString(ContentTypes.JSON.toString))
}
case _ =>
throw new IllegalStateException("Topic without schema should have ContentType Json or Plain, was neither")
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
val schemaWithMetadata = { | |
if (schemaRegistryClient.isTopicWithSchema(topic, new AllTopicsSelectionStrategy)) { | |
schemaRegistryClient.getSchemaById(writerSchemaId.value) | |
} else { | |
writerSchemaId.value match { | |
case StringSchemaId(value) => | |
if (value.equals(ContentTypes.PLAIN.toString)) { | |
SchemaWithMetadata(ContentTypesSchemas.schemaForPlain, SchemaId.fromString(ContentTypes.PLAIN.toString)) | |
} else { | |
SchemaWithMetadata(ContentTypesSchemas.schemaForJson, SchemaId.fromString(ContentTypes.JSON.toString)) | |
} | |
case _ => | |
throw new IllegalStateException("Topic without schema should have ContentType Json or Plain, was neither") | |
} | |
} | |
} | |
val schemaWithMetadata = { | |
if (schemaRegistryClient.isTopicWithSchema(topic, new AllTopicsSelectionStrategy)) { | |
schemaRegistryClient.getSchemaById(writerSchemaId.value) | |
} else { | |
writerSchemaId match { | |
case StringSchemaId(value) => | |
if (value.equals(ContentTypes.PLAIN.toString)) { | |
SchemaWithMetadata(ContentTypesSchemas.schemaForPlain, SchemaId.fromString(ContentTypes.PLAIN.toString)) | |
} else { | |
SchemaWithMetadata(ContentTypesSchemas.schemaForJson, SchemaId.fromString(ContentTypes.JSON.toString)) | |
} | |
case _ => | |
throw new IllegalStateException("Topic without schema should have ContentType Json or Plain, was neither") | |
} | |
} | |
} |
case TransformationStep( | ||
(`topicParamName`, DefinedEagerParameter(_: String, _)) :: | ||
(`contentTypeParamName`, DefinedEagerParameter(contentType: String, _)) :: | ||
(`sinkKeyParamName`, _) :: | ||
(`sinkRawEditorParamName`, DefinedEagerParameter(false, _)) :: Nil, | ||
_ | ||
) => | ||
val schemaData = jsonSchema | ||
|
||
schemaSupportDispatcher | ||
.forSchemaType(schemaData.schema.schemaType()) | ||
.extractParameter( | ||
schemaData.schema, | ||
rawMode = false, | ||
validationMode = ValidationMode.lax, | ||
rawValueParamDeclaration.createParameter(), | ||
restrictedParamNames | ||
) | ||
.map[TransformationStepResult] { valueParam => | ||
val state = TransformationState(schemaData, valueParam) | ||
// shouldn't happen except for empty schema, but it can lead to infinite loop... | ||
if (valueParam.toParameters.isEmpty) { | ||
FinalResults(context, Nil, Some(state)) | ||
} else { | ||
NextParameters(valueParam.toParameters, state = Some(state)) | ||
} | ||
} | ||
.valueOr { errors => | ||
FinalResults(context, errors.toList) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure PLAIN
content type is handled in valueEditorParamStep
In the valueEditorParamStep
method, the contentType
parameter is used, but only the JSON
content type seems to be handled. This might lead to issues if PLAIN
content type is selected, as schemaData
would incorrectly use the JSON schema.
Update the code to handle PLAIN
content type similarly to JSON
:
val schemaData = if (contentType.equalsIgnoreCase(ContentTypes.JSON.toString)) {
jsonSchema
+ } else if (contentType.equalsIgnoreCase(ContentTypes.PLAIN.toString)) {
+ plainSchema
} else {
jsonSchema
}
Committable suggestion skipped: line range outside the PR's diff.
val schemaData = jsonSchema | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid hardcoding jsonSchema
for all content types
Currently, schemaData
is assigned jsonSchema
regardless of the contentType
. This may cause incorrect schema usage for other content types.
Assign schemaData
based on contentType
:
- val schemaData = jsonSchema
+ val schemaData = if (contentType.equalsIgnoreCase(ContentTypes.JSON.toString)) {
+ jsonSchema
+ } else if (contentType.equalsIgnoreCase(ContentTypes.PLAIN.toString)) {
+ plainSchema
+ } else {
+ // Handle other content types or throw an error
+ }
Committable suggestion skipped: line range outside the PR's diff.
created: #7163 |
Describe your changes
We want to enable users to choose and use topics without schema present on schema registry.
We won't validate input and it will be passed as
Unknown
Problems
Currently using ad-hoc creates json object {"Value":
message
} and so it works differently compared to whenmessage
is put on topic.Similarly, I'm not sure if we should handle metadata from
hermes
here or is it managed on cloud. If locally we create topic and send messages throughhermes
we get:when we only need message -> this can be worked around for now by using dynamic access
Checklist before merge
Summary by CodeRabbit
New Features
Bug Fixes
Documentation