Skip to content

Active Issue: Relating Source

terrymacdonald edited this page Feb 3, 2016 · 14 revisions

Consensus

Information identifying and characterizing sources of CTI information should be broken out into a separate "top level" Source construct rather than embedded within each "top level" construct.

Open Questions

How should the relationship between a "top level" construct instance and its Source be asserted?

Questions to consider

  1. Should Source follow the "one way to do things" with relationships or should it be an exception to the rule?
  2. Is Source a key CTI object or only metadata?
  3. Should there be a distinction between the producer of the STIX and the source of the content?
  4. If so, how should that distinction be conveyed?
  5. How do we deal with anonymous sources?
  6. Separate Source object each time an anonymous source is asserted or one general anonymous Source object that is related to for each anonymous source assertion?
  7. How do we deal with deanonymizing an anonymous source?
  8. How do we deal with third party source assertions?
  9. How do we deal with complex source chains (e.g., Z sends me STIX that is a translation of STIX produced by Y that was a STIX codification of information created by X)?
  10. How do we deal with uncertainty/confidence on source assertions?
  11. How important is bandwidth efficiency?
  12. What are the best approaches for dealing with the issue?

Proposal #1

Follow the "one way of doing things" for relationships and assert source relationships for all "top level" construct instances using the Relationship object with a relationship nature of "Has Source".

Strong assertion that Source is a key CTI object and not simply metadata.

Advantages:

  • Consistency (one way of doing relationships)
  • Treats Source as a key CTI object and allows its characterization and correlation like any other CTI object
  • Inherently graph-based to support analysis
  • Enables assertions for both producer of the STIX itself and the creator of the content itself
    • In large majority of cases they will be the same and this approach allows them to be asserted consistently
  • Enables support for anonymous sources and for deanonymizing sources
  • Supports third party source assertions
  • Inherently supports complex source chains in a consistent fashion
  • Allows assertion of confidence for any source assertion
  • When same exact content received from multiple sources, allows you to characterize (with confidence) them separately
  • Supports more flexible pivoting on Source

Disadvantages:

  • Could result in more verbose content (a few extra lines for the "Has Source" relationship of each construct).
    • Can be mitigated by a many-to-one relationship for "Has Source" which would offer the most efficient representation available.

Examples

Example #1: simple indicator with attributed source for the information

{
	"id": "example:src-83dc6b53-ac3d-40e0-82ef-eab173c7ee1e",
	"type": "source",
	"timestamp": "2015-12-21T19:59:11Z",
	"name": "US-CERT"
}

{
	"id": "example:ind-b8e37090-5d62-45a1-ac2e-a88601b08432",
	"type": "indicator",
	"timestamp": "2015-12-21T19:59:11Z",
	"title": "Sakurel Malware",
	"indicator_expression": "this would be an observable pattern for a particular file hash using the new CybOX patterning language under consideration",
	"indicator_type": ["File Hash Watchlist"]
}

{
	"id": "example:rel-9d0c539e-a874-42c7-a055-3e900b98724f",
	"type": "relationship",
	"timestamp": "2015-12-21T19:59:12Z",
	"from": "example:ind-b8e37090-5d62-45a1-ac2e-a88601b08432",
	"to": "example:src-83dc6b53-ac3d-40e0-82ef-eab173c7ee1e",
	"relationship_nature": "Has Source"
}

Proposal #2 (from TWIGS proposal)

Considers the "producer" or "creator" of a STIX object as distinct and tracked separately from the sources of information used within that analysis. We believe that there's value in understanding which STIX producer created an object in addition to understanding where it came from. This linked information is especially important as a key part of the object lookup process within the TWIGS proposal.

Approach

  • The TWIGS based approach proposed a new Object called Identity. This object identifies an Identity of an Organization or Individual. This object is used in many places within the TWIGS proposal, and is a key part of the simplification we will be proposing in later proposals.

  • To record who created the STIX Object, we propose having created_by_ref field (name is certainly debatable, it could be information_source_ref) that defines which STIX producer created the entity. Please note this is not refer to who produced the information used it the object, but only who produced and published the STIX object itself. The created_by_ref would directly point to the Identity Object of the Organization who created the STIX Object.

e.g.

    {
      "type": "indicator",
      "id": "indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
      "title": "Some indicator",
      "created_by_ref": "identity--8ae20dde-83d4-4218-88fd-41ef0dabf9d1",
      "otherindicatorthings":"..."
    }  

In the example above, if this Indicator object was created by MITRE, then the created_by_ref field would point to the Identity Object with object ID 'identity--8ae20dde-83d4-4218-88fd-41ef0dabf9d1', which would represent MITRE.

  • To record the underlying sources of information used within the STIX Object, we propose having an embedded list of references, that the object creator uses to describe the references they used while creating the STIX Object.

e.g.

    {
      "type": "indicator",
      "id": "indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
      "title": "Some indicator",
      "created_by_ref": "identity--8ae20dde-83d4-4218-88fd-41ef0dabf9d1",
      "references": [
         {
           "url": "http://fireeye.com/APT1",
           "analysis_by_ref": "identity--18d129bb-71f5-4d58-a8c0-19c1976c2f56",
         }
      ],
      "otherindicatorthings":"..."
    }  

In the example above, we can see that this Indicator Object has a url pointing to the location of the document, and an analysis_by_ref field that points directly to the Identity Object of 'identity--18d129bb-71f5-4d58-a8c0-19c1976c2f56', which would be the Identity ID of Fireeye.

The exact approach of how to store references would still need to be discussed and confirmed, though we do have a ROUGH diagram showing how the relationships would work when tracking who made the object, where the information used in the object came from: https://docs.google.com/drawings/d/1IfU0u_5y2ZbyEbmrLIo5nXgcBX-wSP3ssQjAB8-9iks/edit?usp=sharing

We are open to the idea that the reference producers could be done via a relationship if the community felt that was better.

Goals

Assure non-ambiguity

Using a created_by_ref direct reference is a non-ambiguous way of saying who is responsible for publishing and updating that STIX object. With the relationship approach, that immediately becomes ambiguous and complicated. While the real source of information may be ambiguous and complicated, the STIX object creator should not be. This way we can do things like mandate that only STIX Object creators can update content.

Object Creator can be anonymous if they want

The created_by_ref field is optional. This was done on purpose to allow Organizations or Individuals to be anonymous if they wish. This was an important use case requested by governmental organizations and some more secretive groups who wished to be able to provide information, but didnt want the general populace to know who they are.

Easier to record who created relationships

A direct reference also avoids chains of "source" relationships. For example, if I issue an indicator and then issue a relationship object saying that I created the indicator, how do I indicate that I also created the relationship object? Do I need to have another "source" relationship saying that I'm the source of the first source relationship? Or do we assume that "source" relationships have a source of whoever they point to, which is inconsistent?

Having it as a direct reference in the TLO ensures there's a single, concise, way to do that across all TLOs - even relationships

One Identity

Having one identity representing an Organization or Individual and reusing that throughout STIX allows analysts to quickly and easily find objects and information associated with that entity.

In addition, if the company that creates the STIX Objects moves, or is bought by another larger company, then all they need to do is simply reissue their single Indentity Object out to the community. At that point all past, present and future STIX Objects that they have created are still pointing to the latest information about the Organization.

Allows for lookup by Object ID

A single Identity object for each Organization or Individual also allows us to store lookup information into their Identity object, so that it can tell us how to contact them for more information about objects they've created. As an example, we can potentially put information about the Organizations TAXII server into their Identity object, meaning that consumers who only have an Object ID will know how to contat the Organization to ask for a copy of the object.

More compatible with future digital signatures

This proposal will help when we start to cryptographically check for tampering of STIX Objects in the future. Using a created_by_ref direct reference also ensures that the creator of a STIX Object is included in the HMAC for the Object. This means we will be able to tell if someone has tampered with that relationship. When the content is passed around as a block you can understand who created it and be assured that it's accurate when it is signed by the creator.

Avoids superfluous relationship objects

We feel that relationship object should be reserved to represent relationships between objects in the cyber threat domain. You can use relationships to represent everything, it doesn't mean you should. Using it to represent who created a given STIX construct is beyond that purpose.

It also simply avoids either a high volume of extra relationships (an additional one for each TLO) or having a relationship with multiple target nodes. While a relationship with multiple targets is easy to represent in a serialization, handling that in code can become very tricky and should be avoided.

Helps prevent false ownership claims

This approach also makes it harder for another party to claim ownership of an existing construct. For example, if I issue an indicator I would say that I created it via issuing a relationship. What if you issue another relationship saying that you actually created that indicator? How should a consumer evaluate that? Having the source directly embedded in the object mitigates this by requiring an object update to change the source in the object itself, which can more easily be detected and evaluated.

Advantages:

  • Simpler
  • Direct
  • Only uses relationships when there is a requirement to asert a confidence level.
  • Avoids the need for messy many-to-one relationships
  • Treats Identity as a key CTI object and allows its characterization and correlation like any other CTI object
  • Enables assertions for both producer of the STIX object itself and the creator of the content the object is based on
    • In large majority of cases they will be the same
  • Enables support for anonymous object creators and for object creators to identity themselves
  • Supports third party assertions of content source (via references). Doesn't support third party assertions of the STIX Object creator as the object creator knows that they have created the object that they are creating.
  • When same content is received from multiple sources, but each is contained within a STIX object from different producers, it allows you to characterize them separately and maintain the fact that they are different.
  • Supports more flexible pivoting on Identity.

Disadvantages:

  • Less flexbility with confidence for the references to the sources of information used within the STIX object with the current proposal.
    • Can be mitigated by replacing the references array with relationship based model if the community decides it requires the confidence.

Example

  "identities": [
    {
      "type": "identity",
      "id": "identity--8ae20dde-83d4-4218-88fd-41ef0dabf9d1",
      "name": "mitre.org"
    },
    {
      "type": "identity",
      "id": "identity--18d129bb-71f5-4d58-a8c0-19c1976c2f56",
      "name": "fireeye.com"
    }
  ],
  "indicators": [
    {
      "type": "indicator",
      "id": "indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
      "created_by_ref": "information-source--8ae20dde-83d4-4218-88fd-41ef0dabf9d1",
      "timestamp": "2015-12-21T19:59:11Z",
      "references": [
         {
           "url": "http://fireeye.com/APT1",
           "analysis_by_ref": "identity--18d129bb-71f5-4d58-a8c0-19c1976c2f56",
         }
      ],
      "title": "Sakurel Malware",
      "indicator_expression": "this would be an observable pattern for a particular file hash using the new CybOX patterning language under consideration",
      "indicator_type": ["File Hash Watchlist"]
    }  
  ]  
}
Clone this wiki locally