Using the Swagger Normalizer GenTemplate

Swagger Normalizer is a core component of the Swagger Multi-File Support in RepreZen API Studio, and as such it is used by each of the three “live” views - Documentation View, Diagram View, and Swagger UI View - that appear by default in the right-hand pane of the RepreZen API Studio GUI, as well as by all Swagger GenTemplates. You can also use it directly as its own GenTemplate, named “Swagger [Normalized YAML]” in the GenTarget Wizard.

Swagger Normalizer

The primary function of the normalizer is to render a multi-file Swagger spec as a functionally equivalent single-file spec. In this way it can simplify the use of other tools and libraries in the evolving Swagger ecosystem, where external references are not always handled consistently.

Additionally, the normalizer can perform other transformations of the Swagger spec, which may be helpful for some circumstances, especially when feeding the spec to downstream systems.

Basic Use

The Normalizer is used like any other Swagger GenTemplate:

  1. Create a GenTarget (a .gen file) in your model folder, linking your Swagger model file (.yaml file) to the GenTemplate. The internal id of this GenTemplate is:

    com.modelsolv.reprezen.gentemplates.swaggernorm.SwaggerNormalizerGenTemplate

    The name listed in the drop-down list in the GenTarget wizard is “Swagger [Normalized YAML].”

  2. Configure the GenTarget as desired (see below).

  3. Execute the GenTarget

  4. Find the generated YAML file in the generated folder that appears in the GenTarget folder.

Multifile Processing

The one thing that the normalizer will always do is resolve external references and leave you with a single-file Swagger Spec. The other things it may do depend on options, described in Normalizer Options.

References in Swagger Specs

Here’s an example of what a typical reference might look like in a Swagger spec:

responses:
  200:
    description: Default Response
    schema:
      $ref: "#/definitions/Pet"

This is part of the definition of an operation whose normal response will contain data about a pet. That information will be structured according to a schema named Pet defined elsewhere in this same Swagger spec, in the definitions section of the spec.

The reference itself appears as the value of the schema property in the response. That property could appear with an "in-line" schema definition, but in this case the designer has opted to define the schema elsewhere in the file and reference it here by name. The reference itself takes the form of an object with a string-valued property named $ref. [1]

If the definition of the Pet schema physically appeared in some other Swagger spec, the reference would need to include a URL to retrieve that spec, with a fragment identical to the reference string shown above:

responses:
  200:
    description: Default Response
    schema:
      $ref: "http://models.example.com/petstore-schemas.yaml#/definitions/Pet"
Swagger’s $ref syntax conforms to a separate standard known as "JSON Reference." That standard is available here.

Conforming and Non-Conforming References

References in a Swagger spec should all be of the variety specifically endorsed by the Swagger Specification, namely those with URI fragments that begin with #/paths/ or #/parameters/ or #/responses/ or #/definitions/. We’ll call those conforming references. All other references will be called non-conforming references. Swagger Normalizer does not treat these two varieties of reference identically.

Non-conforming references are not officially allowed in Swagger specs, but some tooling permits their use, and there are, confusingly, posted examples and tutorials from Swagger project contributors and others that feature them.
Note

The document identified by the pre-fragment portion of an external conforming reference must be a valid Swagger spec. At a minimum this means that it must include: (1) a string-valued swagger property whose value is 2.0; (2) an object-valued info property that includes (3) a string-valued title property and (4) a string-valued version property; and an object-valued paths property, which may be empty ({}). A minimal compliant Swagger spec [2] might look like this:

---
swagger: "2.0"     (1)
info:              (2)
  title: My title  (3)
  version: "1.0"   (4)
paths: {}          (5)

What Swagger Normalizer Does with References

When the normalizer encounters any reference, there are two ways it may process the reference:

Inline

The normalizer retrieves the referenced value (e.g. the Pet schema definition object) and replaces the reference itself with that value.

Localize

The normalizer first adds the referenced object to the normalized spec that it is creating, if it is not already present, and then replaces the reference with a local reference to that object. So in the external reference example shown above, the Pet schema definition would appear directly in the Swagger spec produced by the normalizer, and references that were formerly external references would become local references.

The normalizer always inlines non-conforming references. Any given conforming reference might be inlined or localized, depending on options in effect.

Name Collisions

Localization of a conforming reference may lead to a name collision. For example, imagine the following excerpts from two Swagger specs:

main.yaml
defintions:
  Address:
    description: An address given by a speaker
    type: object
    properties:
      speaker:
         $ref: "external.yaml#/definitions/Person"
      title:
        type: string
      ...
external.yaml
defintions:
  Person:
    name:
       type: string
    address:
      $ref: "#/definitions/Address"
  Address:
    description: A postal address
    type: object
    properties:
      street:
        type: string
      ...

The main spec is apparently describing APIs related to events where speakers deliver addresses. The speakers themselves are represented using an externally referenced Person schema which itself makes use of a locally referenced Address schema.

In a localizing scenario, the normalized spec created by the normalizer would look something like this:

main-normalized.yaml
definitions:
  Address:
    description: An address given by a speaker
    type: object
    properties:
      speaker:
         $ref: "#/definitions/Person"   (1)
      title:
        type: string
      ...
  Person:
    name:
       type: string
    address:
      $ref: "#/definitions/Address_1"   (2)
  Address_1:
    description: A postal address
    type: object
    properties:
      street:
        type: string
      ...

The two Address schemas originally in main.yaml and external.yaml are both needed in the normalized spec, but their names collide. Therefore, the schema definition originally in external.yaml is renamed to Address_1.

All references have been adjusted as required:

1 The former external reference to the Person schema is now a local reference.
2 The Person schema’s Address reference now reflects the renaming that occurred.

Renaming is done only where necessary due to a conflict, and the names appearing in the top-level spec are always preserved as-is; that is, if there is a colliding externally referenced object that needs to be localized, that object will be renamed, not the top-level object with which it collided. In the above example, the Address schema occurring in main.yaml will always retain its original name, forcing any colliding objects to be renamed.

Recursive References

It is possible to set up recursive schema definitions in Swagger specs, through the use of references. For example, consider the following schema:

definitions:
  Person:
    type: object
    properties:
      name:
        type: string
      children:
         $ref: "#/definitions/People"  (1)
  People:
    type: array
    items:
      $ref: "#/definitions/Person"     (2)
1 The Person schema has a children property of type People, and
2 the People schema defines an array of Person objects.

Naively attempting to inline a reference to a Person object would lead to a never-ending expansion like this:

original
matriarch:
  $ref: "#/definitions/Person"
inlined
matriarch:
  type: object                 # inline Person
  properties:
    name:
      type: string
    children:
      type: array              # inline People
      items:
        type: object           # inline Person
        properties:
          name:
            type: string
          children:
            type: array        # inline People
            items:
               type: object    # inline Person
               ...             # inlining never ends

We have cut off the inlining above with an ellipsis, but in reality it could never stop.

To handle recursive references encountered during inlining, the normalizer stops inlining whenever a reference is encountered that is fully contained within another (inlined) instance of the referenced object. That recursive reference is localized rather than being inlined.

In the above example, we would end up with something like this:

partially-inlined
    matriarch:
      type: object                            (1)
      properties:
	name:
	  type: string
	children:
	  type: array
	  items:
	    $ref: "#/definitions/Person"      (2)
...
definitions:
  Person:
    type: object
    properties:
      name:
        type: string
      children:
        type: array
        items:
          $ref: "#/definitions/Person"        (3)
  People:
    type: array
      items:
        type: object
        properties:
          name:
            type: string
          children:
            $ref: "#/definitions/People"      (4)

Here we see:

1 that the top-level reference to Person as the type of the matriarch property was inlined;
2 that the recursive reference to Person encountered while performing this inlining has been localized;
3 that the Person schema itself was subjected to inlining, with localization of its recursive reference;
4 and likewise for the People schema.

When an object is inlined without encountering a recursive reference (so that the object is not also localized), we say that it is fully inlined.

For non-conforming references, recursion is not currently permitted and will cause the normalizer to fail.

Object Retention

Some of the normalizer options pertain to object retention policy: rules that decide which objects from the multifile spec will appear in the normalized output.

The Completeness Rule

In all cases, the normalized spec must be complete, in the sense that all references appearing in the spec resolve to objects defined in the spec.[3] Thus, any object that is referenced in the normalized spec is also retained in the normalized spec.

Objects that are fully inlined are not covered by the completeness rule and may not be retained, depending on options in effect. An object that is partially inlined because of recursive references is covered by completeness, since recursive references are localized. It is therefore covered by completeness and must be retained.

All other retention policy is subordinate to completeness: every referenced object is retained, even if other retention policy would cause it to be dropped.

Root Objects

Completeness presupposes a starting point: some set of objects that are retained for other reasons. References appearing in those objects are processed for completeness, and then objects that are retained for completeness are themselves processed for completeness, and so on.

We call the objects that are retained for reasons other than completeness root objects. Root objects are determined according to retention policy and retention scope, as governed by options.

Retention Policy

Retention policy is determined according to RETAIN and DROP rules that select and reject individual objects. An object is retained if it matches at least one RETAIN rule and does not match any DROP rule.

Currently, there is only one RETAIN rule, which specifies which object types - paths, definitions, parameters, and responses - are to be retained. There are not currently any DROP rules implemented. We anticipate implementing additional RETAIN and DROP rules in the future to provide additional flexibility.

Object-type-based retention policy is specified with the RETAIN option.

Retention Scope

Retention policy is applied only to objects that appear in files that are considered in scope for retention. The top-level file is always in scope.

When processing a Swagger spec, other swagger specs may be loaded in order to satisfy references. By default, those other specs are not in scope. However, if the RETENTION_SCOPE option is set to ALL, specs that are loaded solely to resolve references will also be considered in scope, so that other objects in those files may be retained - even if they are not needed for completeness.

It is also possible to identify other files to be treated as top-level, by listing them in the ADDITIONAL_FILES option.[4] All such files will be loaded and will be in-scope for retention, regardless of whether any objects they contain are otherwise required for completeness. And of course, retained references from those files will be processed for completeness.

One important use-case for "additional files" involves allOf schema defintions. These are commonly used to express type hierarchies, and in such cases it is common for a supertype to be referenced from the top-level spec (e.g. a list of Animal objects). The subtypes themselves also reference the supertype in their allOf property (e.g. Dog and Cat both reference Animal). However, it is common for the subtypes themselves not to be directly referenced in the Swagger spec; they are not typically referenced by the supertype itself (Dog references Animal, but not vice-versa).

If the subtypes are defined in a separate file, that file will not be loaded for reference resolution, and so those subtypes will not be loaded—​let alone retained—​by the normalizer. Configuring the file as an "additional file" would cause the file to be loaded, and subtype definitions would then be eligible for retention.

Ordering of Properties in Normalized Model

The normalizer includes an option, ORDERING, that provides some control over the ordering of elements in the normalized Swagger spec. However, the normalizer makes use of the Swagger project’s SwaggerParser class for a number of its operations. This class, and the Swagger class that it produces to represent a Swagger spec, are incapable of maintaining the ordering of many model elements, due to internal design decisions. In fairness, any changes of ordering caused by this software is meaningless, from the point of view of the semantic content of the model. However, in some cases it is important to impose a particular ordering for purposes of presentation.

Because of the limitations of the software on which Swagger Normalizer depends, it records its ordering decisions in a vendor extension named x-reprezen-normalization that is attached to affected elements.

When the normalizer is used internally by RepreZen software, the normalized model is in the form of a Swagger object, with all the position information intact. When it is executed as a Gen Target, the YAML file that is produced will reflect the calculated positions, but the position values themselves will, by default, be removed.

If your intention is to feed the YAML output created by the normalizer to a downstream process that expects position indicators, you should set the RETAIN_POSITION_VALUES parameter to true in your Gen Target configuration file. The resulting YAML file will be unchanged except that position information will be present within x-reprezen-normalization vendor extension properties.

Normalizer Options

When the normalizer is used through its GenTemplate ("Swagger [Normalized Yaml]"), options are configured in the GenTarget file — the .gen file created by the GenTarget wizard. Each option can take on various values, as detailed below.

Options are as follows:

INLINE

Specify which objects are inlined by the normalizer. The value of this option can be:

  • A list of object types, drawn from DEFINITION, PARAMETER, RESPONSE.[5]

  • The value ALL, meaning that all objects are inlined.

  • The value COMPONENT, meaning that all objects except paths are inlined.[6][7]

  • The value NONE, meaning that no objects are inlined.

RETAIN

Specify which object types will be retained from in-scope files. The value of this option can be:

  • A list of object types, drawn from PATH, DEFINITION, PARAMETER, and RESPONSE.

  • The value ALL, meaning that all objects are retained.

  • The value COMPONENT, meaning that all objects except paths are retained.

  • The value PATH_OR_COMPONENT [8], meaning that:

    • If the top-level spec defines at least one path, then the PATH option will be in effect.

    • Otherwise, the COMPONENT option will be in effect.

RETENTION_SCOPE

Determines which Swagger specs are considered in-scope for retention rules. Value is either:

  • ROOTS, meaning that only the top-level file and any files specified in ADDITIONAL_FILES will be in scope; or

  • ALL, meaning that files loaded in order to resolve references will also be considered in scope.

ADDITIONAL_FILES

Specifies additional files that should be treated as top-level, and are therefore always loaded and always in-scope. The value is a list of file names, or more generally URLs. Each URL, if it is relative, is resolved based on the URL that specifies the top-level file.

HOIST

Enables some or all of the hoisting operations that can be performed by the normalizer. Hoisting refers to extrapolating certain items appearing in a swagger spec into the contexts in which they apply. The option value is a list of hoistable items, drawn from:

  • MEDIA_TYPE: Global consumes and produces declarations are extrapolated into all operations that do not contain their own declarations.

  • PARAMETER: Parameters defined at path-level are extrapolated into every operation appearing in the path that does not already define a parameter with the same name and the same in value.

  • SECURITY_REQUIREMENT: The global security requirements array is extrapolated into every operation that does not define its own.

The HOIST option value may also be ALL or NONE.

REWRITE_SIMPLE_REFS

In former versions of the Swagger specification, reference strings were allowed to take a simple form like Pet. These would be treated as internal references based on the context in which the reference appears. For example, in old pet-store examples, references to the Pet schema appeared simply as $ref: Pet and this would be equivalent to $ref: #/definitions/Pet.

While these “simple references” are no longer supported by the Swagger specification, they are still processed by some existing tools. Enabling this option will cause the normalizer to rewrite simple references to fully compliant internal references.[9]

The REWRITE_SIMPLE_REFS option value should be either true or false.

CREATE_DEF_TITLES

This option causes the normalizer to add title properties to definitions that do not already have them. The title for such a definition is set to its property name in the definitions object of its containing Swagger spec.

This is particularly helpful when name collisions occur during localization, as the titles then reflect the original names of the definitions, prior to renaming.

The CREATE_DEF_TITLES option value should be either true or false.

INSTANTIATE_NULL_COLLECTIONS

There are many optional properties in the Swagger specification, and the Swagger Java parser creates structures in which omitted properties generally appear with null values. This forces a great deal of null-checking in Java code that processes parsed Swagger specs. The INSTANTIATE_NULL_COLLECTIONS option causes such null values for either array-valued or object-valued properties to be replaced with empty arrays and objects, respectively, where doing so would not alter the meaning of the spec.[10]

The INSTANTIATE_NULL_COLLECTIONS option value should be either true or false.

FIX_MISSING_TYPES

The Swagger Java parser accepts Swagger specs in which some object schemas are missing their type property. This is allowed when the schema contains either a properties or additionalProperties property, and the parser treats the schema as if it contained type: object. This option causes the normalizer to explicitly add type: object in these schemas.

The FIX_MISSING_TYPES option value should be either true or false.

ORDERING

This option gives you some control over the order in which objects appear in the Swagger spec produced by the normalizer. Permitted values include:

  • AS_DECLARED, meaning that there should be no reordering of the model elements by Normalizer. This applies only to objects declared in the top-level and other root files; objects localized or retained from other files will appear after all root file objects, but not in a predictable order.

  • SORTED, meaning that a mostly-alphabetical ordering is imposed within the output model. In this case, all objects from all files participate, not just those from root files. The details of this ordering are as follows:

    • Paths, global parameters, global responses, and schema definitions are all ordered in a quasi-alphabetic order based on their names in the normalized spec. This is a case-insensitive ordering, except that names of the form Xxx_nnn are treated specially, where nnn is a numeric suffix. Such names are typically the result of disambiguation when collisions occur through localization. However, if your models use such names on their own, they will be treated the same way by the ordering algorithm.

      When such names occur, ordering is such that all names with the same root - including the unadorned root itself - appear together, and with numerically increasing suffixes. This is the case even when two roots differ only by letter case.

      For example, you would always see the following names in the indicated order:

      FOO, FOO_1, FOO_2, …​, FOO_10, Foo, Foo_1, Foo_2, …​, Foo_10

    • Operations within a path are ordered in the standard sequence defined by the Swagger project’s Swagger class: get, head, post, put, delete, options, patch

    • Responses defined within an operation are sorted numerically by response code, with a default entry, if any, following all numeric entries.

With both treatments - even SORTED - ordering is restricted to the model contents specifically mentioned above. So, for example, tags, operation parameters, object schema property lists, and the top-level structure of the swagger spec should mostly be as they are in the source spec under both ordering treatments, except where Swagger project software may disrupt things (e.g. in the ordering of top-level model sections).

The way to interpret the above paragraph in the case of AS_DECLARED ordering is that the Normalizer will not record positional information for items not explicitly mentioned in the details of the SORTED ordering. Therefore, if these items are reorganized by Swagger software, it will not be possible to reconstruct the original ordering.

In some cases these unaddressed orderings are likely to become addressed by the normalizer in a future release, but we have explicitly chosen not to reorder parameter lists in operations, since doing so could cause incompatible changes in the output of certain code generators (e.g. in generated method signatures).

Option Defaults

The normalizer is used in RepreZen API Studio in the following scenarios:

  • Loading a Swagger spec for display in one of the live views: Diagram, Documentation, and Swagger UI.

  • Loading a Swagger spec for processing by a GenTemplate other than that "Swagger [Normalized Yaml]" GenTemplate.

  • Loading a Swagger spec for processing by the "Swagger [Normalized Yaml]" GenTemplate.

The following table specifies the option settings that are used in each case:

Option Documentation Live View All Other Scenarios

INLINE

PARAMETER, RESPONSE

PARAMETER, RESPONSE

RETAIN

PATH_OR_COMPONENT

ALL

RETENTION_SCOPE

ROOTS

ROOTS

ADDITIONAL_FILES

empty

empty

HOIST

ALL

ALL

REWRITE_SIMPLE_REFS

true

true

CREATE_DEF_TITLES

true

false

INSTANTIATE_NULL_COLLECTIONS

true

true

FIX_MISSING_TYPES

true

true

ORDERING

AS_DECLARED

SORTED

Note that the Document Live View defaults differ from all the rest, including other live views.

There is currently no way to alter the option settings for any scenario except the "Swagger [Normalized Yaml]" GenTemplate, where the GenTarget file explicitly sets all option values. The New GenTarget wizard in RepreZen API Studio creates a GenTarget with option values set initially according to the "All Other Scenarios" column above, and you may edit those options as desired.


1. Local references like this one - that is references to an object in the same file - always start with a pound sign: "#". This happens to be the comment character in YAML syntax, so a common error is to omit quotes around the reference string. This will have the same effect as an empty string, which can lead to a variety of problems with consumers of the model. Be careful to always use quotes around your reference strings!
2. The RepreZen API Studio New Model Wizard offers a "Minimal" option that will create a (nearly) minimal Swagger spec as a starting point.
3. The only exception to this is references that could not be resolved in the original spec; these will be copied as-is into the normalized spec.
4. The only difference between these files and the actual top-level file has to do with object renaming. As stated earlier, objects appearing in the top-level spec will never be renamed. However, it is possible for a name collision to occur when loading "additional" files, and such collisions will trigger object renaming. Additional files are loaded immediately after the top-level file, in the order in which they are specified, and naming priority always favors the earlier-loaded files.
5. PATH is not an option because paths are always inlined; local path references are disallowed in Swagger specs.
6. The term "component object" is used in the forthcoming OpenAPI v3.0 specification to denote non-path objects.
7. This option is really equivalent to ALL, since paths are always inlined anyway; no other treatment is sensible since local path references are not allowed in a Swagger spec.
8. This option is needed for our Reprezen HTML Documentation gen target, which inlines everything by default and retains only top-level paths, except when there are no paths; in that case it still inlines everything, but it also retains everything. Note that due to a bug in the Swagger Parser from swagger.io, inlining of definitions is not performed by normalizer in this case, but rather by the documentation generator itself.
9. Simple reference strings are recognized only if they start with an alphabetic character or “_” and consist solely of alpha-numeric characters and “_”.
10. An example of where such replacement would change the spec is the consumes and produces arrays in operation definitions. For these, an empty array would prevent inheriting the corresponding global defaults, while a null value would not.