DSL of endpoint descriptions

Find a file

Johan Maasing cf3ff3f982 Documentation and merge behaviour for state files		2025-04-19 19:24:45 +02:00
endgen-dist	Add verbose output while running, fix arguments to run.sh	2025-04-15 16:57:14 +02:00
endpoints-templates	Output files are named as the templates with the ending stripped.	2025-04-13 14:11:03 +02:00
parser	Documentation and merge behaviour for state files	2025-04-19 19:24:45 +02:00
states-templates	Documentation and merge behaviour for state files	2025-04-19 19:24:45 +02:00
.gitignore	Split grammar	2025-04-18 08:19:38 +02:00
Jenkinsfile	Set correct branch name	2025-04-13 15:55:38 +02:00
license.txt	Use ASL license and started on documentation	2025-04-10 21:35:49 +02:00
pom.xml	Lower java requirement to 21 from 24	2025-04-15 14:17:08 +02:00
README.md	Documentation and merge behaviour for state files	2025-04-19 19:24:45 +02:00
test01.endpoints	Add verbose output while running, fix arguments to run.sh	2025-04-15 16:57:14 +02:00
test01.states	Documentation and merge behaviour for state files	2025-04-19 19:24:45 +02:00

README.md

endgen

This is a converter tool that reads a DSL and generates output files.

Endgen is Open Source Software using the Apache Software License v2.0

Motivation

The motivation behind this tool is that I wanted to generate boilerplate code for handling HTTP Endpoints (hence the endgen name).

I had one project written in Scala using the Tapir library. Another Java project using Spring Boot. In both of these projects had a very simple endpoints only supporting POST, taking some datatype as payload and using the same response type.

That's a lot of boilerplate, especially in the Scala case where the payload datatype had to be written in several separate files (for endpoint-definitions, circe serializer support etc).

So I wrote a DSL parsed by an ANTLR4 parser and a code generator using freemarker.

                                 | Endgen |
                  -------------------------------------------------
-----------       ||--------|      |-----------|     |------------||
| endpoint |      || Parser |      | In-Memory |     | Freemarker ||     ------------------
| file     | -->  ||        | -->  | AST       | --> | engine     || --> | Output file    |
\__________\      ||--------|      |-----------|     |------------||     | mytemplate.xxx |
                  --------------------------------------------------     \________________\
                                                           ^
                                                           |
                                                    ----------------------
                                                    | mytemplate.xxx.ftl |
                                                    \____________________\

Endgen currently contains two separate parsers:

endpoint - A DSL for expressing HTTP endpoints.
state - A DSL for expressing state and transitions.

Which parser that is used to read the input file is determined by the file name ending '.endpoints' or '.states' or by a command line argument.

The endpoint DSL and the state DSL share the grammar for expressing configuration and data types ,see below for details.

How to Run

You need a Java 21 (or later) runtime and java in the path. A very convenient way to install a java runtime is SdkMan.

Unpack the archive, run the provided shellscript file.

Usage

sage: run.sh [-hvV] [-o=<outputDir>] [-p=<parser>] [-t=<templateDir>] <file>
Generate source code from an endpoints specification file.
      <file>                 The source endpoints DSL file.
  -h, --help                 Show this help message and exit.
  -o, --output=<outputDir>   The directory to write the generated code to.
                               Default is endpoints-output
  -p, --parser=<parser>      Force use of a specific parser instead of
                               determining from filename. Valid values:
                               Endpoints, States.
  -t, --template=<templateDir>
                             The template directory. Default is
                               endpoints-template
  -v, --verbose              Print verbose debug messages.
  -V, --version              Print version information and exit.

Endpoint DSL example

In the simplest form the DSL looks like this

/some/endpoint <- SomeType(foo:String)

This gets parsed into an AST which in this case holds a list of Path segments and a data strucutre representing the input/body type.

Code generation example

When the parser is done reading the DSL it will look in a directory for freemarker templates. For each template it finds it sends in the AST. The resulting file (per template) is written to an output directory.

The templates must have the file ending .ftl - this ending is stripped when generating the output file. So a template called types.java.ftl will generate a file called types.java.

The idea being that you can take these files and probably adapt them before checking them into your project. Endgen does not aim to be a roundtrip tool (i.e. reading the generated source, or being smart in updating them etc). It is also a very limited DSL, you can for example not express what type of HTTP Verb to use or declare response codes. There are no plans to extend the DSL to do that either.

DSL

This is the ANTLR grammar for the root of the DSL

document                : generatorconfig? (namedTypeDeclaration|endpoint)* ;

Configuration block

Meaning that the DSL file has an optional generatorconfig block at the top. Then you can write either; a type definition, or an endpoint declaration, as many times as you like.

Here is an example:

{
    package: se.rutdev.senash,
    mykey: myvalue
}

/some/endpoint <- SomeType(foo:String)

Embedded(foo:Bar)
/some/other/endpoint <- (bar:Seq[Embedded])

This consists of a config block with 2 items, the 'package' and the 'mykey' definition. These are available to be used in the freemarker template as a Map of String-keys to String-values.

Endpoint definition

/some/endpoint <- SomeType(foo:String) is an endpoint definition. It declares one endpoint that have a request body data type called SomeType that has a field called foo of the type String.

Data types

The DSL uses Scala convention of writing data types after the field name separated by a colon. Of course the DSL parser does not know anything about java or scala types, as far as it is concerned these are 2 strings and the first one is just named field-name and the other string is named field-type.

Embedded(foo:Bar) is a namedTypeDeclaration which is parsed the same way as the request type above. But isn't tied to a specific endpoint.

Automatically named endpoint data types

/some/other/endpoint <- (bar:Seq[Embedded]) is another endpoint declaration. However this time the request body is not named in the DSL. But all datatypes must have a name so it will simply name it after the last path segment and tack on the string 'Request' at the end. So the AST till contain a datatype named endpointRequest with a field named bar and a type-field with the value Seq[Embedded].

Again, the parser does not care about, or know anything about what code is generated, so it has not semantic knowledge if these are actual datatypes in the code it generates or if they make sense in java/scala/lua/rust or whatever you decide to generate in the templates.

The only 'semantic' validation the parser performs is to check that not two types have the same name.

Endpoint reponse data type

It is possible to have an optional response data type declared like so:

/some/other/endpoint <- (bar:Seq[Embedded]) -> ResponseType(foo: Bar)

The right pointing arrow -> denotes a response type, it can be an anonymous data type in which case the parser till name it from the last path segment and add 'Response' to the end of the data type name.

State grammar

This is an example of a state file:

start -> middle: message,
middle -> middle: selfmessage,
middle -> end: endmessage

It contains 3 state definitions start, middle and end. A state definition will be parsed as a data type with the name of the state as the type name.

It also contains 3 message definitions message, selfmessage and endmessage. Message definitions will also be parsed as data types.

Since the parser will extract datatypes it is possible to define the fields of the data types. This is a slightly more complicated example:

start(foo:Foo) -> middle: message(a: String),
middle(bar:Bar) -> middle: selfmessage,
middle -> end: endmessage

Where for example the data type for middle will have the field declaration with the name bar and the type Bar.

Fields for the same state data type, or message data type, will be merged. Here is a complex example:

start(s:S) -> middle(foo:foo): message(foo:foo),
middle -> middle(bar:bar): selfmessage(bar:bar),
middle -> end: message(bar:baz)

Not that we can declare fields on both the from and to state declarations. The middle datat type will have field definitons for foo and bar.

The data type for message will have fields for foo and bar.

One restriction is that states and messages may not have the same name, i.e. be parsed as the same data type.

Generating

If the parser is successful it will hold the following data in the AST

public record DocumentNode(
        Map<String, String> config,
        Set<TypeNode> typeDefinitions,
        List<EndpointNode> endpoints,
        Set<StateNode> states) {
}

Depending on the parser used the endpoints or the states will be null but config and typeDefinitions are populated the same for both parsers.

This will be passed to the freemarker engine as the 'root' data object, meaning you have access to the parts in your freemarker template like this:

<#list typeDefinitions as type>
    This is the datat type name: ${type.name?cap_first} with the first letter capitalized.
</#list>

That is, you can directly reference typeDefinitions, endpoints, states or config.

Config

The config object is simply a String-map with the keys and values unfiltered from the input file. Here is an example that writes the value for a config key called 'package'.

package ${config.package}

Data types

These are all the data types the parser have collected, either from explicit declarations, request payloads and response bodies.

public record TypeNode(String name, List<FieldNode> fields) { }
public record FieldNode(String name, String type) { }

Here is an example template that writes the data types as Scala case classes

object Protocol:
<#list typeDefinitions?sort as type>
    case class ${type.name?cap_first}(
    <#list type.fields as field>
        ${field.name} : ${field.type},
    </#list>
    )
</#list>

Endpoints

The parser will collect the following data for endpoint declarations

public record EndpointNode(
	PathsNode paths,
	String inputType,
	Optional<String> outputType) {}

public record PathsNode(List<String> paths) {}

This is an example that will write out the endpoints with the path first, then the Input data type, then the optional Output data type.

<#list endpoints as endpoint>
    <#list endpoint.paths.paths>
    <#items as segment>/${segment}</#items>
    Input:
        ${endpoint.inputType?cap_first}
    Output:
    <#if endpoint.outputType.isPresent()>
        ${endpoint.outputType.get()?cap_first}
    <#else>
        Not specified
    </#if>
    </#list>

</#list>

States

The set of states will hold items of this shape:

public record StateNode(String name, String data, Set<TransitionNode> transitions) {
}

and the transitions has this structure:

public record TransitionNode(String message, String toState) {
}