Parse Algorithm

The lexing algorithm and parse approach employed by Æsthetic stem from an original strategic concept crafted by Austin Cheney and first introduced in his project Sparser. Notably, the parse table produced by Sparser offers a simple yet powerful data format capable of precisely describing any language. Before its integration into Æsthetic, Sparser played a crucial role in the language-aware diffing and beautification tool PrettyDiff.

Unlike many parsers that typically generate an AST, Æsthetic follows a distinct path. Its implementation of Sparser results in a uniform table-like structure. While various algorithms can achieve similar outcomes, each approach comes with its own set of tradeoffs. Several parse tools in the development ecosystem rely on technologies like ANTLR, PEG, or the impressive incremental Tree Sitter.

Generators such as ANTLR and Tree Sitter demand grammars and may impose steep learning curves on users. On the other hand, PEG parsers exhibit less ambiguity than LR parsers but might produce less effective error messages and consume more memory. Alternatively, hand-rolled recursive-descent parsers, while potentially slower than those generated by parsers, offer an unambiguous approach overall.

Given Æsthetic’s primary task of making sense of combined language formations, the chosen data structure (i.e., parse table) stands out for its flexibility, ease of reasoning, and extensibility. This enables Æsthetic to tackle the intricacies of code formatting across different languages, providing a reliable and adaptable solution to developers worldwide.

Parse Table

Let’s take a moment to analyze the following code sample—a combination of HTML (markup) with embedded Liquid, CSS, JavaScript, and JSON. Each language is encapsulated within the appropriate regional-based denominated tags. Notably, Liquid token expressions are found within CSS, JavaScript, and HTML.

Code Example

<style>
  .list { background-color: {{ bg.color }}; }
</style>

<script>
  {% if condition %} fn('hello world!') {% endif %}
</script>

<main id="{{ object.prop }}">
  <ul class="list">
    {% for item in arr %}
      <li>{{ item }}</li>
    {% endfor %}
  </ul>
</main>

{% schema %}
{
  "prop": []
}
{% endschema %}

Lexical Interpretation

The complexity arises when we encounter such a lexical anomaly, as there is no definitive “right way” or consensus on how a parser should interpret this mixture of languages. This edge case realm poses unique challenges, and the study of achieving lexical context in such scenarios has not been extensively explored in academia.

However, with the originality of the Sparser language parsing algorithm, Æsthetic shines as a solution to traverse and interpret these otherwise complex structures without the need to make compromises or resort to additional resources to address weaknesses. The powerful capabilities of Æsthetic’s parse table allow for seamless handling of these intricate combinations, providing developers with a reliable and flexible tool to navigate the complexities of a multi-language codebase.