The emergence of local-first web applications is challenging conventional client-server architectures, pushing developers to rethink how applications function. Users now demand a near-native experience, even in offline scenarios, which has sparked a need for efficient client-side data processing, particularly for search functionalities. This article delves into a type-driven approach that incorporates a Domain-Specific Language (DSL) to construct an effective and maintainable search system.

For those interested, the complete code devoid of dependencies is available on GitHub, providing a practical resource for implementing the discussed techniques.

Understanding Domain-Specific Languages (DSLs)

At the core of our exploration lies a DSL specifically designed for searching issues, a common element in project management and bug tracking systems. A DSL is a specialized language that articulates search intents, offering several notable advantages. Here are a few example queries that highlight its functionality:

  • is:open label:bug
  • author:alice (type:feature type:enhancement)
  • is:closed milestone:v1.0 assignee:bob

The clarity and expressiveness that a well-structured DSL provides can be observed in various successful systems. Consider Lucene and Elasticsearch, which use a query string DSL for full-text searches, or SQL, which utilizes its WHERE clause as a DSL for data filtering. Similarly, GraphQL defines a specific query language for fetching data from APIs. Users may find the proposed DSL reminiscent of GitHubs issue search functionality.

The implementation of a DSL offers multiple benefits, including controlled complexity through a limited scope of possible queries, and ensuring domain alignment by reflecting the domains terminology in its vocabulary. This enhancement in usability allows users to formulate queries using familiar terms, while a formal grammar simplifies the process of maintenance and extensibility, making upgrades and modifications more manageable.

Defining the Domain

To effectively structure the dataset, it is crucial to consider the business domain being modeled. For this discussion, an issue tracking system has been selected, with the primary value represented by the Issue interface.

type IssueStatus = "open" | "closed";type IssueType = "bug" | "feature" | "docs" | "enhancement";interface Issue {    id: string;    title: string;    status: IssueStatus;    author: string;    labels: string[];    milestone: string | null;    assignee: string | null;    type: IssueType;    updatedAt: number;}

This model serves as a foundational example, with the same principles applicable to various other domains, such as document management, customer relationship management, product inventories, and application logging.

Error Handling Mechanism

Before moving to the parsing process, it is imperative to have a robust error handling strategy in place. The Either type, a fundamental concept in functional programming, offers a solid mechanism for this purpose. It represents values that can either be a success (the Right case) or a failure (the Left case) resulting from an operation.

type Either = Left | Right;class Left {    constructor(readonly value: L) {}    isLeft(): this is Left { return true; }    isRight(): this is Right { return false; }}class Right {    constructor(readonly value: R) {}    isLeft(): this is Left { return false; }    isRight(): this is Right { return true; }}const success = (value: T) => new Right(value);const failure = (error: E) => new Left(error);

This mechanism will be utilized throughout the program to propagate results efficiently. When a parsing step succeeds, it returns a Right with the parsed value and any remaining input. Conversely, if it fails, it returns a Left containing an error object, which is critical for establishing a reliable system.

Precise Query Parsing

To process queries effectively, we employ parser combinatorsan established technique from functional programming often used in compiler design. Parser combinators are higher-order functions that take one or more parsers as input and return a new parser. Each individual parser attempts to match a portion of the input string and, if successful, returns a Right with the parsed value and remaining input. If it fails, it will yield a Left indicating an error.

type ParserError = {    code: | "INVALID_TOKEN" | "MISSING_VALUE" | "INVALID_STATUS" | "INVALID_TYPE" | "INVALID_TIME_FILTER";    message: string;    position: number;    input: string;};type ParserResult = Either;type Parser = (input: string) => ParserResult;

We implement several fundamental combinators in our parser:

  • lit: Matches a literal string.
  • word: Parses a sequence of alphanumeric characters.
  • alt: Tries multiple parsers in sequence, returning the result of the first that succeeds.
  • seq: Applies multiple parsers sequentially, succeeding only if all of them succeed.
  • many: Applies a parser zero or more times, collecting results into an array.
  • map: Transforms the result of a successful parse using a provided function.

This layered approach, starting with simple parsers like lit and word, allows us to combine them into more complex structures using combinators like seq, alt, and many. The map combinator plays a pivotal role in converting raw parsed strings into a structured representation known as the Abstract Syntax Tree (AST). This recursive method of building complex structures from simpler ones showcases the elegance inherent in functional programming.

Structuring the Query with AST

The primary role of the parser is not to execute queries directly but to convert the input string into an Abstract Syntax Tree (AST). The AST serves as a structured and hierarchical representation of the query, independent of its specific syntax. This separation is essential for several reasons:

  • Separation of Concerns: Parsing (syntax) remains distinct from evaluation (semantics).
  • Optimization: The AST can be analyzed and optimized prior to execution.
  • Flexibility: The AST can be repurposed for various uses, such as generating queries for different backends.

Below is a representation of the AST structure:

type FilterNodeType = | "status" | "author" | "label" | "type" | "updated" | "and" | "or";type LeafFilterNode = { readonly type: FilterNodeType; readonly value: string; };type FilterNode = | LeafFilterNode | { type: "and" | "or"; value: FilterNode[]; };

The AST consists of FilterNode objects where leaf nodes represent individual filters (like status:open), while and and or nodes represent boolean combinations. The final searchQueryParser is achieved by composing combinators effectively.

Evaluating the Query

With the AST ready, we can transform it into a predicate function. The predicate function, IssuePredicate, evaluates whether a given issue meets the defined criteria.

type IssuePredicate = (issue: Issue) => boolean;const isValidIssueStatus = (value: string): value is IssueStatus => {    return ["open", "closed"].includes(value);};const matchStatus = (status: string): IssuePredicate => {    if (isValidIssueStatus(status)) {        return (issue) => issue.status === status;    }    console.error(`Invalid status: ${status}`);    return matchNone();};

The createPredicate function navigates through the AST, generating and combining predicates effectively.

const createPredicate = (node: FilterNode): IssuePredicate => {    switch (node.type) {        case "status":            return isValidIssueStatus(node.value) ? matchStatus(node.value) : matchNone();                case "and":            return and(...(node.value as FilterNode[]).map(createPredicate));                }};const and = (...preds: IssuePredicate[]): IssuePredicate => (issue) => preds.every((p) => p(issue));

This recursive design elegantly addresses the complex boolean logic necessary for applying the filters to our dataset.

Executing the Query

Finally, we define an executeQuery function to manage the entire querying process.

const executeQuery = (query: string, issues: Issue[]): Issue[] => {    const parsedQuery = searchQueryParser(query);    if (parsedQuery.isLeft()) {        console.error(`Parse Error: ${parsedQuery.value.message}`);        return [];     }    const ast = parsedQuery.value[0];    const predicate = createPredicate(ast);    return issues.filter(predicate);};

This function efficiently manages parsing errors, returning an empty result set while logging any issues encountered. In a production environment, such errors could be logged in various ways depending on the context.

Looking Ahead

While it might be argued that other solutions could be better suited for handling large datasetssuch as using IndexedDB or SQLite for data storage and queryingthe techniques discussed still hold significant value. A database solution could serve as the storage layer, while the methods articulated in this article could construct queries based on user inputs.

Future considerations should include evaluating the performance of this system and exploring enhancements. A preliminary performance assessment, using a test machine with a simulated dataset containing 1 million issues, yielded the following results:

Search Results for: "is:open"----------------------------------------Matching issues: 499344   Search time: 20.77msSearch Results for: "is:closed"----------------------------------------Matching issues: 500656   Search time: 20.06msSearch Results for: "is:open label:bug"----------------------------------------Matching issues: 210988   Search time: 76.44msSearch Results for: "author:alice type:feature"----------------------------------------Matching issues: 63052   Search time: 67.80msSearch Results for: "label:documentation type:docs"----------------------------------------Matching issues: 105336   Search time: 52.06msSearch Results for: "is:open (author:charlie author:alice) label:enhancement"----------------------------------------Matching issues: 105326   Search time: 81.30ms

These outcomes suggest that the system can manage a substantial volume of records and execute queries with reasonable performance. However, it is essential to acknowledge that the current implementation relies on linear scans, which can be inefficient for larger datasets.

For real-world applications, indexing becomes crucial to maintain acceptable performance levels. A common approach for text fields, like titles, is to employ an inverted indexmapping words to the IDs of the issues that contain them. For simpler fields, basic map-based indexes may suffice.

Beyond indexing, other optimizations could be introduced:

  • Query Optimization: Analyze the AST to determine the most efficient order for applying filters, prioritizing the most selective filters first to reduce the overall workload.
  • Query Planning: In more complex scenarios, a query planner could opt for different indexes and execution strategies based on the query structure and data statistics.
  • Caching: This could be implemented at various levelscaching parsed queries, predicate functions, and even entire query results to improve performance when data remains unchanged.

These optimization strategies can provide a roadmap for scaling the system to manage larger datasets and intricate queries, with their implementation left as an exercise for the reader.

Conclusion

This article has presented a structured method for querying data, constructing a comprehensive system that hinges on type safety, functional programming principles, and a clear separation of concerns. By leveraging TypeScript, parser combinators, and an AST, we have created a search DSL that is functional, robust, maintainable, and extensible.

It is also worth noting that the Either type serves as an example of a monada core concept within functional programming. Monads help structure computations that involve sequencing operations and managing potential failures or side effects. While a deeper exploration into monad theory is beyond the bounds of this article, recognizing this connection can pave the way for a richer understanding of functional programming principles, potentially easing the barrier to entry for new learners.

The techniques shared in this article offer a solid foundation for developing advanced search capabilities relevant to both local-first web applications and extensive server-side systems. By adopting the principles outlined here, developers can craft search experiences that are not only powerful but also user-friendly.

References and Further Reading