-
Notifications
You must be signed in to change notification settings - Fork 1
Add test suite, CI workflow, and technical documentation #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| name: Tests | ||
|
|
||
| on: | ||
| push: | ||
| branches: [ lwt, test-suite ] | ||
| pull_request: | ||
| branches: [ lwt ] | ||
|
|
||
| jobs: | ||
| test: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up OCaml | ||
| uses: ocaml/setup-ocaml@v3 | ||
| with: | ||
| ocaml-compiler: 5.1.1 | ||
| dune-cache: true | ||
| opam-repositories: | | ||
| default: https://github.com/ocaml/opam-repository.git | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| sudo apt-get update | ||
| sudo apt-get install -y npm xz-utils libomp-dev llvm-dev | ||
| opam install . --deps-only --update-invariant | ||
| npm install --no-save typescript browserify pug-lexer pug-parser pug-walk | ||
|
|
||
| - name: Install QuickJS | ||
| run: | | ||
| curl -fsSL https://bellard.org/quickjs/quickjs-2021-03-27.tar.xz -o quickjs.tar.xz | ||
| tar xvf quickjs.tar.xz && rm quickjs.tar.xz | ||
| mv quickjs-2021-03-27 quickjs | ||
| cd quickjs && make | ||
|
|
||
| - name: Install Flow | ||
| run: | | ||
| git clone --branch v0.183.1 --depth 1 https://github.com/facebook/flow.git flow | ||
| ln -s "$(pwd)/flow/src/parser" src/flow_parser | ||
| ln -s "$(pwd)/flow/src/third-party/sedlex" src/sedlex | ||
| ln -s "$(pwd)/flow/src/hack_forked/utils/collections" src/collections | ||
|
|
||
| - name: Run tests | ||
| run: | | ||
| mkdir -p strings | ||
| opam exec -- dune runtest tests/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,3 +22,7 @@ bad/ | |
| src/flow_parser | ||
| src/sedlex | ||
| src/collections | ||
|
|
||
| tests/integration_test_run/ | ||
| .agents/ | ||
| ignored/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,88 @@ | ||
| # Agent Information - String Extractor | ||
|
|
||
| This repository contains an OCaml-based internationalization (i18n) string extraction tool. It parses source files (JS, TS, Vue, Pug, HTML) and extracts strings for translation management. | ||
|
|
||
| ## Documentation | ||
|
|
||
| - **[ARCHITECTURE.md](ARCHITECTURE.md)**: Contains a deep-dive into the codebase layout, directory structure, and a comprehensive API reference. **Read this file first** when: | ||
| - Starting a new task to understand which files are relevant. | ||
| - Investigating the impact of changes across the system. | ||
| - Looking for specific functionality or function definitions before searching. | ||
|
|
||
| - **[DEVELOPMENT.md](DEVELOPMENT.md)**: Contains instructions for environment setup, build processes for various platforms, and release workflows. **Read this file first** when: | ||
| - Setting up the development environment or installing dependencies (OCaml, JS, QuickJS). | ||
| - Building the project for development or release. | ||
| - Executing the tool for manual verification or testing. | ||
| - Managing version numbers or release artifacts. | ||
|
|
||
| ## Project Overview | ||
|
|
||
| - **Language**: OCaml (5.1.1) with some C++ (QuickJS bridge) and JavaScript (parsers via Browserify). | ||
| - **Architecture**: | ||
| - `src/cli/`: Main entry point, command-line interface, and output generation logic. | ||
| - `src/parsing/`: OCaml parsers using `Angstrom` for custom formats and `Flow_parser` for JS. | ||
| - `src/quickjs/`: Bridge to QuickJS to run JavaScript-based parsers (TypeScript/Pug) from OCaml. | ||
| - `src/utils/`: Common utilities for collection, timing, and I/O. | ||
| - **Key Libraries**: `Core`, `Lwt` (concurrency), `Angstrom` (parsing), `Yojson`, `Ppx_jane`. | ||
|
|
||
| ## Essential Commands | ||
|
|
||
| ### Build | ||
| - **Development build**: `dune build src/cli/strings.exe` | ||
| - **Watch mode**: `dune build src/cli/strings.exe -w` | ||
| - **Release build (MacOS)**: `DUNE_PROFILE=release dune build src/cli/strings.exe` | ||
| - **Full release cycle**: See `DEVELOPMENT.md` for `cp`, `strip`, and Docker commands. | ||
|
|
||
| ### Run | ||
| - After building: `./_build/default/src/cli/strings.exe [directory-to-extract-from]` | ||
| - The CLI expects to be run from the root of a project containing a `strings/` directory (or it will create one if a `.git` folder is present). | ||
|
|
||
| ### Installation (Dev Setup) | ||
| Refer to `DEVELOPMENT.md` for specific `opam` and `npm` setup steps, as the project has several external dependencies (Flow, QuickJS, pug-lexer, etc.). | ||
|
|
||
| ## Code Conventions & Patterns | ||
|
|
||
| ### Parsing Strategy | ||
| 1. **Direct Parsers**: Simple formats like `.strings`, `HTML`, and basic `Vue` tags are parsed using `Angstrom` in `src/parsing/`. | ||
| 2. **JS/TS Parsing**: | ||
| - Javascript uses `Flow_parser` and a custom AST walker in `src/parsing/js_ast.ml`. | ||
| - TypeScript uses the official TS parser running inside QuickJS (`src/quickjs/`). | ||
| 3. **Pug Parsing**: Has a "fast" OCaml implementation (`src/parsing/pug.ml`) and a "slow" official Pug implementation via QuickJS (`src/quickjs/`). | ||
|
|
||
| ### Extraction Pattern | ||
| - Content is extracted into a `Utils.Collector.t`. | ||
| - The collector tracks found strings, potential scripts (to be further parsed), and file errors. | ||
| - **Convention**: Strings found inside `L("...")` calls are treated as translations in JS/TS. | ||
|
|
||
| ### Concurrency | ||
| - Uses `Lwt` for cooperative concurrency. | ||
| - Parallel traversal of directories is handled in `src/cli/strings.ml` via `Lwt_list` and `Lwt_pool`. | ||
| - JS workers (QuickJS) are managed via `Lwt_pool` and `Lwt_preemptive` in `src/quickjs/quickjs.ml`. | ||
|
|
||
| ## Important Gotchas | ||
|
|
||
| - **QuickJS Dependency**: Requires a compiled `quickjs` directory at the project root for building. `dune` rules in `src/quickjs/dune` copy headers and libraries from there. | ||
| - **Generated Headers**: `src/quickjs/runtime.h` is generated from `src/quickjs/parsers.js` using `browserify` and `qjsc`. | ||
| - **Linking**: MacOS builds use specific link flags (e.g., `ld64.lld`) defined in `src/cli/link_flags.*`. | ||
| - **OCamlFormat**: `.ocamlformat` is present; ensure you format OCaml code before submitting. | ||
| - **Memory Safety**: Be cautious with C++ FFI code in `src/quickjs/quickjs.cpp`, particularly regarding OCaml's GC interaction (`CAMLparam`, `CAMLreturn`, `caml_release_runtime_system`). | ||
|
|
||
| ## Testing Approach | ||
|
|
||
| - **Inline Tests**: The project uses `ppx_inline_test`. Parsers in `src/parsing/` can be tested directly within the OCaml files or in the `tests/` directory. | ||
| - **Test Suite**: A standard test suite is located in `tests/test_runner.ml`. It covers JS, HTML, Pug, and `.strings` file parsing. | ||
| - **Integration Tests**: Verification can be performed by running the built binary against fixtures in `tests/fixtures/` and checking the generated output in the `strings/` directory. | ||
| - **Debug Flags**: Use `--show-debugging` or `--debug-pug` / `--debug-html` flags in the CLI to inspect internal parsing results. | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### "File modified since last read" | ||
| If you receive an error stating that a file has been **"modified since it was last read"**, it usually indicates a discrepancy between the file's filesystem timestamp and the internal state tracking. | ||
|
|
||
| **Example Error:** | ||
| > `Edit failed: The file '/path/to/file' was modified since it was last read. Please read the file again before trying to edit it.` | ||
|
|
||
| **Recommended Fix:** | ||
| 1. Execute `touch filename` to reset the file's modification time to the current system time. | ||
| 2. Re-read the file using the `view` tool. | ||
| 3. Attempt the edit again. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| # Architecture Documentation - String Extractor | ||
|
|
||
| This document provides a high-level overview of the String Extractor's architecture, directory structure, and internal APIs. | ||
|
|
||
| ## Project Entry Point | ||
|
|
||
| The main entry point of the application is **`src/cli/strings.ml`**. It handles command-line argument parsing using `Core.Command`, sets up the `Lwt` runtime, and initiates the file traversal process. | ||
|
|
||
| ## Directory Structure | ||
|
|
||
| ```text | ||
| / | ||
| ├── src/ | ||
| │ ├── cli/ # Main CLI application logic | ||
| │ │ ├── strings.ml # CLI entry point, traversal coordination | ||
| │ │ ├── vue.ml # Vue-specific parsing and extraction logic | ||
| │ │ └── generate.ml # Localization file generation (.strings, .json) | ||
| │ ├── parsing/ # Core parsers using Angstrom and Flow | ||
| │ │ ├── basic.ml # Common parsing utilities and combinators | ||
| │ │ ├── js_ast.ml # Flow AST walker for string extraction | ||
| │ │ ├── js.ml # JavaScript string extraction entry point | ||
| │ │ ├── pug.ml # Native Pug template parsing | ||
| │ │ ├── html.ml # HTML template parsing | ||
| │ │ ├── strings.ml # .strings file parsing logic | ||
| │ │ └── ... # Other specialized parsers (vue blocks, styles) | ||
| │ ├── quickjs/ # Interface to QuickJS for JS/TS/Pug parsing | ||
| │ │ ├── quickjs.ml # OCaml FFI to QuickJS | ||
| │ │ ├── quickjs.cpp # C++ implementation of the bridge | ||
| │ │ └── parsers.js # JS-based parsers running in QuickJS | ||
| │ └── utils/ # Shared utility modules | ||
| │ ├── collector.ml # State container for collected strings/errors | ||
| │ ├── io.ml # I/O helpers | ||
| │ ├── timing.ml # Performance measurement | ||
| │ └── exception.ml # Exception handling | ||
| ├── strings/ # Directory where .strings files are managed | ||
| ├── dune-project # Dune build system configuration | ||
| └── README.md # Project overview and usage instructions | ||
| ``` | ||
|
|
||
| ## Core API Reference | ||
|
|
||
| ### `src/cli/` | ||
| - **`Strings.main`**: Coordinates the entire run, including directory traversal and result generation. | ||
| - **`Vue.parse`**: Splits a `.vue` file into its constituent parts (template, script, style). | ||
| - **`Generate.write_english`**: Creates `english.strings` and `english.json` from the collected strings. | ||
| - **`Generate.write_other`**: Updates existing translations for other languages. | ||
|
|
||
| ### `src/parsing/` | ||
| - **`Parsing.Basic`**: Provides foundational Angstrom parsers for whitespace, strings, and standard error handling. | ||
| - **`Parsing.Js.extract_to_collector`**: Entry point for scanning JavaScript source code. | ||
| - **`Parsing.Js_ast.extract`**: A comprehensive walker for the Flow AST that identifies and extracts strings from `L("...")` calls. | ||
| - **`Parsing.Pug.collect`**: Traverses the native Pug AST to extract strings. | ||
| - **`Parsing.Strings.parse`**: Parses existing `.strings` files into a lookup table. Takes a `Lwt_io.input_channel` and returns a `string Core.String.Table.t Lwt.t`. | ||
|
|
||
| ### `src/quickjs/` | ||
| - **`Quickjs.extract_to_collector`**: Offloads extraction to QuickJS for TypeScript and advanced Pug templates. | ||
|
|
||
| ### `src/utils/` | ||
| - **`Utils.Collector.create`**: Initializes a new string collection state for a specific file. (type `t = { path: string; strings: string Queue.t; ... }`) | ||
| - **`Utils.Collector.blit_transfer`**: Merges results from one collector into another. | ||
|
|
||
| ## Control Flow | ||
| 1. **Initiation**: `strings.exe` starts, parses CLI flags, and identifies the target directory. | ||
| 2. **Traversal**: Uses `Lwt` to cooperatively walk the directory tree via `Lwt_list` and `Lwt_pool`. | ||
| 3. **Dispatch**: For each supported file extension, the corresponding parser in `src/parsing` is invoked. | ||
| 4. **Collection**: Parsers find strings (usually inside `L()`) and add them to a `Collector.t`. | ||
| 5. **Generation**: `Generate.ml` aggregates strings from all collectors and updates the `strings/` directory. | ||
|
|
||
| ## Testing Setup | ||
|
|
||
| The project implements a multi-layered testing strategy: | ||
|
|
||
| 1. **Inline Tests**: Using `ppx_inline_test` (e.g. `let%test_unit`) together with `ppx_assert` (e.g. `[%test_eq]`), logic can be tested directly within the source files. This is primarily used for parser validation in `src/parsing/`. | ||
| 2. **Standard Test Suite**: Located in `tests/test_runner.ml`, this suite runs the inline tests via `ppx_inline_test` and uses `ppx_assert` to verify: | ||
| - JavaScript string extraction via `Flow_parser`. | ||
| - HTML extraction via `SZXX` and Pug extraction via `Angstrom`. | ||
| - Apple-style `.strings` file parsing (via `Lwt_main.run` and `Lwt_io`). | ||
| 3. **Integration Testing**: The `tests/fixtures/` directory contains sample files of all supported types. The CLI can be run against these fixtures to verify end-to-end extraction and output generation (`.strings` and `.json` files). | ||
|
|
||
| The `tests/dune` file configures the test library and enables inline tests for the module. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| () |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| (library | ||
| (name parsing_tests) | ||
| (inline_tests) | ||
| (libraries parsing utils core lwt lwt.unix angstrom-lwt-unix) | ||
| (preprocess (pps ppx_jane ppx_inline_test)) | ||
| ) | ||
|
|
||
| (rule | ||
| (alias runtest) | ||
| (deps | ||
| ../src/cli/strings.exe | ||
| (source_tree fixtures)) | ||
| (action | ||
| (bash " | ||
| TMP_DIR=\"integration_test_run\" | ||
| rm -rf $TMP_DIR | ||
| mkdir -p $TMP_DIR/strings | ||
| mkdir -p $TMP_DIR/.git | ||
| printf '\"Hello from HTML\" = \"Bonjour de HTML\";\n' > $TMP_DIR/strings/french.strings | ||
| cp -r fixtures $TMP_DIR/ | ||
| cd $TMP_DIR | ||
| ../../src/cli/strings.exe fixtures --output strings &> /dev/null | ||
| cd .. | ||
|
|
||
| if ! grep -q \"Bonjour de HTML\" $TMP_DIR/strings/french.strings; then | ||
| echo \"Error: French translation lost in .strings\" | ||
| exit 1 | ||
| fi | ||
| if ! grep -q \"Bonjour de HTML\" $TMP_DIR/strings/french.json; then | ||
| echo \"Error: French translation lost in .json\" | ||
| exit 1 | ||
| fi | ||
| if ! grep -q \"MISSING TRANSLATION - demo.pug\" $TMP_DIR/strings/french.strings; then | ||
| echo \"Error: Missing translation marker not found in .strings\" | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo \"✅ French integration test passed\" | ||
| rm -rf $TMP_DIR | ||
| "))) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| <i18n>Hello from HTML</i18n> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| L('Hello from JS'); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| i18n Hello from Pug |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| <template> | ||
| <i18n>Hello from Vue Template</i18n> | ||
| </template> | ||
|
|
||
| <script> | ||
| export default { | ||
| data() { | ||
| return { | ||
| msg: L('Hello from Vue Script') | ||
| } | ||
| } | ||
| } | ||
| </script> |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.