[Monarch] asm/mips 語法高亮開發(fā)心得 - Monaco Editor

MonarchMonaco Editor 自帶的一個(gè)語法高亮庫,可以用類似 JSON 的語法來實(shí)現(xiàn)自定義語言的語法高亮功能。本文將通過編寫一個(gè)簡單的mips匯編語言的自定義語法高亮,來介紹 Monarch 的使用。

1. 初始化

首先需要定義一門語言,在此我們指定語言的名字叫 asm。

// Register a new language
monaco.languages.register({ id: "asm", ignoreCase: false });

monaco 官方文檔如下,

### register
register(language: ILanguageExtensionPoint): void

Defined in monaco.d.ts:4659
Register information about a new language.

#### Parameters
* language: ILanguageExtensionPoint

#### Returns void

其中 ILanguageExtensionPoint 是以下 Object,

{
    aliases?: string[],
    configuration?: Uri,
    extensions?: string[], // 源代碼文件拓展名
    filenamePatterns?: string[],
    filenames?: string[],
    firstLine?: string,
    id: string, // 語言的名字
    mimetypes?: string[]
}

2. Monarch Tokens Provider

接下來需要注冊(cè)該語言的標(biāo)識(shí)解釋器,在此我們?cè)O(shè)置該語言是大小寫敏感的,并且有一個(gè) tokenizer

// Register a tokens provider for the language
monaco.languages.setMonarchTokensProvider("asm", {
    ignoreCase: false,
    tokenizer: {...}
}

Tokenizer

官方文檔中有以下描述

(object with states) This defines the tokenization rules. The tokenizer attribute describes how lexical analysis takes place, and how the input is divided into tokens. Each token is given a CSS class name which is used to render each token in the editor.

即是將源代碼轉(zhuǎn)化為各個(gè)標(biāo)識(shí)符(關(guān)鍵字、字符串、注釋)的規(guī)則。具體而言, tokenizer 描述了一系列 state 和其規(guī)則,可以看成是一個(gè)語法解析狀態(tài)機(jī),而每一條規(guī)則描述了該 state 的匹配規(guī)則、行為action、下一狀態(tài) next。

https://microsoft.github.io/monaco-editor/monarch.html 中有很多樣例,這里不具體講解各種配置的意義,下面直接舉例 asm 語言的 tokenizer

話不多說上代碼,最終的結(jié)果如下,

{
storage_type_kw: /\.(ascii|asciiz|byte|data|double|float|half|kdata|ktext|space|text|word|set\s*(noat|at|noreorder|reorder))\b/,
function_normal: ["abs.d", "abs.s", "add", "add.d", "add.s", ..., "xor", "xori"],
function_pseudo: ["mul", "abs", "div", "divu", ..., "sd", "ush", "usw", "move", "mfc1.d", "l.d", "l.s", "s.d", "s.s"],

tokenizer: {
    root: [
        [/^\s*?/, "line.line", "@line_pre"],
        { include: "@normal" }
    ],
    normal: [
        [/#.*$/, "comment", "@popall"],
        [/"/, { token: "string.quote", bracket: "@open", next: "@string" }],
        [/[\w\.\-]+/, {
            cases: {
                "-?\\d+": { token: "number", next: "@popall" },
                "-?\\d+\\.\\d+": { token: "number.float", next: "@popall" },
                "0[xX]([0-9a-fA-F]*)": { token: "number.hex", next: "@popall" },
                "0[bB]([01]*)": { token: "number.binary", next: "@popall" },
                "@default": { token: "source", next: "@popall" },
                "@eos": { token: "line.line", next: "@popall" }
            }
        }],
        { include: "register" }
    ],

    line_pre: [
        [/([a-zA-Z_]\w*):/, "tag.label.$1", "@line_fun"],
        { include: "@line_fun" },
        { include: "@normal" },
    ],

    line_fun: [
        [/[a-z][\w\.]*/, {
            cases: {
                "@function_normal": { token: "function.normal.$0", next: "@popall" },
                "@function_pseudo": { token: "function.pseudo.$0", next: "@popall" },
                "@default": { token: "source", next: "@popall" },
                "@eos": { token: "line.line", next: "@popall" }
            }
        }],
        [/@storage_type_kw/, "constructor.storage.type", "@popall"],
        [/\.(align|extern|globl)\b/, "constructor.storage.modifier", "@popall"],
        { include: "@normal" },
    ],

    register: [
        [/(\$)(0|[2-9]|1[0-9]|2[0-589]|3[0-1])\b/, "variable.register.by-number", "@popall"],
        [/(\$)(zero|v[01]|a[0-3]|t[0-9]|s[0-7]|gp|sp|fp|ra)\b/, "variable.register.by-name", "@popall"],
        [/(\$)(at|k[01]|1|2[67])\b/, "variable.register.reserved", "@popall"],
        [/(\$)f([0-9]|1[0-9]|2[0-9]|3[0-1])\b/, "variable.register.floating-point", "@popall"]
    ],

    string: [
        [/[^\\"&]+/, "string"],
        { include: "@string_common" },
        [/"/, { token: 'string.quote', bracket: '@close', next: '@popall' }]
    ],

    string_common: [
        [/\\[rnt\\']/, "string.escape"],
        [/&\w+;/, 'string.escape'],
        [/[\\&]/, 'string']
    ]
}
}

其中規(guī)則的入口是 tokenizer.root ,與tokenizer同級(jí)的是關(guān)鍵字表,tokenizer 的子元素是規(guī)則表

include

包含 tokenizer 下其它的規(guī)則,例如,

root: [ { include: "@normal" } ]

Inspecting Tokens

Monaco provides an Inspect Tokens tool in browsers to help identify the tokens parsed from source code.

To activate:

  • Press F1 while focused on a Monaco instance. (或者右鍵 - Command Palette)
  • Trigger the Developer: Inspect Tokens option.

This will show a display over the currently selected token for its language, token type, basic font style and colors, and selector you can target in your editor themes.

可以看出 beq 的標(biāo)識(shí)是 function.normal.beq.asm

3. Theme

4. Completion Item Provider

[To be continued]

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容