Version | 46 (draft) |
---|---|
Editors | Addison Phillips and other CLDR committee members |
For the full header, summary, and status, see Part 1: Core.
This specification defines the data model, syntax, processing, and conformance requirements for the next generation of dynamic messages.
This is a partial document, describing only those parts of the LDML that are relevant for message format. For the other parts of the LDML see the main LDML document and the links above.
This is a draft document which may be updated, replaced, or superseded by other documents at any time. Publication does not imply endorsement by the Unicode Consortium. This is not a stable document; it is inappropriate to cite this document as other than a work in progress.
A Unicode Technical Standard (UTS) is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS.
Please submit corrigenda and other comments with the CLDR bug reporting form [Bugs]. Related information that is useful in understanding this document is found in the References. For the latest version of the Unicode Standard see [Unicode]. For a list of current Unicode Technical Reports see [Reports]. For more information about versions of the Unicode Standard, see [Versions].
The LDML specification is divided into the following parts:
One of the challenges in adapting software to work for users with different languages and cultures is the need for dynamic messages. Whenever a user interface needs to present data as part of a larger string, that data needs to be formatted (and the message may need to be altered) to make it culturally accepted and grammatically correct.
For example, if your US English (
en-US
) interface has a message like:Your item had 1,023 views on April 3, 2023
You want the translated message to be appropriately formatted into French:
Votre article a eu 1 023 vues le 3 avril 2023
Or Japanese:
あなたのアイテムは 2023 年 4 月 3 日に 1,023 回閲覧されました。
This specification defines the data model, syntax, processing, and conformance requirements for the next generation of dynamic messages. It is intended for adoption by programming languages and APIs. This will enable the integration of existing internationalization APIs (such as the date and number formats shown above), grammatical matching (such as plurals or genders), as well as user-defined formats and message selectors.
The document is the successor to ICU MessageFormat, henceforth called ICU MessageFormat 1.0.
Everything in this specification is normative except for: sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
A term looks like this when it is defined in this specification.
A reference to a term looks like this.
Examples are non-normative and styled like this.
Important
The provisions of the stability policy are not in effect until the conclusion of the technical preview and adoption of this specification.
Updates to this specification will not change the syntactical meaning, the runtime output, or other behaviour of valid messages written for earlier versions of this specification that only use functions defined in this specification. Updates to this specification will not remove any syntax provided in this version. Future versions MAY add additional structure or meaning to existing syntax.
Updates to this specification will not remove any reserved keywords or sigils.
Note
Future versions may define new keywords.
Updates to this specification will not reserve or assign meaning to
any character "sigils" except for those in the reserved
production.
Updates to this specification will not remove any functions defined in the default registry nor will they remove any options or option values. Additional options or option values MAY be defined.
Note
This does not guarantee that the results of formatting will never change. Even when the specification doesn't change, the functions for date formatting, number formatting and so on will change their results over time.
Later specification versions MAY make previously invalid messages valid.
Updates to this specification will not introduce message syntax that, when parsed according to earlier versions of this specification, would produce syntax or data model errors. Such messages MAY produce errors when formatted according to an earlier version of this specification.
From version 2.0, MessageFormat will only reserve, define, or require function names or function option names consisting of characters in the ranges a-z, A-Z, and 0-9. All other names in these categories are reserved for the use of implementations or users.
Note
Users defining custom names SHOULD include at least one character outside these ranges to ensure that they will be compatible with future versions of this specification.
Later versions of this specification will not introduce changes to the data model that would result in a data model representation based on this version being invalid.
For example, existing interfaces or fields will not be removed.
Later versions of this specification MAY introduce changes to the data model that would result in future data model representations not being valid for implementations of this version of the data model.
For example, a future version could introduce a new keyword, whose data model representation would be a new interface that is not recognized by this version's data model.
Later specification versions will not introduce syntax that cannot be represented by this version of the data model.
For example, a future version could introduce a new keyword. The future version's data model would provide an interface for that keyword while this version of the data model would parse the value into the interface
UnsupportedStatement
. Both data models would be "valid" in their context, but this version's would be missing any functionality for the new statement type.
This section defines the formal grammar describing the syntax of a single message.
This section is non-normative.
The design goals of the syntax specification are as follows:
The syntax should leverage the familiarity with ICU MessageFormat 1.0 in order to lower the barrier to entry and increase the chance of adoption. At the same time, the syntax should fix the pain points of ICU MessageFormat 1.0.
The syntax inside translatable content should be easy to understand for humans. This includes making it clear which parts of the message body are translatable content, which parts inside it are placeholders for expressions, as well as making the selection logic predictable and easy to reason about.
The syntax surrounding translatable content should be easy to write and edit for developers, localization engineers, and easy to parse by machines.
The syntax should make a single message easily embeddable inside many container formats:
.properties
, YAML, XML, inlined as string literals in programming languages, etc.
This includes a future MessageResource specification.
\n
, \012
, \x0A
, \u000A
,
\U0000000A
, 

, 

, %0A
, <LF>
, or something else entirely).This section is non-normative.
The syntax specification takes into account the following design restrictions:
Whitespace outside the translatable content should be insignificant. It should be possible to define a message entirely on a single line with no ambiguity, as well as to format it over multiple lines for clarity.
The syntax should define as few special characters and sigils as possible. Note that this necessitates extra care when presenting messages for human consumption, because they may contain invisible characters such as U+200B ZERO WIDTH SPACE, control characters such as U+0000 NULL and U+0009 TAB, permanently reserved noncharacters (U+FDD0 through U+FDEF and U+nFFFE and U+nFFFF where n is 0x0 through 0x10), private-use code points (U+E000 through U+F8FF, U+F0000 through U+FFFFD, and U+100000 through U+10FFFD), unassigned code points, and other potentially confusing content.
The purpose of MessageFormat is to allow content to vary at runtime. This variation might be due to placing a value into the content or it might be due to selecting a different bit of content based on some data value or it might be due to a combination of the two.
MessageFormat calls the template for a given formatting operation a message.
The values passed in at runtime (which are to be placed into the content or used to select between different content items) are called external variables. The author of a message can also assign local variables, including variables that modify external variables.
This part of the MessageFormat specification defines the syntax for a message, along with the concepts and terminology needed when processing a message during the formatting of a message at runtime.
The complete formal syntax of a message is described by the ABNF.
A message is well-formed if it satisfies all the rules of the grammar. Attempting to parse a message that is not well-formed will result in a Syntax Error.
A message is valid if it is well-formed and also meets the additional content restrictions and semantic requirements about its structure defined below for declarations, matcher and options. Attempting to parse a message that is not valid will result in a Data Model Error.
A message is the complete template for a specific message formatting request.
Note
This syntax is designed to be embeddable into many different programming languages and formats. As such, it avoids constructs, such as character escapes, that are specific to any given file format or processor. In particular, it avoids using quote characters common to many file formats and formal languages so that these do not need to be escaped in the body of a message.
Note
In general (and except where required by the syntax), whitespace carries no meaning in the structure of a message. While many of the examples in this spec are written on multiple lines, the formatting shown is primarily for readability.
Example This message:
.local $foo = { |horse| } {{You have a {$foo}!}}
Can also be written as:
.local $foo={|horse|}{{You have a {$foo}!}}
An exception to this is: whitespace inside a pattern is always significant.
Note
The syntax assumes that each message will be displayed with a left-to-right display order and be processed in the logical character order. The syntax also permits the use of right-to-left characters in identifiers, literals, and other values. This can result in confusion when viewing the message.
Additional restrictions or requirements, such as permitting the use of certain bidirectional control characters in the syntax, might be added during the Tech Preview to better manage bidirectional text. Feedback on the creation and management of messages containing bidirectional tokens is strongly desired.
A message can be a simple message or it can be a complex message.
message = simple-message / complex-message
A simple message contains a single pattern, with restrictions on its first character. An empty string is a valid simple message.
simple-message = [simple-start pattern]
simple-start = simple-start-char / text-escape / placeholder
A complex message is any message that contains declarations,
a matcher, or both.
A complex message always begins with either a keyword that has a .
prefix or a quoted pattern
and consists of:
complex-message = *(declaration [s]) complex-body
A declaration binds a variable identifier to a value within the scope of a message. This variable can then be used in other expressions within the same message. Declarations are optional: many messages will not contain any declarations.
An input-declaration binds a variable to an external input value. The variable-expression of an input-declaration MAY include an annotation that is applied to the external value.
A local-declaration binds a variable to the resolved value of an expression.
For compatibility with later MessageFormat 2 specification versions, declarations MAY also include reserved statements.
declaration = input-declaration / local-declaration / reserved-statement
input-declaration = input [s] variable-expression
local-declaration = local s variable [s] "=" [s] expression
Variables, once declared, MUST NOT be redeclared. A message that does any of the following is not valid and will produce a Duplicate Declaration error during processing:
A local-declaration MAY overwrite an external input value as long as the external input value does not appear in a previous declaration.
Note
These restrictions only apply to declarations. A placeholder or selector can apply a different annotation to a variable than one applied to the same variable named in a declaration. For example, this message is valid:
.input {$var :number maximumFractionDigits=0}
.match {$var :number maximumFractionDigits=2}
0 {{The selector can apply a different annotation to {$var} for the purposes of selection}}
* {{A placeholder in a pattern can apply a different annotation to {$var :number maximumFractionDigits=3}}}
(See the Errors section for examples of invalid messages)
A reserved statement reserves additional .keywords
for use by future versions of this specification.
Any such future keyword must start with .
,
followed by two or more lower-case ASCII characters.
The rest of the statement supports a similarly wide range of content as reserved annotations, but it MUST end with one or more expressions.
reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression)
reserved-keyword = "." name
Note
The reserved-keyword
ABNF rule is a simplification,
as it MUST NOT be considered to match any of the existing keywords
.input
, .local
, or .match
.
This allows flexibility in future standardization, as future definitions MAY define additional semantics and constraints on the contents of these reserved statements.
Implementations MUST NOT assign meaning or semantics to a reserved statement: these are reserved for future standardization. Implementations MUST NOT remove or alter the contents of a reserved statement.
The complex body of a complex message is the part that will be formatted. The complex body consists of either a quoted pattern or a matcher.
complex-body = quoted-pattern / matcher
A pattern contains a sequence of text and placeholders to be formatted as a unit. Unless there is an error, resolving a message always results in the formatting of a single pattern.
pattern = *(text-char / text-escape / placeholder)
A pattern MAY be empty.
A pattern MAY contain an arbitrary number of placeholders to be evaluated during the formatting process.
A quoted pattern is a pattern that is "quoted" to prevent
interference with other parts of the message.
A quoted pattern starts with a sequence of two U+007B LEFT CURLY BRACKET {{
and ends with a sequence of two U+007D RIGHT CURLY BRACKET }}
.
quoted-pattern = "{{" pattern "}}"
A quoted pattern MAY be empty.
An empty quoted pattern:
{{}}
text is the translateable content of a pattern.
Any Unicode code point is allowed, except for U+0000 NULL
and the surrogate code points U+D800 through U+DFFF inclusive.
The characters U+005C REVERSE SOLIDUS \
,
U+007B LEFT CURLY BRACKET {
, and U+007D RIGHT CURLY BRACKET }
MUST be escaped as \\
, \{
, and \}
respectively.
In the ABNF, text is represented by non-empty sequences of
simple-start-char
, text-char
, and text-escape
.
The first of these is used at the start of a simple message,
and matches text-char
except for not allowing U+002E FULL STOP .
.
The ABNF uses content-char
as a shared base for text and quoted literal characters.
Whitespace in text, including tabs, spaces, and newlines is significant and MUST be preserved during formatting.
simple-start-char = content-char / s / "@" / "|"
text-char = content-char / s / "." / "@" / "|"
quoted-char = content-char / s / "." / "@" / "{" / "}"
reserved-char = content-char / "."
content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)
/ %x0B-0C ; omit CR (%x0D)
/ %x0E-1F ; omit SP (%x20)
/ %x21-2D ; omit . (%x2E)
/ %x2F-3F ; omit @ (%x40)
/ %x41-5B ; omit \ (%x5C)
/ %x5D-7A ; omit { | } (%x7B-7D)
/ %x7E-2FFF ; omit IDEOGRAPHIC SPACE (%x3000)
/ %x3001-D7FF ; omit surrogates
/ %xE000-10FFFF
When a pattern is quoted by embedding the pattern in curly brackets, the resulting message can be embedded into various formats regardless of the container's whitespace trimming rules. Otherwise, care must be taken to ensure that pattern-significant whitespace is preserved.
Example In a Java
.properties
file, the valueshello
andhello2
both contain an identical message which consists of a single pattern. This pattern consists of text with exactly three spaces before and after the word "Hello":hello = {{ Hello }} hello2=\ Hello \
A placeholder is an expression or markup that appears inside of a pattern and which will be replaced during the formatting of a message.
placeholder = expression / markup
A matcher is the complex body of a message that allows runtime selection of the pattern to use for formatting. This allows the form or content of a message to vary based on values determined at runtime.
A matcher consists of the keyword .match
followed by at least one selector
and at least one variant.
When the matcher is processed, the result will be a single pattern that serves as the template for the formatting process.
A message can only be considered valid if the following requirements are satisfied:
*
.matcher = match-statement 1*([s] variant)
match-statement = match 1*([s] selector)
A message with a matcher:
.input {$count :number} .match {$count} one {{You have {$count} notification.}} * {{You have {$count} notifications.}}
A message containing a matcher formatted on a single line:
.match {:platform} windows {{Settings}} * {{Preferences}}
A selector is an expression that ranks or excludes the variants based on the value of the corresponding key in each variant. The combination of selectors in a matcher thus determines which pattern will be used during formatting.
selector = expression
There MUST be at least one selector in a matcher. There MAY be any number of additional selectors.
A message with a single selector that uses a custom function
:hasCase
which is a selector that allows the message to choose a pattern based on grammatical case:.match {$userName :hasCase} vocative {{Hello, {$userName :person case=vocative}!}} accusative {{Please welcome {$userName :person case=accusative}!}} * {{Hello!}}
A message with two selectors:
.input {$numLikes :integer} .input {$numShares :integer} .match {$numLikes} {$numShares} 0 0 {{Your item has no likes and has not been shared.}} 0 one {{Your item has no likes and has been shared {$numShares} time.}} 0 * {{Your item has no likes and has been shared {$numShares} times.}} one 0 {{Your item has {$numLikes} like and has not been shared.}} one one {{Your item has {$numLikes} like and has been shared {$numShares} time.}} one * {{Your item has {$numLikes} like and has been shared {$numShares} times.}} * 0 {{Your item has {$numLikes} likes and has not been shared.}} * one {{Your item has {$numLikes} likes and has been shared {$numShares} time.}} * * {{Your item has {$numLikes} likes and has been shared {$numShares} times.}}
A variant is a quoted pattern associated with a set of keys in a matcher. Each variant MUST begin with a sequence of keys, and terminate with a valid quoted pattern. The number of keys in each variant MUST match the number of selectors in the matcher.
Each key is separated from each other by whitespace. Whitespace is permitted but not required between the last key and the quoted pattern.
variant = key *(s key) [s] quoted-pattern
key = literal / "*"
A key is a value in a variant for use by a selector when ranking
or excluding variants during the matcher process.
A key can be either a literal value or the "catch-all" key *
.
The catch-all key is a special key, represented by *
,
that matches all values for a given selector.
An expression is a part of a message that will be determined during the message's formatting.
An expression MUST begin with U+007B LEFT CURLY BRACKET {
and end with U+007D RIGHT CURLY BRACKET }
.
An expression MUST NOT be empty.
An expression cannot contain another expression.
An expression MAY contain one more attributes.
A literal-expression contains a literal, optionally followed by an annotation.
A variable-expression contains a variable, optionally followed by an annotation.
An annotation-expression contains an annotation without an operand.
expression = literal-expression
/ variable-expression
/ annotation-expression
literal-expression = "{" [s] literal [s annotation] *(s attribute) [s] "}"
variable-expression = "{" [s] variable [s annotation] *(s attribute) [s] "}"
annotation-expression = "{" [s] annotation *(s attribute) [s] "}"
There are several types of expression that can appear in a message. All expressions share a common syntax. The types of expression are:
Additionally, an input-declaration can contain a variable-expression.
Examples of different types of expression
Declarations:
.input {$x :function option=value} .local $y = {|This is an expression|}
Selectors:
.match {$selector :functionRequired}
Placeholders:
This placeholder contains a literal expression: {|literal|} This placeholder contains a variable expression: {$variable} This placeholder references a function on a variable: {$variable :function with=options} This placeholder contains a function expression with a variable-valued option: {:function option=$variable}
An annotation is part of an expression containing either a function together with its associated options, or a private-use annotation or a reserved annotation.
annotation = function
/ private-use-annotation
/ reserved-annotation
An operand is the literal of a literal-expression or the variable of a variable-expression.
An annotation can appear in an expression by itself or following a single operand. When following an operand, the operand serves as input to the annotation.
A function is named functionality in an annotation. Functions are used to evaluate, format, select, or otherwise process data values during formatting.
Each function is defined by the runtime's function registry. A function's entry in the function registry will define whether the function is a selector or formatter (or both), whether an operand is required, what form the values of an operand can take, what options and option values are valid, and what outputs might result. See function registry for more information.
A function starts with a prefix sigil :
followed by an identifier.
The identifier MAY be followed by one or more options.
Options are not required.
function = ":" identifier *(s option)
A message with a function operating on the variable
$now
:It is now {$now :datetime}.
An option is a key-value pair containing a named argument that is passed to a function.
An option has an identifier and a value.
The identifier is separated from the value by an U+003D EQUALS SIGN =
along with
optional whitespace.
The value of an option can be either a literal or a variable.
Multiple options are permitted in an annotation. Options are separated from the preceding function identifier and from each other by whitespace. Each option's identifier MUST be unique within the annotation: an annotation with duplicate option identifiers is not valid.
The order of options is not significant.
option = identifier [s] "=" [s] (literal / variable)
Examples of functions with options
A message using the
:datetime
function. The optionweekday
has the literallong
as its value:Today is {$date :datetime weekday=long}!
A message using the
:datetime
function. The optionweekday
has a variable$dateStyle
as its value:Today is {$date :datetime weekday=$dateStyle}!
A private-use annotation is an annotation whose syntax is reserved for use by a specific implementation or by private agreement between multiple implementations. Implementations MAY define their own meaning and semantics for private-use annotations.
A private-use annotation starts with either U+0026 AMPERSAND &
or U+005E CIRCUMFLEX ACCENT ^
.
Characters, including whitespace, are assigned meaning by the implementation.
The definition of escapes in the reserved-body
production, used for the body of
a private-use annotation is an affordance to implementations that
wish to use a syntax exactly like other functions. Specifically:
\
, {
, and }
MUST be escaped as \\
, \{
, and \}
respectively
when they appear in the body of a private-use annotation.|
is special: it SHOULD be escaped as \|
in a private-use annotation,
but can appear unescaped as long as it is paired with another |
.
This is an affordance to allow literals to appear in the private use syntax.A private-use annotation MAY be empty after its introducing sigil.
private-use-annotation = private-start [[s] reserved-body]
private-start = "^" / "&"
Note
Users are cautioned that private-use annotations cannot be reliably exchanged and can result in errors during formatting. It is generally a better idea to use the function registry to define additional formatting or annotation options.
Here are some examples of what private-use sequences might look like:
Here's private use with an operand: {$foo &bar} Here's a placeholder that is entirely private-use: {&anything here} Here's a private-use function that uses normal function syntax: {$operand ^foo option=|literal|} The character \| has to be paired or escaped: {&private || |something between| or isolated: \| } Stop {& "translate 'stop' as a verb" might be a translator instruction or comment } Protect stuff in {^ph}<a>{^/ph}private use{^ph}</a>{^/ph}
A reserved annotation is an annotation whose syntax is reserved for future standardization.
A reserved annotation starts with a reserved character. The remaining part of a reserved annotation, called a reserved body, MAY be empty or contain arbitrary text that starts and ends with a non-whitespace character.
This allows maximum flexibility in future standardization, as future definitions MAY define additional semantics and constraints on the contents of these annotations.
Implementations MUST NOT assign meaning or semantics to
an annotation starting with reserved-annotation-start
:
these are reserved for future standardization.
Whitespace before or after a reserved body is not part of the reserved body.
Implementations MUST NOT remove or alter the contents of a reserved body,
including any interior whitespace,
but MAY remove or alter whitespace before or after the reserved body.
While a reserved sequence is technically "well-formed", unrecognized reserved-annotations or private-use-annotations have no meaning.
reserved-annotation = reserved-annotation-start [[s] reserved-body]
reserved-annotation-start = "!" / "%" / "*" / "+" / "<" / ">" / "?" / "~"
reserved-body = reserved-body-part *([s] reserved-body-part)
reserved-body-part = reserved-char / reserved-escape / quoted
Markup placeholders are pattern parts that can be used to represent non-language parts of a message, such as inline elements or styling that should apply to a span of parts.
Markup MUST begin with U+007B LEFT CURLY BRACKET {
and end with U+007D RIGHT CURLY BRACKET }
.
Markup MAY contain one more attributes.
Markup comes in three forms:
Markup-open starts with U+0023 NUMBER SIGN #
and
represents an opening element within the message,
such as markup used to start a span.
It MAY include options.
Markup-standalone starts with U+0023 NUMBER SIGN #
and has a U+002F SOLIDUS /
immediately before its closing }
representing a self-closing or standalone element within the message.
It MAY include options.
Markup-close starts with U+002F SOLIDUS /
and
is a pattern part ending a span.
markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone
/ "{" [s] "/" identifier *(s option) *(s attribute) [s] "}" ; close
A message with one
button
markup span and a standaloneimg
markup element:{#button}Submit{/button} or {#img alt=|Cancel| /}.
A message with attributes in the closing tag:
{#ansi attr=|bold,italic|}Bold and italic{/ansi attr=|bold|} italic only {/ansi attr=|italic|} no formatting.}
A markup-open can appear without a corresponding markup-close. A markup-close can appear without a corresponding markup-open. Markup placeholders can appear in any order without making the message invalid. However, specifications or implementations defining markup might impose requirements on the pairing, ordering, or contents of markup during formatting.
Attributes are reserved for standardization by future versions of this specification. Examples in this section are meant to be illustrative and might not match future requirements or usage.
Note
The Tech Preview does not provide a built-in mechanism for overriding
values in the formatting context (most notably the locale)
Nor does it provide a mechanism for identifying specific expressions
such as by assigning a name or id.
The utility of these types of mechanisms has been debated.
There are at least two proposed mechanisms for implementing support for
these.
Specifically, one mechanism would be to reserve specifically-named options,
possibly using a Unicode namespace (i.e. locale=xxx
or u:locale=xxx
).
Such options would be reserved for use in any and all functions or markup.
The other mechanism would be to use the reserved "expression attribute" syntax
for this purpose (i.e. @locale=xxx
or @id=foo
)
Neither mechanism was included in this Tech Preview.
Feedback on the preferred mechanism for managing these features
is strongly desired.
In the meantime, function authors and other implementers are cautioned to avoid creating function-specific or implementation-specific option values for this purpose. One workaround would be to use the implementation's namespace for these features to insure later interoperability when such a mechanism is finalized during the Tech Preview period. Specifically:
An attribute is an identifier with an optional value that appears in an expression or in markup.
Attributes are prefixed by a U+0040 COMMERCIAL AT @
sign,
followed by an identifier.
An attribute MAY have a value which is separated from the identifier
by an U+003D EQUALS SIGN =
along with optional whitespace.
The value of an attribute can be either a literal or a variable.
Multiple attributes are permitted in an expression or markup. Each attribute is separated by whitespace.
The order of attributes is not significant.
attribute = "@" identifier [[s] "=" [s] (literal / variable)]
Examples of expressions and markup with attributes:
A message including a literal that should not be translated:
In French, "{|bonjour| @translate=no}" is a greeting
A message with markup that should not be copied:
Have a {#span @can-copy}great and wonderful{/span @can-copy} birthday!
This section defines common elements used to construct messages.
A keyword is a reserved token that has a unique meaning in the message syntax.
The following three keywords are defined: .input
, .local
, and .match
.
Keywords are always lowercase and start with U+002E FULL STOP .
.
input = %s".input"
local = %s".local"
match = %s".match"
A literal is a character sequence that appears outside of text in various parts of a message. A literal can appear as a key value, as the operand of a literal-expression, or in the value of an option. A literal MAY include any Unicode code point except for U+0000 NULL or the surrogate code points U+D800 through U+DFFF.
All code points are preserved.
A quoted literal begins and ends with U+005E VERTICAL BAR |
.
The characters \
and |
within a quoted literal MUST be
escaped as \\
and \|
.
An unquoted literal is a literal that does not require the |
quotes around it to be distinct from the rest of the message syntax.
An unquoted MAY be used when the content of the literal
contains no whitespace and otherwise matches the unquoted
production.
Any unquoted literal MAY be quoted.
Implementations MUST NOT distinguish between quoted and unquoted literals
that have the same sequence of code points.
Unquoted literals can contain a name or consist of a number-literal. A number-literal uses the same syntax as JSON and is intended for the encoding of number values in operands or options, or as keys for variants.
literal = quoted / unquoted
quoted = "|" *(quoted-char / quoted-escape) "|"
unquoted = name / number-literal
number-literal = ["-"] (%x30 / (%x31-39 *DIGIT)) ["." 1*DIGIT] [%i"e" ["-" / "+"] 1*DIGIT]
An identifier is a character sequence that
identifies a function, markup, or option.
Each identifier consists of a name optionally preceeded by
a namespace.
When present, the namespace is separated from the name by a
U+003A COLON :
.
Built-in functions and their options do not have a namespace identifier.
The namespace u
(U+0075 LATIN SMALL LETTER U)
is reserved for future standardization.
Function identifiers are prefixed with :
.
Markup identifiers are prefixed with #
or /
.
Option identifiers have no prefix.
A name is a character sequence used in an identifier or as the name for a variable or the value of an unquoted literal.
Variable names are prefixed with $
.
Valid content for names is based on Namespaces in XML 1.0's
NCName.
This is different from XML's Name
in that it MUST NOT contain a U+003A COLON :
.
Otherwise, the set of characters allowed in a name is large.
Note
External variables can be passed in that are not valid names. Such variables cannot be referenced in a message, but are not otherwise errors.
Examples:
A variable:
This has a {$variable}
A function:
This has a {:function}
An add-on function from the
icu
namespace:This has a {:icu:function}
An option and an add-on option:
This has {:options option=value icu:option=add_on}
Support for namespaces and their interpretation is implementation-defined in this release.
variable = "$" name
option = identifier [s] "=" [s] (literal / variable)
identifier = [namespace ":"] name
namespace = name
name = name-start *name-char
name-start = ALPHA / "_"
/ %xC0-D6 / %xD8-F6 / %xF8-2FF
/ %x370-37D / %x37F-1FFF / %x200C-200D
/ %x2070-218F / %x2C00-2FEF / %x3001-D7FF
/ %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF
name-char = name-start / DIGIT / "-" / "."
/ %xB7 / %x300-36F / %x203F-2040
An escape sequence is a two-character sequence starting with
U+005C REVERSE SOLIDUS \
.
An escape sequence allows the appearance of lexically meaningful characters in the body of text, quoted, or reserved (which includes, in this case, private-use) sequences respectively:
text-escape = backslash ( backslash / "{" / "}" )
quoted-escape = backslash ( backslash / "|" )
reserved-escape = backslash ( backslash / "{" / "|" / "}" )
backslash = %x5C ; U+005C REVERSE SOLIDUS "\"
Whitespace is defined as one or more of U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (new line), U+000D CARRIAGE RETURN, U+3000 IDEOGRAPHIC SPACE, or U+0020 SPACE.
Inside patterns and quoted literals, whitespace is part of the content and is recorded and stored verbatim. Whitespace is not significant outside translatable text, except where required by the syntax.
Note
The character U+3000 IDEOGRAPHIC SPACE is included in whitespace for compatibility with certain East Asian keyboards and input methods, in which users might accidentally create these characters in a message.
s = 1*( SP / HTAB / CR / LF / %x3000 )
The grammar below uses the ABNF notation [STD68], including the modifications found in RFC 7405.
RFC7405 defines a variation of ABNF that is case-sensitive.
Some ABNF tools are only compatible with the specification found in
RFC 5234.
To make message.abnf
compatible with that version of ABNF, replace
the rules of the same name with this block:
input = %x2E.69.6E.70.75.74 ; ".input"
local = %x2E.6C.6F.63.61.6C ; ".local"
match = %x2E.6D.61.74.63.68 ; ".match"
message.abnf
message = simple-message / complex-message
simple-message = [simple-start pattern]
simple-start = simple-start-char / text-escape / placeholder
pattern = *(text-char / text-escape / placeholder)
placeholder = expression / markup
complex-message = *(declaration [s]) complex-body
declaration = input-declaration / local-declaration / reserved-statement
complex-body = quoted-pattern / matcher
input-declaration = input [s] variable-expression
local-declaration = local s variable [s] "=" [s] expression
quoted-pattern = "{{" pattern "}}"
matcher = match-statement 1*([s] variant)
match-statement = match 1*([s] selector)
selector = expression
variant = key *(s key) [s] quoted-pattern
key = literal / "*"
; Expressions
expression = literal-expression
/ variable-expression
/ annotation-expression
literal-expression = "{" [s] literal [s annotation] *(s attribute) [s] "}"
variable-expression = "{" [s] variable [s annotation] *(s attribute) [s] "}"
annotation-expression = "{" [s] annotation *(s attribute) [s] "}"
annotation = function
/ private-use-annotation
/ reserved-annotation
markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone
/ "{" [s] "/" identifier *(s option) *(s attribute) [s] "}" ; close
; Expression and literal parts
function = ":" identifier *(s option)
option = identifier [s] "=" [s] (literal / variable)
; Attributes are reserved for future standardization
attribute = "@" identifier [[s] "=" [s] (literal / variable)]
variable = "$" name
literal = quoted / unquoted
quoted = "|" *(quoted-char / quoted-escape) "|"
unquoted = name / number-literal
; number-literal matches JSON number (https://www.rfc-editor.org/rfc/rfc8259#section-6)
number-literal = ["-"] (%x30 / (%x31-39 *DIGIT)) ["." 1*DIGIT] [%i"e" ["-" / "+"] 1*DIGIT]
; Keywords; Note that these are case-sensitive
input = %s".input"
local = %s".local"
match = %s".match"
; Reserve additional .keywords for use by future versions of this specification.
reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression)
; Note that the following production is a simplification,
; as this rule MUST NOT be considered to match existing keywords
; (`.input`, `.local`, and `.match`).
reserved-keyword = "." name
; Reserve additional sigils for use by future versions of this specification.
reserved-annotation = reserved-annotation-start [[s] reserved-body]
reserved-annotation-start = "!" / "%" / "*" / "+" / "<" / ">" / "?" / "~"
; Reserve sigils for private-use by implementations.
private-use-annotation = private-start [[s] reserved-body]
private-start = "^" / "&"
reserved-body = reserved-body-part *([s] reserved-body-part)
reserved-body-part = reserved-char / reserved-escape / quoted
; Names and identifiers
; identifier matches https://www.w3.org/TR/REC-xml-names/#NT-QName
; name matches https://www.w3.org/TR/REC-xml-names/#NT-NCName
identifier = [namespace ":"] name
namespace = name
name = name-start *name-char
name-start = ALPHA / "_"
/ %xC0-D6 / %xD8-F6 / %xF8-2FF
/ %x370-37D / %x37F-1FFF / %x200C-200D
/ %x2070-218F / %x2C00-2FEF / %x3001-D7FF
/ %xF900-FDCF / %xFDF0-FFFC / %x10000-EFFFF
name-char = name-start / DIGIT / "-" / "."
/ %xB7 / %x300-36F / %x203F-2040
; Restrictions on characters in various contexts
simple-start-char = content-char / s / "@" / "|"
text-char = content-char / s / "." / "@" / "|"
quoted-char = content-char / s / "." / "@" / "{" / "}"
reserved-char = content-char / "."
content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)
/ %x0B-0C ; omit CR (%x0D)
/ %x0E-1F ; omit SP (%x20)
/ %x21-2D ; omit . (%x2E)
/ %x2F-3F ; omit @ (%x40)
/ %x41-5B ; omit \ (%x5C)
/ %x5D-7A ; omit { | } (%x7B-7D)
/ %x7E-2FFF ; omit IDEOGRAPHIC SPACE (%x3000)
/ %x3001-D7FF ; omit surrogates
/ %xE000-10FFFF
; Character escapes
text-escape = backslash ( backslash / "{" / "}" )
quoted-escape = backslash ( backslash / "|" )
reserved-escape = backslash ( backslash / "{" / "|" / "}" )
backslash = %x5C ; U+005C REVERSE SOLIDUS "\"
; Whitespace
s = 1*( SP / HTAB / CR / LF / %x3000 )
Errors in messages and their formatting MAY occur and be detected at different stages of processing. Where available, the use of validation tools is recommended, as early detection of errors makes their correction easier.
Syntax Errors and Data Model Errors apply to all message processors, and MUST be emitted as soon as possible. The other error categories are only emitted during formatting, but it might be possible to detect them with validation tools.
During selection, an expression handler MUST only emit Resolution Errors and Selection Errors. During formatting, an expression handler MUST only emit Resolution Errors and Formatting Errors.
Resolution Errors and Formatting Errors in expressions that are not used in pattern selection or formatting MAY be ignored, as they do not affect the output of the formatter.
In all cases, when encountering a runtime error, a message formatter MUST provide some representation of the message. An informative error or errors MUST also be separately provided.
When a message contains more than one error, or contains some error which leads to further errors, an implementation which does not emit all of the errors SHOULD prioritise Syntax Errors and Data Model Errors over others.
When an error occurs within a selector,
the selector MUST NOT match any variant key other than the catch-all *
and a Resolution Error or a Selection Error MUST be emitted.
Syntax Errors occur when the syntax representation of a message is not well-formed.
Example invalid messages resulting in a Syntax Error:
{{Missing end braces
{{Missing one end brace}
Unknown {{expression}}
.local $var = {|no message body|}
Data Model Errors occur when a message is invalid due to violating one of the semantic requirements on its structure.
A Variant Key Mismatch occurs when the number of keys on a variant does not equal the number of selectors.
Example invalid messages resulting in a Variant Key Mismatch error:
.match {$one :func} 1 2 {{Too many}} * {{Otherwise}}
.match {$one :func} {$two :func} 1 2 {{Two keys}} * {{Missing a key}} * * {{Otherwise}}
A Missing Fallback Variant error occurs when the message does not include a variant with only catch-all keys.
Example invalid messages resulting in a Missing Fallback Variant error:
.match {$one :func} 1 {{Value is one}} 2 {{Value is two}}
.match {$one :func} {$two :func} 1 * {{First is one}} * 1 {{Second is one}}
A Missing Selector Annotation error occurs when the message contains a selector that does not have an annotation, or contains a variable that does not directly or indirectly reference a declaration with an annotation.
Examples of invalid messages resulting in a Missing Selector Annotation error:
.match {$one} 1 {{Value is one}} * {{Value is not one}}
.local $one = {|The one|} .match {$one} 1 {{Value is one}} * {{Value is not one}}
.input {$one} .match {$one} 1 {{Value is one}} * {{Value is not one}}
A Duplicate Declaration error occurs when a variable is declared more than once. Note that an input variable is implicitly declared when it is first used, so explicitly declaring it after such use is also an error.
Examples of invalid messages resulting in a Duplicate Declaration error:
.input {$var :number maximumFractionDigits=0} .input {$var :number minimumFractionDigits=0} {{Redeclaration of the same variable}} .local $var = {$ext :number maximumFractionDigits=0} .input {$var :number minimumFractionDigits=0} {{Redeclaration of a local variable}} .input {$var :number minimumFractionDigits=0} .local $var = {$ext :number maximumFractionDigits=0} {{Redeclaration of an input variable}} .input {$var :number minimumFractionDigits=$var2} .input {$var2 :number} {{Redeclaration of the implicit input variable $var2}} .local $var = {$ext :someFunction} .local $var = {$error} .local $var2 = {$var2 :error} {{{$var} cannot be redefined. {$var2} cannot refer to itself}}
A Duplicate Option Name error occurs when the same identifier appears on the left-hand side of more than one option in the same expression.
Examples of invalid messages resulting in a Duplicate Option Name error:
Value is {42 :number style=percent style=decimal}
.local $foo = {horse :func one=1 two=2 one=1} {{This is {$foo}}}
Resolution Errors occur when the runtime value of a part of a message cannot be determined.
An Unresolved Variable error occurs when a variable reference cannot be resolved.
For example, attempting to format either of the following messages would result in an Unresolved Variable error if done within a context that does not provide for the variable reference
$var
to be successfully resolved:The value is {$var}.
.match {$var :func} 1 {{The value is one.}} * {{The value is not one.}}
An Unknown Function error occurs when an expression includes a reference to a function which cannot be resolved.
For example, attempting to format either of the following messages would result in an Unknown Function error if done within a context that does not provide for the function
:func
to be successfully resolved:The value is {horse :func}.
.match {|horse| :func} 1 {{The value is one.}} * {{The value is not one.}}
An Unsupported Expression error occurs when an expression uses syntax reserved for future standardization, or for private implementation use that is not supported by the current implementation.
For example, attempting to format this message would always result in an Unsupported Expression error:
The value is {!horse}.
Attempting to format this message would result in an Unsupported Expression error if done within a context that does not support the
^
private use sigil:.match {|horse| ^private} 1 {{The value is one.}} * {{The value is not one.}}
An Invalid Expression error occurs when a message includes an expression
whose implementation-defined internal requirements produce an error during function resolution
or when a function returns a value (such as null
) that the implementation does not support.
An Operand Mismatch Error is an Invalid Expression error that occurs when an operand provided to a function during function resolution does not match one of the expected implementation-defined types for that function; or in which a literal operand value does not have the required format and thus cannot be processed into one of the expected implementation-defined types for that specific function.
For example, the following message produces an Operand Mismatch Error (a type of Invalid Expression error) because the literal
|horse|
does not match the productionnumber-literal
, which is a requirement of the function:number
for its operand:.local $horse = {horse :number} {{You have a {$horse}.}}
The following message might produce an Invalid Expression error if the the function
:function
threw an exception or otherwise emitted an error rather than returning a valid value:{{This has an invalid expression {$var :function} because it has a bug in it.}}
An Unsupported Statement error occurs when a message includes a reserved statement.
For example, attempting to format this message would always result in an Unsupported Statement error:
.some {|horse|} {{The message body}}
Selection Errors occur when message selection fails.
For example, attempting to format either of the following messages might result in a Selection Error if done within a context that uses a
:number
selector function which requires its input to be numeric:.match {|horse| :number} 1 {{The value is one.}} * {{The value is not one.}}
.local $sel = {|horse| :number} .match {$sel} 1 {{The value is one.}} * {{The value is not one.}}
Formatting Errors occur during the formatting of a resolved value, for example when encountering a value with an unsupported type or an internally inconsistent set of options.
For example, attempting to format any of the following messages might result in a Formatting Error if done within a context that
- provides for the variable reference
$user
to resolve to an object{ name: 'Kat', id: 1234 }
,- provides for the variable reference
$field
to resolve to a string'address'
, and- uses a
:get
formatting function which requires its argument to be an object and an optionfield
to be provided with a string value,Hello, {horse :get field=name}!
Hello, {$user :get}!
.local $id = {$user :get field=id} {{Hello, {$id :get field=name}!}}
Your {$field} is {$id :get field=$field}
Implementations and tooling can greatly benefit from a structured definition of formatting and matching functions available to messages at runtime. This specification is intended to provide a mechanism for storing such declarations in a portable manner.
This section is non-normative.
The registry provides a machine-readable description of MessageFormat 2 extensions (custom functions), in order to support the following goals and use-cases:
This section is normative.
To be conformant with MessageFormat 2.0, an implementation MUST implement the functions, options and option values, operands and outputs described in the section Default Registry below.
Implementations MAY implement additional functions or additional options. In particular, implementations are encouraged to provide feedback on proposed options and their values.
Important
In the Tech Preview, the registry data model should be regarded as experimental. Changes to the format are expected during this period. Feedback on the registry's format and implementation is encouraged!
Implementations are not required to provide a machine-readable registry nor to read or interpret the registry data model in order to be conformant.
The MessageFormat 2.0 Registry was created to describe the core set of formatting and selection functions, including operands, options, and option values. This is the minimum set of functionality needed for conformance. By using the same names and values, messages can be used interchangeably by different implementations, regardless of programming language or runtime environment. This ensures that developers do not have to relearn core MessageFormat syntax and functionality when moving between platforms and that translators do not need to know about the runtime environment for most selection or formatting operations.
The registry provides a machine-readable description of functions suitable for tools, such as those used in translation automation, so that variant expansion and information about available options and their effects are available in the translation ecosystem. To that end, implementations are strongly encouraged to provide appropriately tailored versions of the registry for consumption by tools (even if not included in software distributions) and to encourage any add-on or plug-in functionality to provide a registry to support localization tooling.
This section is non-normative.
Important
This part of the specification is not part of the Tech Preview.
The registry contains descriptions of function signatures.
The main building block of the registry is the <function>
element.
It represents an implementation of a custom function available to translation at runtime.
A function defines a human-readable <description>
of its behavior
and one or more machine-readable signatures of how to call it.
Named <validationRule>
elements can optionally define regex validation rules for
literals, option values, and variant keys.
MessageFormat 2 functions can be invoked in two contexts:
|1.5|
may be formatted to 1,5
in a language which uses commas as decimal separators,A single function name may be used in both contexts, regardless of whether it's implemented as one or multiple functions.
A signature defines one particular set of at most one argument and any number of named options
that can be used together in a single call to the function.
<formatSignature>
corresponds to a function call inside a placeholder inside translatable text.
<matchSignature>
corresponds to a function call inside a selector.
A signature may define the positional argument of the function with the <input>
element.
If the <input>
element is not present, the function is defined as a nullary function.
A signature may also define one or more <option>
elements representing named options to the function.
An option can be omitted in a call to the function,
unless the required
attribute is present.
They accept either a finite enumeration of values (the values
attribute)
or validate their input with a regular expression (the validationRule
attribute).
Read-only options (the readonly
attribute) can be displayed to translators in CAT tools, but may not be edited.
As the <input>
and <option>
rules may be locale-dependent,
each signature can include an <override locales="...">
that extends and overrides
the corresponding input and options rules.
If multiple <override>
elements would match the current locale,
only the first one is used.
Matching-function signatures additionally include one or more <match>
elements
to define the keys against which they can match when used as selectors.
Functions may also include <alias>
definitions,
which provide shorthands for commonly used option baskets.
An alias name may be used equivalently to a function name in messages.
Its <setOption>
values are always set, and may not be overridden in message annotations.
If a <function>
, <input>
or <option>
includes multiple <description>
elements,
each SHOULD have a different xml:lang
attribute value.
This allows for the descriptions of these elements to be themselves localized
according to the preferred locale of the message authors and editors.
The following registry.xml
is an example of a registry file
which may be provided by an implementation to describe its built-in functions.
For the sake of brevity, only locales="en"
is considered.
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE registry SYSTEM "./registry.dtd">
<registry xml:lang="en">
<function name="platform">
<description>Match the current OS.</description>
<matchSignature>
<match values="windows linux macos android ios"/>
</matchSignature>
</function>
<validationRule id="anyNumber" regex="-?[0-9]+(\.[0-9]+)"/>
<validationRule id="positiveInteger" regex="[0-9]+"/>
<validationRule id="currencyCode" regex="[A-Z]{3}"/>
<function name="number">
<description>
Format a number.
Match a **formatted** numerical value against CLDR plural categories or against a number literal.
</description>
<matchSignature>
<input validationRule="anyNumber"/>
<option name="type" values="cardinal ordinal"/>
<option name="minimumIntegerDigits" validationRule="positiveInteger"/>
<option name="minimumFractionDigits" validationRule="positiveInteger"/>
<option name="maximumFractionDigits" validationRule="positiveInteger"/>
<option name="minimumSignificantDigits" validationRule="positiveInteger"/>
<option name="maximumSignificantDigits" validationRule="positiveInteger"/>
<!-- Since this applies to both cardinal and ordinal, all plural options are valid. -->
<match locales="en" values="one two few other" validationRule="anyNumber"/>
<match values="zero one two few many other" validationRule="anyNumber"/>
</matchSignature>
<formatSignature>
<input validationRule="anyNumber"/>
<option name="minimumIntegerDigits" validationRule="positiveInteger"/>
<option name="minimumFractionDigits" validationRule="positiveInteger"/>
<option name="maximumFractionDigits" validationRule="positiveInteger"/>
<option name="minimumSignificantDigits" validationRule="positiveInteger"/>
<option name="maximumSignificantDigits" validationRule="positiveInteger"/>
<option name="style" readonly="true" values="decimal currency percent unit" default="decimal"/>
<option name="currency" readonly="true" validationRule="currencyCode"/>
</formatSignature>
<alias name="integer">
<description>Locale-sensitive integral number formatting</description>
<setOption name="maximumFractionDigits" value="0" />
<setOption name="style" value="decimal" />
</alias>
</function>
</registry>
Given the above description, the :number
function is defined to work both in a selector and a placeholder:
.match {$count :number}
1 {{One new message}}
* {{{$count :number} new messages}}
Furthermore,
:number
's <matchSignature>
contains two <match>
elements
which allow the validation of variant keys.
The element whose locales
best matches the current locale
using resource item lookup from LDML is used.
An element with no locales
attribute is the default
(and is considered equivalent to the root
locale).
<match locales="en" values="one two few other" .../>
can be used in locales like en
and en-GB
to validate the when other
variant by verifying that the other
key is present
in the list of enumarated values: one other
.<match ... validationRule="anyNumber"/>
can be used to valide the when 1
variant
by testing the 1
key against the anyNumber
regular expression defined in the registry file.A localization engineer can then extend the registry by defining the following customRegistry.xml
file.
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE registry SYSTEM "./registry.dtd">
<registry xml:lang="en">
<function name="noun">
<description>Handle the grammar of a noun.</description>
<formatSignature>
<override locales="en">
<input/>
<option name="article" values="definite indefinite"/>
<option name="plural" values="one other"/>
<option name="case" values="nominative genitive" default="nominative"/>
</override>
</formatSignature>
</function>
<function name="adjective">
<description>Handle the grammar of an adjective.</description>
<formatSignature>
<override locales="en">
<input/>
<option name="article" values="definite indefinite"/>
<option name="plural" values="one other"/>
<option name="case" values="nominative genitive" default="nominative"/>
</override>
</formatSignature>
<formatSignature>
<override locales="en">
<input/>
<option name="article" values="definite indefinite"/>
<option name="accord"/>
</override>
</formatSignature>
</function>
</registry>
Messages can now use the :noun
and the :adjective
functions.
The following message references the first signature of :adjective
,
which expects the plural
and case
options:
You see {$color :adjective article=indefinite plural=one case=nominative} {$object :noun case=nominative}!
The following message references the second signature of :adjective
,
which only expects the accord
option:
.input {$object :noun case=nominative} {{You see {$color :adjective article=indefinite accord=$object} {$object}!}}
Important
This part of the specification is part of the Tech Preview and is NORMATIVE.
This section describes the functions which each implementation MUST provide to be conformant with this specification.
:string
functionThe function :string
provides string selection and formatting.
The operand of :string
is either any implementation-defined type
that is a string or for which conversion to a string is supported,
or any literal value.
All other values produce an Invalid Expression error.
For example, in Java, implementations of the
java.lang.CharSequence
interface (such asjava.lang.String
orjava.lang.StringBuilder
), the typechar
, or the classjava.lang.Character
might be considered as the "implementation-defined types". Such an implementation might also support other classes via the methodtoString()
. This might be used to enable selection of aenum
value by name, for example.Other programming languages would define string and character sequence types or classes according to their local needs, including, where appropriate, coercion to string.
The function :string
has no options.
Note
Proposals for string transformation options or implementation experience with user requirements is desired during the Tech Preview.
When implementing MatchSelectorKeys(resolvedSelector, keys)
where resolvedSelector
is the resolved value of a selector expression
and keys
is a list of strings,
the :string
selector performs as described below.
compare
be the string value of resolvedSelector
.result
be a new empty list of strings.key
in keys
:key
and compare
consist of the same sequence of Unicode code points, thenkey
as the last element of the list result
.result
.Note
Matching of key
and compare
values is sensitive to the sequence of code points
in each string.
As a result, variations in how text can be encoded can affect the performance of matching.
The function :string
does not perform case folding or Unicode Normalization of string values.
Users SHOULD encode messages and their parts (such as keys and operands),
in Unicode Normalization Form C (NFC) unless there is a very good reason
not to.
See also: String Matching
Note
Unquoted string literals in a variant do not include spaces.
If users wish to match strings that include whitespace
(including U+3000 IDEOGRAPHIC SPACE
)
to a key, the key
needs to be quoted.
For example:
.match {$string :string}
| space key | {{Matches the string " space key "}}
* {{Matches the string "space key"}}
The :string
function returns the string value of the resolved value of the operand.
:number
functionThe function :number
is a selector and formatter for numeric values.
The function :number
requires a Number Operand as its operand.
Some options do not have default values defined in this specification. The defaults for these options are implementation-dependent. In general, the default values for such options depend on the locale, the value of other options, or both.
The following options and their values are required to be available on the function :number
:
select
plural
(default; see Default Value of select
Option below)ordinal
exact
compactDisplay
(this option only has meaning when combined with the option notation=compact
)short
(default)long
notation
standard
(default)scientific
engineering
compact
numberingSystem
signDisplay
auto
(default)always
exceptZero
negative
never
style
decimal
(default)percent
(see Percent Style below)useGrouping
auto
(default)always
never
min2
minimumIntegerDigits
1
)minimumFractionDigits
maximumFractionDigits
minimumSignificantDigits
maximumSignificantDigits
Note
The following options and option values are being developed during the Technical Preview period.
The following values for the option style
are not part of the default registry.
Implementations SHOULD avoid creating options that conflict with these, but
are encouraged to track development of these options during Tech Preview:
currency
unit
The following options are not part of the default registry. Implementations SHOULD avoid creating options that conflict with these, but are encouraged to track development of these options during Tech Preview:
currency
currencyDisplay
symbol
(default)narrowSymbol
code
name
currencySign
accounting
standard
(default)unit
unitDisplay
long
short
(default)narrow
select
OptionThe value plural
is the default for the option select
because it is the most common use case for numeric selection.
It can be used for exact value matches but also allows for the grammatical needs of
languages using CLDR's plural rules.
This might not be noticeable in the source language (particularly English),
but can cause problems in target locales that the original developer is not considering.
For example, a naive developer might use a special message for the value
1
without considering a locale's need for aone
plural:.match {$var :number} 1 {{You have one last chance}} one {{You have {$var} chance remaining}} * {{You have {$var} chances remaining}}
The
one
variant is needed by languages such as Polish or Russian. Such locales typically also require other keywords such astwo
,few
, andmany
.
When implementing style=percent
, the numeric value of the operand
MUST be multiplied by 100 for the purposes of formatting.
For example,
The total was {0.5 :number style=percent}.
should format in a manner similar to:
The total was 50%.
The function :number
performs selection as described in Number Selection below.
:integer
functionThe function :integer
is a selector and formatter for matching or formatting numeric
values as integers.
The function :integer
requires a Number Operand as its operand.
Some options do not have default values defined in this specification. The defaults for these options are implementation-dependent. In general, the default values for such options depend on the locale, the value of other options, or both.
The following options and their values are required in the default registry to be available on the
function :integer
:
select
plural
(default)ordinal
exact
numberingSystem
signDisplay
auto
(default)always
exceptZero
negative
never
style
decimal
(default)percent
(see Percent Style below)useGrouping
auto
(default)always
min2
minimumIntegerDigits
1
)maximumSignificantDigits
Note
The following options and option values are being developed during the Technical Preview period.
The following values for the option style
are not part of the default registry.
Implementations SHOULD avoid creating options that conflict with these, but
are encouraged to track development of these options during Tech Preview:
currency
unit
The following options are not part of the default registry. Implementations SHOULD avoid creating options that conflict with these, but are encouraged to track development of these options during Tech Preview:
currency
currencyDisplay
symbol
(default)narrowSymbol
code
name
currencySign
accounting
standard
(default)unit
unitDisplay
long
short
(default)narrow
select
OptionThe value plural
is the default for the option select
because it is the most common use case for numeric selection.
It can be used for exact value matches but also allows for the grammatical needs of
languages using CLDR's plural rules.
This might not be noticeable in the source language (particularly English),
but can cause problems in target locales that the original developer is not considering.
For example, a naive developer might use a special message for the value
1
without considering a locale's need for aone
plural:.match {$var :integer} 1 {{You have one last chance}} one {{You have {$var} chance remaining}} * {{You have {$var} chances remaining}}
The
one
variant is needed by languages such as Polish or Russian. Such locales typically also require other keywords such astwo
,few
, andmany
.
When implementing style=percent
, the numeric value of the operand
MUST be multiplied by 100 for the purposes of formatting.
For example,
The total was {0.5 :number style=percent}.
should format in a manner similar to:
The total was 50%.
The function :integer
performs selection as described in Number Selection below.
The operand of a number function is either an implementation-defined type or
a literal whose contents match the number-literal
production in the ABNF.
All other values produce an Invalid Expression error.
For example, in Java, any subclass of
java.lang.Number
plus the primitive types (byte
,short
,int
,long
,float
,double
, etc.) might be considered as the "implementation-defined numeric types". Implementations in other programming languages would define different types or classes according to their local needs.
Note
String values passed as variables in the formatting context's
input mapping can be formatted as numeric values as long as their
contents match the number-literal
production in the ABNF.
For example, if the value of the variable num
were the string
-1234.567
, it would behave identically to the local
variable in this example:
.local $example = {|-1234.567| :number}
{{{$num :number} == {$example}}}
Note
Implementations are encouraged to provide support for compound types or data structures
that provide additional semantic meaning to the formatting of number-like values.
For example, in ICU4J, the type com.ibm.icu.util.Measure
can be used to communicate
a value that includes a unit
or the type com.ibm.icu.util.CurrencyAmount
can be used to set the currency and related
options (such as the number of fraction digits).
Some options of number functions are defined to take a "digit size option". Implementations of number functions use these options to control aspects of numeric display such as the number of fraction, integer, or significant digits.
A "digit size option" is an option value that the function interprets as a small integer value greater than or equal to zero. Implementations MAY define an upper limit on the resolved value of a digit size option option consistent with that implementation's practical limits.
In most cases, the value of a digit size option will be a string that encodes the value as a decimal integer. Implementations MAY also accept implementation-defined types as the value. When provided as a string, the representation of a digit size option matches the following ABNF:
digit-size-option = "0" / (("1"-"9") [DIGIT])
Number selection has three modes:
exact
selection matches the operand to explicit numeric keys exactlyplural
selection matches the operand to explicit numeric keys exactly
or to plural rule categories if there is no explicit matchordinal
selection matches the operand to explicit numeric keys exactly
or to ordinal rule categories if there is no explicit matchWhen implementing MatchSelectorKeys(resolvedSelector, keys)
where resolvedSelector
is the resolved value of a selector expression
and keys
is a list of strings,
numeric selectors perform as described below.
exact
be the JSON string representation of the numeric value of resolvedSelector
.
(See Determining Exact Literal Match for details)keyword
be a string which is the result of rule selection on resolvedSelector
.resultExact
be a new empty list of strings.resultKeyword
be a new empty list of strings.key
in keys
:key
matches the production number-literal
, thenkey
and exact
consist of the same sequence of Unicode code points, thenkey
as the last element of the list resultExact
.key
is one of the keywords zero
, one
, two
, few
, many
, or other
, thenkey
and keyword
consist of the same sequence of Unicode code points, thenkey
as the last element of the list resultKeyword
.resultExact
followed by the elements (in order) of resultKeyword
.Note
Implementations are not required to implement this exactly as written. However, the observed behavior must be consistent with what is described here.
If the option select
is set to exact
, rule-based selection is not used.
Return the empty string.
Note
Since valid keys cannot be the empty string in a numeric expression, returning the empty string disables keyword selection.
If the option select
is set to plural
, selection should be based on CLDR plural rule data
of type cardinal
. See charts
for examples.
If the option select
is set to ordinal
, selection should be based on CLDR plural rule data
of type ordinal
. See charts
for examples.
Apply the rules defined by CLDR to the resolved value of the operand and the function options,
and return the resulting keyword.
If no rules match, return other
.
Example. In CLDR 44, the Czech (
cs
) plural rule set can be found here.A message in Czech might be:
.match {$numDays :number} one {{{$numDays} den}} few {{{$numDays} dny}} many {{{$numDays} dne}} * {{{$numDays} dní}}
Using the rules found above, the results of various operand values might look like:
Operand value Keyword Formatted Message 1 one
1 den 2 few
2 dny 5 other
5 dní 22 few
22 dny 27 other
27 dní 2.4 many
2,4 dne
Important
The exact behavior of exact literal match is only defined for non-zero-filled integer values. Annotations that use fraction digits or significant digits might work in specific implementation-defined ways. Users should avoid depending on these types of keys in message selection.
Number literals in the MessageFormat 2 syntax use the
format defined for a JSON number.
A resolvedSelector
exactly matches a numeric literal key
if, when the numeric value of resolvedSelector
is serialized using the format for a JSON number,
the two strings are equal.
Note
Only integer matching is required in the Technical Preview. Feedback describing use cases for fractional and significant digits-based selection would be helpful. Otherwise, users should avoid using matching with fractional numbers or significant digits.
This subsection describes the functions and options for date/time formatting. Selection based on date and time values is not required in this release.
Note
Selection based on date/time types is not required by MF2.
Implementations should use care when defining selectors based on date/time types.
The types of queries found in implementations such as java.time.TemporalAccessor
are complex and user expectations may be inconsistent with good I18N practices.
:datetime
functionThe function :datetime
is used to format date/time values, including
the ability to compose user-specified combinations of fields.
If no options are specified, this function defaults to the following:
{$d :datetime}
is the same as {$d :datetime dateStyle=short timeStyle=short}
Note
The default formatting behavior of :datetime
is inconsistent with Intl.DateTimeFormat
in JavaScript and with {d,date}
in ICU MessageFormat 1.0.
This is because, unlike those implementations, :datetime
is distinct from :date
and :time
.
The operand of the :datetime
function is either
an implementation-defined date/time type
or a date/time literal value, as defined in Date and Time Operand.
All other operand values produce an Invalid Expression error.
The :datetime
function can use either the appropriate style options
or can use a collection of field options (but not both) to control the formatted
output.
If both are specified, an Invalid Expression error MUST be emitted and a fallback value used as the resolved value of the expression.
Style Options
The function :datetime
has these style options.
dateStyle
full
long
medium
short
timeStyle
full
long
medium
short
Field Options
Field options describe which fields to include in the formatted output and what format to use for that field. The implementation may use this annotation to configure which fields appear in the formatted output.
Note
Field options do not have default values because they are only to be used to compose the formatter.
The field options are defined as follows:
Important
The value 2-digit
for some field options must be quoted
in the MessageFormat syntax because it starts with a digit
but does not match the number-literal
production in the ABNF.
.local $correct = {$someDate :datetime year=|2-digit|}
.local $syntaxError = {$someDate :datetime year=2-digit}
The function :datetime
has the following options:
weekday
long
short
narrow
era
long
short
narrow
year
numeric
2-digit
month
numeric
2-digit
long
short
narrow
day
numeric
2-digit
hour
numeric
2-digit
minute
numeric
2-digit
second
numeric
2-digit
fractionalSecondDigits
1
2
3
hourCycle
(default is locale-specific)h11
h12
h23
h24
timeZoneName
long
short
shortOffset
longOffset
shortGeneric
longGeneric
Note
The following options do not have default values because they are only to be used as overrides for locale-and-value dependent implementation-defined defaults.
The following date/time options are not part of the default registry. Implementations SHOULD avoid creating options that conflict with these, but are encouraged to track development of these options during Tech Preview:
calendar
(default is locale-specific)
numberingSystem
(default is locale-specific)
timeZone
(default is system default time zone or UTC):date
functionThe function :date
is used to format the date portion of date/time values.
If no options are specified, this function defaults to the following:
{$d :date}
is the same as {$d :date style=short}
The operand of the :date
function is either
an implementation-defined date/time type
or a date/time literal value, as defined in Date and Time Operand.
All other operand values produce an Invalid Expression error.
The function :date
has these options:
style
full
long
medium
short
(default):time
functionThe function :time
is used to format the time portion of date/time values.
If no options are specified, this function defaults to the following:
{$t :time}
is the same as {$t :time style=short}
The operand of the :time
function is either
an implementation-defined date/time type
or a date/time literal value, as defined in Date and Time Operand.
All other operand values produce an Invalid Expression error.
The function :time
has these options:
style
full
long
medium
short
(default)The operand of a date/time function is either an implementation-defined date/time type or a date/time literal value, as defined below. All other operand values produce an Invalid Expression error.
A date/time literal value is a non-empty string consisting of an ISO 8601 date, or an ISO 8601 datetime optionally followed by a timezone offset. As implementations differ slightly in their parsing of such strings, ISO 8601 date and datetime values not matching the following regular expression MAY also be supported. Furthermore, matching this regular expression does not guarantee validity, given the variable number of days in each month.
(?!0000)[0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])(T([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](\.[0-9]{1,3})?(Z|[+-]((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?)?
When the time is not present, implementations SHOULD use 00:00:00
as the time.
When the offset is not present, implementations SHOULD use a floating time type
(such as Java's java.time.LocalDateTime
) to represent the time value.
For more information, see Working with Timezones.
Important
The ABNF and syntax of MF2 do not formally define date/time literals. This means that a message can be syntactically valid but produce an Operand Mismatch Error at runtime.
Note
String values passed as variables in the formatting context's input mapping can be formatted as date/time values as long as their contents are date/time literals.
For example, if the value of the variable now
were the string
2024-02-06T16:40:00Z
, it would behave identically to the local
variable in this example:
.local $example = {|2024-02-06T16:40:00Z| :datetime}
{{{$now :datetime} == {$example}}}
Note
True time zone support in serializations is expected to coincide with the adoption of Temporal in JavaScript. The form of these serializations is known and is a de facto standard. Support for these extensions is expected to be required in the post-tech preview. See: https://datatracker.ietf.org/doc/draft-ietf-sedate-datetime-extended/
This section defines the behavior of a MessageFormat 2.0 implementation when formatting a message for display in a user interface, or for some later processing.
To start, we presume that a message has either been parsed from its syntax or created from a data model description. If this construction has encountered any Syntax Errors or Data Model Errors, an appropriate error MUST be emitted and a fallback value MAY be used as the formatting result.
Formatting of a message is defined by the following operations:
Expression and Markup Resolution determines the value of an expression or markup, with reference to the current formatting context. This can include multiple steps, such as looking up the value of a variable and calling formatting functions. The form of the resolved value is implementation defined and the value might not be evaluated or formatted yet. However, it needs to be "formattable", i.e. it contains everything required by the eventual formatting.
The resolution of text is rather straightforward, and is detailed under literal resolution.
Important
This specification does not require either eager or lazy expression resolution of message parts; do not construe any requirement in this document as requiring either.
Implementations are not required to evaluate all parts of a message when parsing, processing, or formatting. In particular, an implementation MAY choose not to evaluate or resolve the value of a given expression until it is actually used by a selection or formatting process. However, when an expression is resolved, it MUST behave as if all preceding declarations and selectors affecting variables referenced by that expression have already been evaluated in the order in which the relevant declarations and selectors appear in the message.
Pattern Selection determines which of a message's patterns is formatted. For a message with no selectors, this is simple as there is only one pattern. With selectors, this will depend on their resolution.
At the start of pattern selection, if the message contains any reserved statements, emit an Unsupported Statement error.
Formatting takes the resolved values of the selected pattern, and produces the formatted result for the message. Depending on the implementation, this result could be a single concatenated string, an array of objects, an attributed string, or some other locally appropriate data type.
Formatter implementations are not required to expose the expression resolution and pattern selection operations to their users, or even use them in their internal processing, as long as the final formatting result is made available to users and the observable behavior of the formatter matches that described here.
A message's formatting context represents the data and procedures that are required for the message's expression resolution, pattern selection and formatting.
At a minimum, it includes:
Information on the current locale, potentially including a fallback chain of locales. This will be passed on to formatting functions.
Information on the base directionality of the message and its text tokens. This will be used by strategies for bidirectional isolation, and can be used to set the base direction of the message upon display.
An input mapping of string identifiers to values, defining variable values that are available during variable resolution. This is often determined by a user-provided argument of a formatting function call.
The function registry, providing the implementations of the functions referred to by message functions.
Optionally, a fallback string to use for the message if it contains any Syntax Errors or Data Model Errors.
Implementations MAY include additional fields in their formatting context.
Expressions are used in declarations, selectors, and patterns. Markup is only used in patterns.
In a declaration, the resolved value of the expression is bound to a variable, which is available for use by later expressions. Since a variable can be referenced in different ways later, implementations SHOULD NOT immediately fully format the value for output.
In an input-declaration, the variable operand of the variable-expression identifies not only the name of the external input value, but also the variable to which the resolved value of the variable-expression is bound.
In selectors, the resolved value of an expression is used for pattern selection.
In a pattern, the resolved value of an expression or markup is used in its formatting.
The form that resolved values take is implementation-dependent, and different implementations MAY choose to perform different levels of resolution.
For example, the resolved value of the expression
{|0.40| :number style=percent}
could be an object such as{ value: Number('0.40'), formatter: NumberFormat(locale, { style: 'percent' }) }
Alternatively, it could be an instance of an ICU4J
FormattedNumber
, or some other locally appropriate value.
Depending on the presence or absence of a variable or literal operand and a function, private-use annotation, or reserved annotation, the resolved value of the expression is determined as follows:
If the expression contains a reserved annotation, an Unsupported Expression error is emitted and a fallback value is used as the resolved value of the expression.
Else, if the expression contains a private-use annotation, its resolved value is defined according to the implementation's specification.
Else, if the expression contains an annotation, its resolved value is defined by function resolution.
Else, if the expression consists of a variable, its resolved value is defined by variable resolution. An implementation MAY perform additional processing when resolving the value of an expression that consists only of a variable.
For example, it could apply function resolution using a function and a set of options chosen based on the value or type of the variable. So, given a message like this:
Today is {$date}
If the value passed in the variable were a date object, such as a JavaScript
Date
or a Javajava.util.Date
orjava.time.Temporal
, the implementation could interpret the placeholder{$date}
as if the pattern included the function:datetime
with some set of default options.
Else, the expression consists of a literal. Its resolved value is defined by literal resolution.
Note This means that a literal value with no annotation is always treated as a string. To represent values that are not strings as a literal, an annotation needs to be provided:
.local $aNumber = {1234 :number} .local $aDate = {|2023-08-30| :datetime} .local $aFoo = {|some foo| :foo} {{You have {42 :number}}}
The resolved value of a text or a literal is the character sequence of the text or literal after any character escape has been converted to the escaped character.
When a literal is used as an operand or on the right-hand side of an option, the formatting function MUST treat its resolved value the same whether its value was originally quoted or unquoted.
For example, the option
foo=42
and the optionfoo=|42|
are treated as identical.
The resolution of a text or literal MUST resolve to a string.
To resolve the value of a variable, its name is used to identify either a local variable or an input variable. If a declaration exists for the variable, its resolved value is used. Otherwise, the variable is an implicit reference to an input value, and its value is looked up from the formatting context input mapping.
The resolution of a variable MAY fail if no value is identified for its name. If this happens, an Unresolved Variable error MUST be emitted. If a variable would resolve to a fallback value, this MUST also be considered a failure.
To resolve an expression with a function annotation, the following steps are taken:
If the expression includes an operand, resolve its value. If this fails, use a fallback value for the expression.
Resolve the identifier of the function and, based on the starting sigil, find the appropriate function implementation to call. If the implementation cannot find the function, or if the identifier includes a namespace that the implementation does not support, emit an Unknown Function error and use a fallback value for the expression.
Implementations are not required to implement namespaces or installable function registries.
Perform option resolution.
Call the function implementation with the following arguments:
The form that resolved operand and option values take is implementation-defined.
A declaration binds the resolved value of an expression to a variable. Thus, the result of one function is potentially the operand of another function, or the value of one of the options for another function. For example, in
.input {$n :number minIntegerDigits=3}
.local $n1 = {$n :number maxFractionDigits=3}
the value bound to $n
is the
resolved value used as the operand
of the :number
function
when resolving the value of the variable $n1
.
Implementations that provide a means for defining custom functions
SHOULD provide a means for function implementations
to return values that contain enough information
(e.g. a representation of
the resolved operand and option values
that the function was called with)
to be used as arguments to subsequent calls
to the function implementations.
For example, an implementation might define an interface that allows custom function implementation.
Such an interface SHOULD define an implementation-specific
argument type T
and return type U
for implementations of functions
such that U
can be coerced to T
.
Implementations of a function SHOULD emit an
Invalid Expression error for operands whose resolved value
or type is not supported.
Note
The behavior of the previous example is
currently implementation-dependent. Supposing that
the external input variable n
is bound to the string "1"
,
and that the implementation formats to a string,
the formatted result of the following message:
.input {$n :number minIntegerDigits=3}
.local $n1 = {$n :number maxFractionDigits=3}
{{$n1}}
is currently implementation-dependent.
Depending on whether the options are preserved
between the resolution of the first :number
annotation
and the resolution of the second :number
annotation,
a conformant implementation
could produce either "001.000" or "1.000"
Each function specification MAY have its own rules to preserve some options in the returned structure and discard others. In instances where a function specification does not determine whether an option is preserved or discarded, each function implementation of that specification MAY have its own rules to preserve some options in the returned structure and discard others.
Note
During the Technical Preview, feedback on how the registry describes the flow of resolved values and options from one function to another, and on what requirements this specification should impose, is highly desired.
An implementation MAY pass additional arguments to the function, as long as reasonable precautions are taken to keep the function interface simple and minimal, and avoid introducing potential security vulnerabilities.
An implementation MAY define its own functions. An implementation MAY allow custom functions to be defined by users.
Function access to the formatting context MUST be minimal and read-only, and execution time SHOULD be limited.
Implementation-defined functions SHOULD use an implementation-defined namespace.
If the call succeeds, resolve the value of the expression as the result of that function call.
If the call fails or does not return a valid value, emit a Invalid Expression error.
Implementations MAY provide a mechanism for the function to provide additional detail about internal failures. Specifically, if the cause of the failure was that the datatype, value, or format of the operand did not match that expected by the function, the function might cause an Operand Mismatch Error to be emitted.
In all failure cases, use the fallback value for the expression as the resolved value.
The result of resolving option values is an unordered mapping of string identifiers to values.
For each option:
Errors MAY be emitted during option resolution, but it always resolves to some mapping of string identifiers to values. This mapping can be empty.
Unlike functions, the resolution of markup is not customizable.
The resolved value of markup includes the following fields:
The resolution of markup MUST always succeed.
A fallback value is the resolved value for an expression that fails to resolve.
An expression fails to resolve when:
The fallback value depends on the contents of the expression:
expression with literal operand (quoted or unquoted):
U+007C VERTICAL LINE |
followed by the value of the literal
with escaping applied to U+005C REVERSE SOLIDUS \
and U+007C VERTICAL LINE |
,
and then by U+007C VERTICAL LINE |
.
Examples: In a context where
:func
fails to resolve,{42 :func}
resolves to the fallback value|42|
and{|C:\\| :func}
resolves to the fallback value|C:\\|
. In any context,{|| @reserved}
resolves to the fallback value||
.
expression with variable operand referring to a local declaration (with or without an annotation): the value to which it resolves (which may already be a fallback value)
Examples: In a context where
:func
fails to resolve, the pattern's expression in.local $var={|val|} {{{$val :func}}}
resolves to the fallback value|val|
and the message formats to{|val|}
. In a context where:now
fails to resolve but:datetime
does not, the pattern's expression in.local $t = {:now format=iso8601} .local $pretty_t = {$t :datetime} {{{$pretty_t}}}
(transitively) resolves to the fallback value
:now
and the message formats to{:now}
.
expression with variable operand not referring to a local declaration (with or without an annotation):
U+0024 DOLLAR SIGN $
followed by the name of the variable
Examples: In a context where
$var
fails to resolve,{$var}
and{$var :number}
and{$var @reserved}
all resolve to the fallback value$var
. In a context where:func
fails to resolve, the pattern's expression in.input $arg {{{$arg :func}}}
resolves to the fallback value$arg
and the message formats to{$arg}
.
function expression with no operand:
U+003A COLON :
followed by the function identifier
Examples: In a context where
:func
fails to resolve,{:func}
resolves to the fallback value:func
. In a context where:ns:func
fails to resolve,{:ns:func}
resolves to the fallback value:ns:func
.
unsupported private-use annotation or reserved annotation with no operand: the annotation starting sigil
Examples: In any context,
{@reserved}
and{@reserved |...|}
both resolve to the fallback value@
.
supported private-use annotation with no operand: the annotation starting sigil, optionally followed by implementation-defined details conforming with patterns in the other cases (such as quoting literals). If details are provided, they SHOULD NOT leak potentially private information.
Examples: In a context where
^
expressions are used for comments,{^▽^}
might resolve to the fallback value^
. In a context where&
expressions are function-like macro invocations,{&foo |...|}
might resolve to the fallback value&foo
.
Otherwise: the U+FFFD REPLACEMENT CHARACTER �
This is not currently used by any expression, but may apply in future revisions.
Option identifiers and values are not included in the fallback value.
Pattern selection is not supported for fallback values.
When a message contains a matcher with one or more selectors, the implementation needs to determine which variant will be used to provide the pattern for the formatting operation. This is done by ordering and filtering the available variant statements according to their key values and selecting the first one.
Note
At least one variant is required to have all of its keys consist of
the fallback value *
.
Some selectors might be implemented in a way that the key value *
cannot be selected in a valid message.
In other cases, this key value might be unreachable only in certain locales.
This could result in the need in some locales to create
one or more variants that do not make sense grammatically for that language.
For example, in the
pl
(Polish) locale, this message cannot reach the*
variant:.match {$num :integer} 0 {{ }} one {{ }} few {{ }} many {{ }} * {{Only used by fractions in Polish.}}
In the Tech Preview, feedback from users and implementers is desired about whether to relax the requirement that such a "fallback variant" appear in every message, versus the potential for a message to fail at runtime because no matching variant is available.
The number of keys in each variant MUST equal the number of selectors.
Each key corresponds to a selector by its position in the variant.
For example, in this message:
.match {:one} {:two} {:three} 1 2 3 {{ ... }}
The first key
1
corresponds to the first selector ({:one}
), the second key2
to the second selector ({:two}
), and the third key3
to the third selector ({:three}
).
To determine which variant best matches a given set of inputs, each selector is used in turn to order and filter the list of variants.
Each variant with a key that does not match its corresponding selector is omitted from the list of variants. The remaining variants are sorted according to the selector's key-ordering preference. Earlier selectors in the matcher's list of selectors have a higher priority than later ones.
When all of the selectors have been processed, the earliest-sorted variant in the remaining list of variants is selected.
Note
A selector is not a declaration.
Even when the same function can be used for both formatting and selection
of a given operand
the annotation that appears in a selector has no effect on subsequent
selectors nor on the formatting used in placeholders.
To use the same value for selection and formatting,
set its value with a .input
or .local
declaration.
This selection method is defined in more detail below. An implementation MAY use any pattern selection method, as long as its observable behavior matches the results of the method defined here.
If the message being formatted has any Syntax Errors or Data Model Errors,
the result of pattern selection MUST be a pattern resolving to a single fallback value
using the message's fallback string defined in the formatting context
or if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER �
.
First, resolve the values of each selector:
res
be a new empty list of resolved values that support selection.sel
, in source order,rv
be the resolved value of sel
.rv
:rv
as the last element of the list res
.nomatch
be a resolved value for which selection always fails.nomatch
as the last element of the list res
.The form of the resolved values is determined by each implementation, along with the manner of determining their support for selection.
Next, using res
, resolve the preferential order for all message keys:
pref
be a new empty list of lists of strings.i
in res
:keys
be a new empty list of strings.var
of the message:key
be the var
key at position i
.key
is not the catch-all key '*'
:key
is a literal.ks
be the resolved value of key
.ks
as the last element of the list keys
.rv
be the resolved value at index i
of res
.matches
be the result of calling the method MatchSelectorKeys(rv
, keys
)matches
as the last element of the list pref
.The method MatchSelectorKeys is determined by the implementation.
It takes as arguments a resolved selector value rv
and a list of string keys keys
,
and returns a list of string keys in preferential order.
The returned list MUST contain only unique elements of the input list keys
.
The returned list MAY be empty.
The most-preferred key is first,
with each successive key appearing in order by decreasing preference.
Then, using the preferential key orders pref
,
filter the list of variants to the ones that match with some preference:
vars
be a new empty list of variants.var
of the message:i
in pref
:key
be the var
key at position i
.key
is the catch-all key '*'
:pref
.key
is a literal.ks
be the resolved value of key
.matches
be the list of strings at index i
of pref
.matches
includes ks
:pref
.var
as the last element of the list vars
.Finally, sort the list of variants vars
and select the pattern:
sortable
be a new empty list of (integer, variant) tuples.var
of vars
:tuple
be a new tuple (-1, var
).tuple
as the last element of the list sortable
.len
be the integer count of items in pref
.i
be len
- 1.i
>= 0:matches
be the list of strings at index i
of pref
.minpref
be the integer count of items in matches
.tuple
of sortable
:matchpref
be an integer with the value minpref
.key
be the tuple
variant key at position i
.key
is not the catch-all key '*'
:key
is a literal.ks
be the resolved value of key
.matchpref
be the integer position of ks
in matches
.tuple
integer value as matchpref
.sortable
to be the result of calling the method SortVariants(sortable)
.i
to be i
- 1.var
be the variant element of the first element of sortable
.var
.SortVariants
is a method whose single argument is
a list of (integer, variant) tuples.
It returns a list of (integer, variant) tuples.
Any implementation of SortVariants
is acceptable
as long as it satisfies the following requirements:
sortable
be an arbitrary list of (integer, variant) tuples.sorted
be SortVariants(sortable)
.sorted
is the result of sorting sortable
using the following comparator:(i1, v1)
<= (i2, v2)
if and only if i1 <= i2
.sortable
that are equal
in their first element have the same relative order in sorted
).This section is non-normative.
Presuming a minimal implementation which only supports :string
annotation
which matches keys by using string comparison,
and a formatting context in which
the variable reference $foo
resolves to the string 'foo'
and
the variable reference $bar
resolves to the string 'bar'
,
pattern selection proceeds as follows for this message:
.match {$foo :string} {$bar :string}
bar bar {{All bar}}
foo foo {{All foo}}
* * {{Otherwise}}
For the first selector:
The value of the selector is resolved to be 'foo'
.
The available keys « 'bar'
, 'foo'
» are compared to 'foo'
,
resulting in a list « 'foo'
» of matching keys.
For the second selector:
The value of the selector is resolved to be 'bar'
.
The available keys « 'bar'
, 'foo'
» are compared to 'bar'
,
resulting in a list « 'bar'
» of matching keys.
Creating the list vars
of variants matching all keys:
The first variant bar bar
is discarded as its first key does not match the first selector.
The second variant foo foo
is discarded as its second key does not match the second selector.
The catch-all keys of the third variant * *
always match, and this is added to vars
,
resulting in a list « * *
» of variants.
As the list vars
only has one entry, it does not need to be sorted.
The pattern Otherwise
of the third variant is selected.
Alternatively, with the same implementation and formatting context as in Example 1, pattern selection would proceed as follows for this message:
.match {$foo :string} {$bar :string}
* bar {{Any and bar}}
foo * {{Foo and any}}
foo bar {{Foo and bar}}
* * {{Otherwise}}
For the first selector:
The value of the selector is resolved to be 'foo'
.
The available keys « 'foo'
» are compared to 'foo'
,
resulting in a list « 'foo'
» of matching keys.
For the second selector:
The value of the selector is resolved to be 'bar'
.
The available keys « 'bar'
» are compared to 'bar'
,
resulting in a list « 'bar'
» of matching keys.
Creating the list vars
of variants matching all keys:
The keys of all variants either match each selector exactly, or via the catch-all key,
resulting in a list « * bar
, foo *
, foo bar
, * *
» of variants.
Sorting the variants:
The list sortable
is first set with the variants in their source order
and scores determined by the second selector:
« ( 0, * bar
), ( 1, foo *
), ( 0, foo bar
), ( 1, * *
) »
This is then sorted as:
« ( 0, * bar
), ( 0, foo bar
), ( 1, foo *
), ( 1, * *
) ».
To sort according to the first selector, the scores are updated to:
« ( 1, * bar
), ( 0, foo bar
), ( 0, foo *
), ( 1, * *
) ».
This is then sorted as:
« ( 0, foo bar
), ( 0, foo *
), ( 1, * bar
), ( 1, * *
) ».
The pattern Foo and bar
of the most preferred foo bar
variant is selected.
A more-complex example is the matching found in selection APIs
such as ICU's PluralFormat
.
Suppose that this API is represented here by the function :number
.
This :number
function can match a given numeric value to a specific number literal
and also to a plural category (zero
, one
, two
, few
, many
, other
)
according to locale rules defined in CLDR.
Given a variable reference $count
whose value resolves to the number 1
and an en
(English) locale,
the pattern selection proceeds as follows for this message:
.input {$count :number}
.match {$count}
one {{Category match for {$count}}}
1 {{Exact match for {$count}}}
* {{Other match for {$count}}}
For the selector:
The value of the selector is resolved to an implementation-defined value
that is capable of performing English plural category selection on the value 1
.
The available keys « 'one'
, '1'
» are passed to
the implementation's MatchSelectorKeys method,
resulting in a list « '1'
, 'one'
» of matching keys.
Creating the list vars
of variants matching all keys:
The keys of all variants are included in the list of matching keys, or use the catch-all key,
resulting in a list « one
, 1
, *
» of variants.
Sorting the variants:
The list sortable
is first set with the variants in their source order
and scores determined by the selector key order:
« ( 1, one
), ( 0, 1
), ( 2, *
) »
This is then sorted as:
« ( 0, 1
), ( 1, one
), ( 2, *
) »
The pattern Exact match for {$count}
of the most preferred 1
variant is selected.
After pattern selection, each text and placeholder part of the selected pattern is resolved and formatted.
Resolved values cannot always be formatted by a given implementation. When such an error occurs during formatting, an implementation SHOULD emit a Formatting Error and produce a fallback value for the placeholder that produced the error. A formatting function MAY substitute a value to use instead of a fallback value.
Implementations MAY represent the result of formatting using the most appropriate data type or structure. Some examples of these include:
Implementations SHOULD provide formatting result types that match user needs, including situations that require further processing of formatted messages. Implementations SHOULD encourage users to consider a formatted localised string as an opaque data structure, suitable only for presentation.
When formatting to a string, the default representation of all markup MUST be an empty string. Implementations MAY offer functionality for customizing this, such as by emitting XML-ish tags for each markup.
Attributes are reserved for future standardization. Other than checking for valid syntax, they SHOULD NOT affect the processing or output of a message.
This section is non-normative.
An implementation might choose to return an interstitial object
so that the caller can "decorate" portions of the formatted value.
In ICU4J, the NumberFormatter
class returns a FormattedNumber
object,
so a pattern such as This is my number {42 :number}
might return
the character sequence This is my number
followed by a FormattedNumber
object representing the value 42
in the current locale.
A formatter in a web browser could format a message as a DOM fragment rather than as a representation of its HTML source.
If the resolved pattern includes any fallback values
and the formatting result is a concatenated string or a sequence of strings,
the string representation of each fallback value MUST be the concatenation of
a U+007B LEFT CURLY BRACKET {
,
the fallback value as a string,
and a U+007D RIGHT CURLY BRACKET }
.
For example, a message with a Syntax Error and no fallback string defined in the formatting context would format to a string as
{�}
.
Messages contain text. Any text can be bidirectional text. That is, the text can can consist of a mixture of left-to-right and right-to-left spans of text. The display of bidirectional text is defined by the Unicode Bidirectional Algorithm [UAX9].
The directionality of the message as a whole is provided by the formatting context.
When a message is formatted, placeholders are replaced with their formatted representation. Applying the Unicode Bidirectional Algorithm to the text of a formatted message (including its formatted parts) can result in unexpected or undesirable spillover effects. Applying bidi isolation to each affected formatted value helps avoid this spillover in a formatted message.
Note that both the message and, separately, each placeholder need to have direction metadata for this to work. If an implementation supports formatting to something other than a string (such as a sequence of parts), the directionality of each formatted placeholder needs to be available to the caller.
If a formatted expression itself contains spans with differing directionality, its formatter SHOULD perform any necessary processing, such as inserting controls or isolating such parts to ensure that the formatted value displays correctly in a plain text context.
For example, an implementation could provide a
:currency
formatting function which inserts strongly directional characters, such as U+200F RIGHT-TO-LEFT MARK (RLM), U+200E LEFT-TO-RIGHT MARK (LRM), or U+061C ARABIC LETTER MARKER (ALM), to coerce proper display of the sign and currency symbol next to a formatted number. An example of this is formatting the value-1234.56
as the currencyAED
in thear-AE
locale. The formatted value appears like this:-1,234.56 د.إ.
The code point sequence for this string, as produced by the ICU4J
NumberFormat
function, includes U+200F U+200E at the start and U+200F at the end of the string. If it did not do this, the same string would appear like this instead:
A bidirectional isolation strategy is functionality in the formatter's processing of a message that produces bidirectional output text that is ready for display.
The Default Bidi Strategy is a bidirectional isolation strategy that uses isolating Unicode control characters around placeholder's formatted values. It is primarily intended for use in plain-text strings, where markup or other mechanisms are not available. Implementations MUST provide the Default Bidi Strategy as one of the bidirectional isolation strategies.
Implementations MAY provide other bidirectional isolation strategies.
Implementations MAY supply a bidirectional isolation strategy that performs no processing.
The Default Bidi Strategy is defined as follows:
msgdir
be the directionality of the whole message,
one of « 'LTR'
, 'RTL'
, 'unknown'
».
These correspond to the message having left-to-right directionality,
right-to-left directionality, and to the message's directionality not being known.exp
in pattern:fmt
be the formatted string representation of the resolved value of exp
.dir
be the directionality of fmt
,
one of « 'LTR'
, 'RTL'
, 'unknown'
», with the same meanings as for msgdir
.dir
is 'LTR'
:msgdir
is 'LTR'
in the formatted output, let fmt
be itselffmt
with U+2066 LEFT-TO-RIGHT ISOLATE
and postfix it with U+2069 POP DIRECTIONAL ISOLATE.dir
is 'RTL'
:fmt
with U+2067 RIGHT-TO-LEFT ISOLATE
and postfix it with U+2069 POP DIRECTIONAL ISOLATE.fmt
with U+2068 FIRST STRONG ISOLATE
and postfix it with U+2069 POP DIRECTIONAL ISOLATE.This section defines a data model representation of MessageFormat 2 messages.
Implementations are not required to use this data model for their internal representation of messages. Neither are they required to provide an interface that accepts or produces representations of this data model.
The major reason this specification provides a data model is to allow interchange of the logical representation of a message between different implementations. This includes mapping legacy formatting syntaxes (such as MessageFormat 1) to a MessageFormat 2 implementation. Another use would be in converting to or from translation formats without the need to continually parse and serialize all or part of a message.
Implementations that expose APIs supporting the production, consumption, or transformation of a message as a data structure are encouraged to use this data model.
This data model provides these capabilities:
This data model might also be used to:
To ensure compatibility across all platforms, this interchange data model is defined here using TypeScript notation. Two equivalent definitions of the data model are also provided:
common/dtd/messageFormat/message.json
is a JSON Schema definition,
for use with message data encoded as JSON or compatible formats, such as YAML.common/dtd/messageFormat/message.json
is a document type definition (DTD),
for use with message data encoded as XML.Note that while the data model description below is the canonical one, the JSON and DTD definitions are intended for interchange between systems and processors. To that end, they relax some aspects of the data model, such as allowing declarations, options, and attributes to be optional rather than required properties.
Note
Users relying on XML representations of messages should note that XML 1.0 does not allow for the representation of all C0 control characters (U+0000-U+001F). Except for U+0000 NULL , these characters are allowed in MessageFormat 2 messages, so systems and users relying on this XML representation for interchange might need to supply an alternate escape mechanism to support messages that contain these characters.
Important
The data model uses the field name name
to denote various interface identifiers.
In the MessageFormat 2 syntax, the source for these name
fields
sometimes uses the production identifier
.
This happens when the named item, such as a function, supports namespacing.
In the Tech Preview, feedback on whether to separate the namespace
from the name
and represent both separately, or just, as here, use an opaque single field name
is desired.
A SelectMessage
corresponds to a syntax message that includes selectors.
A message without selectors and with a single pattern is represented by a PatternMessage
.
In the syntax,
a PatternMessage
may be represented either as a simple message or as a complex message,
depending on whether it has declarations and if its pattern
is allowed in a simple message.
type Message = PatternMessage | SelectMessage;
interface PatternMessage {
type: "message";
declarations: Declaration[];
pattern: Pattern;
}
interface SelectMessage {
type: "select";
declarations: Declaration[];
selectors: Expression[];
variants: Variant[];
}
Each message declaration is represented by a Declaration
,
which connects the name
of a variable
with its expression value
.
The name
does not include the initial $
of the variable.
The name
of an InputDeclaration
MUST be the same
as the name
in the VariableRef
of its VariableExpression
value
.
An UnsupportedStatement
represents a statement not supported by the implementation.
Its keyword
is a non-empty string name (i.e. not including the initial .
).
If not empty, the body
is the "raw" value (i.e. escape sequences are not processed)
starting after the keyword and up to the first expression,
not including leading or trailing whitespace.
The non-empty expressions
correspond to the trailing expressions of the reserved statement.
Note
Be aware that future versions of this specification might assign meaning to reserved statement values. This would result in new interfaces being added to this data model.
type Declaration = InputDeclaration | LocalDeclaration | UnsupportedStatement;
interface InputDeclaration {
type: "input";
name: string;
value: VariableExpression;
}
interface LocalDeclaration {
type: "local";
name: string;
value: Expression;
}
interface UnsupportedStatement {
type: "unsupported-statement";
keyword: string;
body?: string;
expressions: Expression[];
}
In a SelectMessage
,
the keys
and value
of each variant are represented as an array of Variant
.
For the CatchallKey
, a string value
may be provided to retain an identifier.
This is always '*'
in MessageFormat 2 syntax, but may vary in other formats.
interface Variant {
keys: Array<Literal | CatchallKey>;
value: Pattern;
}
interface CatchallKey {
type: "*";
value?: string;
}
Each Pattern
contains a linear sequence of text and placeholders corresponding to potential output of a message.
Each element of the Pattern
MUST either be a non-empty string, an Expression
, or a Markup
object.
String values represent literal text.
String values include all processing of the underlying text values,
including escape sequence processing.
Expression
wraps each of the potential expression shapes.
Markup
wraps each of the potential markup shapes.
Implementations MUST NOT rely on the set of Expression
and
Markup
interfaces defined in this document being exhaustive.
Future versions of this specification might define additional
expressions or markup.
type Pattern = Array<string | Expression | Markup>;
type Expression =
| LiteralExpression
| VariableExpression
| FunctionExpression
| UnsupportedExpression;
interface LiteralExpression {
type: "expression";
arg: Literal;
annotation?: FunctionAnnotation | UnsupportedAnnotation;
attributes: Attribute[];
}
interface VariableExpression {
type: "expression";
arg: VariableRef;
annotation?: FunctionAnnotation | UnsupportedAnnotation;
attributes: Attribute[];
}
interface FunctionExpression {
type: "expression";
arg?: never;
annotation: FunctionAnnotation;
attributes: Attribute[];
}
interface UnsupportedExpression {
type: "expression";
arg?: never;
annotation: UnsupportedAnnotation;
attributes: Attribute[];
}
interface Attribute {
name: string;
value?: Literal | VariableRef;
}
The Literal
and VariableRef
correspond to the the literal and variable syntax rules.
When they are used as the body
of an Expression
,
they represent expression values with no annotation.
Literal
represents all literal values, both quoted and unquoted.
The presence or absence of quotes is not preserved by the data model.
The value
of Literal
is the "cooked" value (i.e. escape sequences are processed).
In a VariableRef
, the name
does not include the initial $
of the variable.
interface Literal {
type: "literal";
value: string;
}
interface VariableRef {
type: "variable";
name: string;
}
A FunctionAnnotation
represents a function annotation.
The name
does not include the :
starting sigil.
Each option is represented by an Option
.
interface FunctionAnnotation {
type: "function";
name: string;
options: Option[];
}
interface Option {
name: string;
value: Literal | VariableRef;
}
An UnsupportedAnnotation
represents a
private-use annotation not supported by the implementation or a reserved annotation.
The source
is the "raw" value (i.e. escape sequences are not processed),
including the starting sigil.
When parsing the syntax of a message that includes a private-use annotation supported by the implementation, the implementation SHOULD represent it in the data model using an interface appropriate for the semantics and meaning that the implementation attaches to that annotation.
interface UnsupportedAnnotation {
type: "unsupported-annotation";
source: string;
}
A Markup
object has a kind
of either "open"
, "standalone"
, or "close"
,
each corresponding to open, standalone, and close markup.
The name
in these does not include the starting sigils #
and /
or the ending sigil /
.
The optional options
for markup use the same Option
as FunctionAnnotation
.
interface Markup {
type: "markup";
kind: "open" | "standalone" | "close";
name: string;
options: Option[];
attributes: Attribute[];
}
Implementations MAY extend this data model with additional interfaces,
as well as adding new fields to existing interfaces.
When encountering an unfamiliar field, an implementation MUST ignore it.
For example, an implementation could include a span
field on all interfaces
encoding the corresponding start and end positions in its source syntax.
In general, implementations MUST NOT extend the sets of values for any defined field or type when representing a valid message. However, when using this data model to represent an invalid message, an implementation MAY do so. This is intended to allow for the representation of "junk" or invalid content within messages.
MessageFormat 2.0 patterns are meant to allow a message to include any string value which users might normally wish to use in their environment. Programming languages and other environments vary in what characters are permitted to appear in a valid string. In many cases, certain types of characters, such as invisible control characters, require escaping by these host formats. In other cases, strings are not permitted to contain certain characters at all. Since messages are subject to the restrictions and limitations of their host environments, their serializations and resource formats, that might be sufficient to prevent most problems. However, MessageFormat itself does not supply such a restriction.
MessageFormat messages permit nearly all Unicode code points, with the exception of surrogates, to appear in literals, including the text portions of a pattern. This means that it can be possible for a message to contain invisible characters (such as bidirectional controls, ASCII control characters in the range U+0000 to U+001F, or characters that might be interpreted as escapes or syntax in the host format) that abnormally affect the display of the message when viewed as source code, or in resource formats or translation tools, but do not generate errors from MessageFormat parsers or processing APIs.
Bidirectional text containing right-to-left characters (such as used for Arabic or Hebrew) also poses a potential source of confusion for users. Since MessageFormat 2.0's syntax makes use of keywords and symbols that are left-to-right or consist of neutral characters (including characters subject to mirroring under the Unicode Bidirectional Algorithm), it is possible to create messages that, when displayed in source code, or in resource formats or translation tools, have a misleading appearance or are difficult to parse visually.
For more information, see [UTS#55] Unicode Source Code Handling.
MessageFormat 2.0 implementations might allow end-users to install selectors, functions, or markup from third-party sources. Such functionality can be a vector for various exploits, including buffer overflow, code injection, user tracking, fingerprinting, and other types of bad behavior. Any installed code needs to be appropriately sandboxed. In addition, end-users need to be aware of the risks involved.
Special thanks to the following people for their contributions to making MessageFormat v2. The following people contributed to our github repo and are listed in order by contribution size:
Addison Phillips, Eemeli Aro, Romulo Cintra, Stanisław Małolepszy, Elango Cheran, Richard Gibson, Tim Chevalier, Mihai Niță, Shane F. Carr, Mark Davis, Steven R. Loomis, Caleb Maclennan, David Filip, Daniel Minor, Christopher Dieringer, George Rhoten, Ujjwal Sharma, Daniel Ehrenberg, Markus Scherer, Zibi Braniecki, Matt Radbourne, Bruno Haible, and Rafael Xavier de Souza.
Addison Phillips was chair of the working group from January 2023. Prior to 2023, the group was governed by a chair group, consisting of Romulo Cintra, Elango Cheran, Mihai Niță, David Filip, Nicolas Bouvrette, Stanisław Małolepszy, Rafael Xavier de Souza, Addison Phillips, and Daniel Minor. Romulo Cintra chaired the chair group.
Copyright © 2001–2024 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode Terms of Use apply.
Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions.