RFC 3987 - Internationalized Resource Identifiers (IRIs) 日本語訳

URL : https://tools.ietf.org/html/rfc3987
タイトル : RFC 3987 - 国際化資源識別子（IRIは）
翻訳編集 : 自動生成

Network Working Group                                          M. Duerst
Request for Comments: 3987                                           W3C
Category: Standards Track                                    M. Suignard
                                                   Microsoft Corporation
                                                            January 2005

             Internationalized Resource Identifiers (IRIs)

Status of This Memo

このメモのステータス

This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.

この文書は、インターネットコミュニティのためのインターネット標準トラックプロトコルを指定し、改善のための議論と提案を要求します。このプロトコルの標準化状態と状態への「インターネット公式プロトコル標準」（STD 1）の最新版を参照してください。このメモの配布は無制限です。

著作権表示

著作権（C）インターネット協会（2005）。

Abstract

抽象

This document defines a new protocol element, the Internationalized Resource Identifier (IRI), as a complement to the Uniform Resource Identifier (URI). An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646). A mapping from IRIs to URIs is defined, which means that IRIs can be used instead of URIs, where appropriate, to identify resources.

この文書では、ユニフォームリソース識別子（URI）を補完するものとして、新たなプロトコル要素、国際化リソース識別子（IRI）を定義します。 IRIは、ユニバーサル文字セット（ユニコード/ ISO 10646）からの文字のシーケンスです。 URIの虹彩からのマッピングはアイリスリソースを識別するために、適切な場合、代わりのURIを使用することができることを意味し、定義されています。

The approach of defining a new protocol element was chosen instead of extending or changing the definition of URIs. This was done in order to allow a clear distinction and to avoid incompatibilities with existing software. Guidelines are provided for the use and deployment of IRIs in various protocols, formats, and software components that currently deal with URIs.

新しいプロトコル要素を定義するアプローチが代わりにURIの定義を拡張または変更を選択しました。これは明確に区別できるようにすると、既存のソフトウェアとの互換性の問題を避けるために行われました。ガイドラインは、現在のURIを扱うさまざまなプロトコル、フォーマット、およびソフトウェア・コンポーネントにおける虹彩の使用と展開のために提供されています。

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
       1.1.  Overview and Motivation  . . . . . . . . . . . . . . . .  3
       1.2.  Applicability  . . . . . . . . . . . . . . . . . . . . .  3
       1.3.  Definitions  . . . . . . . . . . . . . . . . . . . . . .  4
       1.4.  Notation . . . . . . . . . . . . . . . . . . . . . . . .  5
   2.  IRI Syntax . . . . . . . . . . . . . . . . . . . . . . . . . .  6
       2.1.  Summary of IRI Syntax  . . . . . . . . . . . . . . . . .  6
       2.2.  ABNF for IRI References and IRIs . . . . . . . . . . . .  7

   3.  Relationship between IRIs and URIs . . . . . . . . . . . . . . 10
       3.1.  Mapping of IRIs to URIs  . . . . . . . . . . . . . . . . 10
       3.2.  Converting URIs to IRIs  . . . . . . . . . . . . . . . . 14
             3.2.1.  Examples . . . . . . . . . . . . . . . . . . . . 15
   4.  Bidirectional IRIs for Right-to-Left Languages.  . . . . . . . 16
       4.1.  Logical Storage and Visual Presentation  . . . . . . . . 17
       4.2.  Bidi IRI Structure . . . . . . . . . . . . . . . . . . . 18
       4.3.  Input of Bidi IRIs . . . . . . . . . . . . . . . . . . . 19
       4.4.  Examples . . . . . . . . . . . . . . . . . . . . . . . . 19
   5.  Normalization and Comparison . . . . . . . . . . . . . . . . . 21
       5.1.  Equivalence  . . . . . . . . . . . . . . . . . . . . . . 22
       5.2.  Preparation for Comparison . . . . . . . . . . . . . . . 22
       5.3.  Comparison Ladder  . . . . . . . . . . . . . . . . . . . 23
             5.3.1.  Simple String Comparison . . . . . . . . . . . . 23
             5.3.2.  Syntax-Based Normalization . . . . . . . . . . . 24
             5.3.3.  Scheme-Based Normalization . . . . . . . . . . . 27
             5.3.4.  Protocol-Based Normalization . . . . . . . . . . 28
   6.  Use of IRIs  . . . . . . . . . . . . . . . . . . . . . . . . . 29
       6.1.  Limitations on UCS Characters Allowed in IRIs  . . . . . 29
       6.2.  Software Interfaces and Protocols  . . . . . . . . . . . 29
       6.3.  Format of URIs and IRIs in Documents and Protocols . . . 30
       6.4.  Use of UTF-8 for Encoding Original Characters .. . . . . 30
       6.5.  Relative IRI References  . . . . . . . . . . . . . . . . 32
   7.  URI/IRI Processing Guidelines (informative)  . . . . . . . . . 32
       7.1.  URI/IRI Software Interfaces  . . . . . . . . . . . . . . 32
       7.2.  URI/IRI Entry  . . . . . . . . . . . . . . . . . . . . . 33
       7.3.  URI/IRI Transfer between Applications  . . . . . . . . . 33
       7.4.  URI/IRI Generation . . . . . . . . . . . . . . . . . . . 34
       7.5.  URI/IRI Selection  . . . . . . . . . . . . . . . . . . . 34
       7.6.  Display of URIs/IRIs . . . . . . . . . . . . . . . . . . 35
       7.7.  Interpretation of URIs and IRIs  . . . . . . . . . . . . 36
       7.8.  Upgrading Strategy . . . . . . . . . . . . . . . . . . . 36
   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 37
   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 39
   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 40
       10.1. Normative References . . . . . . . . . . . . . . . . . . 40
       10.2. Informative References . . . . . . . . . . . . . . . . . 41
   A.  Design Alternatives  . . . . . . . . . . . . . . . . . . . . . 44
       A.1.  New Scheme(s)  . . . . . . . . . . . . . . . . . . . . . 44
       A.2.  Character Encodings Other Than UTF-8 . . . . . . . . . . 44
       A.3.  New Encoding Convention  . . . . . . . . . . . . . . . . 44
       A.4.  Indicating Character Encodings in the URI/IRI  . . . . . 45
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 45
   Full Copyright Statement . . . . . . . . . . . . . . . . . . . . . 46

1. Introduction

1. はじめに

1.1. Overview and Motivation

1.1. 概要と動機

A Uniform Resource Identifier (URI) is defined in [RFC3986] as a sequence of characters chosen from a limited subset of the repertoire of US-ASCII [ASCII] characters.

ユニフォームリソース識別子（URI）は、US-ASCII [ASCII]文字のレパートリーの限定されたサブセットから選択された文字列として[RFC3986]で定義されています。

The characters in URIs are frequently used for representing words of natural languages. This usage has many advantages: Such URIs are easier to memorize, easier to interpret, easier to transcribe, easier to create, and easier to guess. For most languages other than English, however, the natural script uses characters other than A - Z. For many people, handling Latin characters is as difficult as handling the characters of other scripts is for those who use only the Latin alphabet. Many languages with non-Latin scripts are transcribed with Latin letters. These transcriptions are now often used in URIs, but they introduce additional ambiguities.

URIの中の文字は、頻繁に、自然言語の単語を表すために使用されています。この使用法は、多くの利点があります。そのようなURIが作成しやすく、転写しやすく、簡単に解釈するように、暗記しやすく、かつ推測しやすくなります。唯一のラテンアルファベットを使用する人のためのものであるラテン文字を扱う、多くの人々のためにZ.他のスクリプトの文字を扱うと同じくらい困難である - 英語以外のほとんどの言語では、しかし、自然のスクリプトは、以外の文字を使用しています。非ラテンスクリプトと多くの言語は、ラテン文字で転写されます。これらのトランスクリプションは、今、多くの場合、URIの中で使用されているが、彼らは追加のあいまいさを紹介します。

The infrastructure for the appropriate handling of characters from local scripts is now widely deployed in local versions of operating system and application software. Software that can handle a wide variety of scripts and languages at the same time is increasingly common. Also, increasing numbers of protocols and formats can carry a wide range of characters.

ローカルスクリプトからの文字の適切な取扱いのためのインフラストラクチャは、現在広くオペレーティングシステムとアプリケーションソフトウェアのローカルバージョンで展開されます。同時にスクリプトや言語を幅広く扱うことができるソフトウェアがますます一般的になっています。また、プロトコルやフォーマット数の増加は、文字の広い範囲を運ぶことができます。

This document defines a new protocol element called Internationalized Resource Identifier (IRI) by extending the syntax of URIs to a much wider repertoire of characters. It also defines "internationalized" versions corresponding to other constructs from [RFC3986], such as URI references. The syntax of IRIs is defined in section 2, and the relationship between IRIs and URIs in section 3.

この文書では、文字のはるかに広いレパートリーにURIの構文を拡張することにより、国際化リソース識別子（IRI）と呼ばれる新しいプロトコル要素を定義します。また、そのようなURI参照として[RFC3986]から他の構築物に対応する「国際」のバージョンを定義します。虹彩の構文はセクション2、およびセクション3における虹彩とのURIとの関係で定義されています。

Using characters outside of A - Z in IRIs brings some difficulties. Section 4 discusses the special case of bidirectional IRIs, section 5 various forms of equivalence between IRIs, and section 6 the use of IRIs in different situations. Section 7 gives additional informative guidelines, and section 8 security considerations.

Aの外の文字を使用する - 虹彩Zは、いくつかの困難をもたらします。セクション4は、双方向虹彩、セクション5つの種々のIRIの間の等価形態、及びセクション6の異なる状況における虹彩の使用の特別な場合を議論します。第7節では、追加の有益な指針を与え、セクション8のセキュリティ上の考慮事項。

1.2. Applicability

1.2. 適用性

IRIs are designed to be compatible with recommendations for new URI schemes [RFC2718]. The compatibility is provided by specifying a well-defined and deterministic mapping from the IRI character sequence to the functionally equivalent URI character sequence. Practical use of IRIs (or IRI references) in place of URIs (or URI references) depends on the following conditions being met: a. A protocol or format element should be explicitly designated to be able to carry IRIs. The intent is not to introduce IRIs into contexts that are not defined to accept them. For example, XML schema [XMLSchema] has an explicit type "anyURI" that includes IRIs and IRI references. Therefore, IRIs and IRI references can be in attributes and elements of type "anyURI". On the other hand, in the HTTP protocol [RFC2616], the Request URI is defined as a URI, which means that direct use of IRIs is not allowed in HTTP requests.

アイリス新しいURIスキーム[RFC2718]のための勧告と互換性を持つように設計されています。互換性は、機能的に等価なURI文字列にIRIの文字列から、明確に定義されたと決定論的マッピングを指定することによって提供されます。 URI（またはURI参照）の代わり虹彩（またはIRI参照）の実用化が満たされ、次の条件によって異なります。プロトコルまたはフォーマット要素が明示的にアイリスを運ぶことができるように指定されなければなりません。その意図は、それらを受け入れるように定義されていない状況に絞りを導入しないことです。例えば、XMLスキーマは、[XMLスキーマ]虹彩とIRI参照を含む明示的な型「anyURIの」を有します。したがって、アイリスとIRI参照は、タイプの属性と要素「anyURIの」にすることができます。一方、HTTPプロトコル[RFC2616]に、リクエストURIは、虹彩の直接使用は、HTTPリクエストに許可されていないことを意味するURIとして定義されます。

b. The protocol or format carrying the IRIs should have a mechanism to represent the wide range of characters used in IRIs, either natively or by some protocol- or format-specific escaping mechanism (for example, numeric character references in [XML1]).

B。アイリスを運ぶプロトコルまたはフォーマットは、いずれかのネイティブまたは（例えば、[XML1]の数値文字参照）、いくつかの、プロトコルまたはフォーマット固有エスケープ機構により、アイリスに使用される文字の広い範囲を表すためのメカニズムを有していなければなりません。

c. The URI corresponding to the IRI in question has to encode original characters into octets using UTF-8. For new URI schemes, this is recommended in [RFC2718]. It can apply to a whole scheme (e.g., IMAP URLs [RFC2192] and POP URLs [RFC2384], or the URN syntax [RFC2141]). It can apply to a specific part of a URI, such as the fragment identifier (e.g., [XPointer]). It can apply to a specific URI or part(s) thereof. For details, please see section 6.4.

C。 URI質問にIRIに対応するには、UTF-8を使用してオクテットに、元の文字をエンコードする必要があります。新しいURIスキームの場合、これは[RFC2718]で推奨されています。それは全体のスキームに適用することができる（例えば、IMAPのURL [RFC2192]とPOPのURL [RFC2384]、またはURN構文[RFC2141]）。そのようなフラグメント識別子（例えば、[のXPointer]）として、URIの特定の部分に適用することができます。これは、特定のURIまたはその一部（複数可）に適用することができます。詳細については、6.4節を参照してください。

1.3. Definitions

1.3. 定義

The following definitions are used in this document; they follow the terms in [RFC2130], [RFC2277], and [ISO10646].

以下の定義は、この文書で使用されています。彼らは、[RFC2130]、[RFC2277]、および[ISO10646]の条項に従ってください。

character: A member of a set of elements used for the organization, control, or representation of data. For example, "LATIN CAPITAL LETTER A" names a character.

文字：組織のために使用される要素の集合のメンバー、制御、又はデータの表現。例えば、 "LATIN CAPITAL LETTER A" の名前の文字。

octet: An ordered sequence of eight bits considered as a unit.

オクテット：単位として考え、8ビットの順序付けられたシーケンス。

character repertoire: A set of characters (in the mathematical sense).

文字レパートリー：（数学的な意味での）文字のセット。

sequence of characters: A sequence of characters (one after another).

一連の文字：文字の列（次々）。

sequence of octets: A sequence of octets (one after another).

オクテットのシーケンス：オクテットのシーケンス（次々）。

character encoding: A method of representing a sequence of characters as a sequence of octets (maybe with variants). Also, a method of (unambiguously) converting a sequence of octets into a sequence of characters.

文字エンコーディング：（多分亜種を含む）のオクテットのシーケンスとしての文字列を表現する方法。また、文字のシーケンスにオクテットのシーケンスを変換する方法（明確）。

charset: The name of a parameter or attribute used to identify a character encoding.

文字セット：文字エンコーディングを識別するために使用されるパラメータや属性の名前。

UCS: Universal Character Set. The coded character set defined by ISO/IEC 10646 [ISO10646] and the Unicode Standard [UNIV4].

UCS：ユニバーサル文字セット。 ISO / IEC 10646 [ISO10646]とUnicode標準[UNIV4]によって定義されたコード化文字セット。

IRI reference: Denotes the common usage of an Internationalized Resource Identifier. An IRI reference may be absolute or relative. However, the "IRI" that results from such a reference only includes absolute IRIs; any relative IRI references are resolved to their absolute form. Note that in [RFC2396] URIs did not include fragment identifiers, but in [RFC3986] fragment identifiers are part of URIs.

IRI参照は：国際化リソース識別子の一般的な使用方法を示します。 IRI参照は、絶対的または相対的であってもよいです。しかしながら、そのような基準から生じる「IRI」が唯一絶対アイリスを含みます。任意の相対IRI参照はその絶対形式に解決されます。で[RFC2396] URIがフラグメント識別子を含むが、[RFC3986]フラグメント識別子にURIの一部ではなかったことに留意されたいです。

running text: Human text (paragraphs, sentences, phrases) with syntax according to orthographic conventions of a natural language, as opposed to syntax defined for ease of processing by machines (e.g., markup, programming languages).

実行中のテキスト：自然言語の正書法の規則に従った構文を有するヒトテキスト（段落、文、フレーズ）、対照的には、機械（例えば、マークアップ、プログラミング言語）での処理を容易にするために定義された構文します。

protocol element: Any portion of a message that affects processing of that message by the protocol in question.

プロトコル要素：当該プロトコルによって、そのメッセージの処理に影響を与えるメッセージの任意の部分。

presentation element: A presentation form corresponding to a protocol element; for example, using a wider range of characters.

プレゼンテーションエレメント：プロトコル要素に対応するプレゼンテーション形式;例えば、文字の広い範囲を使用して。

create (a URI or IRI): With respect to URIs and IRIs, the term is used for the initial creation. This may be the initial creation of a resource with a certain identifier, or the initial exposition of a resource under a particular identifier.

（URIまたはIRI）を作成：URIと虹彩に関しては、用語を最初に作成するために使用されています。これは、特定の識別子とリソースの初期作成、または特定の識別子の下でのリソースの最初の博覧会であってもよいです。

generate (a URI or IRI): With respect to URIs and IRIs, the term is used when the IRI is generated by derivation from other information.

（URIまたはIRI）を生成する：IRIは、他の情報から導出することによって生成されたときのURIと虹彩に関して、用語が使用されます。

1.4. Notation

1.4. 表記法

RFCs and Internet Drafts currently do not allow any characters outside the US-ASCII repertoire. Therefore, this document uses various special notations to denote such characters in examples.

RFCとインターネットドラフトは現在、US-ASCIIレパートリー外の文字を使用できません。したがって、この文書は、実施例において、このような文字を示すために、様々な特殊な表記を使用します。

In text, characters outside US-ASCII are sometimes referenced by using a prefix of 'U+', followed by four to six hexadecimal digits.

テキストでは、US-ASCII以外の文字は、時には4〜6桁の16進数に続く「U +」の接頭辞を、使用して参照されています。

To represent characters outside US-ASCII in examples, this document uses two notations: 'XML Notation' and 'Bidi Notation'.

「XML表記」と「双方向表記」：例のUS-ASCII以外の文字を表現するには、この文書には、2つの表記法を使用しています。

XML Notation uses a leading '&#x', a trailing ';', and the hexadecimal number of the character in the UCS in between. For example, я stands for CYRILLIC CAPITAL LETTER YA. In this notation, an actual '&' is denoted by '&'.

「;」との間にUCSにおける文字の16進数のXML表記は主要「＆＃x」は、トレーリングを使用します。たとえば、＆＃x44F。 CYRILLIC CAPITAL LETTER YAの略です。この表記では、実際の「＆」は、で表される「＆＃038;」。

Bidi Notation is used for bidirectional examples: Lowercase letters stand for Latin letters or other letters that are written left to right, whereas uppercase letters represent Arabic or Hebrew letters that are written right to left.

双方向表記は、双方向の例で使用されます。小文字大文字がアラビア語または右から左に書かれているヘブライ語の文字を表すのに対し、ラテン文字または左から右に書かれている他の文字を表します。

To denote actual octets in examples (as opposed to percent-encoded octets), the two hex digits denoting the octet are enclosed in "<" and ">". For example, the octet often denoted as 0xc9 is denoted here as <c9>.

（パーセントエンコードされたオクテットとは対照的に）の例で実際のオクテットを示すために、オクテットを表す2進数字は、で囲まれている「<」と「>」。例えば、多くの場合、0xc9と表記オクテットは、<C9>としてここに示されています。

In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in [RFC2119].

この文書では、キーワード "MUST"、 "MUST NOT"、 "REQUIRED"、 "NOT SHALL"、 "推奨"、 "すべきではない" "べきである" "ないものと"、 "MAY"、および "オプション" [RFC2119]に記載されているように解釈されるべきです。

2. IRI Syntax

2. IRI構文

This section defines the syntax of Internationalized Resource Identifiers (IRIs).

このセクションでは、国際化資源識別子（IRIを）の構文を定義します。

As with URIs, an IRI is defined as a sequence of characters, not as a sequence of octets. This definition accommodates the fact that IRIs may be written on paper or read over the radio as well as stored or transmitted digitally. The same IRI may be represented as different sequences of octets in different protocols or documents if these protocols or documents use different character encodings (and/or transfer encodings). Using the same character encoding as the containing protocol or document ensures that the characters in the IRI can be handled (e.g., searched, converted, displayed) in the same way as the rest of the protocol or document.

URIの場合と同様に、IRIはないオクテットのシーケンスとして、文字のシーケンスとして定義されます。この定義は、虹彩が紙に書かれたまたは無線上で読み取るだけでなく、記憶され又はデジタル伝送することができるという事実を収容します。これらのプロトコルまたは文書が異なる文字エンコード（および/または転送符号化）を使用する場合、同じIRIは、異なるプロトコルまたは文書のオクテットの異なる配列として表すことができます。含むプロトコルまたは文書がIRIの文字を扱うことができることを確実に同じ文字エンコーディングを使用して、プロトコルまたは文書の残りの部分と同じ方法で（例えば、検索、変換、表示されます）。

2.1. Summary of IRI Syntax

2.1. IRI構文の概要

IRIs are defined similarly to URIs in [RFC3986], but the class of unreserved characters is extended by adding the characters of the UCS (Universal Character Set, [ISO10646]) beyond U+007F, subject to the limitations given in the syntax rules below and in section 6.1.

以下の構文規則で指定された制限を受けるのIRI [RFC3986]でのURIと同様に定義されるが、非予約文字のクラスは、UCSの文字（ユニバーサル文字セット、[ISO10646]）を追加することによって拡張されたU + 007Fを超え、そして、セクション6.1インチ

Otherwise, the syntax and use of components and reserved characters is the same as that in [RFC3986]. All the operations defined in [RFC3986], such as the resolution of relative references, can be applied to IRIs by IRI-processing software in exactly the same way as they are for URIs by URI-processing software.

そうでなければ、コンポーネントおよび予約文字の構文および使用は[RFC3986]と同様です。そのような相対参照の解像度と[RFC3986]で定義されたすべての操作は、それらがURI処理ソフトウェアによってURIのためのものと全く同様にIRI処理ソフトウェアアイリスに適用することができます。

Characters outside the US-ASCII repertoire are not reserved and therefore MUST NOT be used for syntactical purposes, such as to delimit components in newly defined schemes. For example, U+00A2, CENT SIGN, is not allowed as a delimiter in IRIs, because it is in the 'iunreserved' category. This is similar to the fact that it is not possible to use '-' as a delimiter in URIs, because it is in the 'unreserved' category.

US-ASCIIレパートリー外の文字は予約されていませんので、新しく定義されたスキームでコンポーネントを区切るためになど、構文の目的のために使用してはいけません。それはiunreserved」のカテゴリにあるため例えば、U + 00A2、セント記号は、アイリスで区切り文字として許可されていません。使用することはできないという事実に似て「 - 」のURIで区切り文字として、それは予約されていない」のカテゴリにあるため。

2.2. ABNF for IRI References and IRIs

2.2. IRI参照と虹彩のためのABNF

Although it might be possible to define IRI references and IRIs merely by their transformation to URI references and URIs, they can also be accepted and processed directly. Therefore, an ABNF definition for IRI references (which are the most general concept and the start of the grammar) and IRIs is given here. The syntax of this ABNF is described in [RFC2234]. Character numbers are taken from the UCS, without implying any actual binary encoding. Terminals in the ABNF are characters, not bytes.

それはURI参照とのURIへの変換だけでIRI参照やアイリスを定義することは可能かもしれませんが、彼らはまた、受け入れられると直接処理することができます。したがって、（最も一般的な概念や文法の開始です）、アイリスがここに与えられたIRI参照のABNF定義。このABNFの構文は、[RFC2234]に記載されています。文字番号は、任意の実際のバイナリエンコーディングを意味せず、UCSから取られます。 ABNF内の端末は文字ではなく、バイトです。

The following grammar closely follows the URI grammar in [RFC3986], except that the range of unreserved characters is expanded to include UCS characters, with the restriction that private UCS characters can occur only in query parts. The grammar is split into two parts: Rules that differ from [RFC3986] because of the above-mentioned expansion, and rules that are the same as those in [RFC3986]. For rules that are different than those in [RFC3986], the names of the non-terminals have been changed as follows. If the non-terminal contains 'URI', this has been changed to 'IRI'. Otherwise, an 'i' has been prefixed.

次の文法は、密接に非予約文字の範囲は、プライベートUCS文字のみクエリ部分で起こり得ることを制限して、UCS文字を含むように拡張されることを除いて、[RFC3986]にURI文法に従います。なぜなら、上記膨張[RFC3986]と異なるルール、ルール[RFC3986]と同様である：文法は二つの部分に分割されます。次のように[RFC3986]のものとは異なるルールのため、非端末の名前が変更されています。非端末は「URI」が含まれている場合、これは「IRI」に変更されました。それ以外の場合は、「私は」前置されています。

The following rules are different from those in [RFC3986]:

以下の規則は[RFC3986]とは異なっています。

IRI = scheme ":" ihier-part [ "?" iquery ] [ "#" ifragment ]

IRI =スキーム "：" ihierパート[ "？" IQUERY] [ "＃" ifragment]

ihier-part = "//" iauthority ipath-abempty / ipath-absolute / ipath-rootless / ipath-empty

ihier-部分は= "//" iauthority IPATH-abempty / IPATH絶対/ IPATH-ルートレス/ IPATH、空

IRI-reference = IRI / irelative-ref

IRI-参照= IRI / irelative-REF

absolute-IRI = scheme ":" ihier-part [ "?" iquery ]

絶対IRI =スキーム "：" ihierパート[ "？" IQUERY]

irelative-ref = irelative-part [ "?" iquery ] [ "#" ifragment ]

irelative-REF = irelativeパート[ "？" IQUERY] [ "＃" ifragment]

irelative-part = "//" iauthority ipath-abempty / ipath-absolute

相対パーティ権限IPATH空/ IPATH、絶対

                  / ipath-noscheme
                  / ipath-empty

iauthority = [ iuserinfo "@" ] ihost [ ":" port ] iuserinfo = *( iunreserved / pct-encoded / sub-delims / ":" ) ihost = IP-literal / IPv4address / ireg-name

iauthority = ihostの[ "@" iuserinfo] [ "：" ポート] iuserinfo = *（iunreserved / PCTエンコード/サブdelims / "："）ihost = IP-リテラル/ IPv4Addressを/ IREG名

ireg-name = *( iunreserved / pct-encoded / sub-delims )

IREG-NAME = *（iunreserved / PCTエンコード/サブdelims）

ipath = ipath-abempty ; begins with "/" or is empty / ipath-absolute ; begins with "/" but not "//" / ipath-noscheme ; begins with a non-colon segment / ipath-rootless ; begins with a segment / ipath-empty ; zero characters

IPATH = IPATH-abempty。「/」で始まるまたは空/ IPATH絶対です。 "/" ではなく "//" / IPATH-noschemeで始まります。非結腸セグメント/ IPATH-ルートレスで始まります。セグメント/ IPATH空で始まります。ゼロの文字

ipath-abempty = *( "/" isegment ) ipath-absolute = "/" [ isegment-nz *( "/" isegment ) ] ipath-noscheme = isegment-nz-nc *( "/" isegment ) ipath-rootless = isegment-nz *( "/" isegment ) ipath-empty = 0<ipchar>

IPATH-abempty = *（ "/" isegment）IPATH絶対= "/" [isegment-NZ *（ "/" isegment）] IPATH-noscheme = isegment-NZ-NC *（ "/" isegment）IPATH、ルートレス= NZ-isegment *（ "/" isegment）IPATH空= 0 <ipchar>

isegment = *ipchar isegment-nz = 1*ipchar isegment-nz-nc = 1*( iunreserved / pct-encoded / sub-delims / "@" ) ; non-zero-length segment without any colon ":"

isegment = * ipchar isegment-NZ = 1 * ipchar isegment-NZ-NC = 1 *（iunreserved / PCTエンコード/ /サブdelims "@"）。任意結腸ことなく、非ゼロ長セグメント「：」

ipchar = iunreserved / pct-encoded / sub-delims / ":" / "@"

ipchar = iunreserved / PCT-エンコード/サブdelims / "：" / "@"

iquery = *( ipchar / iprivate / "/" / "?" )

IQUERY = *（ipchar / iprivate / "/" / "？"）

ifragment = *( ipchar / "/" / "?" )

ifragment = *（ipchar / "/" / "？"）

iunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar

iunreserved = ALPHA / DIGIT / " - " / "" / "_" / "〜" / ucschar

ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD / %xD0000-DFFFD / %xE1000-EFFFD

ucschar =％XA0-D7FF /％xF900-FDCF /％xFDF0-FFEF /％x10000-1FFFD /％x20000-2FFFD /％x30000-3FFFD /％x40000-4FFFD /％x50000-5FFFD /％x60000-6FFFD /％x70000- 7FFFD /％x80000-8FFFD /％x90000-9FFFD /％xA0000-AFFFD /％xB0000-BFFFD /％xC0000-CFFFD /％xD0000-DFFFD /％xE1000-EFFFD

iprivate = %xE000-F8FF / %xF0000-FFFFD / %x100000-10FFFD

iprivate =％xE000-F8FF /％xF0000-FFFFD /％x100000-10FFFD

Some productions are ambiguous. The "first-match-wins" (a.k.a. "greedy") algorithm applies. For details, see [RFC3986].

いくつかの作品はあいまいです。「最初に一致-勝利」（別称、「貪欲」）アルゴリズムが適用されます。詳細については、[RFC3986]を参照してください。

The following rules are the same as those in [RFC3986]:

以下の規則は[RFC3986]と同様です。

scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

スキーム= ALPHAの*（ALPHA / DIGIT / "+" / " - " / ""）

port = *DIGIT

ポート= * DIGIT

IP-literal = "[" ( IPv6address / IPvFuture ) "]"

IP-リテラル= "["（IPv6address / IPvFuture） "]"

IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )

IPvFuture = "" 1 * HESDIG "" 1 *（inreserved /サブdelims / "："）

IPv6address = 6( h16 ":" ) ls32 / "::" 5( h16 ":" ) ls32 / [ h16 ] "::" 4( h16 ":" ) ls32 / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32 / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32 / [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32 / [ *4( h16 ":" ) h16 ] "::" ls32 / [ *5( h16 ":" ) h16 ] "::" h16 / [ *6( h16 ":" ) h16 ] "::"

IPv6address = 6（H16 "："）ls32 / "::" 5（H16 "："）ls32 / [H16] "::" 4（H16 "："）ls32 / [* 1（H16 "："）H16 "::" 3（H16 "："）ls32 / [* 2（H16 "："）H16 "::" 2（H16 "："）ls32 / [* 3（H16 "："）H16]」 :: "H16 "：" ls32 / [* 4（H16 "："）H16 "::" ls32 / [* 5（H16 "："）H16 "::" H16 / [* 6（H16"： "）H16] "::"

h16 = 1*4HEXDIG ls32 = ( h16 ":" h16 ) / IPv4address

H16 = 1 * 4HEXDIG ls32 =（H16 "：" H16）/ IPv4Addressを

IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet

IPv4Addressを= 12月オクテット「」 12月オクテット「」 12月オクテット「」 12月オクテット

dec-octet = DIGIT ; 0-9 / %x31-39 DIGIT ; 10-99 / "1" 2DIGIT ; 100-199 / "2" %x30-34 DIGIT ; 200-249 / "25" %x30-35 ; 250-255

12月オクテット= DIGIT; 0-9 /％x31-39 DIGIT。 10-99 / "1" 2DIGIT。 100-199 / "2" ％x30-34 DIGIT。 200から249 / "25" ％x30-35。 250-255

pct-encoded = "%" HEXDIG HEXDIG

PCTエンコード= "％" HEXDIG HEXDIG

unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

予約されていない= ALPHA / DIGIT / " - " / "" / "_" / "〜" 予約済み= GEN-delims /サブdelims GEN-delims = "：" / "/" / "？" / "＃" / "[" / "]" /サブdelims = "@" "！" / "$" / "＆" / " '' /"（ "/"） "/" * "/" + "/"、 "/"; " / "="

This syntax does not support IPv6 scoped addressing zone identifiers.

この構文は、IPv6がゾーン識別子に対処スコープをサポートしていません。

3. Relationship between IRIs and URIs

虹彩とURIの間の3の関係

IRIs are meant to replace URIs in identifying resources for protocols, formats, and software components that use a UCS-based character repertoire. These protocols and components may never need to use URIs directly, especially when the resource identifier is used simply for identification purposes. However, when the resource identifier is used for resource retrieval, it is in many cases necessary to determine the associated URI, because currently most retrieval mechanisms are only defined for URIs. In this case, IRIs can serve as presentation elements for URI protocol elements. An example would be an address bar in a Web user agent. (Additional rationale is given in section 3.1.)

アイリスUCSベースの文字レパートリを使用するプロトコル、フォーマット、およびソフトウェア・コンポーネントのためのリソースを識別するのにURIを置き換えることを意味しています。これらのプロトコルおよびコンポーネントは、リソース識別子は識別目的のために単に使用される場合は特に、直接URIを使用する必要はないかもしれません。リソース識別子は、リソース検索のために使用される場合、現在最も検索メカニズムはURIのみのために定義されているのでしかし、それは、関連するURIを決定するのに必要な場合が多いです。この場合、虹彩がURIのプロトコル要素のためのプレゼンテーション要素として機能することができます。例では、Webユーザー・エージェントのアドレスバーになります。（追加の根拠は、セクション3.1に記載されています。）

3.1. Mapping of IRIs to URIs

3.1. URIに虹彩のマッピング

This section defines how to map an IRI to a URI. Everything in this section also applies to IRI references and URI references, as well as to components thereof (for example, fragment identifiers).

このセクションでは、URIにIRIをマッピングする方法を定義します。このセクションのすべてはまた、IRI参照とURI参照に、ならびにそれらの成分（例えば、フラグメント識別子）にも当てはまります。

This mapping has two purposes:

このマッピングは、2つの目的があります。

Syntaxical. Many URI schemes and components define additional syntactical restrictions not captured in section 2.2. Scheme-specific restrictions are applied to IRIs by converting IRIs to URIs and checking the URIs against the scheme-specific restrictions.

Syntaxical。多くのURIスキームとコンポーネントは、セクション2.2に取り込まれていない追加的な構文の制約を定義します。スキーム固有の制限は、URIにアイリスを変換し、スキーム固有の制限に対するURIをチェックすることにより、アイリス適用されます。

Interpretational. URIs identify resources in various ways. IRIs also identify resources. When the IRI is used solely for identification purposes, it is not necessary to map the IRI to a URI (see section 5). However, when an IRI is used for resource retrieval, the resource that the IRI locates is the same as the one located by the URI obtained after converting the IRI according to the procedure defined here. This means that there is no need to define resolution separately on the IRI level.

解釈上。 URIは、さまざまな方法でリソースを識別します。アイリスもリソースを識別します。 IRIは、単に識別の目的で使用されるとき、URI（セクション5を参照）IRIをマッピングする必要はありません。 IRIは、リソース検索のために使用される場合しかし、IRIが位置リソースは、ここで定義された手順に従ってIRIを変換した後に得られたURIによって配置と同じです。これは、IRIレベルで個別に解像度を定義する必要がないことを意味します。

Applications MUST map IRIs to URIs by using the following two steps.

アプリケーションは、次の2つのステップを使用してURIにアイリスをマップする必要があります。

Step 1. Generate a UCS character sequence from the original IRI format. This step has the following three variants, depending on the form of the input:

ステップ1は、元のIRI形式からUCS文字列を生成します。このステップは、入力の形態に応じて、以下の3つのバリアントがあります。

            a. If the IRI is written on paper, read aloud, or otherwise
               represented as a sequence of characters independent of
               any character encoding, represent the IRI as a sequence
               of characters from the UCS normalized according to
               Normalization Form C (NFC, [UTR15]).

b. If the IRI is in some digital representation (e.g., an octet stream) in some known non-Unicode character encoding, convert the IRI to a sequence of characters from the UCS normalized according to NFC.

B。 IRIは、いくつかの既知の非Unicode文字エンコーディングの一部のデジタル表現（例えば、オクテットストリーム）である場合、NFCに従って正規化UCSから文字の配列にIRIを変換します。

c. If the IRI is in a Unicode-based character encoding (for example, UTF-8 or UTF-16), do not normalize (see section 5.3.2.2 for details). Apply step 2 directly to the encoded Unicode character sequence.

C。 IRIは、Unicodeベースの文字エンコード（例えば、UTF-8やUTF-16）である場合、（詳細についてはセクション5.3.2.2を参照）を正規化していません。エンコードされたUnicode文字列に直接ステップ2を適用します。

Step 2. For each character in 'ucschar' or 'iprivate', apply steps 2.1 through 2.3 below.

「ucschar」または「iprivate」の各文字については、ステップ2、下記の2.3からステップ2.1を適用します。

       2.1.  Convert the character to a sequence of one or more octets
             using UTF-8 [RFC3629].

2.2. Convert each octet to %HH, where HH is the hexadecimal notation of the octet value. Note that this is identical to the percent-encoding mechanism in section 2.1 of [RFC3986]. To reduce variability, the hexadecimal notation SHOULD use uppercase letters.

2.2. 各オクテットは、HHは、オクテット値の16進表記である％HHに変換します。これは[RFC3986]のセクション2.1のパーセントエンコーディング機構と同一であることに留意されたいです。変動性を低減するために、16進数は、大文字を使用すべきです。

2.3. Replace the original character with the resulting character sequence (i.e., a sequence of %HH triplets).

2.3. 得られた文字列（％HHトリプレットの、すなわち、配列）で元の文字を置き換えます。

The above mapping from IRIs to URIs produces URIs fully conforming to [RFC3986]. The mapping is also an identity transformation for URIs and is idempotent; applying the mapping a second time will not change anything. Every URI is by definition an IRI.

URIの虹彩から上記マッピングは、完全に[RFC3986]に準拠したURIを生成します。マッピングは、また、URIの恒等変換であり、冪等です。マッピングをもう一度適用することは何も変更されません。すべてのURIを定義IRIです。

Systems accepting IRIs MAY convert the ireg-name component of an IRI as follows (before step 2 above) for schemes known to use domain names in ireg-name, if the scheme definition does not allow percent-encoding for ireg-name:

スキーム定義がIREG-nameのパーセントエンコーディングを許可しない場合、IREG-名にドメイン名を使用することが知られているスキームのために（上記ステップ2の前に）次のように虹彩IRIのIREG-名コンポーネントを変換することができる受付システム。

Replace the ireg-name part of the IRI by the part converted using the ToASCII operation specified in section 4.1 of [RFC3490] on each dot-separated label, and by using U+002E (FULL STOP) as a label separator, with the flag UseSTD3ASCIIRules set to TRUE, and with the flag AllowUnassigned set to FALSE for creating IRIs and set to TRUE otherwise.

一部によってIRIのIREG名の部品を交換フラグで、各ドットで区切られたラベルに、ラベルセパレーターとしてU + 002E（FULL STOP）を使用して、[RFC3490]のセクション4.1で指定されたもしToASCII操作を使用して変換UseSTD3ASCIIRulesはTRUEに設定され、フラグでアイリスを作成するためにFALSEに設定され、そうでない場合はTRUEに設定さAllowUnassigned。

The ToASCII operation may fail, but this would mean that the IRI cannot be resolved. This conversion SHOULD be used when the goal is to maximize interoperability with legacy URI resolvers. For example, the IRI

もしToASCII操作は失敗する可能性がありますが、これはIRIは解決できないことを意味します。目標は、レガシーURIリゾルバとの相互運用性を最大化することであるとき、この変換を使用する必要があります。例えば、IRI

"http://résumé.example.org"

"のhttp：// R＆＃XE9;サム＆＃XE9; .example.org"

may be converted to

に変換することができます。

"http://xn--rsum-bpad.example.org"

”ｈっｔｐ：／／んーーｒすｍーｂぱｄ。えぁｍｐぇ。おｒｇ”

instead of

の代わりに

"http://r%C3%A9sum%C3%A9.example.org".

"のhttp：//r%C3%A9sum%C3%A9.example.org"。

An IRI with a scheme that is known to use domain names in ireg-name, but where the scheme definition does not allow percent-encoding for ireg-name, meets scheme-specific restrictions if either the straightforward conversion or the conversion using the ToASCII operation on ireg-name result in an URI that meets the scheme-specific restrictions.

IREG-名にドメイン名を使用することが知られているスキームとIRIが、どこスキームの定義は、IREG-nameのパーセントエンコーディングを許可しない、単純な変換やもしToASCII操作を使用して変換のいずれかの場合のスキーム固有の制限を満たしていますスキーム固有の制限を満たしているURIでIREG名の結果に。

Such an IRI resolves to the URI obtained after converting the IRI and uses the ToASCII operation on ireg-name. Implementations do not have to do this conversion as long as they produce the same result.

そのようなIRIは、URIに解決IRIを変換した後に得られたとIREG名にもしToASCII操作を使用します。実装は、限り、彼らは同じ結果を生成し、この変換を行う必要はありません。

Note: The difference between variants b and c in step 1 (using normalization with NFC, versus not using any normalization) accounts for the fact that in many non-Unicode character encodings, some text cannot be represented directly. For example, the word "Vietnam" is natively written "Việt Nam" (containing a LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW) in NFC, but a direct transcoding from the windows-1258 character encoding leads to "Việt Nam" (containing a LATIN SMALL LETTER E WITH CIRCUMFLEX followed by a COMBINING DOT BELOW). Direct transcoding of other 8-bit encodings of Vietnamese may lead to other representations.

注意：との違いは、ステップ1でbとcをバリアント（NFCで正規化を使用して、任意の正規化を使用していない対）多くの非Unicode文字エンコーディングでは、いくつかのテキストを直接表すことができないという事実を占めています。例えば、単語「ベトナムは」ネイティブ「Viの＆＃x1EC7;トンのナム」書かれているNFC中（LATIN SMALL LETTER曲折以下DOT WITH Eを含む）を、が、Windows-1258文字エンコーディングからの直接のトランスコードは「Viの＆につながります#xEA;＆＃1 x323; Tのナム」（下記COMBINING DOT続い回旋付きラテン小文字Eを含みます）。ベトナムの他の8ビットエンコーディングの直接トランスコーディングは、他の表現につながる可能性があります。

Note: The uniform treatment of the whole IRI in step 2 is important to make processing independent of URI scheme. See [Gettys] for an in-depth discussion.

注：ステップ2で全体IRIの均一な処理は、URIスキームとは無関係に処理することが重要です。徹底的な議論のための[ゲティス]を参照してください。

Note: In practice, whether the general mapping (steps 1 and 2) or the ToASCII operation of [RFC3490] is used for ireg-name will not be noticed if mapping from IRI to URI and resolution is tightly integrated (e.g., carried out in the same user agent). But conversion using [RFC3490] may be able to better deal with backwards compatibility issues in case mapping and resolution are separated, as in the case of using an HTTP proxy.

注：実際には、一般的なマッピングはIRIからURIへのマッピングと解像度が緊密に統合されている場合、または[RFC3490]のもしToASCII操作がIREG-名のために使用されるで行わ、例えば（気付かれないであろう（図1および2ステップ）か否か同じユーザーエージェント）。しかし、変換[RFC3490]を使用して、HTTPプロキシを使用した場合と同様に、分離されている場合のマッピングと解像度の後方互換性の問題とのより良い対処することができるかもしれません。

Note: Internationalized Domain Names may be contained in parts of an IRI other than the ireg-name part. It is the responsibility of scheme-specific implementations (if the Internationalized Domain Name is part of the scheme syntax) or of server-side implementations (if the Internationalized Domain Name is part of 'iquery') to apply the necessary conversions at the appropriate point. Example: Trying to validate the Web page at http://résumé.example.org would lead to an IRI of http://validator.w3.org/check?uri=http%3A%2F%2Frésumé. example.org, which would convert to a URI of http://validator.w3.org/check?uri=http%3A%2F%2Fr%C3%A9sum%C3%A9. example.org. The server side implementation would be responsible for making the necessary conversions to be able to retrieve the Web page.

注：国際化ドメイン名は、IREG-名部分以外のIRIの部分に含まれていてもよいです。（国際化ドメイン名が「IQUERY」の一部である場合）、適切な時点で必要な変換を適用するために（国際化ドメイン名スキーム構文の一部である場合）方式固有の実装の責任であるか、またはサーバ側の実装の。例：HTTPでWebページを検証しようとすると：// R＆＃XE9を、サム＆＃XE9; .example.orgはhttp://validator.w3.org/check?uri=http%3A%2FのIRIにつながります％2FR＆＃XE9;サム＆＃XE9 ;. http://validator.w3.org/check?uri=http%3A%2F%2Fr%C3%A9sum%C3%A9のURIに変換しますexample.org、。 example.org。サーバー側の実装は、Webページを取得できるように、必要な変換を行うための責任を負うことになります。

Systems accepting IRIs MAY also deal with the printable characters in US-ASCII that are not allowed in URIs, namely "<", ">", '"', space, "{", "}", "|", "\", "^", and "`", in step 2 above. If these characters are found but are not converted, then the conversion SHOULD fail. Please note that the number sign ("#"), the percent sign ("%"), and the square bracket characters ("[", "]") are not part of the above list and MUST NOT be converted. Protocols and formats that have used earlier definitions of IRIs including these characters MAY require percent-encoding of these characters as a preprocessing step to extract the actual IRI from a given field. This preprocessing MAY also be used by applications allowing the user to enter an IRI.

アイリスも「<」、「>」、「"」、スペース、『{』、『}すなわち、URIの中で許可されていないUS-ASCIIで印刷可能な文字を扱うかもしれない受諾システム』、『|』、" \ 「『^』、および『`』、上記のステップ2インチこれらの文字が発見されたが変換されない場合、変換は失敗するはずです。注意してくださいその番号記号（『＃』）、パーセント記号（」％「）、および角括弧文字（」『）上記リストの一部ではなく、変換されてはいけません。これらのパーセントエンコーディングを必要とする。これらの文字を含む虹彩の以前の定義を使用しているプロトコルとフォーマット[」、』]指定されたフィールドからの実際のIRIを抽出する前処理ステップとして文字。この前処理は、ユーザがIRIを入力することを可能にするアプリケーションによって使用されてもよいです。

Note: In this process (in step 2.3), characters allowed in URI references and existing percent-encoded sequences are not encoded further. (This mapping is similar to, but different from, the encoding applied when arbitrary content is included in some part of a URI.) For example, an IRI of "http://www.example.org/red%09rosé#red" (in XML notation) is converted to "http://www.example.org/red%09ros%C3%A9#red", not to something like "http%3A%2F%2Fwww.example.org%2Fred%2509ros%C3%A9%23red".

注：このプロセス（ステップ2.3）で、文字がURI参照にさせ、既存のパーセントエンコード配列は、さらに、符号化されていません。（このマッピングは同様であるが、任意のコンテンツをURIの一部に含まれている場合とは異なる、符号化が適用される。）例えば、「http://www.example.org/red%09ros&＃XE9のIRI。 "（XML表記）に変換され、 "http://www.example.org/red%09ros%C3%A9#redのhttp％3A％2F％2Fwww.example.org％" ではないようなものに" #red 2Fred％2509ros％C3％A9％23red」。

Note: Some older software transcoding to UTF-8 may produce illegal output for some input, in particular for characters outside the BMP (Basic Multilingual Plane). As an example, for the IRI with non-BMP characters (in XML Notation): "http://example.com/𐌀𐌁&#x10302";

注意：UTF-8に一部の古いソフトウェアトランスコーディングは、BMP（基本多言語面）外の文字のために、特に、いくつかの入力のための違法な出力を生成することができます。非BMP文字（XML表記）とIRIのために、一例として、 "＆＃x10302;＆＃x10301 http://example.com/&＃x10300"。

which contains the first three letters of the Old Italic alphabet, the correct conversion to a URI is "http://example.com/%F0%90%8C%80%F0%90%8C%81%F0%90%8C%82"

旧斜体アルファベットの最初の3つの文字が含まれている、URIへの正しい変換がhttp://example.com/%F0%90%8C%80%F0%90%8C%81%F0%90%8C」です％82"

3.2. Converting URIs to IRIs

3.2. アイリスURIを変換します

In some situations, converting a URI into an equivalent IRI may be desirable. This section gives a procedure for this conversion. The conversion described in this section will always result in an IRI that maps back to the URI used as an input for the conversion (except for potential case differences in percent-encoding and for potential percent-encoded unreserved characters). However, the IRI resulting from this conversion may not be exactly the same as the original IRI (if there ever was one).

いくつかの状況では、同等のIRIにURIを変換することが望ましい場合があります。このセクションでは、この変換のための手順を説明します。このセクションで説明する変換が常にバックURIにマッピングIRIになり（パーセントエンコーディングにおける電位場合の違いを除き、そして潜在的なパーセントエンコードされた非予約文字の）変換のための入力として使用されます。（これまでのものがあった場合）しかし、この変換されたIRIは、元のIRIとまったく同じではないかもしれません。

URI-to-IRI conversion removes percent-encodings, but not all percent-encodings can be eliminated. There are several reasons for this:

URIツーIRI変換パーセントエンコーディングを削除し、すべてではなくパーセントエンコーディングをなくすことができます。これにはいくつかの理由があります。

1. Some percent-encodings are necessary to distinguish percent-encoded and unencoded uses of reserved characters.

1.一部のパーセントエンコーディングは予約文字のパーセントエンコードおよびエンコードされていない用途を区別する必要があります。

2. Some percent-encodings cannot be interpreted as sequences of UTF-8 octets.

2.いくつかのパーセントエンコーディングはUTF-8オクテットのシーケンスとして解釈することはできません。

       (Note: The octet patterns of UTF-8 are highly regular.
       Therefore, there is a very high probability, but no guarantee,
       that percent-encodings that can be interpreted as sequences of
       UTF-8 octets actually originated from UTF-8.  For a detailed
       discussion, see [Duerst97].)

3. The conversion may result in a character that is not appropriate in an IRI. See sections 2.2, 4.1, and 6.1 for further details.

前記変換は、IRIに適切でない文字をもたらすことができます。セクション2.2、4.1、および詳細については6.1を参照してください。

Conversion from a URI to an IRI is done by using the following steps (or any other algorithm that produces the same result):

IRIのURIからの変換は、以下のステップ（又は同じ結果を生成する任意の他のアルゴリズム）を使用することによって行われます。

1. Represent the URI as a sequence of octets in US-ASCII.

1. US-ASCIIのオクテットのシーケンスとしてURIを表します。

2. Convert all percent-encodings ("%" followed by two hexadecimal digits) to the corresponding octets, except those corresponding to "%", characters in "reserved", and characters in US-ASCII not allowed in URIs.

2.すべての「％」に対応するものを除いて、対応するオクテットにパーセントエンコーディング（「％」は2桁の16進数に続く）、「予約」の文字と、URIの中で許可されていないUS-ASCIIの文字を変換します。

3. Re-percent-encode any octet produced in step 2 that is not part of a strictly legal UTF-8 octet sequence.

3.再パーセントエンコード厳密法的UTF-8オクテット配列の一部ではないステップ2で生成される任意のオクテット。

4. Re-percent-encode all octets produced in step 3 that in UTF-8 represent characters that are not appropriate according to sections 2.2, 4.1, and 6.1.

4.再パーセントエンコードは、ステップ3で生成された全てのオクテットはことでUTF-8は、セクション2.2、4.1、及び6.1に従って適切でない文字を表します。

5. Interpret the resulting octet sequence as a sequence of characters encoded in UTF-8.

5. UTF-8でエンコードされた文字のシーケンスとして得られたオクテットシーケンスを解釈。

This procedure will convert as many percent-encoded characters as possible to characters in an IRI. Because there are some choices when step 4 is applied (see section 6.1), results may vary.

この手順は、IRIの文字にできるだけ多くのパーセントエンコードされた文字を変換します。ステップ4は（セクション6.1を参照）が適用されたときに、いくつかの選択肢があるため、結果が変化してもよいです。

Conversions from URIs to IRIs MUST NOT use any character encoding other than UTF-8 in steps 3 and 4, even if it might be possible to guess from the context that another character encoding than UTF-8 was used in the URI. For example, the URI "http://www.example.org/r%E9sum%E9.html" might with some guessing be interpreted to contain two e-acute characters encoded as iso-8859-1. It must not be converted to an IRI containing these e-acute characters. Otherwise, in the future the IRI will be mapped to "http://www.example.org/r%C3%A9sum%C3%A9.html", which is a different URI from "http://www.example.org/r%E9sum%E9.html".

URIから虹彩の変換は、UTF-8以外の文字エンコーディングをURIで使用されたことを文脈から推測することは可能かもしれない場合でも、手順3と4にUTF-8以外の文字エンコーディングを使用してはなりません。たとえば、URI「http://www.example.org/r%E9sum%E9.htmlは」いくつかの推測ではISO-8859-1としてエンコードされた2つの電子急性文字を含むように解釈される可能性があります。これは、これらの電子急性文字を含むIRIに変換してはいけません。 //www.example：それ以外の場合は、将来的にIRIは、http」から別のURIである、「http://www.example.org/r%C3%A9sum%C3%A9.html」にマップされます。 ORG / R％E9sum％E9.html」。

3.2.1. Examples

3.2.1. 例

This section shows various examples of converting URIs to IRIs. Each example shows the result after each of the steps 1 through 5 is applied. XML Notation is used for the final result. Octets are denoted by "<" followed by two hexadecimal digits followed by ">".

このセクションでは、虹彩にURIを変換する様々な例を示しています。各例は、1〜5が適用された工程の各々後の結果を示しています。 XML表記は、最終的な結果のために使用されています。オクテットは「<」に続く2桁の16進数が続く「>」によって示されています。

The following example contains the sequence "%C3%BC", which is a strictly legal UTF-8 sequence, and which is converted into the actual character U+00FC, LATIN SMALL LETTER U WITH DIAERESIS (also known as u-umlaut).

次の例は、厳密に法的なUTF-8配列である配列の「％C3の％のBC」を、含まれており、その（また、U-ウムラウトとしても知られる）は、実際の文字U + 00FC、分音記号付きラテン小文字Uに変換されます。

1. http://www.example.org/D%C3%BCrst

１。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％Ｃ３％ＢＣｒｓｔ

2. http://www.example.org/D<c3><bc>rst

２。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ＜ｃ３＞＜ｂｃ＞ｒｓｔ

3. http://www.example.org/D<c3><bc>rst

３。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ＜ｃ３＞＜ｂｃ＞ｒｓｔ

4. http://www.example.org/D<c3><bc>rst

４。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ＜ｃ３＞＜ｂｃ＞ｒｓｔ

5. http://www.example.org/Dürst

5. http://www.example.org/D& #xFC、RST

The following example contains the sequence "%FC", which might represent U+00FC, LATIN SMALL LETTER U WITH DIAERESIS, in the iso-8859-1 character encoding. (It might represent other characters in other character encodings. For example, the octet <fc> in iso-8859-5 represents U+045C, CYRILLIC SMALL LETTER KJE.) Because <fc> is not part of a strictly legal UTF-8 sequence, it is re-percent-encoded in step 3.

次の例は、ISO-8859-1文字エンコーディングで、U + 00FCを表すかもしれないシーケンス「％のFC」、分音記号付きラテン小文字Uを含んでいます。（これは、他の文字エンコーディングで他の文字を表すことができます。たとえば、ISO-8859-5でオクテット<FC>はU + 045C、CYRILLIC SMALL LETTER KJEを表します。）<FC>は厳密に法的なUTF-8の一部ではないのでシーケンスは、ステップ3で再パーセントエンコードされています。

1. http://www.example.org/D%FCrst

１。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％ＦＣｒｓｔ

2. http://www.example.org/D<fc>rst

２。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ＜ｆｃ＞ｒｓｔ

3. http://www.example.org/D%FCrst

３。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％ＦＣｒｓｔ

4. http://www.example.org/D%FCrst

４。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％ＦＣｒｓｔ

5. http://www.example.org/D%FCrst

５。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％ＦＣｒｓｔ

The following example contains "%e2%80%ae", which is the percent-encoded UTF-8 character encoding of U+202E, RIGHT-TO-LEFT OVERRIDE. Section 4.1 forbids the direct use of this character in an IRI. Therefore, the corresponding octets are re-percent-encoded in step 4. This example shows that the case (upper- or lowercase) of letters used in percent-encodings may not be preserved. The example also contains a punycode-encoded domain name label (xn--99zt52a), which is not converted.

次の例では、U + 202E、右から左へのOVERRIDEのパーセントエンコードされたUTF-8文字エンコーディングでは "％E2％80％のAE" を含んでいます。 4.1節は、IRIでこの文字を直接使用することを禁止します。したがって、対応するオクテットは、この例では、パーセントエンコーディングに使用される文字の場合（大文字または小文字）が保存されなくてもよいことを示すステップ4で再パーセントエンコードされています。変換されず、 - 例では、Punycodeで符号化されたドメイン名ラベル（99zt52a XN）を含みます。

1. http://xn--99zt52a.example.org/%e2%80%ae

１。ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／％え２％８０％あえ

2. http://xn--99zt52a.example.org/<e2><80><ae>

２。ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／＜え２＞＜８０＞＜あえ＞

3. http://xn--99zt52a.example.org/<e2><80><ae>

３。ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／＜え２＞＜８０＞＜あえ＞

4. http://xn--99zt52a.example.org/%E2%80%AE

４。ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／％え２％８０％あえ

5. http://xn--99zt52a.example.org/%E2%80%AE

５。ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／％え２％８０％あえ

Implementations with scheme-specific knowledge MAY convert punycode-encoded domain name labels to the corresponding characters by using the ToUnicode procedure. Thus, for the example above, the label "xn--99zt52a" may be converted to U+7D0D U+8C46 (Japanese Natto), leading to the overall IRI of "http://納豆.example.org/%E2%80%AE".

スキーム固有の知識を持つ実装はのToUnicodeプロシージャを使用して、対応する文字にPunycodeでエンコードされたドメイン名のラベルを変換することができます。したがって、上記の例では、ラベル "XN - 99zt52aは" HTTP」の全体的なIRIをもたらす、U + 7D0D U + 8C46（日本納豆）に変換することができる：//＆＃x7D0D;＆＃x8C46 ;. example.org/%E2%80%AE」。

4. Bidirectional IRIs for Right-to-Left Languages

右から左の言語4.双方向のIRI

Some UCS characters, such as those used in the Arabic and Hebrew scripts, have an inherent right-to-left (rtl) writing direction. IRIs containing these characters (called bidirectional IRIs or Bidi IRIs) require additional attention because of the non-trivial relation between logical representation (used for digital representation and for reading/spelling) and visual representation (used for display/printing).

アラビア語やヘブライ語のスクリプトで使用されるもの、のようないくつかのUCS文字は、本来の右から左（RTL）書き込み方向を持っています。（双方向のIRIまたは双方向のIRIと呼ばれる）は、これらの文字を含む虹彩ため（デジタル表現および読み取り/スペリングに使用される）論理表現と（表示/印刷に使用される）視覚表現との間の非自明な関係の更なる注意が必要です。

Because of the complex interaction between the logical representation, the visual representation, and the syntax of a Bidi IRI, a balance is needed between various requirements. The main requirements are

そのための論理的な表現、視覚的表現、および双方向IRIの構文との間の複雑な相互作用のため、バランスは様々な要件の間で必要とされています。主な要件は、

1. user-predictable conversion between visual and logical representation;

視覚的及び論理的な表現との間の1ユーザー予測可能変換。

2. the ability to include a wide range of characters in various parts of the IRI; and

IRIの様々な部分内の文字の広い範囲を含む2.能力。そして

3. minor or no changes or restrictions for implementations.

3.未成年や実装の変更なしまたは制限。

4.1. Logical Storage and Visual Presentation

4.1. 論理ストレージおよびVisualプレゼンテーション

When stored or transmitted in digital representation, bidirectional IRIs MUST be in full logical order and MUST conform to the IRI syntax rules (which includes the rules relevant to their scheme). This ensures that bidirectional IRIs can be processed in the same way as other IRIs.

デジタル表現に格納されている、または送信する際、双方向アイリスフル論理的な順序でなければならず、（そのスキームに関連するルールを含む）IRIの構文規則に従わなければなりません。これは、双方向のアイリス他のIRIと同じように処理できることを保証します。

Bidirectional IRIs MUST be rendered by using the Unicode Bidirectional Algorithm [UNIV4], [UNI9]. Bidirectional IRIs MUST be rendered in the same way as they would be if they were in a left-to-right embedding; i.e., as if they were preceded by U+202A, LEFT-TO-RIGHT EMBEDDING (LRE), and followed by U+202C, POP DIRECTIONAL FORMATTING (PDF). Setting the embedding direction can also be done in a higher-level protocol (e.g., the dir='ltr' attribute in HTML).

双方向アイリスユニコード双方向アルゴリズム[UNIV4]、[UNI9]を使用してレンダリングされなければなりません。それらは左から右への埋め込みにあった場合、彼らはであるように、双方向虹彩同じ方法でレンダリングされなければなりません。すなわち、それらは、U + 202A、左から右への埋め込み（LRE）が先行し、U + 202C、POP指向フォーマット（PDF）続いたかのように。埋め込み方向を設定することは、より高いレベルのプロトコルで行うことができる（例えば、HTMLの属性「LTR」DIR =）。

There is no requirement to use the above embedding if the display is still the same without the embedding. For example, a bidirectional IRI in a text with left-to-right base directionality (such as used for English or Cyrillic) that is preceded and followed by whitespace and strong left-to-right characters does not need an embedding. Also, a bidirectional relative IRI reference that only contains strong right-to-left characters and weak characters and that starts and ends with a strong right-to-left character and appears in a text with right-to-left base directionality (such as used for Arabic or Hebrew) and is preceded and followed by whitespace and strong characters does not need an embedding.

ディスプレイが埋め込まずにまだ同じである場合は、上記の埋め込みを使用する必要はありません。例えば、空白と強い左から右への文字が先行し、続いて（英語またはキリル文字のために使用されるような）は、左から右への塩基指向のテキストにおける双方向IRIは埋め込みを必要としません。強いだけ右から左文字と弱いの文字が含まれており、開始され、強力な右から左文字で終了し、右から左ベース方向とテキストで表示されます（のようなものということ。また、双方向の相対IRI参照アラビア語やヘブライ語のために使用される）と埋め込みを必要としない空白や強力な文字が先行し、続いています。

In some other cases, using U+200E, LEFT-TO-RIGHT MARK (LRM), may be sufficient to force the correct display behavior. However, the details of the Unicode Bidirectional algorithm are not always easy to understand. Implementers are strongly advised to err on the side of caution and to use embedding in all cases where they are not completely sure that the display behavior is unaffected without the embedding.

いくつかの他の場合には、U + 200Eを用いて、左から右へMARK（LRM）、正しい表示動作を強制するのに十分であり得ます。しかし、Unicodeの双方向アルゴリズムの詳細は、常に理解することは簡単ではありません。実装者は、強く注意の側に誤るために、彼らは表示動作が埋め込まず影響を受けないことを完全に確認されていませんすべてのケースに埋め込む使用することをお勧めします。

The Unicode Bidirectional Algorithm ([UNI9], section 4.3) permits higher-level protocols to influence bidirectional rendering. Such changes by higher-level protocols MUST NOT be used if they change the rendering of IRIs.

ユニコード双方向アルゴリズム（[UNI9]、セクション4.3）双方向のレンダリングに影響を与えるために、より高いレベルのプロトコルを可能にします。彼らは虹彩のレンダリングを変更した場合、より高いレベルのプロトコルによって、このような変更を使用してはいけません。

The bidirectional formatting characters that may be used before or after the IRI to ensure correct display are not themselves part of the IRI. IRIs MUST NOT contain bidirectional formatting characters (LRM, RLM, LRE, RLE, LRO, RLO, and PDF). They affect the visual rendering of the IRI but do not appear themselves. It would therefore not be possible to input an IRI with such characters correctly.

正しい表示を確保するために、IRIの前または後に使用することができる双方向の書式文字自体はIRIの一部ではありません。アイリス双方向の書式文字（LRM、RLM、LRE、RLE、LRO、RLO、およびPDF）を含めることはできません。彼らは、IRIの視覚的レンダリングに影響を与えるが、自らを表示されません。したがって、正確な文字で入力IRIすることはできません。

4.2. Bidi IRI Structure

4.2. 双方向IRIの構造

The Unicode Bidirectional Algorithm is designed mainly for running text. To make sure that it does not affect the rendering of bidirectional IRIs too much, some restrictions on bidirectional IRIs are necessary. These restrictions are given in terms of delimiters (structural characters, mostly punctuation such as "@", ".", ":", and "/") and components (usually consisting mostly of letters and digits).

Unicodeの双方向アルゴリズムは、テキストを実行するために主に設計されています。それはあまりにも多くの双方向の虹彩のレンダリングには影響しないことを確認するために、双方向のIRI上のいくつかの制限が必要です。これらの制限は、区切り文字の面で（「」『：』など、「@」、などの構造的な文字、ほとんど句読点を、そして『/』）が与えられ、コンポーネント（通常は主に文字と数字からなります）。

The following syntax rules from section 2.2 correspond to components for the purpose of Bidi behavior: iuserinfo, ireg-name, isegment, isegment-nz, isegment-nz-nc, ireg-name, iquery, and ifragment.

iuserinfo、IREG-名、isegment、isegment-NZ、isegment-NZ-NC、IREG-名、IQUERY、及びifragment：セクション2.2から次の構文規則は、双方向の動作のために構成要素に対応します。

Specifications that define the syntax of any of the above components MAY divide them further and define smaller parts to be components according to this document. As an example, the restrictions of [RFC3490] on bidirectional domain names correspond to treating each label of a domain name as a component for schemes with ireg-name as a domain name. Even where the components are not defined formally, it may be helpful to think about some syntax in terms of components and to apply the relevant restrictions. For example, for the usual name/value syntax in query parts, it is convenient to treat each name and each value as a component. As another example, the extensions in a resource name can be treated as separate components.

上記成分のいずれかの構文を定義する仕様は、それらを分割し、この文書に係る構成要素であることが小さい部分を定義することができます。一例として、双方向のドメイン名に[RFC3490]の制限は、ドメイン名とIREG-名前のスキームのための成分として、ドメイン名の各ラベルの治療に対応します。コンポーネントは正式に定義されていない場合でも、コンポーネントの面でいくつかの構文を考えるようにし、関連する制限を適用すると便利かもしれません。たとえば、クエリの部分で通常の名前/値の構文については、構成要素としてそれぞれの名前とそれぞれの値を処理することが便利です。別の例として、リソース名に拡張子は、別々の構成要素として扱うことができます。

For each component, the following restrictions apply:

各コンポーネントについて、以下の制限が適用されます。

1. A component SHOULD NOT use both right-to-left and left-to-right characters.

1.コンポーネントは、左へ右と左から右への文字の両方を使用しないでください。

2. A component using right-to-left characters SHOULD start and end with right-to-left characters.

2.右から左への文字を使用してコンポーネントを起動し、右から左への文字で終わるべきです。

The above restrictions are given as shoulds, rather than as musts. For IRIs that are never presented visually, they are not relevant. However, for IRIs in general, they are very important to ensure consistent conversion between visual presentation and logical representation, in both directions.

上記の制限はshouldsとしてではなく、マストとして与えられています。視覚的に提示されることはありませんIRIをするために、彼らは関係ありません。しかし、一般的な虹彩のために、彼らは両方の方向で、視覚的なプレゼンテーションと論理的な表現の間で一貫性の転換を確実にするために非常に重要です。

Note: In some components, the above restrictions may actually be strictly enforced. For example, [RFC3490] requires that these restrictions apply to the labels of a host name for those schemes where ireg-name is a host name. In some other components (for example, path components) following these restrictions may not be too difficult. For other components, such as parts of the query part, it may be very difficult to enforce the restrictions because the values of query parameters may be arbitrary character sequences.

注：一部のコンポーネントでは、上記の制限は、実際に厳密に適用することができます。例えば、[RFC3490]は、これらの制限は、IREG-nameはホスト名で、これらのスキームのホスト名のラベルに適用されることが必要です。いくつかの他のコンポーネントに（例えば、パス成分）これらの制限以下はあまりにも困難ではないかもしれません。クエリパラメータの値は任意の文字列であってもよいため、このようなクエリ一部の部品のような他の成分については、制限を適用することは非常に困難であってもよいです。

If the above restrictions cannot be satisfied otherwise, the affected component can always be mapped to URI notation as described in section 3.1. Please note that the whole component has to be mapped (see also Example 9 below).

上記の制限は、そうでなければ満足できない場合、セクション3.1で説明したように、影響を受けるコンポーネントは、常にURI表記にマッピングすることができます。全成分がマッピングされなければならないことに注意してください（以下の実施例9も参照）。

4.3. Input of Bidi IRIs

4.3. 双方向虹彩の入力

Bidi input methods MUST generate Bidi IRIs in logical order while rendering them according to section 4.1. During input, rendering SHOULD be updated after every new character is input to avoid end-user confusion.

セクション4.1に従ってそれらをレンダリングしながら双方向入力方法は、論理的な順序で双方向アイリスを生成しなければなりません。すべての新しい文字がエンドユーザーの混乱を避けるために、入力された後の入力時には、レンダリングを更新する必要があります。

4.4. Examples

4.4. 例

This section gives examples of bidirectional IRIs, in Bidi Notation. It shows legal IRIs with the relationship between logical and visual representation and explains how certain phenomena in this relationship may look strange to somebody not familiar with bidirectional behavior, but familiar to users of Arabic and Hebrew. It also shows what happens if the restrictions given in section 4.2 are not followed. The examples below can be seen at [BidiEx], in Arabic, Hebrew, and Bidi Notation variants.

このセクションでは、双方向表記で、双方向の虹彩の例を示します。これは、論理的かつ視覚的な表現との関係で法的アイリスを示しており、この関係では、特定の現象が双方向動作に精通したが、アラビア語とヘブライ語のユーザーにはなじみのない誰かには奇妙に見えるかもしれ方法を説明します。また、セクション4.2で与えられた制限が守られていない場合、何が起こるかを示しています。以下の例は、アラビア語、ヘブライ語、および双方向表記変種で、[BidiEx]で見ることができます。

To read the bidi text in the examples, read the visual representation from left to right until you encounter a block of rtl text. Read the rtl block (including slashes and other special characters) from right to left, then continue at the next unread ltr character.

あなたはRTLテキストのブロックに遭遇するまでの例では双方向テキストを読むには、左から右への視覚的な表現をお読みください。次の未読LTRの文字で続行し、右から左に（スラッシュや他の特殊文字を含む）RTLブロックをお読みください。

Example 1: A single component with rtl characters is inverted: Logical representation: "http://ab.CDEFGH.ij/kl/mn/op.html" Visual representation: "http://ab.HGFEDC.ij/kl/mn/op.html" Components can be read one by one, and each component can be read in its natural direction.

例1：RTL文字を単一の成分が反転します。論理表現：「のhttp：//ab.CDEFGH.ij/kl/mn/op.html」ビジュアル表現：「のhttp：//ab.HGFEDC.ij/kl/ MN / op.html」コンポーネントは、一つ一つを読み取ることができ、各成分は、その天然の方向に読み取ることができます。

Example 2: More than one consecutive component with rtl characters is inverted as a whole: Logical representation: "http://ab.CDE.FGH/ij/kl/mn/op.html" Visual representation: "http://ab.HGF.EDC/ij/kl/mn/op.html" A sequence of rtl components is read rtl, in the same way as a sequence of rtl words is read rtl in a bidi text.

例2：RTL文字を含む複数の連続した部品は、全体として反転されます。論理表現：「のhttp：//ab.CDE.FGH/ij/kl/mn/op.html」ビジュアル表現：「のhttp：// AB .HGF.EDC / IJ / KL / MN / op.html」RTL要素の配列は、RTL単語のシーケンスは双方向テキストでRTLを読み取られると同様に、RTL読み出されます。

Example 3: All components of an IRI (except for the scheme) are rtl. All rtl components are inverted overall: Logical representation: "http://AB.CD.EF/GH/IJ/KL?MN=OP;QR=ST#UV" Visual representation: "http://VU#TS=RQ;PO=NM?LK/JI/HG/FE.DC.BA" The whole IRI (except the scheme) is read rtl. Delimiters between rtl components stay between the respective components; delimiters between ltr and rtl components don't move.

実施例3：（スキームを除く）IRIのすべてのコンポーネントは、RTLです。すべてのRTLコンポーネントは、全体的に反転されます。論理表現： "のhttp：//AB.CD.EF/GH/IJ/KL MN = OP、QR = ST＃UV？" ビジュアル表現：「のhttp：// VU番号のTS = RQ ;？PO = NM LK / JI / HG / FE.DC.BA」（スキームを除く）全体のIRIが読み込まれるRTL。 RTLコンポーネント間の区切り文字は、各コンポーネント間滞在します。 LTRとRTLコンポーネント間の区切り文字は移動しません。

Example 4: Each of several sequences of rtl components is inverted on its own: Logical representation: "http://AB.CD.ef/gh/IJ/KL.html" Visual representation: "http://DC.BA.ef/gh/LK/JI.html" Each sequence of rtl components is read rtl, in the same way as each sequence of rtl words in an ltr text is read rtl.

例4：RTLコンポーネントのいくつかの配列の各々が独自に反転されます。論理表現：「のhttp：//AB.CD.ef/gh/IJ/KL.html」ビジュアル表現：「http://DC.BA。 EF / GH / LK / JI.html」RTL要素の各シーケンスは、LTRテキストが読まれるRTLにRTLの単語のそれぞれの配列と同様に、読み取りRTLあります。

Example 5: Example 2, applied to components of different kinds: Logical representation: "http://ab.cd.EF/GH/ij/kl.html" Visual representation: "http://ab.cd.HG/FE/ij/kl.html" The inversion of the domain name label and the path component may be unexpected, but it is consistent with other bidi behavior. For reassurance that the domain component really is "ab.cd.EF", it may be helpful to read aloud the visual representation following the bidi algorithm. After "http://ab.cd." one reads the RTL block "E-F-slash-G-H", which corresponds to the logical representation.

例5：実施例2は、異なる種類のコンポーネントに適用されます。論理表現： "のhttp：//ab.cd.EF/GH/ij/kl.html" ビジュアル表現：「のhttp：//ab.cd.HG/FE /ij/kl.html」ドメイン名ラベルの反転およびパス・コンポーネントは、予期しないかもしれないが、それは他の双方向の挙動と一致しています。ドメインコンポーネントが本当に「ab.cd.EF」で安心のために、声を出して、双方向アルゴリズム以下の視覚的な表現を読むことが役立つかもしれません。 "http://ab.cd。" 後一つは論理的表現に対応するRTLブロック「E-F-スラッシュ-G-H」を読み出します。

Example 6: Same as Example 5, with more rtl components: Logical representation: "http://ab.CD.EF/GH/IJ/kl.html" Visual representation: "http://ab.JI/HG/FE.DC/kl.html" The inversion of the domain name labels and the path components may be easier to identify because the delimiters also move.

例6：論理的な表現： "のhttp：//ab.CD.EF/GH/IJ/kl.html" ビジュアル表現：「のhttp：//ab.JI/HG/FE以上のRTLコンポーネントは、実施例5と同じ.DC / kl.html」区切り文字も移動するため、ドメイン名ラベルとパス成分の反転が識別が容易であってもよいです。

Example 7: A single rtl component includes digits: Logical representation: "http://ab.CDE123FGH.ij/kl/mn/op.html" Visual representation: "http://ab.HGF123EDC.ij/kl/mn/op.html" Numbers are written ltr in all cases but are treated as an additional embedding inside a run of rtl characters. This is completely consistent with usual bidirectional text.

実施例7：単一のRTLコンポーネントは、数字を含む：論理的表現： "HTTP：//ab.CDE123FGH.ij/kl/mn/op.html" 視覚的表現：「HTTP：//ab.HGF123EDC.ij/kl/mn/ op.html」数字はすべての場合にLTRを書かれているが、RTL文字の実行内側に追加埋め込みとして扱われます。これは、通常の双方向テキストと完全に一致しています。

Example 8 (not allowed): Numbers are at the start or end of an rtl component: Logical representation: "http://ab.cd.ef/GH1/2IJ/KL.html" Visual representation: "http://ab.cd.ef/LK/JI1/2HG.html" The sequence "1/2" is interpreted by the bidi algorithm as a fraction, fragmenting the components and leading to confusion. There are other characters that are interpreted in a special way close to numbers; in particular, "+", "-", "#", "$", "%", ",", ".", and ":".

実施例8（許可されていません）：論理的な表現：「のhttp：//ab.cd.ef/GH1/2IJ/KL.html」ビジュアル表現：「のhttp：// AB数値は、RTLコンポーネントの開始や終了にあり.cd.ef / LK / JI1 / 2HG.html」配列『1/2』のコンポーネントを断片化し、混乱につながる、分数としてBIDIアルゴリズムによって解釈されます。数値に近い特別な方法で解釈されている他の文字があります。具体的には、 "+"、 " - "、 "＃"、 "$"、 "％"、 " "および"。" "："。

Example 9 (not allowed): The numbers in the previous example are percent-encoded: Logical representation: "http://ab.cd.ef/GH%31/%32IJ/KL.html", Visual representation (Hebrew): "http://ab.cd.ef/%31HG/LK/JI%32.html" Visual representation (Arabic): "http://ab.cd.ef/31%HG/%LK/JI32.html" Depending on whether the uppercase letters represent Arabic or Hebrew, the visual representation is different.

実施例9（許可されていない）：論理的な表現：「HTTP：//ab.cd.ef/GH%31/%32IJ/KL.html」、視覚的表現（ヘブライ語）前の例の数字はパーセントエンコードされています。 "のhttp：//ab.cd.ef/%31HG/LK/JI%32.html" ビジュアル表現（アラビア語）： "のhttp：//ab.cd.ef/31%HG/%LK/JI32.html"大文字は、アラビア語やヘブライ語を表すかどうかに応じて、視覚的な表現が異なっています。

Example 10 (allowed but not recommended): Logical representation: "http://ab.CDEFGH.123/kl/mn/op.html" Visual representation: "http://ab.123.HGFEDC/kl/mn/op.html" Components consisting of only numbers are allowed (it would be rather difficult to prohibit them), but these may interact with adjacent RTL components in ways that are not easy to predict.

実施例10（許可されますが推奨されません）：論理的な表現： "のhttp：//ab.CDEFGH.123/kl/mn/op.html" ビジュアル表現：「のhttp：//ab.123.HGFEDC/kl/mn/op .htmlを」数字だけからなるコンポーネントは、（それらを禁止するということは困難であろう）許可されているが、これらは予測することは容易ではない方法で、隣接するRTLコンポーネントと相互作用することができます。

5. Normalization and Comparison

5.正規化との比較

      Note: The structure and much of the material for this section is
      taken from section 6 of [RFC3986]; the differences are due to the
      specifics of IRIs.

One of the most common operations on IRIs is simple comparison: Determining whether two IRIs are equivalent without using the IRIs or the mapped URIs to access their respective resource(s). A comparison is performed whenever a response cache is accessed, a browser checks its history to color a link, or an XML parser processes tags within a namespace. Extensive normalization prior to comparison of IRIs may be used by spiders and indexing engines to prune a search space or reduce duplication of request actions and response storage.

2つのアイリスそれぞれのリソースをアクセスするためのIRIまたはマップされたURIを使用せずに等価であるかどうかを決定する虹彩上の最も一般的な操作の一つは、単純な比較です。応答キャッシュがアクセスされるたびに比較が行われ、ブラウザがリンクを着色するためのその履歴をチェックし、またはXMLパーサは、名前空間内のタグを処理します。前虹彩の比較に大規模な正規化は、探索空間をプルーニングまたは要求アクションと応答ストレージの重複を減らすためにスパイダーとインデックスエンジンで使用することができます。

IRI comparison is performed for some particular purpose. Protocols or implementations that compare IRIs for different purposes will often be subject to differing design trade-offs in regards to how much effort should be spent in reducing aliased identifiers. This section describes various methods that may be used to compare IRIs, the trade-offs between them, and the types of applications that might use them.

IRI比較は、いくつかの特定の目的のために行われます。異なる目的のためにアイリスを比較するプロトコルや実装は、多くの場合、エイリアス識別子を減らすのに費やされるべきでどれだけの労力に関しては、設計上のトレードオフが異なるの対象となります。このセクションでは、アイリスを比較するために使用することができる様々な方法、それらの間のトレードオフ、およびそれらを使用する可能性があるアプリケーションの種類について説明します。

5.1. Equivalence

5.1. 等価

Because IRIs exist to identify resources, presumably they should be considered equivalent when they identify the same resource. However, this definition of equivalence is not of much practical use, as there is no way for an implementation to compare two resources unless it has full knowledge or control of them. For this reason, determination of equivalence or difference of IRIs is based on string comparison, perhaps augmented by reference to additional rules provided by URI scheme definitions. We use the terms "different" and "equivalent" to describe the possible outcomes of such comparisons, but there are many application-dependent versions of equivalence.

アイリスリソースを識別するために存在しているので、彼らが同じリソースを識別するとき、おそらく彼らは同等とみなされるべきです。それは完全な知識やそれらのコントロールを持っていない限り、2つのリソースを比較するための実装方法がないようしかし、同値のこの定義は、はるかに実用化ではありません。この理由のために、同等または虹彩の差の決意は、恐らくURIスキームの定義によって提供される追加のルールを参照することによって拡張文字列の比較に基づいています。私たちは、このような比較の可能な結果を記述するために「異なる」と「同等」という用語を使用しますが、同等の多くのアプリケーションに依存するバージョンがあります。

Even though it is possible to determine that two IRIs are equivalent, IRI comparison is not sufficient to determine whether two IRIs identify different resources. For example, an owner of two different domain names could decide to serve the same resource from both, resulting in two different IRIs. Therefore, comparison methods are designed to minimize false negatives while strictly avoiding false positives.

それは、2つの虹彩等価であると判断することができるにもかかわらず、IRI比較は、二つの虹彩異なるリソースを識別するかどうかを決定するために十分ではありません。例えば、2つの異なるドメイン名の所有者は、二つの異なる虹彩その結果、両方から同じリソースを提供することを決定することもできます。そのため、比較方法は、厳密に偽陽性を回避しながら、偽陰性を最小限に抑えるように設計されています。

In testing for equivalence, applications should not directly compare relative references; the references should be converted to their respective target IRIs before comparison. When IRIs are compared to select (or avoid) a network action, such as retrieval of a representation, fragment components (if any) should be excluded from the comparison.

等価性のテストでは、アプリケーションは直接相対参照を比較するべきではありません。参照は、比較の前に、それぞれのターゲットアイリスに変換する必要があります。アイリスそのような表現の検索として、ネットワークアクションを選択（または回避）するために比較される場合、断片のコンポーネントは、（もしあれば）の比較から除外されるべきです。

Applications using IRIs as identity tokens with no relationship to a protocol MUST use the Simple String Comparison (see section 5.3.1). All other applications MUST select one of the comparison practices from the Comparison Ladder (see section 5.3 or, after IRI-to-URI conversion, select one of the comparison practices from the URI comparison ladder in [RFC3986], section 6.2)

プロトコルとは無関係とアイデンティティトークンとしてアイリスを使用するアプリケーションは、単純な文字列の比較を（セクション5.3.1を参照）を使用しなければなりません。比較ラダーから比較慣行のいずれかを選択しなければならない他のすべてのアプリケーションは、（中URI比較ラダーから比較慣行のいずれかを選択し、IRI対URI変換した後、セクション5.3を参照するか、[RFC3986]、セクション6.2）

5.2. Preparation for Comparison

5.2. 比較のための準備

Any kind of IRI comparison REQUIRES that all escapings or encodings in the protocol or format that carries an IRI are resolved. This is usually done when the protocol or format is parsed. Examples of such escapings or encodings are entities and numeric character references in [HTML4] and [XML1]. As an example, "http://example.org/rosé" (in HTML), "http://example.org/ros&#233"; (in HTML or XML), and "http://example.org/ros&#xE9"; (in HTML or XML) are all resolved into what is denoted in this document (see section 1.4) as "http://example.org/ros&#xE9"; (the "é" here standing for the actual e-acute character, to compensate for the fact that this document cannot contain non-ASCII characters).

IRI比較任意の種類は、IRIを運ぶプロトコルまたはフォーマット内のすべてescapingsまたはエンコーディングが解決されることを必要とします。プロトコルやフォーマットが解析されたときにこれは通常行われています。このようescapingsやエンコーディングの例としては、[XML1]エンティティと[HTML4]で数値文字参照としています。一例として、 "http://example.org/ros& eacute;" （HTML）で、 "http://example.org/ros&#233"。（HTMLまたはXMLで）、および "http://example.org/ros&#xE9"。（HTMLまたはXMLで）すべての「http://example.org/ros&#xE9」として（1.4節を参照）、この文書に示されているものに分解されています。（「＆＃XE9;」ここにこの文書は非ASCII文字を含めることができないという事実を補うために、実際の電子急性文字を表します）。

Similar considerations apply to encodings such as Transfer Codings in HTTP (see [RFC2616]) and Content Transfer Encodings in MIME ([RFC2045]), although in these cases, the encoding is based not on characters but on octets, and additional care is required to make sure that characters, and not just arbitrary octets, are compared (see section 5.3.1).

これらのケースでは、エンコーディングがない文字ではなくオクテットに基づいており、さらに注意が必要ですが、同様の考察は、MIMEでこのようなHTTPで転送コーディングとしてエンコード（[RFC2616]を参照）、コンテンツ転送エンコーディング（[RFC2045]）に適用されます文字だけではなく、任意のオクテットが、比較されていることを確認する（セクション5.3.1を参照してください）。

5.3. Comparison Ladder

5.3. 比較ラダー

In practice, a variety of methods are used, to test IRI equivalence. These methods fall into a range distinguished by the amount of processing required and the degree to which the probability of false negatives is reduced. As noted above, false negatives cannot be eliminated. In practice, their probability can be reduced, but this reduction requires more processing and is not cost-effective for all applications.

実際には、種々の方法は、IRIの等価性をテストするために、使用されています。これらの方法は、必要な処理の量および偽陰性の確率が低減される程度によって区別範囲に入ります。上述したように、偽陰性を排除することはできません。実際には、その確率を低減することができますが、この減少は、より多くの処理を必要とし、費用対効果のすべてのアプリケーションではありません。

If this range of comparison practices is considered as a ladder, the following discussion will climb the ladder, starting with practices that are cheap but have a relatively higher chance of producing false negatives, and proceeding to those that have higher computational cost and lower risk of false negatives.

比較の実践のこの範囲は、ラダーとみなされた場合は、以下の議論は安いですが、偽陰性を生産する比較的高い可能性を持っている慣行で始まる、はしごを登ると、高い計算コストとのリスクが低いものに進めます偽陰性。

5.3.1. Simple String Comparison

5.3.1. 単純な文字列比較

If two IRIs, when considered as character strings, are identical, then it is safe to conclude that they are equivalent. This type of equivalence test has very low computational cost and is in wide use in a variety of applications, particularly in the domain of parsing. It is also used when a definitive answer to the question of IRI equivalence is needed that is independent of the scheme used and that can be calculated quickly and without accessing a network. An example of such a case is XML Namespaces ([XMLNamespace]).

文字列として考えるときに、2つのアイリス、同一であれば、彼らが等価であると結論しても安全です。等価テストのこのタイプは非常に低い計算コストを持っており、特に構文解析のドメインでは、様々な用途に広く用いられています。 IRIの等価の質問に対する明確な答えは使用スキームに依存しない必要とされている場合にも使用され、そのは、迅速かつネットワークにアクセスせずに計算することができます。そのような場合の例は、XML名前空間（[XMLNamespace]）です。

Testing strings for equivalence requires some basic precautions. This procedure is often referred to as "bit-for-bit" or "byte-for-byte" comparison, which is potentially misleading. Testing strings for equality is normally based on pair comparison of the characters that make up the strings, starting from the first and proceeding until both strings are exhausted and all characters are found to be equal, until a pair of characters compares unequal, or until one of the strings is exhausted before the other.

等価のための文字列をテストするいくつかの基本的な予防措置を必要とします。この手順は、しばしば「のためのビットのビット」または潜在的に誤解を招くおそれがあり、「バイト単位」の比較、と呼ばれています。等価のテストストリングは、通常、両方の文字列が排出されると文字のペアが等しくない比較するまで、すべての文字は、同じであることが見出されるまで、第1及び進行から出発して、文字列を構成する文字の対比較に、または1つまで基づいています文字列の他の前に排出されます。

This character comparison requires that each pair of characters be put in comparable encoding form. For example, should one IRI be stored in a byte array in UTF-8 encoding form and the second in a UTF-16 encoding form, bit-for-bit comparisons applied naively will produce errors. It is better to speak of equality on a character-for-character rather than on a byte-for-byte or bit-for-bit basis. In practical terms, character-by-character comparisons should be done codepoint by codepoint after conversion to a common character encoding form. When comparing character by character, the comparison function MUST NOT map IRIs to URIs, because such a mapping would create additional spurious equivalences. It follows that an IRI SHOULD NOT be modified when being transported if there is any chance that this IRI might be used as an identifier.

この文字の比較は、文字の各ペアが同程度のエンコード形式で置かれている必要があります。 1 IRIは、UTF-8エンコーディング形式およびUTF-16符号化形態の第2のバイト配列に格納する必要があり、例えば、ビット単位の比較は単純にエラーを生成する適用しました。文字と文字の上ではなく、バイト単位またはビットごとのベースで平等の話をすることをお勧めします。実際には、文字単位の比較は、共通の文字コード形式に変換した後、コードポイントによってコードポイント行われるべきです。文字ずつ比較する場合、このようなマッピングが追加スプリアス等価を作成しますので、比較関数は、URIにアイリスをマッピングしてはなりません。このIRIが識別子として使用されるかもしれないというチャンスがある場合は搬送されたときにIRIは変更しないでくださいということになります。

False negatives are caused by the production and use of IRI aliases. Unnecessary aliases can be reduced, regardless of the comparison method, by consistently providing IRI references in an already normalized form (i.e., a form identical to what would be produced after normalization is applied, as described below). Protocols and data formats often limit some IRI comparisons to simple string comparison, based on the theory that people and implementations will, in their own best interest, be consistent in providing IRI references, or at least be consistent enough to negate any efficiency that might be obtained from further normalization.

偽陰性は、IRIの別名の生産と使用によって引き起こされます。不要なエイリアスが一貫して既に正規化された形式でIRI参照を提供することにより、関係なく、比較方法の、低減することができる（すなわち、正規化が適用された後、以下に記載されるように、生成されるものと同じ形式）。プロトコルとデータ形式は、多くの場合、または少なくともかもしれない任意の効率を否定するのに十分な一貫性がある人と実装は、自分の最善の利益に、IRI参照を提供する上で一貫していることを理論に基づいて、単純な文字列の比較にはいくつかのIRI比較を制限しますさらに正規化から得られました。

5.3.2. Syntax-Based Normalization

5.3.2. 構文ベースの正規化

Implementations may use logic based on the definitions provided by this specification to reduce the probability of false negatives. This processing is moderately higher in cost than character-for-character string comparison. For example, an application using this approach could reasonably consider the following two IRIs equivalent:

実装は、偽陰性の確率を減らすためにこの仕様によって提供された定義に基づいてロジックを使用することができます。この処理は、文字のための文字列比較よりもコストが適度に高くなっています。例えば、このアプローチを使用するアプリケーションは、合理的に同等以下の二つのアイリスを検討することができます：

example://a/b/c/%7Bfoo%7D/rosé eXAMPLE://a/./b/../b/%63/%7bfoo%7d/ros%C3%A9

例：// A / B / C /％7Bfoo％7D / ROS＆＃XE9。例：//a/./b/../b/%63/%7bfoo%7d/ros%C3%A9

Web user agents, such as browsers, typically apply this type of IRI normalization when determining whether a cached response is available. Syntax-based normalization includes such techniques as case normalization, character normalization, percent-encoding normalization, and removal of dot-segments.

キャッシュされた応答が利用可能であるかどうかを決定する際にブラウザなどのウェブ・ユーザー・エージェントは、通常、IRIの正規化のこのタイプを適用します。構文ベース正規化は、ケースの正規化、文字の正規化、パーセントエンコーディングの正規化、及びドットセグメントの除去などの技術を含みます。

5.3.2.1. Case Normalization

5.3.2.1。ケースの正規化

For all IRIs, the hexadecimal digits within a percent-encoding triplet (e.g., "%3a" versus "%3A") are case-insensitive and therefore should be normalized to use uppercase letters for the digits A - F.

F. - 全て虹彩、パーセントコードトリプレット内の16進数字（例えば、「％3Aの」対「％3aは」）大文字と小文字が区別され、したがって、数字Aの大文字を使用するように正規化されなければならないため

When an IRI uses components of the generic syntax, the component syntax equivalence rules always apply; namely, that the scheme and US-ASCII only host are case insensitive and therefore should be normalized to lowercase. For example, the URI "HTTP://www.EXAMPLE.com/" is equivalent to "http://www.example.com/". Case equivalence for non-ASCII characters in IRI components that are IDNs are discussed in section 5.3.3. The other generic syntax components are assumed to be case sensitive unless specifically defined otherwise by the scheme.

IRIは、一般的な構文のコンポーネントを使用すると、コンポーネント構文等価性規則は常に適用されます。すなわち、スキームとUS-ASCIIのみのホストは大文字小文字を区別しないこと、したがって、小文字に正規化されなければなりません。たとえば、URI「HTTP://www.EXAMPLE.com/は、」「http://www.example.com/」に相当します。 IDNのあるIRIコンポーネントに非ASCII文字の場合の等価は、セクション5.3.3で説明されています。具体的スキームによって別段の定義がない限り、他の一般的な構文要素は、大文字と小文字を区別であると仮定されます。

Creating schemes that allow case-insensitive syntax components containing non-ASCII characters should be avoided. Case normalization of non-ASCII characters can be culturally dependent and is always a complex operation. The only exception concerns non-ASCII host names for which the character normalization includes a mapping step derived from case folding.

非ASCII文字を含む、大文字と小文字を区別しない構文コンポーネントは避けるべき許可スキームを作成します。非ASCII文字の場合、正規化は、文化的に依存すると常に複雑な操作であることができます。唯一の例外への懸念文字の正規化は、ケースの折りたたみ由来マッピングステップを含んでいるため、ASCII以外のホスト名。

5.3.2.2. Character Normalization

5.3.2.2。文字の正規化

The Unicode Standard [UNIV4] defines various equivalences between sequences of characters for various purposes. Unicode Standard Annex #15 [UTR15] defines various Normalization Forms for these equivalences, in particular Normalization Form C (NFC, Canonical Decomposition, followed by Canonical Composition) and Normalization Form KC (NFKC, Compatibility Decomposition, followed by Canonical Composition).

Unicode標準[UNIV4]は、様々な目的のために文字の配列の間の様々な等価性を定義します。 Unicode規格付属書＃15は、[UTR15これらの等価のための様々な正規化フォーム、特に正規化形式C（NFC、正準分解、正規組成が続く）及び（正規組成続いNFKC、互換分解）正規化形式KCを定義します。

Equivalence of IRIs MUST rely on the assumption that IRIs are appropriately pre-character-normalized rather than apply character normalization when comparing two IRIs. The exceptions are conversion from a non-digital form, and conversion from a non-UCS-based character encoding to a UCS-based character encoding. In these cases, NFC or a normalizing transcoder using NFC MUST be used for interoperability. To avoid false negatives and problems with transcoding, IRIs SHOULD be created by using NFC. Using NFKC may avoid even more problems; for example, by choosing half-width Latin letters instead of full-width ones, and full-width instead of half-width Katakana.

虹彩の等価性は、虹彩は、2つのアイリスを比較した場合、文字の正規化を適用するのではなく、適切に事前文字正規化されているという仮定に頼らなければなりません。例外は、非デジタル形式から変換、およびUCSベースの文字エンコーディング非UCSベースの文字エンコーディングからの変換です。これらのケースでは、NFCまたはNFCを使用して正規化トランスコーダは、相互運用性のために使用されなければなりません。トランスコーディングと偽陰性との問題を回避するために、アイリスは、NFCを使用して作成する必要があります。 NFKCを使用すると、さらに多くの問題を回避することができます。例えば、代わりにフル幅のものの半値幅ラテン文字、そして代わりに半角カタカナの全角を選択することもできます。

As an example, "http://www.example.org/résumé.html" (in XML Notation) is in NFC. On the other hand, "http://www.example.org/résumé.html" is not in NFC.

例えば、 ";和＆＃XE9; http://www.example.org/r&＃XE9の.html" として（XML表記）は、NFCです。一方、 ";スメ＆＃X301、X301 http://www.example.org/re&＃.htmlのは、" NFCではありません。

The former uses precombined e-acute characters, and the latter uses "e" characters followed by combining acute accents. Both usages are defined as canonically equivalent in [UNIV4].

電子急性文字予備結合前者の用途、及び後者の用途「E」の文字は、急性アクセントを組み合わせて行きました。両方の用途は、[UNIV4]における正準等価であると定義されます。

Note: Because it is unknown how a particular sequence of characters is being treated with respect to character normalization, it would be inappropriate to allow third parties to normalize an IRI arbitrarily. This does not contradict the recommendation that when a resource is created, its IRI should be as character normalized as possible (i.e., NFC or even NFKC). This is similar to the uppercase/lowercase problems. Some parts of a URI are case insensitive (domain name). For others, it is unclear whether they are case sensitive, case insensitive, or something in between (e.g., case sensitive, but with a multiple choice selection if the wrong case is used, instead of a direct negative result). The best recipe is that the creator use a reasonable capitalization and, when transferring the URI, capitalization never be changed.

注：これは、文字の特定のシーケンスは、文字の正規化に対して治療されている方法は不明であるので、第三者が任意にIRIを正規化することを可能にするためには不適切であろう。これは、リソースが作成されるとき、そのIRIを可能として、正規化文字（すなわち、NFCあるいはNFKC）などであることが推奨と矛盾しません。これは、大文字/小文字の問題に似ています。 URIの一部は大文字小文字を区別しない（ドメイン名）です。他の人のために、彼らが（例えば、大文字と小文字を区別し、代わりに直接否定的結果の間違ったケースが使用されている場合、複数の選択肢の選択、との）間での大文字と小文字を区別し、大文字小文字を区別しない、または何かしているかどうかは不明です。最高のレシピは、作成者が合理的な大文字小文字を使用して、URIを転送するときに、大文字と小文字を変更することはないということです。

Various IRI schemes may allow the usage of Internationalized Domain Names (IDN) [RFC3490] either in the ireg-name part or elsewhere. Character Normalization also applies to IDNs, as discussed in section 5.3.3.

様々なIRIスキームは、国際化ドメイン名（IDN）[RFC3490] IREG名の一部または他の場所のいずれかの使用を可能にすることができます。セクション5.3.3で説明したように文字の正規化はまた、IDNのに適用されます。

5.3.2.3. Percent-Encoding Normalization

5.3.2.3。パーセントエンコーディングの正規化

The percent-encoding mechanism (section 2.1 of [RFC3986]) is a frequent source of variance among otherwise identical IRIs. In addition to the case normalization issue noted above, some IRI producers percent-encode octets that do not require percent-encoding, resulting in IRIs that are equivalent to their non encoded counterparts. These IRIs should be normalized by decoding any percent-encoded octet sequence that corresponds to an unreserved character, as described in section 2.3 of [RFC3986].

パーセントエンコーディング機構（[RFC3986]のセクション2.1）は、他の点では同一のIRIの間で分散の頻繁な源です。ケースの正規化の問題に加えて、上記のように、それらの非符号化対応物に相当する虹彩得パーセントエンコーディングを必要としないいくつかのIRI生産パーセントエンコードオクテット。 [RFC3986]のセクション2.3に記載されているように、これらの虹彩、非予約文字に対応する任意のパーセントエンコードオクテットシーケンスをデコードすることにより正規化されるべきです。

For actual resolution, differences in percent-encoding (except for the percent-encoding of reserved characters) MUST always result in the same resource. For example, "http://example.org/~user", "http://example.org/%7euser", and "http://example.org/%7Euser", must resolve to the same resource.

実際の解決のために、（予約文字のパーセントエンコーディングを除く）パーセントエンコーディングの違いは、常に同じリソースをもたらさなければなりません。たとえば、「http://example.org/~user」、「http://example.org/%7euser」、および「http://example.org/%7Euser」、同じリソースに解決される必要があります。

If this kind of equivalence is to be tested, the percent-encoding of both IRIs to be compared has to be aligned; for example, by converting both IRIs to URIs (see section 3.1), eliminating escape differences in the resulting URIs, and making sure that the case of the hexadecimal characters in the percent-encoding is always the same (preferably uppercase). If the IRI is to be passed to another application or used further in some other way, its original form MUST be preserved. The conversion described here should be performed only for local comparison.

同等のこの種の試験される場合、比較する両方の虹彩のパーセントエンコーディングは、整列されなければなりません。例えば、URIに両方のアイリスを変換することにより得られたURIのエスケープ差をなくし、そしてパーセントエンコーディングの16進文字の場合は、常に同じ（好ましくは大文字）であることを確認し、（セクション3.1を参照）。 IRIは、別のアプリケーションに渡されるか、他の方法でさらに使用する場合、その元の形態が保存されなければなりません。ここで説明する変換は、ローカル比較のために実施すべきです。

5.3.2.4. Path Segment Normalization

5.3.2.4。パスセグメントの正規化

The complete path segments "." and ".." are intended only for use within relative references (section 4.1 of [RFC3986]) and are removed as part of the reference resolution process (section 5.2 of [RFC3986]). However, some implementations may incorrectly assume that reference resolution is not necessary when the reference is already an IRI, and thus fail to remove dot-segments when they occur in non-relative paths. IRI normalizers should remove dot-segments by applying the remove_dot_segments algorithm to the path, as described in section 5.2.4 of [RFC3986].

完全なパスセグメント「」そして、「..」相対参照（[RFC3986]のセクション4.1）内でのみ使用することを意図しており、基準解像度処理（[RFC3986]のセクション5.2）の一部として除去されます。しかし、いくつかの実装が誤って参照が既にIRIである場合、基準解像度が必要でないと仮定することができるので、それらは、非相対パスに発生したときのドットセグメントを削除することができません。 [RFC3986]のセクション5.2.4に記載したようにIRIの正規化は、パスにremove_dot_segmentsアルゴリズムを適用することによってドットセグメントを削除しなければなりません。

5.3.3. Scheme-Based Normalization

5.3.3. スキームに基づく正規化

The syntax and semantics of IRIs vary from scheme to scheme, as described by the defining specification for each scheme. Implementations may use scheme-specific rules, at further processing cost, to reduce the probability of false negatives. For example, because the "http" scheme makes use of an authority component, has a default port of "80", and defines an empty path to be equivalent to "/", the following four IRIs are equivalent:

各スキームの定義書によって記載されるように虹彩の構文と意味論は、スキームの方式に変わります。実装は、偽陰性の確率を減少させるために、更なる処理コストで、スキーム固有のルールを使用してもよいです。例えば、「HTTP」方式は、権限コンポーネントを利用するので、「80」のデフォルトのポートを有し、「/」と等価であることが空のパスを定義し、次の4つの虹彩等価です。

http://example.com http://example.com/ http://example.com:/ http://example.com:80/

ｈっｔｐ：／／えぁｍｐぇ。こｍｈっｔｐ：／／えぁｍｐぇ。こｍ／ｈっｔｐ：／／えぁｍｐぇ。こｍ：／ｈっｔｐ：／／えぁｍｐぇ。こｍ：８０／

In general, an IRI that uses the generic syntax for authority with an empty path should be normalized to a path of "/". Likewise, an explicit ":port", for which the port is empty or the default for the scheme, is equivalent to one where the port and its ":" delimiter are elided and thus should be removed by scheme-based normalization. For example, the second IRI above is the normal form for the "http" scheme.

一般的には、空のパスと権威のための一般的な構文を使用していますIRIは、「/」のパスに正規化する必要があります。同様に、明示的な「：」区切り省略され、従ってスキームに基づく正規化によって除去されなければならないポートが空またはスキームのデフォルトであるため、「ポート」は、ポートとその一つに相当します。例えば、上述した第2のIRIは、「HTTP」スキームの正規形です。

Another case where normalization varies by scheme is in the handling of an empty authority component or empty host subcomponent. For many scheme specifications, an empty authority or host is considered an error; for others, it is considered equivalent to "localhost" or the end-user's host. When a scheme defines a default for authority and an IRI reference to that default is desired, the reference should be normalized to an empty authority for the sake of uniformity, brevity, and internationalization. If, however, either the userinfo or port subcomponents are non-empty, then the host should be given explicitly even if it matches the default.

正規化スキームによって異なる別の場合には、空の権限コンポーネントまたは空のホストサブコンポーネントの取り扱いです。多くのスキームの仕様については、空の権限またはホストはエラーとみなされます。他人のために、それは「localhost」をまたはエンドユーザのホストと等価であると考えています。スキームは、権限のデフォルトを定義し、そのデフォルトのIRI参照が所望される場合、参照は均一、簡潔、及び国際化のために空の権限に正規化されるべきです。しかし、いずれかのuserinfoまたはポートサブコンポーネントが空でない場合、ホストはそれがデフォルトと一致した場合でも、明示的に指定する必要があります。

Normalization should not remove delimiters when their associated component is empty unless it is licensed to do so by the scheme specification. For example, the IRI "http://example.com/?" cannot be assumed to be equivalent to any of the examples above. Likewise, the presence or absence of delimiters within a userinfo subcomponent is usually significant to its interpretation. The fragment component is not subject to any scheme-based normalization; thus, two IRIs that differ only by the suffix "#" are considered different regardless of the scheme.

スキームの仕様によってそうするためにライセンスされていない限り、それに関連するコンポーネントが空のときに正規化は、区切り文字を削除しないでください。例えば、IRIは、 "http://example.com/？"上記の例のいずれかと等価であると仮定することはできません。同様に、サブコンポーネントのUserInfo内の区切り文字の有無がその解釈に通常は重要です。フラグメントコンポーネントは、任意のスキームに基づく正規化を受けません。このように、接尾辞「＃」だけが異なる2つのアイリスかかわらず、スキームの異なると考えられています。

Some IRI schemes may allow the usage of Internationalized Domain Names (IDN) [RFC3490] either in their ireg-name part or elsewhere. When in use in IRIs, those names SHOULD be validated by using the ToASCII operation defined in [RFC3490], with the flags "UseSTD3ASCIIRules" and "AllowUnassigned". An IRI containing an invalid IDN cannot successfully be resolved. Validated IDN components of IRIs SHOULD be character normalized by using the Nameprep process [RFC3491]; however, for legibility purposes, they SHOULD NOT be converted into ASCII Compatible Encoding (ACE).

いくつかのIRIスキームは、国際化ドメイン名（IDN）[RFC3490]自分のIREG名の一部または他の場所のいずれかの使用を可能にすることができます。虹彩使用中、これらの名前は、フラグ「UseSTD3ASCIIRules」および「AllowUnassigned」と、[RFC3490]で定義されたもしToASCII操作を使用して検証する必要があるとき。無効なIDNを含むIRIは正常に解決することはできません。虹彩の検証済みIDN成分は、文字がNAMEPREPプロセス[RFC3491]を使用することによって正規化されるべきです。しかし、読みやすさのために、彼らは、ASCII互換エンコーディング（ACE）に変換されるべきではありません。

Scheme-based normalization may also consider IDN components and their conversions to punycode as equivalent. As an example, "http://résumé.example.org" may be considered equivalent to "http://xn--rsum-bpad.example.org".

スキームベース正規化はまた、等価物としてPUNYCODEするIDNコンポーネントとその変換を考慮してもよいです。例として、 "HTTP：// R＆＃XE9;サム＆＃XE9は、.example.org" 相当 "http://xn--rsum-bpad.example.org" と見なすことができます。

Other scheme-specific normalizations are possible.

その他の方式固有の正規化が可能です。

5.3.4. Protocol-Based Normalization

5.3.4. プロトコルベースの正規化

Substantial effort to reduce the incidence of false negatives is often cost-effective for web spiders. Consequently, they implement even more aggressive techniques in IRI comparison. For example, if they observe that an IRI such as

偽陰性の発生率を減らすために相当な努力は、多くの場合、費用対効果の高いウェブスパイダーのためです。その結果、彼らは、IRIの比較においても、より積極的な技術を実装します。例えば、彼らは観察する場合にはそのようなIRI

http://example.com/data

ｈっｔｐ：／／えぁｍｐぇ。こｍ／だた

redirects to an IRI differing only in the trailing slash

末尾のスラッシュのみが異なるIRIにリダイレクト

http://example.com/data/

ｈっｔｐ：／／えぁｍｐぇ。こｍ／だた／

they will likely regard the two as equivalent in the future. This kind of technique is only appropriate when equivalence is clearly indicated by both the result of accessing the resources and the common conventions of their scheme's dereference algorithm (in this case, use of redirection by HTTP origin servers to avoid problems with relative references).

彼らは、おそらく将来的には同等のように2つを考えてます。等価が明確に（この場合には、HTTPのオリジンサーバによってリダイレクトの使用は相対参照の問題を回避するため）リソースとそのスキームの逆参照アルゴリズムの一般的な慣習にアクセスした結果の両方で表示されたときにこの種の技術は適切です。

6. Use of IRIs

虹彩の6.

6.1. Limitations on UCS Characters Allowed in IRIs

6.1. UCS文字の制限は、虹彩可

This section discusses limitations on characters and character sequences usable for IRIs beyond those given in section 2.2 and section 4.1. The considerations in this section are relevant when IRIs are created and when URIs are converted to IRIs.

このセクションでは、文字とセクション2.2と4.1節で与えられたもの以外のIRIのために使用可能な文字列の制限について説明します。 IRIは、作成時とURIの虹彩に変換されるとき、このセクションの考慮事項が関連しています。

a. The repertoire of characters allowed in each IRI component is limited by the definition of that component. For example, the definition of the scheme component does not allow characters beyond US-ASCII.

A。各IRIコンポーネントで使用できる文字のレパートリーは、そのコンポーネントの定義によって制限されます。例えば、スキーム構成要素の定義は、US-ASCII文字を越えては使用できません。

       (Note: In accordance with URI practice, generic IRI software
       cannot and should not check for such limitations.)

b. The UCS contains many areas of characters for which there are strong visual look-alikes. Because of the likelihood of transcription errors, these also should be avoided. This includes the full-width equivalents of Latin characters, half-width Katakana characters for Japanese, and many others. It also includes many look-alikes of "space", "delims", and "unwise", characters excluded in [RFC3491].

B。 UCSは、強力な視覚的なそっくりサイトがある対象の文字の多くの分野が含まれています。そのため、転写エラーの可能性のため、これらも避けるべきです。これはラテン文字、日本語半角カタカナ文字、および他の多くの全幅等価物を含みます。また、「空間」、「delims」、および「愚か」、[RFC3491]で除外文字の多くのそっくりサイトを含んでいます。

Additional information is available from [UNIXML]. [UNIXML] is written in the context of running text rather than in that of identifiers. Nevertheless, it discusses many of the categories of characters not appropriate for IRIs.

追加情報は[UNIXML]から入手可能です。【UNIXML]むしろ識別子のことよりもテキストの実行コンテキストで記述されています。それにもかかわらず、アイリスのために適切でない文字のカテゴリの多くを説明します。

6.2. Software Interfaces and Protocols

6.2. ソフトウェアインタフェースとプロトコル

Although an IRI is defined as a sequence of characters, software interfaces for URIs typically function on sequences of octets or other kinds of code units. Thus, software interfaces and protocols MUST define which character encoding is used.

IRIは、文字のシーケンスとして定義されているが、URIのソフトウェアインターフェースは、典型的には、オクテットまたはコード単位の他の種類の配列に機能します。したがって、ソフトウェアインタフェース及びプロトコルが使用される文字符号化を定義しなければなりません。

Intermediate software interfaces between IRI-capable components and URI-only components MUST map the IRIs per section 3.1, when transferring from IRI-capable to URI-only components. This mapping SHOULD be applied as late as possible. It SHOULD NOT be applied between components that are known to be able to handle IRIs.

URIのみコンポーネントにIRI-できるから転送時IRI-可能な成分とURIのみの構成要素間の中間ソフトウェアインターフェイスは、セクション3.1あたりの絞りをマップする必要があります。このマッピングは、できるだけ遅く適用されるべきです。これは、アイリスを扱うことができることが知られているコンポーネントの間で適用されるべきではありません。

6.3. Format of URIs and IRIs in Documents and Protocols

6.3. ドキュメントとプロトコルにおけるURIと虹彩のフォーマット

Document formats that transport URIs may have to be upgraded to allow the transport of IRIs. In cases where the document as a whole has a native character encoding, IRIs MUST also be encoded in this character encoding and converted accordingly by a parser or interpreter. IRI characters not expressible in the native character encoding SHOULD be escaped by using the escaping conventions of the document format if such conventions are available. Alternatively, they MAY be percent-encoded according to section 3.1. For example, in HTML or XML, numeric character references SHOULD be used. If a document as a whole has a native character encoding and that character encoding is not UTF-8, then IRIs MUST NOT be placed into the document in the UTF-8 character encoding.

トランスポートのURIを持っていることが文書フォーマットは、虹彩の輸送を可能にするためにアップグレードします。全体として文書がネイティブ文字コードを有する場合には、アイリスはまた、この文字エンコーディングでエンコードしなければならなくて、パーサまたはインタプリタによって相応変換します。ネイティブの文字エンコーディングで表現できないIRIの文字は、そのような規則が使用可能な場合、文書形式のエスケープ規則を使用してエスケープする必要があります。あるいは、それらはセクション3.1に従ってパーセント符号化することができます。例えば、HTMLやXMLで、数値文字参照を使用する必要があります。全体として、文書がネイティブ文字エンコーディングを持ち、その文字エンコーディングがUTF-8でない場合、虹彩がUTF-8文字エンコーディングでの文書の中に入れてはなりません。

Note: Some formats already accommodate IRIs, although they use different terminology. HTML 4.0 [HTML4] defines the conversion from IRIs to URIs as error-avoiding behavior. XML 1.0 [XML1], XLink [XLink], XML Schema [XMLSchema], and specifications based upon them allow IRIs. Also, it is expected that all relevant new W3C formats and protocols will be required to handle IRIs [CharMod].

注意：一部の形式は、すでにアイリスを収容し、彼らは異なる用語を使用していますが。 HTML 4.0 [HTML4]はエラー回避行動としてのURI虹彩からの変換を定義します。 XML 1.0 [XML1]は、XLinkのは、[XLinkの]、それらに基づいてXMLスキーマ[XMLスキーマ]、および仕様は、アイリスを許可します。また、関連するすべての新しいW3Cフォーマットとプロトコル虹彩[CHARMOD]を処理するために必要とされることが予想されます。

6.4. Use of UTF-8 for Encoding Original Characters

6.4. オリジナルキャラクターをエンコードするためのUTF-8の使用

This section discusses details and gives examples for point c) in section 1.2. To be able to use IRIs, the URI corresponding to the IRI in question has to encode original characters into octets by using UTF-8. This can be specified for all URIs of a URI scheme or can apply to individual URIs for schemes that do not specify how to encode original characters. It can apply to the whole URI, or only to some part. For background information on encoding characters into URIs, see also section 2.5 of [RFC3986].

このセクションでは、詳細について説明し、セクション1.2に）点cの例を与えます。アイリスを使用できるようにするには、問題のIRIに対応するURIはUTF-8を使用してオクテットに、元の文字をエンコードする必要があります。これは、URIスキームのすべてのURIを指定することができたり、元の文字をエンコードする方法を指定していないスキームのために、個々のURIに適用することができます。それは全体のURIに、または一部だけに適用することができます。 URIの中に文字をコードに関する背景情報のために、また、[RFC3986]のセクション2.5を参照。

For new URI schemes, using UTF-8 is recommended in [RFC2718]. Examples where UTF-8 is already used are the URN syntax [RFC2141], IMAP URLs [RFC2192], and POP URLs [RFC2384]. On the other hand, because the HTTP URL scheme does not specify how to encode original characters, only some HTTP URLs can have corresponding but different IRIs.

新しいURIスキームの場合は、UTF-8を使用すると、[RFC2718]で推奨されています。 UTF-8は、すでに使用されている例は、URN構文[RFC2141]、IMAPのURL [RFC2192]、およびPOPのURL [RFC2384]です。 HTTP URLスキームは、元の文字をエンコードする方法を指定していないため、一方で、唯一のいくつかのHTTP URLは対応が異なるアイリスを持つことができます。

For example, for a document with a URI of "http://www.example.org/r%C3%A9sum%C3%A9.html", it is possible to construct a corresponding IRI (in XML notation, see, section 1.4): "http://www.example.org/résumé.html" ("&#xE9"; stands for the e-acute character, and "%C3%A9" is the UTF-8 encoded and percent-encoded representation of that character). On the other hand, for a document with a URI of

例えば、「http://www.example.org/r%C3%A9sum%C3%A9.html」のURIを持つ文書に対して、XML表記に対応するIRIを（構成することが可能であり、参照、セクション1.4）： "http://www.example.org/r&＃XE9;サム＆＃XE9; .htmlを"（ "＆＃XE9は";電子急性文字を表し、及び "％C3％A9は" UTFあります-8）エンコードされ、その文字の表現をパーセントでエンコードされました。一方、のURIを持つ文書について

"http://www.example.org/r%E9sum%E9.html", the percent-encoding octets cannot be converted to actual characters in an IRI, as the percent-encoding is not based on UTF-8.

パーセントエンコードがUTF-8に基づいていないとして「http://www.example.org/r%E9sum%E9.html」、パーセントエンコードオクテットは、IRIの実際の文字に変換することができません。

This means that for most URI schemes, there is no need to upgrade their scheme definition in order for them to work with IRIs. The main case where upgrading makes sense is when a scheme definition, or a particular component of a scheme, is strictly limited to the use of US-ASCII characters with no provision to include non-ASCII characters/octets via percent-encoding, or if a scheme definition currently uses highly scheme-specific provisions for the encoding of non-ASCII characters. An example of this is the mailto: scheme [RFC2368].

これは、ほとんどのURIスキームのために、彼らはアイリス動作するためには、そのスキームの定義をアップグレードする必要がないことを意味します。アップグレードは理にかなっている主なケースは、スキームの定義、またはスキームの特定のコンポーネントは、パーセントエンコーディングを介した非ASCII文字/オクテットを含めるなし規定にUS-ASCII文字の使用に厳しく制限されているときである場合、またはスキームの定義は、現在、非ASCII文字のエンコーディングのための非常にスキーム固有の規定を使用しています。スキーム[RFC2368]：この例でのmailtoあります。

This specification does not upgrade any scheme specifications in any way; this has to be done separately. Also, note that there is no such thing as an "IRI scheme"; all IRIs use URI schemes, and all URI schemes can be used with IRIs, even though in some cases only by using URIs directly as IRIs, without any conversion.

この仕様は、どのような方法で任意のスキーム仕様をアップグレードしません。これは、個別に行う必要があります。また、「IRIスキーム」などというものは存在しないことに注意してください。全て虹彩URIスキームを使用し、すべてのURIスキームは、さらにいくつかのケースでのみ変換されず、アイリスとして直接URIを使用することによっても、アイリスと共に使用することができます。

URI schemes can impose restrictions on the syntax of scheme-specific URIs; i.e., URIs that are admissible under the generic URI syntax [RFC3986] may not be admissible due to narrower syntactic constraints imposed by a URI scheme specification. URI scheme definitions cannot broaden the syntactic restrictions of the generic URI syntax; otherwise, it would be possible to generate URIs that satisfied the scheme-specific syntactic constraints without satisfying the syntactic constraints of the generic URI syntax. However, additional syntactic constraints imposed by URI scheme specifications are applicable to IRI, as the corresponding URI resulting from the mapping defined in section 3.1 MUST be a valid URI under the syntactic restrictions of generic URI syntax and any narrower restrictions imposed by the corresponding URI scheme specification.

URIスキームは、スキーム固有のURIの構文に制限を課すことができます。すなわち、一般的なURI構文[RFC3986]の下で許容されているURIは起因URIスキーム仕様により課される狭い構文上の制約に許容ではないかもしれません。 URIスキームの定義は、一般的なURI構文の構文の制限を広げることができません。それ以外の場合は、一般的なURI構文の構文上の制約を満たすことなく、スキーム固有の構文上の制約を満たしたURIを生成することが可能です。セクション3.1で定義されたマッピングから得られる対応するURIは、一般的なURI構文の構文の制限および対応するURIスキームによって課される任意狭く制限下で有効なURIでなければなりませんしかし、URIスキームの仕様により課される追加の構文上の制約は、IRIに適用可能です仕様。

The requirement for the use of UTF-8 applies to all parts of a URI (with the potential exception of the ireg-name part; see section 3.1). However, it is possible that the capability of IRIs to represent a wide range of characters directly is used just in some parts of the IRI (or IRI reference). The other parts of the IRI may only contain US-ASCII characters, or they may not be based on UTF-8. They may be based on another character encoding, or they may directly encode raw binary data (see also [RFC2397]).

UTF-8を使用するための要件は、URIのすべての部分に適用される（IREG-名部分の電位を除いて、セクション3.1を参照します）。しかし、虹彩の能力は直接だけIRI（又はIRI参照）の一部で使用されている文字の広い範囲を表すようにすることが可能です。 IRIの他の部分は、US-ASCII文字だけを含むことができ、またはそれらはUTF-8に基づいてすることはできません。それらは別の文字エンコーディングに基づいていてもよい、またはそれらが直接生のバイナリデータを符号化することができる（参照[RFC2397]）。

For example, it is possible to have a URI reference of "http://www.example.org/r%E9sum%E9.xml#r%C3%A9sum%C3%A9", where the document name is encoded in iso-8859-1 based on server settings, but where the fragment identifier is encoded in UTF-8 according to

例えば、文書名は、ISOでエンコードされた「http://www.example.org/r%E9sum%E9.xml#r%C3%A9sum%C3%A9」のURI参照を有することが可能です-8859-1サーバの設定に基づいて、しかし断片識別子はに応じてUTF-8で符号化されます

[XPointer]. The IRI corresponding to the above URI would be (in XML notation) "http://www.example.org/r%E9sum%E9.xml#résum&#xE9";.

【のXPointer]。 URI上記に対応するIRI "はhttp://www.example.org/r%E9sum%E9.xml#r&＃XE9;和＆＃XE9"（XML表記で）であろう;.

Similar considerations apply to query parts. The functionality of IRIs (namely, to be able to include non-ASCII characters) can only be used if the query part is encoded in UTF-8.

同様の考察が部品を照会するために適用されます。クエリ部分はUTF-8でエンコードされている場合虹彩の機能が（すなわち、非ASCII文字を含むことができるようにする）のみを使用することができます。

6.5. Relative IRI References

6.5. 相対IRI参照

Processing of relative IRI references against a base is handled straightforwardly; the algorithms of [RFC3986] can be applied directly, treating the characters additionally allowed in IRI references in the same way that unreserved characters are in URI references.

ベースに対する相対IRI参照の処理が直接的に処理されます。 [RFC3986]のアルゴリズムは、さらに、非予約文字はURI参照しているのと同じようにIRI参照に使用できる文字を治療、直接適用することができます。

7. URI/IRI Processing Guidelines (Informative)

7. URI / IRI処理ガイドライン（参考情報）

This informative section provides guidelines for supporting IRIs in the same software components and operations that currently process URIs: Software interfaces that handle URIs, software that allows users to enter URIs, software that creates or generates URIs, software that displays URIs, formats and protocols that transport URIs, and software that interprets URIs. These may all require modification before functioning properly with IRIs. The considerations in this section also apply to URI references and IRI references.

この有益なセクションでは、現在のURI処理同一のソフトウェアコンポーネントおよび動作にアイリスを支持するためのガイドラインを提供する：URIを扱うソフトウェアインタフェース、URIは、フォーマットとプロトコルを表示し、ユーザがURIを入力することを可能にするソフトウェア、作成またはURIを生成するソフトウェア、ソフトウェアをそのURIを解釈し、トランスポートのURI、およびソフトウェア。これらはすべて、アイリス適切に機能する前に修正が必要な場合があります。このセクションの配慮もURI参照とIRI参照に適用されます。

7.1. URI/IRI Software Interfaces

7.1. URI / IRIソフトウェアインタフェース

Software interfaces that handle URIs, such as URI-handling APIs and protocols transferring URIs, need interfaces and protocol elements that are designed to carry IRIs.

そのようなURIを転送URIハンドリングAPIおよびプロトコルなどのURIを扱うソフトウェアインタフェースは、インタフェースと虹彩を運ぶように設計されているプロトコル要素を必要とします。

In case the current handling in an API or protocol is based on US-ASCII, UTF-8 is recommended as the character encoding for IRIs, as it is compatible with US-ASCII, is in accordance with the recommendations of [RFC2277], and makes converting to URIs easy. In any case, the API or protocol definition must clearly define the character encoding to be used.

場合のAPIまたはプロトコルで現在処理がUS-ASCIIに基づいており、それはUS-ASCIIと互換性があるように、UTF-8は、アイリスの文字エンコーディングとして推奨されている、[RFC2277]の推奨に従っている、そして簡単なURIに変換します。いずれの場合も、APIやプロトコルの定義を明確に使用する文字エンコーディングを定義する必要があります。

The transfer from URI-only to IRI-capable components requires no mapping, although the conversion described in section 3.2 above may be performed. It is preferable not to perform this inverse conversion when there is a chance that this cannot be done correctly.

上記セクション3.2に記載の変換が実行されてもよいがIRI-可能な成分にURIのみからの転送は、何らのマッピングを必要としません。正しく行うことができない可能性がある場合には、この逆変換を実行しないことが好ましいです。

7.2. URI/IRI Entry

7.2. AT / Aエントリ

Some components allow users to enter URIs into the system by typing or dictation, for example. This software must be updated to allow for IRI entry.

一部のコンポーネントは、ユーザーが、たとえば、タイピングやディクテーションすることにより、システムへのURIを入力することができます。このソフトウェアは、IRIエントリーを可能にするために更新する必要があります。

A person viewing a visual representation of an IRI (as a sequence of glyphs, in some order, in some visual display) or hearing an IRI will use an entry method for characters in the user's language to input the IRI. Depending on the script and the input method used, this may be a more or less complicated process.

IRIの視覚的表現を見ている人（グリフのシーケンスとして、いくつかのために、いくつかの視覚表示）、またはIRI入力IRIに対するユーザの言語の文字の入力方法を使用する聴覚。スクリプトと使用する入力方法に応じて、これは、多かれ少なかれ複雑なプロセスであってもよいです。

The process of IRI entry must ensure, as much as possible, that the restrictions defined in section 2.2 are met. This may be done by choosing appropriate input methods or variants/settings thereof, by appropriately converting the characters being input, by eliminating characters that cannot be converted, and/or by issuing a warning or error message to the user.

IRIエントリーのプロセスは、セクション2.2で定義された制限が満たされていることを、可能な限り、確認する必要があります。これは、それらの適切な入力方法または変異体/設定を選択することにより、適切に入力された文字を変換し、変換できない文字を除去することによって、および/またはユーザに警告またはエラーメッセージを発行することによって行われてもよいです。

As an example of variant settings, input method editors for East Asian Languages usually allow the input of Latin letters and related characters in full-width or half-width versions. For IRI input, the input method editor should be set so that it produces half-width Latin letters and punctuation and full-width Katakana.

バリアントの設定の例として、東アジア言語の入力方式エディタは通常、全角またはハーフ幅のバージョンでは、ラテン文字と関連する文字の入力を可能にします。それは、半値幅ラテン文字と句読点と全角カタカナを生成するようにIRI入力の場合、入力メソッドエディタを設定する必要があります。

An input field primarily or solely used for the input of URIs/IRIs may allow the user to view an IRI as it is mapped to a URI. Places where the input of IRIs is frequent may provide the possibility for viewing an IRI as mapped to a URI. This will help users when some of the software they use does not yet accept IRIs.

主に又は専らのURI /虹彩の入力のために使用される入力フィールドは、それがURIにマッピングされているように、ユーザがIRIを閲覧することを可能にし得ます。虹彩の入力が頻繁にある場所はURIにマッピングされたIRIを表示するための可能性を提供することができます。彼らが使用するソフトウェアのいくつかは、まだ絞りを受け入れない場合にユーザーを支援します。

An IRI input component interfacing to components that handle URIs, but not IRIs, must map the IRI to a URI before passing it to these components.

IRI入力コンポーネントのIRI URIを処理するコンポーネントとのインタフェースではなくは、これらのコンポーネントに渡す前にURIにIRIをマップする必要があります。

For the input of IRIs with right-to-left characters, please see section 4.3.

右から左へ文字を虹彩の入力については、4.3節を参照してください。

7.3. URI/IRI Transfer between Applications

7.3. アプリケーション間のURI / IRI転送

Many applications, particularly mail user agents, try to detect URIs appearing in plain text. For this, they use some heuristics based on URI syntax. They then allow the user to click on such URIs and retrieve the corresponding resource in an appropriate (usually scheme-dependent) application.

多くのアプリケーション、特にメールユーザエージェント、プレーンテキストで表示されるURIを検出してみてください。このために、彼らは、URIの構文に基づいていくつかのヒューリスティックを使用します。そして、彼らは、ユーザーがそのようなURIの上でクリックして、適切な（通常の方式に依存）アプリケーションで対応するリソースを取得することができます。

Such applications have to be upgraded to use the IRI syntax as a base for heuristics. In particular, a non-ASCII character should not be taken as the indication of the end of an IRI. Such applications also have to make sure that they correctly convert the detected IRI from the character encoding of the document or application where the IRI appears to the character encoding used by the system-wide IRI invocation mechanism, or to a URI (according to section 3.1) if the system-wide invocation mechanism only accepts URIs.

このようなアプリケーションは、ヒューリスティックのベースとしてIRIの構文を使用するようにアップグレードする必要があります。具体的には、非ASCII文字は、IRIの終わりを示すものとして解釈されるべきではありません。このようなアプリケーションは、セクション3.1によると（彼らは正しくシステム全体のIRIの呼び出しメカニズムによって使用される文字エンコーディングへ、またはURIにIRIが表示された文書やアプリケーションの文字エンコーディングから検出されたIRIを変換することを確認する必要があります）システム全体の起動機構は、URIを受け入れる場合。

The clipboard is another frequently used way to transfer URIs and IRIs from one application to another. On most platforms, the clipboard is able to store and transfer text in many languages and scripts. Correctly used, the clipboard transfers characters, not bytes, which will do the right thing with IRIs.

クリップボードには、別のアプリケーションからURIとアイリスを転送するために、別の頻繁に使用される方法です。ほとんどのプラットフォームでは、クリップボードには、多くの言語やスクリプトでテキストを格納して転送することができます。アイリス正しいことを行います正しく使用、クリップボード文字を転送し、バイトではなく、。

7.4. URI/IRI Generation

７。４。うり／いりげねらちおん

Systems that offer resources through the Internet, where those resources have logical names, sometimes automatically generate URIs for the resources they offer. For example, some HTTP servers can generate a directory listing for a file directory and then respond to the generated URIs with the files.

これらのリソースは、論理名を持っているインターネットを介したリソースを提供するシステムは、時々自動的に彼らが提供するリソースのURIを生成します。例えば、いくつかのHTTPサーバは、ファイルディレクトリのためのディレクトリのリストを生成することができますし、ファイルと生成されたURIに対応しています。

Many legacy character encodings are in use in various file systems. Many currently deployed systems do not transform the local character representation of the underlying system before generating URIs.

多くのレガシー文字エンコーディングは、さまざまなファイル・システムで使用されています。多くの現在展開システムでは、URIを生成する前に、基盤となるシステムのローカル文字表現を変換しません。

For maximum interoperability, systems that generate resource identifiers should make the appropriate transformations. For example, if a file system contains a file named "résumé.html", a server should expose this as "r%C3%A9sum%C3%A9.html" in a URI, which allows use of "résumé.html" in an IRI, even if locally the file name is kept in a character encoding other than UTF-8.

最大の相互運用性のため、リソース識別子を生成するシステムは、適切な変換を行う必要があります。たとえば、ファイルシステムが名前のファイルが含まれている場合、 "R＆＃XE9を、サム＆＃XE9; .htmlを"、サーバが使用することができますURI、中の "r％C3％A9sum％C3％A9.html" としてこれを公開する必要があります「;サム＆＃XE9; R＆＃XE9 .htmlを」のIRIで、ファイル名がUTF-8以外の文字エンコーディングに保管されていても、ローカル場合。

This recommendation particularly applies to HTTP servers. For FTP servers, similar considerations apply; see [RFC2640].

この勧告は、特にHTTPサーバに適用されます。 FTPサーバの場合、同様の考慮事項が適用されます。 [RFC2640]を参照してください。

7.5. URI/IRI Selection

7.5. URI / IRIの選択

In some cases, resource owners and publishers have control over the IRIs used to identify their resources. This control is mostly executed by controlling the resource names, such as file names, directly.

いくつかのケースでは、リソースの所有者と出版社がそのリソースを識別するために使用されるのIRIを制御することができます。この制御は主に直接、ファイル名などのリソース名を制御することによって実行されます。

In these cases, it is recommended to avoid choosing IRIs that are easily confused. For example, for US-ASCII, the lower-case ell ("l") is easily confused with the digit one ("1"), and the upper-case oh ("O") is easily confused with the digit zero ("0"). Publishers should avoid confusing users with "br0ken" or "1ame" identifiers.

これらのケースでは、簡単に混乱しているアイリスを選ぶ避けることをお勧めします。例えば、US-ASCIIのために、小文字のエル（「L」）は、数字1（「1」）、および大文字OH（「O」）と容易に混同される（数字ゼロと容易に混同されます"0"）。出版社は「br0ken」または「1ame」識別子を持つユーザーの混乱を避ける必要があります。

Outside the US-ASCII repertoire, there are many more opportunities for confusion; a complete set of guidelines is too lengthy to include here. As long as names are limited to characters from a single script, native writers of a given script or language will know best when ambiguities can appear, and how they can be avoided. What may look ambiguous to a stranger may be completely obvious to the average native user. On the other hand, in some cases, the UCS contains variants for compatibility reasons; for example, for typographic purposes. These should be avoided wherever possible. Although there may be exceptions, newly created resource names should generally be in NFKC [UTR15] (which means that they are also in NFC).

US-ASCIIレパートリー以外では、混乱のためのより多くの機会があります。ガイドラインの完全なセットは、ここに含めるにはあまりにも長いです。限り名前が単一のスクリプトから文字に制限されているとして、指定したスクリプトまたは言語のネイティブの作家は曖昧さが現れることができたときに最高の知っている、そしてそれらがどのように回避することができます。何見知らぬ人に曖昧に見えるかもしれは平均ネイティブユーザーに対して完全に明白であります。一方、いくつかのケースでは、UCSは互換性の理由のためのバリアントが含まれています。例えば、活版印刷の目的のために。これらは、可能な限り避けるべきです。例外もありますが、新しく作成されたリソース名は、一般的に（彼らはNFCにもあることを意味します）NFKC [UTR15]にする必要があります。

As an example, the UCS contains the "fi" ligature at U+FB01 for compatibility reasons. Wherever possible, IRIs should use the two letters "f" and "i" rather than the "fi" ligature. An example where the latter may be used is in the query part of an IRI for an explicit search for a word written containing the "fi" ligature.

一例として、UCSは、互換性の理由から、U + FB01に「Fiが」結紮を含んでいます。可能な限り、アイリス2つの文字「F」と「私」ではなく「Fiの」合字を使用する必要があります。後者を使用することができる例は、「Fiの」リガチャーを含む書かれた言葉の明示的な検索のためのIRIのクエリ一部です。

In certain cases, there is a chance that characters from different scripts look the same. The best known example is the similarity of the Latin "A", the Greek "Alpha", and the Cyrillic "A". To avoid such cases, only IRIs should be created where all the characters in a single component are used together in a given language. This usually means that all of these characters will be from the same script, but there are languages that mix characters from different scripts (such as Japanese). This is similar to the heuristics used to distinguish between letters and numbers in the examples above. Also, for Latin, Greek, and Cyrillic, using lowercase letters results in fewer ambiguities than using uppercase letters would.

特定の例では、異なるスクリプトからの文字が同じように見える可能性があります。最もよく知られた例は、ラテン語「A」、ギリシャ語の「アルファ」、およびキリル文字「A」の類似性です。単一コンポーネントのすべての文字が与えられた言語で一緒に使用されている場合、このようなケースを避けるために、唯一の虹彩が作成する必要があります。これは通常、これらのすべての文字が同じスクリプトからなることを意味するが、異なるスクリプトから（日本語など）の文字を混在言語があります。これは、上記の実施例において文字と数字を区別するために使用されるヒューリスティックに類似しています。また、大文字を使用してしまうよりも少ない曖昧に小文字の結果を用いて、ラテン語、ギリシャ語、およびキリルため。

7.6. Display of URIs/IRIs

7.6. URI /虹彩の表示

In situations where the rendering software is not expected to display non-ASCII parts of the IRI correctly using the available layout and font resources, these parts should be percent-encoded before being displayed.

レンダリングソフトウェアが正しく利用でき、レイアウトやフォントのリソースを使用してIRIの非ASCII部分を表示することが期待されていない状況では、これらの部品が表示される前にパーセントエンコードしなければなりません。

For display of Bidi IRIs, please see section 4.1.

双方向虹彩の表示については、4.1節を参照してください。

7.7. Interpretation of URIs and IRIs

7.7. URIと虹彩の解釈

Software that interprets IRIs as the names of local resources should accept IRIs in multiple forms and convert and match them with the appropriate local resource names.

ローカルリソースの名前としてアイリスを解釈するソフトウェアは、複数のフォームでアイリスを受け入れて、変換し、適切なローカルリソース名でそれらを一致させる必要があります。

First, multiple representations include both IRIs in the native character encoding of the protocol and also their URI counterparts.

まず、複数の表現は、プロトコルのネイティブ文字エンコーディングに虹彩と、それらのURIの対応の両方を含みます。

Second, it may include URIs constructed based on character encodings other than UTF-8. These URIs may be produced by user agents that do not conform to this specification and that use legacy character encodings to convert non-ASCII characters to URIs. Whether this is necessary, and what character encodings to cover, depends on a number of factors, such as the legacy character encodings used locally and the distribution of various versions of user agents. For example, software for Japanese may accept URIs in Shift_JIS and/or EUC-JP in addition to UTF-8.

第二に、それはUTF-8以外の文字エンコーディングに基づいて構築されたURIを含むことができます。これらのURIは、URIに非ASCII文字を変換するために、この仕様に準拠していないユーザエージェントとその使用のレガシー文字エンコーディングによって製造することができます。これは必要であり、どのような文字エンコーディングカバーするかどうか、そのような局所的に使用される従来の文字エンコーディング及びユーザエージェントのさまざまなバージョンの分布などの要因の数に依存します。例えば、日本人のためのソフトウェアは、UTF-8に加えて、シフトJISおよび/またはEUC-JPでURIを受け入れることができます。

Third, it may include additional mappings to be more user-friendly and robust against transmission errors. These would be similar to how some servers currently treat URIs as case insensitive or perform additional matching to account for spelling errors. For characters beyond the US-ASCII repertoire, this may, for example, include ignoring the accents on received IRIs or resource names. Please note that such mappings, including case mappings, are language dependent.

第三に、それは、伝送エラーに対して、よりユーザーフレンドリーかつ堅牢に追加のマッピングを含むことができます。これらは、いくつかのサーバが現在大文字と小文字を区別としてURIを扱うかのスペルミスを考慮するために、追加のマッチングを実行する方法と同様です。 US-ASCIIレパートリーを越えた文字については、これは、例えば、受信のIRIやリソース名にアクセントを無視して含むことができます。ケースのマッピングを含む、このようなマッピングは、言語に依存していることに注意してください。

It can be difficult to identify a resource unambiguously if too many mappings are taken into consideration. However, percent-encoded and not percent-encoded parts of IRIs can always be clearly distinguished. Also, the regularity of UTF-8 (see [Duerst97]) makes the potential for collisions lower than it may seem at first.

あまりにも多くのマッピングを考慮した場合明確リソースを識別することが困難な場合があります。しかし、虹彩のパーセントエンコードされないパーセントエンコード部分は、常に明確に区別することができます。また、UTF-8（[Duerst97]参照）の規則性は、それが最初に見えるかもしれませんよりも、衝突の可能性が低くなります。

7.8. Upgrading Strategy

7.8. アップグレード戦略

Where this recommendation places further constraints on software for which many instances are already deployed, it is important to introduce upgrades carefully and to be aware of the various interdependencies.

この勧告は、多くのインスタンスがすでに展開されているソフトウェアにさらに制約を課す場合には、慎重にアップグレードを導入し、様々な相互依存性を認識することが重要です。

If IRIs cannot be interpreted correctly, they should not be created, generated, or transported. This suggests that upgrading URI interpreting software to accept IRIs should have highest priority.

アイリス正しく解釈することができない場合、彼らは、作成、生成、または輸送してはなりません。これは、アイリスを受け入れるようにURI解釈ソフトウェアをアップグレードすると、最も高い優先度を持つべきであることを示唆しています。

On the other hand, a single IRI is interpreted only by a single or very few interpreters that are known in advance, although it may be entered and transported very widely.

それが入力され、非常に広範囲に搬送されてもよい一方で、単一のIRIは、唯一の事前に知られている単一又は非常に少数のインタプリタによって解釈されます。

Therefore, IRIs benefit most from a broad upgrade of software to be able to enter and transport IRIs. However, before an individual IRI is published, care should be taken to upgrade the corresponding interpreting software in order to cover the forms expected to be received by various versions of entry and transport software.

したがって、アイリスはアイリスを入力して輸送できるようにするソフトウェアの幅広いアップグレードから最も恩恵を受ける。個々のIRIが公開される前に、注意がエントリーし、トランスポートソフトウェアのさまざまなバージョンによって受信されることが期待されるフォームをカバーするために、対応する通訳ソフトウェアをアップグレードするために取られるべきです。

The upgrade of generating software to generate IRIs instead of using a local character encoding should happen only after the service is upgraded to accept IRIs. Similarly, IRIs should only be generated when the service accepts IRIs and the intervening infrastructure and protocol is known to transport them safely.

アイリスを生成するソフトウェアを生成する代わりに、ローカルの文字エンコーディングを使用してのアップグレードは、サービスがアイリスを受け入れるようにアップグレードされた後にのみ起こるべき。同様に、アイリスは唯一のサービスは、虹彩と介在インフラを受け入れ、プロトコルが安全に輸送することが知られているときに生成されなければなりません。

Software converting from URIs to IRIs for display should be upgraded only after upgraded entry software has been widely deployed to the population that will see the displayed result.

表示するアイリスへのURIからの変換ソフトウェアがアップグレードされたエントリのソフトウェアが広く表示された結果が表示されます人口に配備された後にのみアップグレードする必要があります。

Where there is a free choice of character encodings, it is often possible to reduce the effort and dependencies for upgrading to IRIs by using UTF-8 rather than another encoding. For example, when a new file-based Web server is set up, using UTF-8 as the character encoding for file names will make the transition to IRIs easier. Likewise, when a new Web form is set up using UTF-8 as the character encoding of the form page, the returned query URIs will use UTF-8 as the character encoding (unless the user, for whatever reason, changes the character encoding) and will therefore be compatible with IRIs.

文字エンコーディングの自由な選択がある場合、UTF-8ではなく、別のエンコーディングを使用することにより、アイリスアップグレードするための努力と依存関係を削減することが可能であることが多いです。新しいファイルベースのWebサーバーが設定されている場合たとえば、ファイル名の文字エンコーディングとしてUTF-8を使用すると、アイリスの移行が容易になります。同様に、新しいWebフォームは、フォームページの文字エンコーディングとしてUTF-8を使用して設定されている場合、返されたクエリのURIは文字エンコーディングとしてUTF-8を使用します（何らかの理由で、ユーザーがない限り、文字エンコーディングを変更します）そのためのIRIと互換性があります。

These recommendations, when taken together, will allow for the extension from URIs to IRIs in order to handle characters other than US-ASCII while minimizing interoperability problems. For considerations regarding the upgrade of URI scheme definitions, see section 6.4.

これらの推奨事項は、一緒になって、相互運用性の問題を最小限に抑えながら、US-ASCII以外の文字を処理するために、アイリスのURIからの拡張が可能になります。 URIスキームの定義のアップグレードに関する考慮事項については、6.4節を参照してください。

8. Security Considerations

8.セキュリティの考慮事項

The security considerations discussed in [RFC3986] also apply to IRIs. In addition, the following issues require particular care for IRIs.

[RFC3986]で説明されているセキュリティの考慮事項は、虹彩に適用されます。また、以下の問題アイリスのため、特に注意が必要です。

Incorrect encoding or decoding can lead to security problems. In particular, some UTF-8 decoders do not check against overlong byte sequences. As an example, a "/" is encoded with the byte 0x2F both in UTF-8 and in US-ASCII, but some UTF-8 decoders also wrongly interpret the sequence 0xC0 0xAF as a "/". A sequence such as

不正な符号化または復号化には、セキュリティ上の問題につながることができます。具体的には、いくつかのUTF-8デコーダはすぎるバイト配列に対してチェックしません。例として、「/」UTF-8にし、US-ASCIIの両方でバイト0x2Fでエンコードされていますが、いくつかのUTF-8デコーダはまた、誤って「/」としてシーケンス0xC0の0xAFを解釈します。このようなシーケンス

"%C0%AF.." may pass some security tests and then be interpreted as "/.." in a path if UTF-8 decoders are fault-tolerant, if conversion and checking are not done in the right order, and/or if reserved characters and unreserved characters are not clearly distinguished.

「％のC0％AF ..」/いくつかのセキュリティテストに合格し、次いでとして解釈され、「/ ..」パスに変換し、検査が適切な順序で行われていない場合はUTF-8デコーダは、フォールトトレラントである場合、及びまたは予約文字と非予約文字は明確に区別されていない場合。

There are various ways in which "spoofing" can occur with IRIs. "Spoofing" means that somebody may add a resource name that looks the same or similar to the user, but that points to a different resource. The added resource may pretend to be the real resource by looking very similar but may contain all kinds of changes that may be difficult to spot and that can cause all kinds of problems. Most spoofing possibilities for IRIs are extensions of those for URIs.

「なりすまし」はアイリス発生する可能性のある様々な方法があります。「なりすまし」誰かが同じか、ユーザーに類似に見えるリソース名を追加してもよいことを意味するが、それは別のリソースを指しています。追加のリソースは非常に似て見ることで、実際のリソースをふりをするかもしれませんが、見つけることは難しいかもしれない変化のすべての種類を含むことができ、それはあらゆる種類の問題を引き起こす可能性があります。 IRIはのためのほとんどのなりすましの可能性は、URIのこれらの拡張です。

Spoofing can occur for various reasons. First, a user's normalization expectations or actual normalization when entering an IRI or transcoding an IRI from a legacy character encoding do not match the normalization used on the server side. Conceptually, this is no different from the problems surrounding the use of case-insensitive web servers. For example, a popular web page with a mixed-case name ("http://big.example.com/PopularPage.html") might be "spoofed" by someone who is able to create "http://big.example.com/popularpage.html". However, the use of unnormalized character sequences, and of additional mappings for user convenience, may increase the chance for spoofing. Protocols and servers that allow the creation of resources with names that are not normalized are particularly vulnerable to such attacks. This is an inherent security problem of the relevant protocol, server, or resource and is not specific to IRIs, but it is mentioned here for completeness.

なりすましは、さまざまな理由で発生する可能性があります。まず、ユーザーの正規化の期待や実際の正規IRIを入力するか、従来の文字エンコーディングからIRIをトランスコードサーバ側で使用する正規化と一致していません。概念的には、これは大文字と小文字を区別しないWebサーバの使用を取り巻く問題と違いはありません。例えば、混在ケース名（「http://big.example.com/PopularPage.html」）で人気のあるWebページには、「HTTPを作成することができます誰かによって「詐称」されることがあります。//big.example .COM / popularpage.html」。しかし、利用者の利便性のための非正規化文字列の、および追加のマッピングを使用するには、なりすましの可能性を増大させることができます。正規化されていない名前を持つリソースの作成を許可するプロトコルとサーバーは、このような攻撃に対して特に脆弱です。これは、関連するプロトコル、サーバー、またはリソースの固有のセキュリティ上の問題であり、アイリスに固有ではありませんが、完全を期すために、ここで言及されています。

Spoofing can occur in various IRI components, such as the domain name part or a path part. For considerations specific to the domain name part, see [RFC3491]. For the path part, administrators of sites that allow independent users to create resources in the same sub area may have to be careful to check for spoofing.

スプーフィングは、ドメイン名部分又は経路部分のような種々のIRIコンポーネントで起こり得ます。ドメイン名の一部に固有の考慮事項については、[RFC3491]を参照してください。パスの部分については、独立したユーザが同じサブエリア内のリソースを作成できるようにサイトの管理者は、スプーフィングをチェックするために注意する必要があります。

Spoofing can occur because in the UCS many characters look very similar. Details are discussed in Section 7.5. Again, this is very similar to spoofing possibilities on US-ASCII, e.g., using "br0ken" or "1ame" URIs.

UCSに多くの文字が非常によく似ているのでなりすましが発生する可能性があります。詳細は7.5節で議論されています。繰り返しますが、これは「br0ken」または「1ame」URIを使用して、例えば、US-ASCIIのなりすましの可能性に非常によく似ています。

Spoofing can occur when URIs with percent-encodings based on various character encodings are accepted to deal with older user agents. In some cases, particularly for Latin-based resource names, this is usually easy to detect because UTF-8-encoded names, when interpreted and viewed as legacy character encodings, produce mostly garbage.

様々な文字エンコーディングに基づいて、パーセントエンコーディングとURIが古いユーザーエージェントに対処するために受け入れられたときになりすましが発生する可能性があります。いくつかのケースでは、特にラテンベースのリソース名のために、これは主にゴミを生産、レガシー文字エンコーディングとして解釈され、閲覧UTF-8でエンコードされた名前、ために検出することは通常は簡単です。

When concurrently used character encodings have a similar structure but there are no characters that have exactly the same encoding, detection is more difficult.

同時に使用される文字エンコーディングは、同様の構造を持っていますが、まったく同じエンコーディングを持つ文字がない場合は、検出がより困難です。

Spoofing can occur with bidirectional IRIs, if the restrictions in section 4.2 are not followed. The same visual representation may be interpreted as different logical representations, and vice versa. It is also very important that a correct Unicode bidirectional implementation be used.

セクション4.2の制約に従わない場合スプーフィングは、双方向アイリス起こり得ます。同じ視覚的表現は、異なる論理的表現、及びその逆として解釈することができます。正しいUnicodeの双方向の実装が使用されることも非常に重要です。

9. Acknowledgements

9.謝辞

We would like to thank Larry Masinter for his work as coauthor of many earlier versions of this document (draft-masinter-url-i18n-xx).

私たちは、この文書の多くの以前のバージョン（ドラフト-masinter-URL-I18N-XX）の共著者として彼の仕事のためにラリーMasinterに感謝したいと思います。

The discussion on the issue addressed here started a long time ago. There was a thread in the HTML working group in August 1995 (under the topic of "Globalizing URIs") and in the www-international mailing list in July 1996 (under the topic of "Internationalization and URLs"), and there were ad-hoc meetings at the Unicode conferences in September 1995 and September 1997.

長い時間前に開始した問題についての議論はここで扱わ。そこHTMLワーキンググループ内のスレッドは、（「グローバル化のURI」のトピックの下）1995年8月にあったとWWW-国際メーリングリストで1996年7月に（「国際化およびURL」のトピックの下）、およびAD-がありました1995年9月と1997年9月のUnicode会議でアドホック会議。

Many thanks go to Francois Yergeau, Matitiahu Allouche, Roy Fielding, Tim Berners-Lee, Mark Davis, M.T. Carrasco Benitez, James Clark, Tim Bray, Chris Wendt, Yaron Goland, Andrea Vine, Misha Wolf, Leslie Daigle, Ted Hardie, Bill Fenner, Margaret Wasserman, Russ Housley, Makoto MURATA, Steven Atkin, Ryan Stansifer, Tex Texin, Graham Klyne, Bjoern Hoehrmann, Chris Lilley, Ian Jacobs, Adam Costello, Dan Oscarson, Elliotte Rusty Harold, Mike J. Brown, Roy Badami, Jonathan Rosenne, Asmus Freytag, Simon Josefsson, Carlos Viegas Damasio, Chris Haynes, Walter Underwood, and many others for help with understanding the issues and possible solutions, and with getting the details right.

多くのおかげではフランソワYergeau、Matitiahu Allouche、ロイ・フィールディング、ティム・バーナーズ=リー、マーク・デイビス、M.T.に行きますカラスコベニテス、James Clark氏、ティム・ブレイ、クリス・ウェント、ヤロンGoland、アンドレア・バイン、ミーシャ・ウルフ、レスリーDaigle氏、テッドハーディー、ビルフェナー、マーガレットワッサーマン、ラスHousley、村田真、スティーブン・アトキン、ライアンStansifer、テックステキシン、グラハムKlyne 、ビョルンHoehrmann、クリス・リレイ、イアン・ジェイコブス、アダム・コステロ、ダンOscarson、この記事ではElliotte Rusty Harold著、マイク・J.ブラウン、ロイBadami、ジョナサンRosenne、Asmusフライターク、サイモンJosefsson氏、カルロスViegas Damasio、クリス・ヘインズ、ウォルター・アンダーウッド、および他の多く問題と可能な解決策を理解を助けるため、詳細は右の取得と。

This document is a product of the Internationalization Working Group (I18N WG) of the World Wide Web Consortium (W3C). Thanks to the members of the W3C I18N Working Group and Interest Group for their contributions and their work on [CharMod]. Thanks also go to the members of many other W3C Working Groups for adopting IRIs, and to the members of the Montreal IAB Workshop on Internationalization and Localization for their review.

この文書は、World Wide Web Consortium（W3C）の国際化ワーキンググループ（I18N WG）の製品です。彼らの貢献と[CHARMOD]上の自分の仕事のためのW3C I18N作業部会及びインタレスト・グループのメンバーに感謝します。おかげでも絞りを採用するために、他の多くのW3Cワーキンググループのメンバーに移動し、その審査のための国際化とローカライズに関するモントリオールIABワークショップのメンバーに。

10. References

10.参考文献

10.1. Normative References

10.1. 引用規格

[ASCII] American National Standards Institute, "Coded Character Set -- 7-bit American Standard Code for Information Interchange", ANSI X3.4, 1986.

「 - 情報交換のための7ビットの米国標準コードコード化文字セット」、ANSI X3.4、1986 [ASCII]米国規格協会、。

[ISO10646] International Organization for Standardization, "ISO/IEC 10646:2003: Information Technology - Universal Multiple-Octet Coded Character Set (UCS)", ISO Standard 10646, December 2003.

[ISO10646]国際標準化機構、 "ISO / IEC 10646：2003：情報技術 - ユニバーサルマルチオクテット符号化文字集合（UCS）"、ISO規格10646、2003年12月。

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC2119]ブラドナーの、S.、 "要件レベルを示すためにRFCsにおける使用のためのキーワード"、BCP 14、RFC 2119、1997年3月。

[RFC2234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997.

[RFC2234]クロッカー、D.、およびP. Overell、 "構文仕様のための増大しているBNF：ABNF"、RFC 2234、1997年11月。

[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003.

[RFC3490] Faltstrom、P.、ホフマン、P.、およびA.コステロ、 "アプリケーションにおける国際化ドメイン名（IDNA）"、RFC 3490、2003年3月。

[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003.

[RFC3491]ホフマン、P.とM.ブランシェ、 "NAMEPREP：国際化ドメイン名のためのstringprepプロフィール（IDN）"、RFC 3491、2003年3月。

[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003.

[RFC3629] Yergeau、F.、 "UTF-8、ISO 10646の変換フォーマット"、STD 63、RFC 3629、2003年11月。

[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005.

[RFC3986]バーナーズ - リー、T.、フィールディング、R.、およびL. Masinter、 "ユニフォームリソース識別子（URI）：汎用構文"、STD 66、RFC 3986、2005年1月。

[UNI9] Davis, M., "The Bidirectional Algorithm", Unicode Standard Annex #9, March 2004, <http://www.unicode.org/reports/tr9/tr9-13.html>.

[UNI9]デイビス、M.、 "双方向アルゴリズム"、Unicode規格附属書＃9、2004年3月、<http://www.unicode.org/reports/tr9/tr9-13.html>。

[UNIV4] The Unicode Consortium, "The Unicode Standard, Version 4.0.1, defined by: The Unicode Standard, Version 4.0 (Reading, MA, Addison-Wesley, 2003. ISBN 0-321-18578-1), as amended by Unicode 4.0.1 (http://www.unicode.org/versions/Unicode4.0.1/)", March 2004.

【UNIV4]ユニコードコンソーシアムは、「Unicode標準、バージョン4.0.1は、によって定義される：により修正されUnicode規格、バージョン4.0は、（マサチューセッツ州、アディソン・ウェズリー、2003 ISBN 0-321-18578-1を読みます）ユニコード4.0.1（http://www.unicode.org/versions/Unicode4.0.1/）」、2004年3月。

[UTR15] Davis, M. and M. Duerst, "Unicode Normalization Forms", Unicode Standard Annex #15, April 2003, <http://www.unicode.org/unicode/reports/ tr15/tr15-23.html>.

【UTR15]デイビス、M.およびM. Duerst、 "Unicode正規化フォームの" Unicode標準附属書＃15、2003年4月、<TR15 http://www.unicode.org/unicode/reports/ / tr15-23.html> 、

10.2. Informative References

10.2. 参考文献

[BidiEx] "Examples of bidirectional IRIs", <http://www.w3.org/International/iri-edit/ BidiExamples>.

【BidiEx "双方向虹彩の例"、<http://www.w3.org/International/iri-edit/ BidiExamples>。

[CharMod] Duerst, M., Yergeau, F., Ishida, R., Wolf, M., and T. Texin, "Character Model for the World Wide Web: Resource Identifiers", World Wide Web Consortium Candidate Recommendation, November 2004, <http://www.w3.org/TR/charmod-resid>.

[CHARMOD] Duerst、M.、Yergeau、F.、石田、R.、ウルフ、M.、およびT.テキシン、 "ワールド・ワイド・ウェブのためのキャラクタモデル：資源識別子"、World Wide Web Consortium（W3C）の勧告候補、2004年11月、<http://www.w3.org/TR/charmod-resid>。

[Duerst97] Duerst, M., "The Properties and Promises of UTF-8", Proc. 11th International Unicode Conference, San Jose , September 1997, <http://www.ifi.unizh.ch/mml/mduerst/papers/ PDF/IUC11-UTF-8.pdf>.

【Duerst97] Duerst、M.、 "プロパティとUTF-8の約束"、PROC。第11回国際会議のUnicode、サンノゼ、1997年9月、<http://www.ifi.unizh.ch/mml/mduerst/papers/ PDF / IUC11-UTF-8.pdf>。

[Gettys] Gettys, J., "URI Model Consequences", <http://www.w3.org/DesignIssues/ModelConsequences>.

【ゲティス]ゲティス、J.、 "URIモデル帰結"、<http://www.w3.org/DesignIssues/ModelConsequences>。

[HTML4] Raggett, D., Le Hors, A., and I. Jacobs, "HTML 4.01 Specification", World Wide Web Consortium Recommendation, December 1999, <http://www.w3.org/TR/html401/appendix/ notes.html#h-B.2>.

[HTML4] Raggett、D.、ル・オードブル、A.、およびI.ジェイコブス、 "HTML 4.01仕様書"、World Wide Web Consortium（W3C）の勧告、1999年12月、<http://www.w3.org/TR/html401/appendix / notes.html＃hB.2>。

[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996.

[RFC2045]解放され、N.とN. Borenstein、 "マルチパーパスインターネットメールエクステンション（MIME）第一部：インターネットメッセージ本体のフォーマット"、RFC 2045、1996年11月。

[RFC2130] Weider, C., Preston, C., Simonsen, K., Alvestrand, H., Atkinson, R., Crispin, M., and P. Svanberg, "The Report of the IAB Character Set Workshop held 29 February - 1 March, 1996", RFC 2130, April 1997.

[RFC2130]ウイダー、C.、プレストン、C.、シモンセン、K.、Alvestrand、H.、アトキンソン、R.、クリスピン、M.、およびP. Svanberg、「ワークショップセットIAB文字の報告は2月29日開催しました - 1996" 年3月1日、RFC 2130、1997年4月。

[RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997.

[RFC2141]堀、R.、 "URN構文"、RFC 2141、1997年5月。

[RFC2192] Newman, C., "IMAP URL Scheme", RFC 2192, September 1997.

[RFC2192]ニューマン、C.、 "IMAP URLスキーム"、RFC 2192、1997年9月。

[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and Languages", BCP 18, RFC 2277, January 1998.

[RFC2277] Alvestrand、H.、 "文字セットと言語のIETF方針"、BCP 18、RFC 2277、1998年1月。

[RFC2368] Hoffman, P., Masinter, L., and J. Zawinski, "The mailto URL scheme", RFC 2368, July 1998.

[RFC2368]ホフマン、P.、Masinter、L.、およびJ. Zawinski、 "mailtoのURLスキーム"、RFC 2368、1998年7月。

[RFC2384] Gellens, R., "POP URL Scheme", RFC 2384, August 1998.

[RFC2384] Gellens、R.、 "POPのURLスキーム"、RFC 2384、1998年8月。

[RFC2396] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax", RFC 2396, August 1998.

[RFC2396]バーナーズ=リー、T.、フィールディング、R.、およびL. Masinter、 "統一資源識別子（URI）：一般的な構文"、RFC 2396、1998年8月。

[RFC2397] Masinter, L., "The "data" URL scheme", RFC 2397, August 1998.

[RFC2397] Masinter、L.、 " "データ" URLスキーム"、RFC 2397、1998年8月。

[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

[RFC2616]フィールディング、R.、ゲティス、J.、モーグル、J.、Frystyk、H.、Masinter、L.、リーチ、P.、およびT.バーナーズ - リー、 "ハイパーテキスト転送プロトコル - HTTP / 1.1" 、RFC 2616、1999年6月。

[RFC2640] Curtin, B., "Internationalization of the File Transfer Protocol", RFC 2640, July 1999.

[RFC2640]カーティン、B.、 "ファイル転送プロトコルの国際化"、RFC 2640、1999年7月。

[RFC2718] Masinter, L., Alvestrand, H., Zigmond, D., and R. Petke, "Guidelines for new URL Schemes", RFC 2718, November 1999.

[RFC2718] Masinter、L.、Alvestrand、H.、Zigmond、D.、およびR. Petke、 "新しいURLスキームのためのガイドライン"、RFC 2718、1999年11月。

[UNIXML] Duerst, M. and A. Freytag, "Unicode in XML and other Markup Languages", Unicode Technical Report #20, World Wide Web Consortium Note, June 2003, <http://www.w3.org/TR/unicode-xml/>.

[UNIXML] Duerst、M.とA.フライターク、 "XMLおよびその他のマークアップ言語でのUnicode"、Unicodeのテクニカルレポート＃20、World Wide Web Consortium（W3C）のノート、2003年6月、<http://www.w3.org/TR/ユニコード-XML />。

[XLink] DeRose, S., Maler, E., and D. Orchard, "XML Linking Language (XLink) Version 1.0", World Wide Web Consortium Recommendation, June 2001, <http://www.w3.org/TR/xlink/#link-locators>.

[XLinkの] DeRose、S.、MALER、E.、およびD.オーチャード、 "XML言語（XLinkの）バージョン1.0のリンク"、World Wide Web Consortium（W3C）の勧告、2001年6月、<http://www.w3.org/TR / XLINK /＃リンク・ロケーター>。

[XML1] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E., and F. Yergeau, "Extensible Markup Language (XML) 1.0 (Third Edition)", World Wide Web Consortium Recommendation, February 2004, <http://www.w3.org/TR/REC-xml#sec-external-ent>.

[XML1]ブレイ、T.、パオリ、J.、Sperberg-マックィーン、C.、MALER、E.、およびF. Yergeau、 "拡張マークアップ言語（XML）1.0（第3版）"、World Wide Web Consortium（W3C）の勧告2004年2月、<http://www.w3.org/TR/REC-xml#sec-external-ent>。

[XMLNamespace] Bray, T., Hollander, D., and A. Layman, "Namespaces in XML", World Wide Web Consortium Recommendation, January 1999, <http://www.w3.org/TR/REC-xml-names>.

[XMLNamespace]ブレイ、T.、オランダ、D.、およびA.素人、 "XMLで名前空間"、World Wide Web Consortium（W3C）の勧告、1999年1月、<http://www.w3.org/TR/REC-xml-名前>。

[XMLSchema] Biron, P. and A. Malhotra, "XML Schema Part 2: Datatypes", World Wide Web Consortium Recommendation, May 2001, <http://www.w3.org/TR/xmlschema-2/#anyURI>.

[XMLスキーマ]ビロン、P.およびA.マルホトラ、 "XMLスキーマパート2：データ型"、World Wide Web Consortium（W3C）の勧告、2001年5月、<http://www.w3.org/TR/xmlschema-2/#anyURI> 。

[XPointer] Grosso, P., Maler, E., Marsh, J. and N. Walsh, "XPointer Framework", World Wide Web Consortium Recommendation, March 2003, <http://www.w3.org/TR/xptr-framework/#escaping>.

[XPointerの]グロッソ、P.、MALER、E.、マーシュ、J.とN.ウォルシュ、 "XPointerのフレームワーク"、World Wide Web Consortium（W3C）の勧告、2003年3月、<http://www.w3.org/TR/xptr -framework /＃エスケープ>。

Appendix A. Design Alternatives

付録A.デザイン代替

This section shortly summarizes major design alternatives and the reasons for why they were not chosen.

このセクションでは、まもなく主要な設計案とそれらが選ばれなかった理由の理由をまとめたもの。

Appendix A.1. New Scheme(s)

付録A.1。新スキーム（S）

Introducing new schemes (for example, httpi:, ftpi:,...) or a new metascheme (e.g., i:, leading to URI/IRI prefixes such as i:http:, i:ftp:,...) was proposed to make IRI-to-URI conversion scheme dependent or to distinguish between percent-encodings resulting from IRI-to-URI conversion and percent-encodings from legacy character encodings.

新しい制度を導入する（例えば、httpi :, FTPI：、...）されたか、新しいmetascheme（、...ます：http：、I：FTP例えば、私は、私のようにURI / IRIの接頭辞につながる:,します） IRI対URI変換方式に依存するために、または従来の文字エンコーディングからIRI対URI変換及びパーセントエンコーディングから生じるパーセントエンコーディングを区別することを提案しました。

New schemes are not needed to distinguish URIs from true IRIs (i.e., IRIs that contain non-ASCII characters). The benefit of being able to detect the origin of percent-encodings is marginal, as UTF-8 can be detected with very high reliability. Deploying new schemes is extremely hard, so not requiring new schemes for IRIs makes deployment of IRIs vastly easier. Making conversion scheme dependent is highly inadvisable and would be encouraged by separate schemes for IRIs. Using a uniform convention for conversion from IRIs to URIs makes IRI implementation orthogonal to the introduction of actual new schemes.

新制度は、真のIRI（非ASCII文字が含まれている、すなわち、アイリス）からURIを区別する必要はありません。 UTF-8は、非常に高い信頼性で検出することができるようにパーセントエンコーディングの原点を検出することができるという利点は、わずかです。新しいスキームを展開することは極めて困難であるので、IRIをするための新しいスキームを必要としないことは虹彩の展開が大幅に容易になります。変換方式は、依存作ることは非常にお勧めできません、アイリスのための別のスキームによって奨励されるだろう。 IRIはからのURIに変換するための統一規則を使用すると、実際の新制度の導入にIRIの実装が直交することができます。

Appendix A.2. Character Encodings Other Than UTF-8

付録A.2。 UTF-8以外の文字エンコーディング

At an early stage, UTF-7 was considered as an alternative to UTF-8 when IRIs are converted to URIs. UTF-7 would not have needed percent-encoding and in most cases would have been shorter than percent-encoded UTF-8.

アイリスURIに変換されたときに初期の段階では、UTF-7は、UTF-8に代わるものとして考えられました。 UTF-7は、パーセントエンコーディングを必要としなかったであろうと、ほとんどの場合、パーセントエンコードUTF-8よりも短くされていると思います。

Using UTF-8 avoids a double layering and overloading of the use of the "+" character. UTF-8 is fully compatible with US-ASCII and has therefore been recommended by the IETF, and is being used widely.

UTF-8を使用すると、「+」の文字の使用のダブルレイヤーとオーバーロードを回避できます。 UTF-8は、US-ASCIIと完全に互換性があり、したがって、IETFによって推奨されている、と広く使用されています。

UTF-7 has never been used much and is now clearly being discouraged. Requiring implementations to convert from UTF-8 to UTF-7 and back would be an additional implementation burden.

UTF-7はあまり使用されていないと、今はっきりと落胆しています。 UTF-7とバックにUTF-8から変換するために、実装を要求することは、追加の実装の負担になります。

Appendix A.3. New Encoding Convention

付録A.3。新しいエンコーディングコンベンション

Instead of using the existing percent-encoding convention of URIs, which is based on octets, the idea was to create a new encoding convention; for example, to use "%u" to introduce UCS code points.

代わりに、オクテットに基づいているのURI、既存のパーセントエンコーディング規則を使用しての、アイデアは、新しいエンコーディング規則を作成することでした。例えば、UCSコードポイントを紹介する「％uを」を使用します。

Using the existing octet-based percent-encoding mechanism does not need an upgrade of the URI syntax and does not need corresponding server upgrades.

既存のオクテットベースのパーセントエンコーディングメカニズムを使用すると、URI構文のアップグレードを必要とせず、対応するサーバーのアップグレードを必要としません。

Appendix A.4. Indicating Character Encodings in the URI/IRI

付録A.4。 URI / IRIで文字エンコーディングを示します

Some proposals suggested indicating the character encodings used in an URI or IRI with some new syntactic convention in the URI itself, similar to the "charset" parameter for e-mails and Web pages. As an example, the label in square brackets in "http://www.example.org/ros[iso-8859-1]&#xE9"; indicated that the following "&#xE9"; had to be interpreted as iso-8859-1.

いくつかの提案は、電子メールやWebページの「文字セット」のパラメータに似たURI自体にいくつかの新しい構文規則、とURIまたはIRIに使用される文字エンコーディングを示す示唆しました。一例として、「http://www.example.org/ros[iso-8859-1]&#xE9」で角括弧内のラベル。以下の「＆＃XE9」ことが示されました。 ISO-8859-1として解釈されなければなりませんでした。

If UTF-8 is used exclusively, an upgrade to the URI syntax is not needed. It avoids potentially multiple labels that have to be copied correctly in all cases, even on the side of a bus or on a napkin, leading to usability problems (and being prohibitively annoying). Exclusively using UTF-8 also reduces transcoding errors and confusion.

UTF-8が独占的に使用されている場合は、URIの構文へのアップグレードは必要ありません。これは、ユーザビリティ上の問題につながる、でもバスの側面やナプキンに、すべての場合に正しくコピーする必要が潜在的に複数のラベルを回避（と法外迷惑です）。もっぱらUTF-8を使用した場合も、トランスコーディング・エラーと混乱を軽減します。

Authors' Addresses

著者のアドレス

Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever possible, for example as "Dürst" in XML and HTML.) World Wide Web Consortium 5322 Endo Fujisawa, Kanagawa 252-8520 Japan

マーティンDuerst（注：として、たとえば、可能な限りのuウムラウトで「Duerst」を入力してください「D＆＃252; RST」XMLとHTMLで）、ワールド・ワイド・ウェブ・コンソーシアム5322遠藤藤沢市、神奈川県252から8520日本

Phone: +81 466 49 1170 Fax: +81 466 49 1171 EMail: duerst@w3.org URI: http://www.w3.org/People/D%C3%BCrst/ (Note: This is the percent-encoded form of an IRI.)

電話：+81 466 49 1170ファックス：+81 466 49 1171 Eメール：duerst@w3.org URI：http://www.w3.org/People/D%C3%BCrst/（注：これは、パーセントエンコードされIRIの形。）

Michel Suignard Microsoft Corporation One Microsoft Way Redmond, WA 98052 U.S.A.

ミシェルSuignardマイクロソフト社1マイクロソフト道、レッドモンド、ワシントン98052 U.S.A.

Phone: +1 425 882-8080 EMail: michelsu@microsoft.com URI: http://www.suignard.com

電話：+1 425 882-8080 Eメール：michelsu@microsoft.com URI：http://www.suignard.com

Full Copyright Statement

完全な著作権声明

著作権（C）インターネット協会（2005）。

This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.

この文書では、BCP 78に含まれる権利と許可と制限の適用を受けており、その中の記載を除いて、作者は彼らのすべての権利を保有します。

This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

この文書とここに含まれている情報は、基礎とCONTRIBUTOR「そのまま」、ORGANIZATION HE / SHEが表すまたはインターネットソサエティおよびインターネット・エンジニアリング・タスク・フォース放棄すべての保証、明示または、（もしあれば）後援ISに設けられています。黙示、情報の利用は、特定の目的に対する権利または商品性または適合性の黙示の保証を侵害しない任意の保証含むがこれらに限定されません。

Intellectual Property

知的財産

The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the IETF's procedures with respect to rights in IETF Documents can be found in BCP 78 and BCP 79.

IETFは、本書またはそのような権限下で、ライセンスがたりないかもしれない程度に記載された技術の実装や使用に関係すると主張される可能性があります任意の知的財産権やその他の権利の有効性または範囲に関していかなる位置を取りません利用可能です。またそれは、それがどのような権利を確認する独自の取り組みを行ったことを示すものでもありません。 IETF文書の権利に関するIETFの手続きの情報は、BCP 78およびBCP 79に記載されています。

Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.

IPRの開示のコピーが利用できるようにIETF事務局とライセンスの保証に行われた、または本仕様の実装者または利用者がそのような所有権の使用のための一般的なライセンスまたは許可を取得するために作られた試みの結果を得ることができますhttp://www.ietf.org/iprのIETFのオンラインIPRリポジトリから。

The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.

IETFは、その注意にこの標準を実装するために必要とされる技術をカバーすることができる任意の著作権、特許または特許出願、またはその他の所有権を持ってすべての利害関係者を招待します。 ietf-ipr@ietf.orgのIETFに情報を記述してください。

Acknowledgement

謝辞

Funding for the RFC Editor function is currently provided by the Internet Society.

RFC Editor機能のための基金は現在、インターネット協会によって提供されます。

RFC 3987 - Internationalized Resource Identifiers (IRIs) 日本語訳

1. Introduction

1. はじめに

1.1. Overview and Motivation

1.1. 概要と動機

1.2. Applicability

1.2. 適用性

1.3. Definitions

1.3. 定義

1.4. Notation

1.4. 表記法

2. IRI Syntax

2. IRI構文

2.1. Summary of IRI Syntax

2.1. IRI構文の概要

2.2. ABNF for IRI References and IRIs

2.2. IRI参照と虹彩のためのABNF

3. Relationship between IRIs and URIs

虹彩とURIの間の3の関係

3.1. Mapping of IRIs to URIs

3.1. URIに虹彩のマッピング

3.2. Converting URIs to IRIs

3.2. アイリスURIを変換します

1. Represent the URI as a sequence of octets in US-ASCII.

1. US-ASCIIのオクテットのシーケンスとしてURIを表します。

3.2.1. Examples

3.2.1. 例

1. http://www.example.org/D%C3%BCrst

１。 ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％Ｃ３％ＢＣｒｓｔ

2. http://www.example.org/D<c3><bc>rst

２。 ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ＜ｃ３＞＜ｂｃ＞ｒｓｔ

3. http://www.example.org/D<c3><bc>rst

３。 ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ＜ｃ３＞＜ｂｃ＞ｒｓｔ

4. http://www.example.org/D<c3><bc>rst

４。 ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ＜ｃ３＞＜ｂｃ＞ｒｓｔ

5. http://www.example.org/D&#xFC;rst

5. http://www.example.org/D& #xFC、RST

1. http://www.example.org/D%FCrst

１。 ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％ＦＣｒｓｔ

2. http://www.example.org/D<fc>rst

２。 ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ＜ｆｃ＞ｒｓｔ

3. http://www.example.org/D%FCrst

３。 ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％ＦＣｒｓｔ

4. http://www.example.org/D%FCrst

４。 ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％ＦＣｒｓｔ

5. http://www.example.org/D%FCrst

５。 ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％ＦＣｒｓｔ

1. http://xn--99zt52a.example.org/%e2%80%ae

１。 ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／％え２％８０％あえ

2. http://xn--99zt52a.example.org/<e2><80><ae>

２。 ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／＜え２＞＜８０＞＜あえ＞

3. http://xn--99zt52a.example.org/<e2><80><ae>

３。 ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／＜え２＞＜８０＞＜あえ＞

4. http://xn--99zt52a.example.org/%E2%80%AE

４。 ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／％え２％８０％あえ

5. http://xn--99zt52a.example.org/%E2%80%AE

５。 ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／％え２％８０％あえ

4. Bidirectional IRIs for Right-to-Left Languages

右から左の言語4.双方向のIRI

3. minor or no changes or restrictions for implementations.

3.未成年や実装の変更なしまたは制限。

4.1. Logical Storage and Visual Presentation

4.1. 論理ストレージおよびVisualプレゼンテーション

4.2. Bidi IRI Structure

4.2. 双方向IRIの構造

4.3. Input of Bidi IRIs

4.3. 双方向虹彩の入力

4.4. Examples

4.4. 例

5. Normalization and Comparison

5.正規化との比較

5.1. Equivalence

5.1. 等価

5.2. Preparation for Comparison

5.2. 比較のための準備

5.3. Comparison Ladder

5.3. 比較ラダー

5.3.1. Simple String Comparison

5.3.1. 単純な文字列比較

5.3.2. Syntax-Based Normalization

１。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％Ｃ３％ＢＣｒｓｔ

２。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ＜ｃ３＞＜ｂｃ＞ｒｓｔ

３。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ＜ｃ３＞＜ｂｃ＞ｒｓｔ

４。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ＜ｃ３＞＜ｂｃ＞ｒｓｔ

5. http://www.example.org/Dürst

１。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％ＦＣｒｓｔ

２。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ＜ｆｃ＞ｒｓｔ

３。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％ＦＣｒｓｔ

４。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％ＦＣｒｓｔ

５。ｈっｔｐ：／／ｗっｗ。えぁｍｐぇ。おｒｇ／Ｄ％ＦＣｒｓｔ

１。ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／％え２％８０％あえ

２。ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／＜え２＞＜８０＞＜あえ＞

３。ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／＜え２＞＜８０＞＜あえ＞

４。ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／％え２％８０％あえ

５。ｈっｔｐ：／／んーー９９ｚｔ５２あ。えぁｍｐぇ。おｒｇ／％え２％８０％あえ

７。４。うり／いりげねらちおん