RFC 4060 - RTP Payload Formats for European Telecommunications Standards Institute (ETSI) European Standard ES 202 050, ES 202 211, and ES 202 212 Distributed Speech Recognition Encoding 日本語訳

URL : https://tools.ietf.org/html/rfc4060
タイトル : RFC 4060 - 欧州電気通信標準化機構（ETSI）欧州規格ES 202 050、ES 202 211、およびES 202 212分散型音声認識のエンコーディングのためのRTPペイロードフォーマット
翻訳編集 : 自動生成

Network Working Group                                             Q. Xie
Request for Comments: 4060                                     D. Pearce
Category: Standards Track                                       Motorola
                                                                May 2005

          RTP Payload Formats for European Telecommunications
              Standards Institute (ETSI) European Standard
                 ES 202 050, ES 202 211, and ES 202 212
                Distributed Speech Recognition Encoding

Status of This Memo

このメモのステータス

This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.

この文書は、インターネットコミュニティのためのインターネット標準トラックプロトコルを指定し、改善のための議論と提案を要求します。このプロトコルの標準化状態と状態への「インターネット公式プロトコル標準」（STD 1）の最新版を参照してください。このメモの配布は無制限です。

著作権表示

著作権（C）インターネット協会（2005）。

Abstract

抽象

This document specifies RTP payload formats for encapsulating European Telecommunications Standards Institute (ETSI) European Standard ES 202 050 DSR Advanced Front-end (AFE), ES 202 211 DSR Extended Front-end (XFE), and ES 202 212 DSR Extended Advanced Front-end (XAFE) signal processing feature streams for distributed speech recognition (DSR) systems.

この文書では、欧州電気通信標準化機構（ETSI）欧州規格ES 202 050 DSR高度なフロントエンド（AFE）、ES 202 211 DSR拡張フロントエンド（XFE）、およびES 202 212 DSR拡張高度なフロントをカプセル化するためのRTPペイロードフォーマットを指定します端（XAFE）信号処理機能は、分散型音声認識（DSR）システムのためのストリーム。

Table of Contents

   1. Introduction ....................................................2
      1.1. Conventions and Acronyms ...................................3
   2. ETSI DSR Front-end Codecs .......................................4
      2.1. ES 202 050 Advanced DSR Front-end Codec ....................4
      2.2. ES 202 211 Extended DSR Front-end Codec ....................4
      2.3. ES 202 212 Extended Advanced DSR Front-end Codec ...........5
   3. DSR RTP Payload Formats .........................................6
      3.1. Common Considerations of the Three DSR RTP Payload
           Formats ....................................................6
           3.1.1. Number of FPs in Each RTP Packet ....................6
           3.1.2. Support for Discontinuous Transmission ..............6
           3.1.3. RTP Header Usage ....................................6
      3.2. Payload Format for ES 202 050 DSR ..........................7
           3.2.1. Frame Pair Formats ..................................7
      3.3. Payload Format for ES 202 211 DSR ..........................9
           3.3.1. Frame Pair Formats ..................................9
      3.4. Payload Format for ES 202 212 DSR .........................11
           3.4.1. Frame Pair Formats .................................12
   4. IANA Considerations ............................................14
      4.1. Mapping MIME Parameters into SDP ..........................15
      4.2. Usage in Offer/Answer .....................................16
      4.3. Congestion Control ........................................16
   5. Security Considerations ........................................16
   6. Acknowledgments ................................................16
   7. References .....................................................16
      7.1. Normative References ......................................16
      7.2. Informative References ....................................17

1. Introduction

1. はじめに

Distributed speech recognition (DSR) technology is intended for a remote device acting as a thin client (a.k.a. the front-end) to communicate with a speech recognition server (a.k.a. a speech engine), over a network connection to obtain speech recognition services. More details on DSR over Internet can be found in RFC 3557 [10].

分散型音声認識（DSR）技術は、音声認識サービスを取得するためにネットワーク接続を介して、音声認識サーバ（別称、音声合成エンジン）と通信する（フロントエンド別称）シンクライアントとして動作するリモートデバイスを対象としています。インターネット上でDSRについての詳細は、RFC 3557 [10]に記載されています。

To achieve interoperability with different client devices and speech engines, the first ETSI standard DSR front-end ES 201 108 was published in early 2000 [11]. An RTP packetization for ES 201 108 frames is defined in RFC 3557 [10] by IETF.

異なるクライアントデバイスと音声エンジンとの相互運用性を実現するために、最初のETSI標準DSRフロントエンドES 201 108は2000年の初め、[11]に掲載されました。 ES 201 108フレームのためのRTPパケットは、IETFによってRFC 3557 [10]で定義されています。

In ES 202 050 [1], ETSI issues another standard for an Advanced DSR front-end that provides substantially improved recognition performance when background noise is present. The codecs in ES 202

ES 202 050 [1]、ETSIは、背景雑音が存在する場合、実質的に改善された認識性能を提供する高度なDSRフロントエンドのための別の標準を発行します。 ES 202でコーデック

050 use a slightly different frame format from that of ES 201 108 and thus the two do not inter-operate with each other.

050 ES 201 108のものとは若干異なるフレームフォーマットを使用し、したがって、2つは、互いに相互動作しません。

The RTP packetization for ES 202 050 front-end defined in this document uses the same RTP packet format layout as that defined in RFC 3557 [10]. The differences are in the DSR codec frame bit definition and the payload type MIME registration.

この文書で定義されたES 202 050フロントエンドのためのRTPパケットは、RFC 3557 [10]で定義されたものと同じRTPパケットフォーマットのレイアウトを使用します。違いはDSRコーデックフレームビット定義及びペイロードタイプMIME登録しています。

The two further standards, ES 202 211 and ES 202 212, provide extensions to each of the DSR front-end standards. The extensions allow the speech waveform to be reconstructed for human audition and can also be used to improve recognition performance for tonal languages. This is done by sending additional pitch and voicing information for each frame along with the recognition features.

さらに2つの規格、ES 202 211およびES 202 212は、DSRフロントエンド規格のそれぞれに拡張機能を提供します。拡張子は、音声波形は、人間の聴覚のために再構成するとも声調言語のための認識性能を向上させるために使用することができますことができます。これは、追加のピッチを送信し、認識機能と共にフレーム毎に情報を発声することによって行われます。

The RTP packet format for these extended standards is also defined in this document.

これらの拡張規格のRTPパケットフォーマットも、この文書で定義されています。

It is worthwhile to note that the performance of most speech recognizers are extremely sensitive to consecutive frame losses and DSR speech recognizers are no exception. If a DSR over RTP session is expected to endure high packet loss ratio between the front-end and the speech engine, one should consider limiting the maximum number of DSR frames allowed in a packet, or employing other loss management techniques, such as FEC or interleaving, to minimize the chance of losing consecutive frames.

ほとんどの音声認識の性能は、連続したフレーム損失に非常に敏感であり、DSR音声認識も例外ではありませんことに注意することは価値があります。 RTPセッション上DSRフロントエンドおよび音声エンジンとの間の高いパケット損失率に耐えられることが予想される場合、一つのパケットに許容DSRフレームの最大数を制限し、又はそのようなFECのような他の損失管理技術を採用するか、検討する必要がありますインターリーブは、連続したフレームを失う可能性を最小限にします。

1.1. Conventions and Acronyms

1.1. 規則および略語

The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in RFC 2119 [4].

キーワードは、REQUIREDは、、、、、推奨しません、MAYを推奨、オプション、彼らは、この文書に表示されたときに、RFC 2119で説明されているように解釈されるすべきでないないものとものとしてはなりませんしなければならない[4]。

The following acronyms are used in this document:

以下の頭字語は、本書で使用されています。

DSR - Distributed Speech Recognition ETSI - the European Telecommunications Standards Institute FP - Frame Pair DTX - Discontinuous Transmission VAD - Voice Activity Detection

DSR - 分散型音声認識ETSI - 欧州電気通信標準化協会FP - 不連続伝送VAD - - 音声アクティビティ検出ペアDTXフレーム

2. ETSI DSR Front-end Codecs

2. ETSI DSRフロントエンドのコーデック

Some relevant characteristics of ES 202 050 Advanced, ES 202 211 Extended, and ES 202 212 Extended Advanced DSR front-end codecs are summarized below.

ES 202 050高度な、拡張ES 202 211、およびES 202の関連するいくつかの特徴は、212アドバンスドDSRフロントエンドのコーデックが以下に要約されている拡張します。

2.1. ES 202 050 Advanced DSR Front-end Codec

2.1. ES 202 050高度なDSRフロントエンドコーデック

The front-end calculation is a frame-based scheme that produces an output vector every 10 ms. In the front-end feature extraction, noise reduction by two stages of Wiener filtering is performed first. Then, waveform processing is applied to the de-noised signal and mel-cepstral features are calculated. At the end, blind equalization is applied to the cepstral features. The front-end algorithm produces at its output a mel-cepstral representation in the same format as ES 210 108, i.e., 12 cepstral coefficients [C1 - C12], C0 and log Energy. Voice activity detection (VAD) for the classification of each frame as speech or non-speech is also implemented in Feature Extraction. The VAD information is included in the payload format for each frame pair to be sent to the remote recognition engine as part of the payload. This information may optionally be used by the receiving recognition engine to drop non-speech frames. The front-end supports three raw sampling rates: 8 kHz, 11 kHz, and 16 kHz (Note that unlike some other speech codecs, the feature frame size of DSR presented to RTP packetization is not dependent on the number of speech samples used in each 10 ms sample frame. This will become more evident in the following sections).

フロントエンド演算は、出力ベクトルを10ms毎に生成するフレームベースの方式です。フロントエンド特徴抽出では、ウィーナーフィルタリングの2段のノイズ低減が最初に実行されます。次に、波形処理は、雑音除去信号に適用されるとメルケプストラム特徴が計算されます。最後には、ブラインド等化はケプストラム特徴に適用されます。フロントエンド・アルゴリズムは、ES 210 108、すなわち、12のケプストラム係数として、その出力において同じ形式のメルケプストラム表現を生成[C1 - C12]、C0とエネルギーをログ。スピーチまたは非スピーチとして、各フレームの分類のための音声アクティビティ検出（VAD）は、特徴抽出に実装されています。 VAD情報は、ペイロードの一部としてリモート認識エンジンに送信される各フレームペアのペイロードフォーマットに含まれています。この情報は、必要に応じて非音声フレームをドロップする受信認識エンジンによって使用されてもよいです。 8キロヘルツ、11キロヘルツ、16キロヘルツ（いくつかの他の音声コーデックとは異なり、RTPパケット化に提示DSRの特徴フレームのサイズがそれぞれに使用される音声サンプルの数に依存しないことに注意してください。フロントエンドは、三の生サンプリングレートをサポート10ミリ秒のサンプルフレーム。これは）次のセクションでより明らかになるであろう。

After calculation of the mel-cepstral representation, the representation is first quantized via split-vector quantization to reduce the data rate of the encoded stream. Then, the quantized vectors from two consecutive frames are put into a FP, as described in more detail in Section 3.2.

メルケプストラム表現の計算後、表示は、第1の符号化ストリームのデータレートを低減するために分割ベクトル量子化を介して量子化されます。 3.2節でより詳細に説明するように続いて、2つの連続するフレームからの量子化ベクトルは、FPに入れています。

2.2. ES 202 211 Extended DSR Front-end Codec

2.2. ES 202 211拡張DSRフロントエンドコーデック

Some relevant characteristics of ES 202 211 Extended DSR front-end codec are summarized below.

ES 202 211拡張DSRフロントエンドのコーデックのいくつかの関連する特性を以下にまとめます。

ES 202 211 is an extension of the mel-cepstrum DSR Front-end standard ES 201 108 [11]. The mel-cepstrum front-end provides the features for speech recognition but these are not available for human listening. The purpose of the extension is allow the reconstruction of the speech waveform from these features so that they can be replayed. The front-end feature extraction part of the processing is exactly the same as for ES 201 108. To allow speech reconstruction additional fundamental frequency (perceived as pitch) and voicing class (e.g., non-speech, voiced, unvoiced and mixed) information is needed. This extra information is provided by the extended front-end processing algorithms at the device side. It is compressed and transmitted along with the front-end features to the server. This extra information may also be useful for improved speech recognition performance with tonal languages such as Mandarin, Cantonese and Thai.

ES 202 211は、メルケプストラムDSRフロントエンドの拡張規格ES 201 108 [11]です。メルケプストラムフロントエンドは、音声認識のための機能を提供しますが、これらは人間のリスニングでは使用できません。延長の目的は、彼らが再生できるように、これらの特徴から音声波形の再構築が可能です。処理のフロントエンド特徴抽出部は（ピッチとして知覚される）音声再生の追加基本周波数および発声クラス可能にするためにES 201 108の場合と全く同じである（例えば、非音声を、無声および混合有声音）の情報であります必要に応じて。この追加情報は、装置側の拡張フロントエンド処理アルゴリズムによって提供されます。これは、圧縮され、サーバーへのフロントエンド機能と一緒に送信されます。この追加情報はまた、北京語、広東語やタイ語などの声調言語と改善された音声認識性能のために有用である可能性があります。

Full information about the client side signal processing algorithms used in the standard are described in the specification ES 202 211 [2].

標準的に使用されるクライアント側信号処理アルゴリズムについての完全な情報は、本明細書に記載されているES 202 211 [2]。

The additional fundamental frequency and voicing class information is compressed for each frame pair. The pitch for the first frame of the FP is quantized to 7 bits and the second frame is differentially quantized to 7 bits. The voicing class is indicated with one bit for each frame. The total for the extension information for a frame pair therefore consists of 14 bits plus an additional 2 bits of CRC error protection computed over these extension bits only.

追加の基本周波数および発声クラス情報は、各フレーム対に対して圧縮されます。 FPの最初のフレームのピッチは7ビットに量子化され、第二フレームは、7ビットの示差量子化されます。発声クラスは、フレーム毎に1ビットで示されています。フレームペアの拡張情報の合計は、したがって、14ビットとのみこれらの拡張ビットにわたって計算されたCRCエラー保護の追加の2ビットからなります。

The total information for the frame pair is made up of 92 bits for the two compressed front-end feature frames (including 4 bits for their CRC) plus 16 bits for the extension (including 2 bits for their CRC) and 4 bits of null padding to give a total of 14 octets per frame pair. As for ES 201 208 the extended frame pair also corresponds to 20ms of speech. The extended front-end supports three raw sampling rates: 8 kHz, 11 kHz, and 16 kHz.

フレームペアの合計情報（それらのCRCのための4ビットを含む）は、2つの圧縮されたフロントエンド機能フレームの92ビット+（それらのCRCのための2ビットを含む）拡張のため16ビットとヌルパディングの4ビットで構成されていますフレーム対あたり14オクテットの総数を得ました。 ES 201 208用として拡張フレーム対はまた、音声の20ミリ秒に相当します。 8キロヘルツ、11キロヘルツ、16キロヘルツ：拡張フロントエンドは、三の生サンプリングレートをサポートします。

The quantized vectors from two consecutive frames are put into an FP, as described in more detail in Section 3.3 below.

以下のセクション3.3で詳細に説明するように2つの連続するフレームからの量子化されたベクターは、FPに入れられます。

The parameters received at the remote server from the RTP extended DSR payload specified here can be used to synthesize an intelligible speech waveform for replay. The algorithms to do this are described in the specification ES 202 211 [2].

RTPからリモートサーバで受信されたパラメータは、ここで指定されたDSRペイロードをリプレイするための分かりやすい音声波形を合成するために使用することができる拡張しました。これを行うためのアルゴリズムは、本明細書に記載されているES 202 211 [2]。

2.3. ES 202 212 Extended Advanced DSR Front-end Codec

2.3. ES 202 212拡張拡張DSRフロントエンドコーデック

ES 202 212 is the extension for the DSR Advanced Front-end ES 202 050 [1]. It provides the same capabilities as the extended mel-cepstrum front-end described in Section 2.2 but for the DSR Advanced Front-end.

ES 202 212は、ES 202 DSR高度なフロントエンドのための拡張である050 [1]。これは、2.2節で説明した拡張メルケプストラムのフロントエンドとしてではなくDSR高度なフロントエンドのために同じ機能を提供します。

3. DSR RTP Payload Formats

3. DSR RTPペイロードフォーマット

3.1. Common Considerations of the Three DSR RTP Payload Formats

3.1. 三のDSR RTPペイロードフォーマットの共通の注意事項

The three DSR RTP payload formats defined in this document share the following consideration or behaviours.

この文書で定義された3つのDSR RTPペイロードフォーマットは、以下の考察や行動を共有します。

3.1.1. Number of FPs in Each RTP Packet

3.1.1. 各RTPパケット内のFP数

Any number of FPs MAY be aggregate together in an RTP payload and they MUST be consecutive in time. However, one SHOULD always keep the RTP payload size smaller than the MTU in order to avoid IP fragmentation and SHOULD follow the recommendations given in Section 3.1 in RFC 3557 [10] when determining the proper number of FPs in an RTP payload.

FPを任意の数のRTPペイロードに一緒に集約することができ、彼らは時間に連続している必要があります。しかし、一つは常にIP断片化を避けるために、MTUより小さいRTPペイロードサイズを保つべきであり、RFC 3557でセクション3.1で与えられた勧告に従うべきである[10]場合RTPペイロード内のFPの適切な数を決定します。

3.1.2. Support for Discontinuous Transmission

3.1.2. 不連続送信のサポート

Same considerations described in Section 3.2 of RFC 3557 [10] apply to all the three DSR RTP payloads defined in this document.

RFC 3557のセクション3.2で説明したのと同じ考慮事項[10]は、この文書で定義されたすべての3つのDSR RTPペイロードに適用されます。

3.1.3. RTP Header Usage

3.1.3. RTPヘッダーの使用

The format of the RTP header is specified in RFC 3550 [8]. The three payload formats defined here use the fields of the header in a manner consistent with that specification.

RTPヘッダのフォーマットは、RFC 3550で指定されている[8]。ここで定義された三のペイロードフォーマットは、その仕様と一致する方法で、ヘッダのフィールドを使用します。

The RTP timestamp corresponds to the sampling instant of the first sample encoded for the first FP in the packet. The timestamp clock frequency is the same as the sampling frequency, so the timestamp unit is in samples.

RTPタイムスタンプは、パケットの最初のFPのために符号化された第1のサンプルのサンプリング時点に対応します。タイムスタンプユニットがサンプルにあるように、タイムスタンプのクロック周波数は、サンプリング周波数と同じです。

As defined by all three front-end codecs, the duration of one FP is 20 ms, corresponding to 160, 220, or 320 encoded samples with a sampling rate of 8, 11, or 16 kHz being used at the front-end, respectively. Thus, the timestamp is increased by 160, 220, or 320 for each consecutive FP, respectively.

すべての3つのフロントエンドコーデックによって定義されたように、一FPの持続時間は、それぞれ、8、11、または16 kHzのフロントエンドで使用されるのサンプリングレートで160、220、または320に符号化されたサンプルに対応する、20ミリ秒であります。従って、タイムスタンプは、それぞれ、各連続FP 160、220、または320だけ増加されます。

The DSR payload for all three front-end codecs is always an integral number of octets. If additional padding is required for some other purpose, then the P bit in the RTP header may be set and padding appended as specified in RFC 3550 [8].

すべての3つのフロントエンドコーデックのDSRペイロードは常にオクテットの整数倍です。追加のパディングが他の目的のために必要とされる場合には、RTPヘッダ内のPビットがセットされてもよく、RFC 3550で指定されるようにパディング添付[8]。

The RTP header marker bit (M) MUST be set following the general rules for audio codecs, as defined in Section 4.1 in RFC 3551 [9].

RFC 3551でセクション4.1で定義されるようにRTPヘッダのマーカービット（M）は、オーディオコーデックのための一般的なルールに従って設定されなければならない[9]。

This document does not specify the assignment of an RTP payload type for these three new packet formats. It is expected that the RTP profile under which any of these payload formats is being used will assign a payload type for this encoding or will specify that the payload type is to be bound dynamically.

この文書では、これら3つの新しいパケットフォーマットのためのRTPペイロードタイプの割り当てを指定していません。これらのペイロード・フォーマットのいずれかが使用されている下RTPプロファイルが、この符号化のためのペイロードタイプを割り当てるか、ペイロードタイプを動的に結合されるように指定することが期待されます。

3.2. Payload Format for ES 202 050 DSR

3.2. ES 202 050 DSRのためのペイロードフォーマット

An ES 202 050 DSR RTP payload datagram uses exactly the same layout as defined in Section 3 of RFC 3557 [10], i.e., a standard RTP header followed by a DSR payload containing a series of DSR FPs.

RFC 3557のセクション3で定義されるようにES 202 050 DSR RTPペイロードデータグラムは、全く同じレイアウトを使用して[10]、すなわち、標準のRTPヘッダーはDSRのFPの一連を含むDSRペイロードが続きます。

The size of each ES 202 050 FP remains 96 bits or 12 octets, as defined in the following sections. This ensures that a DSR RTP payload will always end on an octet boundary.

以下のセクションで定義されるように各ES 202 050 FPのサイズは、96ビットまたは12オクテットのままです。これは、DSR RTPペイロードが常にオクテット境界で終了するようになります。

3.2.1. Frame Pair Formats

3.2.1. ペア形式のフレーム

3.2.1.1. Format of Speech and Non-speech FPs

3.2.1.1。音声・非音声のFPのフォーマット

The following mel-cepstral frame MUST be used, as defined in [1]:

[1]で定義されるように、次のメルケプストラムフレームは、使用する必要があります。

Pairs of the quantized 10ms mel-cepstral frames MUST be grouped together and protected with a 4-bit CRC forming a 92-bit long FP. At the end, each FP MUST be padded with 4 zeros to the MSB 4 bits of the last octet in order to make the FP aligned to the octet boundary.

量子化された10ミリ秒のメルケプストラムフレームの対が一緒にグループ化され、92ビット長FPを形成する4ビットのCRCで保護されなければなりません。終了時に、各FPはオクテット境界に整列FPを行うために、最後のオクテットのMSB 4ビットに4ゼロでパディングされなければなりません。

The following diagram shows a complete ES 202 050 FP:

次の図は、完全なES 202 050 FPを示しています。

     Frame #1 in FP:
     ===============
        (MSB)                                     (LSB)
          0     1     2     3     4     5     6     7
       +-----+-----+-----+-----+-----+-----+-----+-----+
       :  idx(2,3) |            idx(0,1)               |    Octet 1
       +-----+-----+-----+-----+-----+-----+-----+-----+
       :       idx(4,5)        |     idx(2,3) (cont)   :    Octet 2
       +-----+-----+-----+-----+-----+-----+-----+-----+
       |             idx(6,7)              |idx(4,5)(cont)  Octet 3
       +-----+-----+-----+-----+-----+-----+-----+-----+
   idx(10,11)| VAD |              idx(8,9)             |    Octet 4
       +-----+-----+-----+-----+-----+-----+-----+-----+
       :       idx(12,13)      |   idx(10,11) (cont)   :    Octet 5
       +-----+-----+-----+-----+-----+-----+-----+-----+
                               |   idx(12,13) (cont)   :    Octet 6/1
                               +-----+-----+-----+-----+

    Frame #2 in FP:
    ===============
        (MSB)                                     (LSB)
          0     1     2     3     4     5     6     7
       +-----+-----+-----+-----+
       :        idx(0,1)       |                            Octet 6/2
       +-----+-----+-----+-----+-----+-----+-----+-----+
       |              idx(2,3)             |idx(0,1)(cont)  Octet 7
       +-----+-----+-----+-----+-----+-----+-----+-----+
       :  idx(6,7) |              idx(4,5)             |    Octet 8
       +-----+-----+-----+-----+-----+-----+-----+-----+
       :        idx(8,9)       |      idx(6,7) (cont)  :    Octet 9
       +-----+-----+-----+-----+-----+-----+-----+-----+
       |          idx(10,11)         | VAD |idx(8,9)(cont)  Octet 10
       +-----+-----+-----+-----+-----+-----+-----+-----+
       |                   idx(12,13)                  |    Octet 11
       +-----+-----+-----+-----+-----+-----+-----+-----+

    CRC for Frame #1 and Frame #2 and padding in FP:
    ================================================
        (MSB)                                     (LSB)
          0     1     2     3     4     5     6     7
       +-----+-----+-----+-----+-----+-----+-----+-----+
       |  0  |  0  |  0  |  0  |          CRC          |    Octet 12
       +-----+-----+-----+-----+-----+-----+-----+-----+

The 4-bit CRC in the FP MUST be calculated using the formula (including the bit-order rules) defined in 7.2 in [1].

FPの4ビットCRC [1]で7.2で定義された（ビット順序ルールを含む）の式を用いて計算しなければなりません。

Therefore, each FP represents 20ms of original speech. Note that each FP MUST be padded with 4 zeros to the MSB 4 bits of the last octet in order to make the FP aligned to the octet boundary, as shown above. This makes the total size of an FP 96 bits, or 12 octets. Note that this padding is separate from padding indicated by the P bit in the RTP header.

したがって、各FPは、元の音声の20ミリ秒を表します。上記のように各FPは、オクテット境界に整列FPを行うために、最後のオクテットのMSB 4ビットに4ゼロで埋めなければならないことに留意されたいです。これは、FP 96ビット、または12オクテットの合計サイズになります。このパディングは、RTPヘッダ内のPビットによって示さパディングとは別であることに留意されたいです。

The definition of the indices and 'VAD' flag are described in [1] and their value is only set and examined by the codecs in the front-end client and the recognizer.

指標の定義と「VAD」フラグ[1]に記載されており、それらの値は、フロントエンド・クライアントおよび認識におけるコーデックによって設定され、調べられます。

3.2.1.2. Format of Null FP

3.2.1.2。ヌルFPのフォーマット

Null FPs are sent to mark the end of a transmission segment. Details on transmission segment and the use of Null FPs can be found in RFC 3557 [10].

ヌルのFPは、送信セグメントの終わりをマークするために送信されます。伝送セグメントおよびNullのFPの使用に関する詳細は、RFC 3557 [10]に見出すことができます。

A Null FP for the ES 202 050 front-end codec is defined by setting the content of the first and second frame in the FP to null (i.e., filling the first 88 bits of the FP with zeros). The 4-bit CRC MUST be calculated the same way as described in Section 7.2.4 of [1], and 4 zeros MUST be padded to the end of the Null FP in order to make it aligned to the octet boundary.

ES 202 050フロントエンドコーデックのヌルFP（すなわち、ゼロでFPの最初の88ビットを埋める）nullにFPに第一及び第二のフレームの内容を設定することによって定義されます。 4ビットのCRCは、[1]のセクション7.2.4に記載したのと同じ方法で計算しなければならない、及び4ゼロが、それはオクテット境界に整列させるためにヌルFPの終わりにパディングされなければなりません。

3.3. Payload Format for ES 202 211 DSR

3.3. ES 202 211 DSRのためのペイロードフォーマット

An ES 202 211 DSR RTP payload datagram is very similar to that defined in Section 3 of RFC 3557 [10], i.e., a standard RTP header followed by a DSR payload containing a series of DSR FPs.

ES 202 211 DSR RTPペイロードデータグラムは、RFC 3557のセクション3で定義されたものと非常に類似している[10]、すなわち、標準のRTPヘッダーはDSRのFPの一連を含むDSRペイロードが続きます。

The size of each ES 202 211 FP is 112 bits or 14 octets, as defined in the following sections. This ensures that a DSR RTP payload will always end on an octet boundary.

以下のセクションで定義されるように各ES 202 211 FPのサイズは、112ビットまたは14オクテットです。これは、DSR RTPペイロードが常にオクテット境界で終了するようになります。

3.3.1. Frame Pair Formats

3.3.1. ペア形式のフレーム

3.3.1.1. Format of Speech and Non-speech FPs

3.3.1.1。音声・非音声のFPのフォーマット

The following mel-cepstral frame MUST be used, as defined in Section 6.2.4 in [2]:

6.2.4項で定義されるように、次のメルケプストラムフレームを使用しなければならない[2]。

Immediately following two frames (Frame #1 and Frame #2) worth of codebook indices (or 88 bits), there is a 4-bit CRC calculated on these 88 bits. The pitch indices of the first frame (Pidx1: 7 bits) and the second frame (Pidx2: 5 bits) of the frame pair then follow. The class indices of the two frames in the frame pair worth 1 bit each (Cidx1 and Cidx2) next follow. Finally, a 2-bit CRC calculated on the pitch and class bits (total: 14 bits) of the frame pair is included (PC-CRC). The total number of bits in a frame pair packet is therefore 44 + 44 + 4 + 7 + 5 + 1 + 1 + 2 = 108. At the end, each FP MUST be padded with 4 zeros to the MSB 4 bits of the last octet in order to make the FP aligned to the octet boundary.

直ちに二つのフレーム（フレーム＃1とフレーム＃2）コードブックインデックス（又は88ビット）の価値以下、これらの88ビットに基づいて計算した4ビットのCRCがあります。次に従うフレーム対の（：7ビットPidx1）と第2フレーム（5ビットPidx2）最初のフレームのピッチインデックス。次のフォロー（Cidx1とCidx2）各1ビットの価値フレームペアにおける2つのフレームのクラスインデックス。最後に、ピッチクラスビット（全14ビット）に基づいて計算し、2ビットのCRCフレームペアのは、（PC-CRC）が含まれます。フレームペアパケット内のビットの総数は、したがって終わりに44 + 44 + 4 + 7 + 5 + 1 + 1 + 2 = 108であり、それぞれのFPは、最後のMSB 4ビットに4ゼロでパディングされなければなりませんオクテット境界に整列FPを作るためにはオクテット。

The following diagram shows a complete ES 202 211 FP:

次の図は、完全なES 202 211 FPを示しています。

     Frame #1 in FP:
     ===============
       (MSB)                                     (LSB)
         0     1     2     3     4     5     6     7
      +-----+-----+-----+-----+-----+-----+-----+-----+
      :  idx(2,3) |            idx(0,1)               |    Octet 1
      +-----+-----+-----+-----+-----+-----+-----+-----+
      :       idx(4,5)        |     idx(2,3) (cont)   :    Octet 2
      +-----+-----+-----+-----+-----+-----+-----+-----+
      |             idx(6,7)              |idx(4,5)(cont)  Octet 3
      +-----+-----+-----+-----+-----+-----+-----+-----+
       idx(10,11) |              idx(8,9)             |    Octet 4
      +-----+-----+-----+-----+-----+-----+-----+-----+
      :       idx(12,13)      |   idx(10,11) (cont)   :    Octet 5
      +-----+-----+-----+-----+-----+-----+-----+-----+
                              |   idx(12,13) (cont)   :    Octet 6/1
                              +-----+-----+-----+-----+

    Frame #2 in FP:
    ===============
       (MSB)                                     (LSB)
         0     1     2     3     4     5     6     7
      +-----+-----+-----+-----+
      :        idx(0,1)       |                            Octet 6/2
      +-----+-----+-----+-----+-----+-----+-----+-----+
      |              idx(2,3)             |idx(0,1)(cont)  Octet 7
      +-----+-----+-----+-----+-----+-----+-----+-----+
      :  idx(6,7) |              idx(4,5)             |    Octet 8
      +-----+-----+-----+-----+-----+-----+-----+-----+
      :        idx(8,9)       |      idx(6,7) (cont)  :    Octet 9
      +-----+-----+-----+-----+-----+-----+-----+-----+
      |          idx(10,11)               |idx(8,9)(cont)  Octet 10
      +-----+-----+-----+-----+-----+-----+-----+-----+
      |                   idx(12,13)                  |    Octet 11
      +-----+-----+-----+-----+-----+-----+-----+-----+

    CRC for Frame #1 and Frame #2 in FP:
    ====================================
       (MSB)                                     (LSB)
         0     1     2     3     4     5     6     7
                              +-----+-----+-----+-----+
                              |          CRC          |    Octet 12/1
                              +-----+-----+-----+-----+

    Extension information and padding in FP:
    ========================================
       (MSB)                                     (LSB)
         0     1     2     3     4     5     6     7
      +-----+-----+-----+-----+
      :       Pidx1           |                            Octet 12/2
      +-----+-----+-----+-----+-----+-----+-----+-----+
      |            Pidx2            |   Pidx1 (cont)  :    Octet 13
      +-----+-----+-----+-----+-----+-----+-----+-----+
      |  0  |  0  |  0  |  0  |  PC-CRC   |Cidx2|Cidx1|    Octet 14
      +-----+-----+-----+-----+-----+-----+-----+-----+

The 4-bit CRC and the 2-bit PC-CRC in the FP MUST be calculated using the formula (including the bit-order rules) defined in 6.2.4 in [2].

FPの4ビットCRCと2ビットPC-CRC [2]において6.2.4で定義された（ビット順序ルールを含む）の式を用いて計算しなければなりません。

Therefore, each FP represents 20ms of original speech. Note, as shown above, each FP MUST be padded with 4 zeros to the MSB 4 bits of the last octet in order to make the FP aligned to the octet boundary. This makes the total size of an FP 112 bits, or 14 octets. Note, this padding is separate from padding indicated by the P bit in the RTP header.

したがって、各FPは、元の音声の20ミリ秒を表します。上記のように、注意し、各FPはオクテット境界に整列FPを行うために、最後のオクテットのMSB 4ビットに4ゼロでパディングされなければなりません。これはFP 112ビット、または14オクテットの合計サイズになります。このパディングは、RTPヘッダ内のPビットによって示さパディングから分離され、注意してください。

3.3.1.2. Format of Null FP

3.3.1.2。ヌルFPのフォーマット

A Null FP for the ES 202 211 front-end codec is defined by setting all the 112 bits of the FP with zeros. Null FPs are sent to mark the end of a transmission segment. Details on transmission segment and the use of Null FPs can be found in RFC 3557 [10].

ヌルFP ES 202 211フロントエンドのコーデックはゼロでFPの全112ビットを設定することによって定義されます。ヌルのFPは、送信セグメントの終わりをマークするために送信されます。伝送セグメントおよびNullのFPの使用に関する詳細は、RFC 3557 [10]に見出すことができます。

3.4. Payload Format for ES 202 212 DSR

3.4. ES 202 212 DSRのためのペイロードフォーマット

Similar to other ETSI DSR front-end encoding schemes, the encoded DSR feature stream of ES 202 212 is transmitted in a sequence of FPs, where each FP represents two consecutive original voice frames.

他のETSI DSRフロントエンド符号化方式と同様、ES 202 212の符号化されたDSR特徴ストリームは、それぞれのFPは、2つの連続する元の音声フレームを表すのFPの順序で送信されます。

An ES 202 212 DSR RTP payload datagram is very similar to that defined in Section 3 of RFC 3557 [10], i.e., a standard RTP header followed by a DSR payload containing a series of DSR FPs.

ES 202 212 DSR RTPペイロードデータグラムは、RFC 3557のセクション3で定義されたものと非常に類似している[10]、すなわち、標準のRTPヘッダーはDSRのFPの一連を含むDSRペイロードが続きます。

The size of each ES 202 212 FP is 112 bits or 14 octets, as defined in the following sections. This ensures that an ES 202 212 DSR RTP payload will always end on an octet boundary.

以下のセクションで定義されるように各ES 202 212 FPのサイズは、112ビットまたは14オクテットです。これは、ES 202 212 DSR RTPペイロードは常にオクテット境界で終了するようになります。

3.4.1. Frame Pair Formats

3.4.1. ペア形式のフレーム

3.4.1.1. Format of Speech and Non-speech FPs

3.4.1.1。音声・非音声のFPのフォーマット

The following mel-cepstral frame MUST be used, as defined in Section 7.2.4 of [3]:

[3]のセクション7.2.4で定義されるように、次のメルケプストラムフレームは、使用する必要があります。

Immediately following two frames (Frame #1 and Frame #2) worth of codebook indices (or 88 bits), there is a 4-bit CRC calculated on these 88 bits. The pitch indices of the first frame (Pidx1: 7 bits) and the second frame (Pidx2: 5 bits) of the frame pair then follow. The class indices of the two frames in the frame pair worth 1 bit each next follow (Cidx1 and Cidx2). Finally, a 2-bit CRC (PC-CRC) calculated on the pitch and class bits (total: 14 bits) of the frame pair is included. The total number of bits in frame pair packet is therefore 44 + 44 + 4 + 7 + 5 + 1 + 1 + 2 = 108. At the end, each FP MUST be padded with 4 zeros to the MSB 4 bits of the last octet in order to make the FP aligned to the octet boundary. The padding brings the total size of a FP to 112 bits, or 14 octets. Note that this padding is separate from padding indicated by the P bit in the RTP header.

直ちに二つのフレーム（フレーム＃1とフレーム＃2）コードブックインデックス（又は88ビット）の価値以下、これらの88ビットに基づいて計算した4ビットのCRCがあります。次に従うフレーム対の（：7ビットPidx1）と第2フレーム（5ビットPidx2）最初のフレームのピッチインデックス。各次のフォロー（Cidx1とCidx2）1ビット価値フレームペアにおける2つのフレームのクラスインデックス。最後に、ピッチクラスビット（全14ビット）に基づいて計算し、2ビットのCRC（PC-CRC）フレーム対が含まれます。フレームペアパケット内のビットの総数は、したがって終わりに44 + 44 + 4 + 7 + 5 + 1 + 1 + 2 = 108であり、それぞれのFPは、最後のオクテットのMSB 4ビットに4ゼロでパディングされなければなりません作るためにFPは、オクテット境界に整列します。パディングは、112ビット、または14個のオクテットにFPの合計サイズをもたらします。このパディングは、RTPヘッダ内のPビットによって示さパディングとは別であることに留意されたいです。

The following diagram shows a complete ES 202 212 FP:

次の図は、完全なES 202 212 FPを示しています。

     Frame #1 in FP:
     ===============
        (MSB)                                     (LSB)
          0     1     2     3     4     5     6     7
       +-----+-----+-----+-----+-----+-----+-----+-----+
       :  idx(2,3) |            idx(0,1)               |    Octet 1
       +-----+-----+-----+-----+-----+-----+-----+-----+
       :       idx(4,5)        |     idx(2,3) (cont)   :    Octet 2
       +-----+-----+-----+-----+-----+-----+-----+-----+
       |             idx(6,7)              |idx(4,5)(cont)  Octet 3
       +-----+-----+-----+-----+-----+-----+-----+-----+
   idx(10,11)| VAD |              idx(8,9)             |    Octet 4
       +-----+-----+-----+-----+-----+-----+-----+-----+
       :       idx(12,13)      |   idx(10,11) (cont)   :    Octet 5
       +-----+-----+-----+-----+-----+-----+-----+-----+
                               |   idx(12,13) (cont)   :    Octet 6/1
                               +-----+-----+-----+-----+

    Frame #2 in FP:
    ===============
        (MSB)                                     (LSB)
          0     1     2     3     4     5     6     7
       +-----+-----+-----+-----+
       :        idx(0,1)       |                            Octet 6/2
       +-----+-----+-----+-----+-----+-----+-----+-----+
       |              idx(2,3)             |idx(0,1)(cont)  Octet 7
       +-----+-----+-----+-----+-----+-----+-----+-----+
       :  idx(6,7) |              idx(4,5)             |    Octet 8
       +-----+-----+-----+-----+-----+-----+-----+-----+
       :        idx(8,9)       |      idx(6,7) (cont)  :    Octet 9
       +-----+-----+-----+-----+-----+-----+-----+-----+
       |          idx(10,11)         | VAD |idx(8,9)(cont)  Octet 10
       +-----+-----+-----+-----+-----+-----+-----+-----+
       |                   idx(12,13)                  |    Octet 11
       +-----+-----+-----+-----+-----+-----+-----+-----+

    CRC for Frame #1 and Frame #2 in FP:
    ====================================
        (MSB)                                     (LSB)
          0     1     2     3     4     5     6     7
                               +-----+-----+-----+-----+
                               |          CRC          |    Octet 12/1
                               +-----+-----+-----+-----+

    Extension information and padding in FP:
    ========================================
        (MSB)                                     (LSB)
          0     1     2     3     4     5     6     7
       +-----+-----+-----+-----+
       :       Pidx1           |                            Octet 12/2
       +-----+-----+-----+-----+-----+-----+-----+-----+
       |            Pidx2            |   Pidx1 (cont)  :    Octet 13
       +-----+-----+-----+-----+-----+-----+-----+-----+
       |  0  |  0  |  0  |  0  |  PC-CRC   |Cidx2|Cidx1|    Octet 14
       +-----+-----+-----+-----+-----+-----+-----+-----+

The codebook indices, VAD flag, pitch index, and class index are specified in Section 6 of [3]. The 4-bit CRC and the 2-bit PC-CRC in the FP MUST be calculated using the formula (including the bit-order rules) defined in 7.2.4 in [3].

コードブックインデックス、VADフラグ、ピッチインデックス、及びクラスインデックスは、[3]のセクション6に規定されています。 FPの4ビットCRCと2ビットPC-CRC [3]において7.2.4で定義された（ビット順序ルールを含む）の式を用いて計算しなければなりません。

3.4.1.2. Format of Null FP

3.4.1.2。ヌルFPのフォーマット

A Null FP for the ES 202 212 front-end codec is defined by setting all 112 bits of the FP with zeros. Null FPs are sent to mark the end of a transmission segment. Details on transmission segments and the use of Null FPs can be found in RFC 3557 [10].

ヌルFP ES 202 212フロントエンドのコーデックはゼロでFPの全112ビットを設定することによって定義されます。ヌルのFPは、送信セグメントの終わりをマークするために送信されます。伝送セグメントおよびNullのFPの使用に関する詳細は、RFC 3557 [10]に見出すことができます。

4. IANA Considerations

4. IANAの考慮事項

For each of the three ETSI DSR front-end codecs covered in this document, a new MIME subtype registration has been registered by the IANA for the corresponding payload type, as described below.

以下に説明するように、この文書で覆わ3 ETSI DSRフロントエンドコーデックのそれぞれについて、新しいMIMEサブタイプ登録は、対応するペイロードタイプにIANAによって登録されています。

Media Type name: audio

メディアタイプ名：オーディオ

Media subtype names:

メディアサブタイプ名：

dsr-es202050 (for ES 202 050 front-end)

（ES 202 050フロントエンド用）DSR-es202050

dsr-es202211 (for ES 202 211 front-end)

（ES 202 211フロントエンド用）DSR-es202211

dsr-es202212 (for ES 202 212 front-end)

（ES 202 212フロントエンド用）DSR-es202212

Required parameters: none

必須パラメータ：なし

Optional parameters:

オプションのパラメータ：

rate: Indicates the sample rate of the speech. Valid values include: 8000, 11000, and 16000. If this parameter is not present, 8000 sample rate is assumed.

レート：音声のサンプルレートを示します。有効な値は次のとおりです。8000、11000、および16000をこのパラメータが存在しない場合、8000サンプル・レートが想定されます。

maxptime: see RFC 3267 [7]. If this parameter is not present, maxptime is assumed to be 80ms.

maxptime：RFC 3267 [7]を参照してください。このパラメータが存在しない場合、maxptimeは80ミリ秒であると仮定されます。

Note, since the performance of most speech recognizers are extremely sensitive to consecutive FP losses, if the user of the payload format expects a high packet loss ratio for the session, it MAY consider to explicitly choose a maxptime value for the session that is shorter than the default value.

ペイロード形式のユーザーがセッションのための高いパケット損失率は、それが明示的により短いセッションのmaxptime値を選択するために検討することを期待あれば、ほとんどの音声認識のパフォーマンスは、連続したFP損失に非常に敏感なので、注意してくださいデフォルト値。

ptime: see RFC 2327 [5].

PTIME：RFC 2327 [5]を参照してください。

Encoding considerations: These types are defined for transfer via RTP [8] as described in Section 3 of RFC 4060.

考察をコードする：これらのタイプは、RTPを介して転送[8] RFC 4060のセクション3に記載されるようにするために定義されています。

Security considerations: See Section 5 of RFC 4060.

セキュリティの考慮事項：RFC 4060のセクション5を参照してください。

Person & email address to contact for further information: Qiaobing.Xie@motorola.com

人とEメールアドレスは、詳細についての問い合わせ先：Qiaobing.Xie@motorola.com

Intended usage: COMMON. It is expected that many VoIP applications (as well as mobile applications) will use this type.

意図している用法：COMMON。多くのVoIPアプリケーション（だけでなく、モバイルアプリケーション）は、このタイプを使用することが期待されます。

Author: Qiaobing.Xie@motorola.com

著者：Qiaobing.Xie@motorola.com

Change controller: IETF Audio/Video transport working group

変更コントローラ：IETFオーディオ/ビデオトランスポートワーキンググループ

4.1. Mapping MIME Parameters into SDP

4.1. SDPにMIMEパラメータのマッピング

The information carried in the MIME media type specification has a specific mapping to fields in the Session Description Protocol (SDP) [5], which is commonly used to describe RTP sessions. When SDP is used to specify sessions employing ES 202 050, ES 202 211, or ES 202 212 DSR codec, the mapping is as follows:

MIMEメディアタイプの仕様で搬送される情報は、[5]、一般にRTPセッションを記述するために使用されるセッション記述プロトコル（SDP）内のフィールドに特定のマッピングを有します。 SDPはES 202 050、ES 202 211、またはES 202 212 DSRコーデック採用セッションを指定するために使用される場合、以下のように、マッピングは次のとおりです。

o The MIME type ("audio") goes in SDP "m=" as the media name.

O MIMEタイプ（「オーディオ」）は、メディア名としてSDP「m =」に進みます。

o The MIME subtype ("dsr-es202050", "dsr-es202211", or "dsr-es202212") goes in SDP "a=rtpmap" as the encoding name.

O MIMEサブタイプ（ "DSR-es202050"、 "DSR-es202211"、または "DSR-es202212"）は、符号化名としてSDPの "a = rtpmap" に進みます。

o The optional parameter "rate" also goes in "a=rtpmap" as clock rate. If no rate is given, then the default value (i.e., 8000) is used in SDP.

Oオプションのパラメータ「速度」は、クロック・レートとして「A = rtpmap」になります。何レートが指定されていない場合は、デフォルト値（すなわち、8000）SDPに使用されます。

o The optional parameters "ptime" and "maxptime" go in the SDP "a=ptime" and "a=maxptime" attributes, respectively.

Oオプションのパラメータ "PTIME" と "maxptime" は、それぞれ、 "A = PTIME" と "A = maxptimeは" 属性SDPに行きます。

Example of usage of ES 202 050 DSR:

ES 202 050 DSRの使用例：

m=audio 49120 RTP/AVP 101 a=rtpmap:101 dsr-es202050/8000 a=maxptime:40

M =オーディオ49120 RTP / AVP 101 = rtpmap：101 DSR-es202050 / 8000 = maxptime：40

Example of usage of ES 202 211 DSR:

ES 202 211 DSRの使用例：

m=audio 49120 RTP/AVP 101 a=rtpmap:101 dsr-es202211/8000 a=maxptime:40

M =オーディオ49120 RTP / AVP 101 = rtpmap：101 DSR-es202211 / 8000 = maxptime：40

Example of usage of ES 202 212 DSR:

ES 202 212 DSRの使用例：

m=audio 49120 RTP/AVP 101 a=rtpmap:101 dsr-es202212/8000 a=maxptime:40

M =オーディオ49120 RTP / AVP 101 = rtpmap：101 DSR-es202212 / 8000 = maxptime：40

4.2. Usage in Offer/Answer

4.2. オファー/回答での使用

All SDP parameters in this payload format are declarative, and all reasonable values are expected to be supported. Thus, the standard usage of Offer/Answer as described in RFC 3264 [6] should be followed.

このペイロード形式のすべてのSDPパラメータが宣言され、そしてすべての合理的な値がサポートされることが期待されます。従って、オファー/アンサーの標準的な使用は、RFC 3264に記載されているように[6]に従うべきです。

4.3. Congestion Control

4.3. 輻輳制御

Congestion control for RTP MUST be used in accordance with RFC 3550 [8], and in any applicable RTP profile, e.g., RFC 3551 [9].

RTPのための輻輳制御は、RFC 3550に従って使用されなければならない[8]、及び該当RTPプロファイルで、例えば、RFC 3551 [9]。

5. Security Considerations

5.セキュリティについての考慮事項

Implementations using the payload defined in this specification are subject to the security considerations discussed in the RTP specification RFC 3550 [8] and any RTP profile, e.g., RFC 3551 [9]. This payload does not specify any different security services.

本明細書で定義されたペイロードを使用して実装RTP仕様RFC 3550で説明したセキュリティ上の考慮の対象となっている[8]及び任意RTPプロファイル、例えば、RFC 3551 [9]。このペイロードは、任意の異なるセキュリティ・サービスを指定していません。

6. Acknowledgments

6.謝辞

The design presented here is based on that of RFC 3557 [10]. The authors wish to thank Magnus Westerlund and others for their reviews and comments.

ここで紹介する設計はRFC 3557 [10]のものに基づいています。作者は彼らのレビューとコメントのためのマグヌスウェスターや他の人に感謝したいです。

7. References

7.参考

7.1. Normative References

7.1. 引用規格

[1] European Telecommunications Standards Institute (ETSI) Standard ES 202 050, "Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-end Feature Extraction Algorithm; Compression Algorithms", http://pda.etsi.org/pda/.

[1]欧州電気通信標準化機構（ETSI）標準ES 202 050、「音声処理、伝送及び品質的側面（STQ）;分散型音声認識、高度なフロントエンド特徴抽出アルゴリズム、圧縮アルゴリズム」は、http：//pda.etsi .ORG / PDAの/。

[2] European Telecommunications Standards Institute (ETSI) Standard ES 202 211, "Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Extended front-end feature extraction algorithm; Compression algorithms; Back-end speech reconstruction algorithm", http://pda.etsi.org/pda/.

[2]ヨーロッパ電気通信標準協会（ETSI）規格ES 202 211、「音声処理、伝送及び品質面（STQ）;音声認識分散;拡張フロントエンド特徴抽出アルゴリズム、圧縮アルゴリズム、バックエンド音声再構成アルゴリズム」、 http://pda.etsi.org/pda/。

[3] European Telecommunications Standards Institute (ETSI) Standard ES 202 212, "Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Extended advanced front-end feature extraction algorithm; Compression algorithms; Back-end speech reconstruction algorithm", http://pda.etsi.org/pda/.

[3]欧州電気通信標準化機構（ETSI）標準ES 202 212、「音声処理、伝送及び品質の面（STQ）;分散型音声認識;拡張先進的なフロントエンド機能抽出アルゴリズムを、圧縮アルゴリズム、バックエンドの音声再構成アルゴリズム」、http://pda.etsi.org/pda/。

[4] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.

[4]ブラドナーの、S.、 "要件レベルを示すためにRFCsにおける使用のためのキーワード"、BCP 14、RFC 2119、1997年3月を。

[5] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998.

[5]ハンドリー、M.およびV. Jacobson氏、 "SDP：セッション記述プロトコル"、RFC 2327、1998年4月。

[6] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with the Session Description Protocol (SDP)", RFC 3264, June 2002.

[6]ローゼンバーグ、J.、およびH. Schulzrinneと、RFC 3264、2002年6月 "セッション記述プロトコル（SDP）とのオファー/アンサーモデル"。

[7] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, "Real-Time Transport Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs", RFC 3267, June 2002.

[7] Sjoberg、J.、ウェスター、M.、Lakaniemi、A.、およびQ.謝、「リアルタイムトランスポートプロトコル（RTP）ペイロードフォーマットと適応マルチレート（AMR）と適応マルチ用ストレージファイル形式を-rate広帯域（AMR-WB）オーディオコーデック」、RFC 3267、2002年6月。

[8] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.

[8] Schulzrinneと、H.、Casner、S.、フレデリック、R.、およびV.ヤコブソン、 "RTP：リアルタイムアプリケーションのためのトランスポートプロトコル"、STD 64、RFC 3550、2003年7月。

[9] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, July 2003.

[9] Schulzrinneと、H.とS. Casner、 "最小量のコントロールがあるオーディオとビデオ会議システムのためのRTPプロフィール"、STD 65、RFC 3551、2003年7月。

[10] Xie, Q., "RTP Payload Format for European Telecommunications Standards Institute (ETSI) European Standard ES 201 108 Distributed Speech Recognition Encoding", RFC 3557, July 2003.

[10]謝、Q.、 "欧州電気通信標準化協会のためのRTPペイロードフォーマット（ETSI）欧州規格ES 201 108分散型音声認識エンコーディング"、RFC 3557、2003年7月。

7.2. Informative References

7.2. 参考文献

[11] European Telecommunications Standards Institute (ETSI) Standard ES 201 108, "Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Front-end Feature Extraction Algorithm; Compression Algorithms", http://pda.etsi.org/pda/.

[11]欧州電気通信標準化機構（ETSI）規格ES 201 108、「音声処理、伝送及び品質側面（STQ）;分散型音声認識、フロントエンド特徴抽出アルゴリズム、圧縮アルゴリズム」は、http：//pda.etsi。 ORG / PDAの/。

Authors' Addresses

著者のアドレス

Qiaobing Xie Motorola, Inc. 1501 W. Shure Drive, 2-F9 Arlington Heights, IL 60004 US

Qiaobing謝モトローラ社1501 W.シュアードライブ、2-F9アーリントンハイツ、イリノイ州60004米国

Phone: +1-847-632-3028 EMail: qxie1@email.mot.com

電話：+ 1-847-632-3028 Eメール：qxie1@email.mot.com

David Pearce Motorola Labs UK Research Laboratory Jays Close Viables Industrial Estate Basingstoke, HANTS RG22 4PD UK

デビッド・ピアースモトローラ研究所英国研究所ジェイズ閉じるViables工業団地ベイジングストーク、ハンツRG22 4PD英国

Phone: +44 (0)1256 484 436 EMail: bdp003@motorola.com

電話：+44（0）1256 484 436 Eメール：bdp003@motorola.com

Full Copyright Statement

完全な著作権声明

著作権（C）インターネット協会（2005）。

This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.

この文書では、BCP 78に含まれる権利と許可と制限の適用を受けており、その中の記載を除いて、作者は彼らのすべての権利を保有します。

This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

この文書とここに含まれている情報は、基礎とCONTRIBUTOR「そのまま」、ORGANIZATION HE / SHEが表すまたはインターネットソサエティおよびインターネット・エンジニアリング・タスク・フォース放棄すべての保証、明示または、（もしあれば）後援ISに設けられています。黙示、情報の利用は、特定の目的に対する権利または商品性または適合性の黙示の保証を侵害しない任意の保証含むがこれらに限定されません。

Intellectual Property

知的財産

The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.

IETFは、本書またはそのような権限下で、ライセンスがたりないかもしれない程度に記載された技術の実装や使用に関係すると主張される可能性があります任意の知的財産権やその他の権利の有効性または範囲に関していかなる位置を取りません利用可能です。またそれは、それがどのような権利を確認する独自の取り組みを行ったことを示すものでもありません。 RFC文書の権利に関する手続きの情報は、BCP 78およびBCP 79に記載されています。

Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.

IPRの開示のコピーが利用できるようにIETF事務局とライセンスの保証に行われた、または本仕様の実装者または利用者がそのような所有権の使用のための一般的なライセンスまたは許可を取得するために作られた試みの結果を得ることができますhttp://www.ietf.org/iprのIETFのオンラインIPRリポジトリから。

The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.

IETFは、その注意にこの標準を実装するために必要とされる技術をカバーすることができる任意の著作権、特許または特許出願、またはその他の所有権を持ってすべての利害関係者を招待します。 ietf-ipr@ietf.orgのIETFに情報を記述してください。

Acknowledgement

謝辞

Funding for the RFC Editor function is currently provided by the Internet Society.

RFC Editor機能のための基金は現在、インターネット協会によって提供されます。