pgsql-jp.github.io/current/html/bloom.html at master · pgsql-jp/pgsql-jp.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>F.6. bloom — ブルームフィルタインデックスアクセスメソッド</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="basic-archive.html" title="F.5. basic_archive — WALアーカイブモジュールの例" /><link rel="next" href="btree-gin.html" title="F.7. btree_gin — GIN演算子クラスとB-tree動作" /><meta name="viewport" content="width=device-width,initial-scale=1.0" /></head><body id="docContent" class="container-fluid col-10"><div class="other_version"><a href="https://www.postgresql.jp/document/">バージョンごとのドキュメント一覧</a></div><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="4" align="center"><a accesskey="h" href="index.html">PostgreSQL 18.3文書</a></th></tr><tr><td width="10%" align="left"></td><td width="10%" align="left"></td><td width="60%" align="center"><a href="contrib.html" title="付録F 追加で提供されるモジュールと拡張">付録F 追加で提供されるモジュールと拡張</a></td><td width="20%" align="right"><div class="actions"><a class="issue" title="github" href="https://github.com/pgsql-jp/jpug-doc/issues/new?template=bug_report.yml&amp;what-happened=version 18.3 : bloom.html">誤訳等の報告
                    </a></div></td></tr><tr><td width="10%" align="left"><a accesskey="p" href="basic-archive.html" title="F.5. basic_archive — WALアーカイブモジュールの例">前へ</a> </td><td width="10%" align="left"><a accesskey="u" href="contrib.html" title="付録F 追加で提供されるモジュールと拡張">上へ</a></td><td width="60%" align="center">F.6. bloom — ブルームフィルタインデックスアクセスメソッド</td><td width="20%" align="right"> <a accesskey="n" href="btree-gin.html" title="F.7. btree_gin — GIN演算子クラスとB-tree動作">次へ</a></td></tr></table><hr /></div><div class="sect1" id="BLOOM"><div class="titlepage"><div><div><h2 class="title" style="clear: both">F.6. bloom — ブルームフィルタインデックスアクセスメソッド <a href="#BLOOM" class="id_link">#</a></h2></div></div></div><span class="original">
 &lt;title&gt;bloom &amp;mdash; bloom filter index access method&lt;/title&gt;
</span><a id="id-1.11.7.16.2" class="indexterm"></a><p>
<span class="original">
  &lt;literal&gt;bloom&lt;/literal&gt; provides an index access method based on
  &lt;ulink url="https://en.wikipedia.org/wiki/Bloom_filter"&gt;Bloom filters&lt;/ulink&gt;.
</span>
  <code class="literal">bloom</code>は、<a class="ulink" href="https://en.wikipedia.org/wiki/Bloom_filter" target="_top">ブルームフィルタ</a>によるインデックスアクセスメソッドを提供します。
 </p><p>
<span class="original">
  A Bloom filter is a space-efficient data structure that is used to test
  whether an element is a member of a set.  In the case of an index access
  method, it allows fast exclusion of non-matching tuples via signatures
  whose size is determined at index creation.
</span>
ブルームフィルタは、空間効率の良いデータ構造で、ある要素が集合のメンバかどうかをテストするのに用いられます。
インデックスアクセスメソッドとして使用する場合、インデックス作成時に大きさが決まるシグネチャを使って、条件を満たさないタプルを高速に除外することができます。
 </p><p>
<span class="original">
  A signature is a lossy representation of the indexed attribute(s), and as
  such is prone to reporting false positives; that is, it may be reported
  that an element is in the set, when it is not.  So index search results
  must always be rechecked using the actual attribute values from the heap
  entry.  Larger signatures reduce the odds of a false positive and thus
  reduce the number of useless heap visits, but of course also make the index
  larger and hence slower to scan.
</span>
シグネチャはインデックス化された属性を非可逆的に表現するもので、その性質上、偽陽性の結果を出すことがあります。
すなわち、集合の中にない要素が、集合の中にあると報告するかもしれません。
ですから、インデックスの検索結果は、ヒープエントリ中の実際の属性値を使って、必ず再検査しなければなりません。
シグネチャが大きくなれば偽陽性の可能性が下がるので不必要なヒープの検索は減りますが、もちろんそうなるとインデックスが大きくなるので、スキャンが遅くなります。
 </p><p>
<span class="original">
  This type of index is most useful when a table has many attributes and
  queries test arbitrary combinations of them.  A traditional btree index is
  faster than a bloom index, but it can require many btree indexes to support
  all possible queries where one needs only a single bloom index.  Note
  however that bloom indexes only support equality queries, whereas btree
  indexes can also perform inequality and range searches.
</span>
この種のインデックスは、テーブルに多数の属性があり、その任意の組み合わせを検索する問い合わせを実行するときにもっとも有効です。
伝統的なbtreeインデックスはブルームインデックスよりも高速ですが、可能なすべての問い合わせをサポートするためには多数のbtreeインデックスが必要なのに対し、ブルームインデックスでは、たった一つのブルームインデックスだけで事足ります。
しかし、ブルームインデックスでは等価検索だけをサポートすることに注意してください。
btreeインデックスでは、等価だけでなく、範囲検索も実行できます。
 </p><div class="sect2" id="BLOOM-PARAMETERS"><div class="titlepage"><div><div><h3 class="title">F.6.1. パラメータ <a href="#BLOOM-PARAMETERS" class="id_link">#</a></h3></div></div></div><span class="original">
  &lt;title&gt;Parameters&lt;/title&gt;
</span><p>
<span class="original">
   A &lt;literal&gt;bloom&lt;/literal&gt; index accepts the following parameters in its
   &lt;literal&gt;WITH&lt;/literal&gt; clause:
</span>
<code class="literal">bloom</code>インデックスは、<code class="literal">WITH</code>句中の以下のパラメータを受け付けます。
  </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">length</code></span></dt><dd><p>
<span class="original">
      Length of each signature (index entry) in bits. It is rounded up to the
      nearest multiple of &lt;literal&gt;16&lt;/literal&gt;. The default is
      &lt;literal&gt;80&lt;/literal&gt; bits and the maximum is &lt;literal&gt;4096&lt;/literal&gt;.
</span>
ビット単位の個々のシグネチャ（インデックスエントリ）の長さ。
<code class="literal">16</code>の倍数に近い値に丸められます。
デフォルトは<code class="literal">80</code>ビットで、最大値は<code class="literal">4096</code>です。
     </p></dd></dl></div><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">col1 — col32</code></span></dt><dd><p>
<span class="original">
      Number of bits generated for each index column. Each parameter's name
      refers to the number of the index column that it controls.  The default
      is &lt;literal&gt;2&lt;/literal&gt; bits and the maximum is &lt;literal&gt;4095&lt;/literal&gt;.
      Parameters for index columns not actually used are ignored.
</span>
各インデックスカラムに対して生成するビット数。
各々のパラメータ名は、管理対象のインデックスカラムの番号です。
デフォルトは<code class="literal">2</code>ビットで、最大値は<code class="literal">4095</code>です。
実際には使用されないインデックスカラムについてのパラメータは無視されます。
     </p></dd></dl></div></div><div class="sect2" id="BLOOM-EXAMPLES"><div class="titlepage"><div><div><h3 class="title">F.6.2. 例 <a href="#BLOOM-EXAMPLES" class="id_link">#</a></h3></div></div></div><span class="original">
  &lt;title&gt;Examples&lt;/title&gt;
</span><p>
<span class="original">
   This is an example of creating a bloom index:
</span>
ブルームインデックスの作成例です。
  </p><pre class="programlisting">
CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
       WITH (length=80, col1=2, col2=2, col3=4);
</pre><p>
<span class="original">
   The index is created with a signature length of 80 bits, with attributes
   i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits.  We could
   have omitted the &lt;literal&gt;length&lt;/literal&gt;, &lt;literal&gt;col1&lt;/literal&gt;,
   and &lt;literal&gt;col2&lt;/literal&gt; specifications since those have the default values.
</span>
このインデックスは80ビット長のシグネチャで作成され、属性i1とi2は2ビットに、i3は4ビットにマップされます。
<code class="literal">length</code>、<code class="literal">col1</code>、<code class="literal">col2</code>指定はデフォルト値を使っているので、省略しても構いません。
  </p><p>
<span class="original">
   Here is a more complete example of bloom index definition and usage, as
   well as a comparison with equivalent btree indexes.  The bloom index is
   considerably smaller than the btree index, and can perform better.
</span>
より完全なブルームインデックスの定義と使用法を示します。
比較のために、これと同等のbtreeインデックスも併せて示します。
ブルームインデックスはbtreeインデックスよりもかなり小さく、また、より良い性能を発揮できるかもしれません。
  </p><pre class="programlisting">
=# CREATE TABLE tbloom AS
   SELECT
     (random() * 1000000)::int as i1,
     (random() * 1000000)::int as i2,
     (random() * 1000000)::int as i3,
     (random() * 1000000)::int as i4,
     (random() * 1000000)::int as i5,
     (random() * 1000000)::int as i6
   FROM
  generate_series(1,10000000);
SELECT 10000000
</pre><p>
<span class="original">
   A sequential scan over this large table takes a long time:
</span>
これだけ大きなテーブルに対するシーケンシャルスキャンは長い時間がかかります。
</p><pre class="programlisting">
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on tbloom  (cost=0.00..213744.00 rows=250 width=24) (actual time=357.059..357.059 rows=0.00 loops=1)
   Filter: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Filter: 10000000
   Buffers: shared hit=63744
 Planning Time: 0.346 ms
 Execution Time: 357.076 ms
(6 rows)
</pre><p>
  </p><p>
<span class="original">
   Even with the btree index defined the result will still be a
   sequential scan:
</span>
たとえbtreeインデックスが定義されていたとしても、結果はまだシーケンシャルスキャンです。
</p><pre class="programlisting">
=# CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
=# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
 pg_size_pretty
----------------
 386 MB
(1 row)
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on tbloom  (cost=0.00..213744.00 rows=2 width=24) (actual time=351.016..351.017 rows=0.00 loops=1)
   Filter: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Filter: 10000000
   Buffers: shared hit=63744
 Planning Time: 0.138 ms
 Execution Time: 351.035 ms
(6 rows)
</pre><p>
  </p><p>
<span class="original">
   Having the bloom index defined on the table is better than btree in
   handling this type of search:
</span>
そのテーブルにブルームインデックスが定義されていれば、btreeよりもこの種の検索をうまく扱います。
</p><pre class="programlisting">
=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
 pg_size_pretty
----------------
 153 MB
(1 row)
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=1792.00..1799.69 rows=2 width=24) (actual time=22.605..22.606 rows=0.00 loops=1)
   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Index Recheck: 2300
   Heap Blocks: exact=2256
   Buffers: shared hit=21864
   -&gt;  Bitmap Index Scan on bloomidx  (cost=0.00..178436.00 rows=1 width=0) (actual time=20.005..20.005 rows=2300.00 loops=1)
         Index Cond: ((i2 = 898732) AND (i5 = 123451))
         Index Searches: 1
         Buffers: shared hit=19608
 Planning Time: 0.099 ms
 Execution Time: 22.632 ms
(11 rows)
</pre><p>
  </p><p>
<span class="original">
   Now, the main problem with the btree search is that btree is inefficient
   when the search conditions do not constrain the leading index column(s).
   A better strategy for btree is to create a separate index on each column.
   Then the planner will choose something like this:
</span>
btree検索の主要な問題は、検索条件が、先頭（そしてそれに続く）インデックスカラムを使用しないときに、効率が悪くなってしまうことです。
btreeでは各々のカラムに対して別々のインデックスを作るのが良い戦略です。
するとプランはこのような選択をします。
</p><pre class="programlisting">
=# CREATE INDEX btreeidx1 ON tbloom (i1);
CREATE INDEX
=# CREATE INDEX btreeidx2 ON tbloom (i2);
CREATE INDEX
=# CREATE INDEX btreeidx3 ON tbloom (i3);
CREATE INDEX
=# CREATE INDEX btreeidx4 ON tbloom (i4);
CREATE INDEX
=# CREATE INDEX btreeidx5 ON tbloom (i5);
CREATE INDEX
=# CREATE INDEX btreeidx6 ON tbloom (i6);
CREATE INDEX
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=9.29..13.30 rows=1 width=24) (actual time=0.032..0.033 rows=0.00 loops=1)
   Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
   Buffers: shared read=6
   -&gt;  BitmapAnd  (cost=9.29..9.29 rows=1 width=0) (actual time=0.047..0.047 rows=0.00 loops=1)
         Buffers: shared hit=6
         -&gt;  Bitmap Index Scan on btreeidx5  (cost=0.00..4.52 rows=11 width=0) (actual time=0.026..0.026 rows=7.00 loops=1)
               Index Cond: (i5 = 123451)
               Index Searches: 1
               Buffers: shared hit=3
         -&gt;  Bitmap Index Scan on btreeidx2  (cost=0.00..4.52 rows=11 width=0) (actual time=0.007..0.007 rows=8.00 loops=1)
               Index Cond: (i2 = 898732)
               Index Searches: 1
               Buffers: shared hit=3
 Planning Time: 0.264 ms
 Execution Time: 0.047 ms
(15 rows)
</pre><p>
<span class="original">
   Although this query runs much faster than with either of the single
   indexes, we pay a penalty in index size.  Each of the single-column
   btree indexes occupies 88.5 MB, so the total space needed is 531 MB,
   over three times the space used by the bloom index.
</span>
個別のインデックスのどれかを使うよりもこの問い合わせはずっと高速に実行できますが、インデックスのサイズにペナルティを払わなければなりません。
各々の単一カラムのbtreeインデックスは、88.5MBになります。ですから、全体で必要なスペースは531MBです。ブルームインデックスで使用するスペースの3倍以上です。
  </p></div><div class="sect2" id="BLOOM-OPERATOR-CLASS-INTERFACE"><div class="titlepage"><div><div><h3 class="title">F.6.3. 演算子クラスインタフェース <a href="#BLOOM-OPERATOR-CLASS-INTERFACE" class="id_link">#</a></h3></div></div></div><span class="original">
  &lt;title&gt;Operator Class Interface&lt;/title&gt;
</span><p>
<span class="original">
   An operator class for bloom indexes requires only a hash function for the
   indexed data type and an equality operator for searching. This example
   shows the operator class definition for the &lt;type&gt;text&lt;/type&gt; data type:
</span>
ブルームインデックスの演算子クラスには、インデックス対象のデータ型に対するハッシュ関数と、検索のための等価演算子だけが必要です。
この例では、<code class="type">text</code>データ型に対する演算子クラスの定義を示します。
  </p><pre class="programlisting">
CREATE OPERATOR CLASS text_ops
DEFAULT FOR TYPE text USING bloom AS
    OPERATOR    1   =(text, text),
    FUNCTION    1   hashtext(text);
</pre></div><div class="sect2" id="BLOOM-LIMITATIONS"><div class="titlepage"><div><div><h3 class="title">F.6.4. 制限事項 <a href="#BLOOM-LIMITATIONS" class="id_link">#</a></h3></div></div></div><span class="original">
  &lt;title&gt;Limitations&lt;/title&gt;
</span><p>
   </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
<span class="original">
      Only operator classes for &lt;type&gt;int4&lt;/type&gt; and &lt;type&gt;text&lt;/type&gt; are
      included with the module.
</span>
このモジュールには、<code class="type">int4</code>と<code class="type">text</code>に対する演算子クラスだけが含まれています。
     </p></li><li class="listitem"><p>
<span class="original">
      Only the &lt;literal&gt;=&lt;/literal&gt; operator is supported for search.  But
      it is possible to add support for arrays with union and intersection
      operations in the future.
</span>
<code class="literal">=</code>演算子だけが検索ではサポートされています。
しかし、配列の和、積演算のサポートを将来追加することは可能です。
     </p></li><li class="listitem"><p>
<span class="original">
       &lt;literal&gt;bloom&lt;/literal&gt; access method doesn't support
       &lt;literal&gt;UNIQUE&lt;/literal&gt; indexes.
</span>
<code class="literal">bloom</code>アクセスメソッドは<code class="literal">UNIQUE</code>インデックスをサポートしていません。
     </p></li><li class="listitem"><p>
<span class="original">
       &lt;literal&gt;bloom&lt;/literal&gt; access method doesn't support searching for
       &lt;literal&gt;NULL&lt;/literal&gt; values.
</span>
<code class="literal">bloom</code>アクセスメソッドは<code class="literal">NULL</code>値の検索をサポートしていません。
     </p></li></ul></div><p>
  </p></div><div class="sect2" id="BLOOM-AUTHORS"><div class="titlepage"><div><div><h3 class="title">F.6.5. 作者 <a href="#BLOOM-AUTHORS" class="id_link">#</a></h3></div></div></div><span class="original">
  &lt;title&gt;Authors&lt;/title&gt;
</span><p>
   Teodor Sigaev <code class="email">&lt;<a class="email" href="mailto:teodor@postgrespro.ru">teodor@postgrespro.ru</a>&gt;</code>,
   Postgres Professional, Moscow, Russia
  </p><p>
   Alexander Korotkov <code class="email">&lt;<a class="email" href="mailto:a.korotkov@postgrespro.ru">a.korotkov@postgrespro.ru</a>&gt;</code>,
   Postgres Professional, Moscow, Russia
  </p><p>
   Oleg Bartunov <code class="email">&lt;<a class="email" href="mailto:obartunov@postgrespro.ru">obartunov@postgrespro.ru</a>&gt;</code>,
   Postgres Professional, Moscow, Russia
  </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="basic-archive.html" title="F.5. basic_archive — WALアーカイブモジュールの例">前へ</a> </td><td width="20%" align="center"><a accesskey="u" href="contrib.html" title="付録F 追加で提供されるモジュールと拡張">上へ</a></td><td width="40%" align="right"> <a accesskey="n" href="btree-gin.html" title="F.7. btree_gin — GIN演算子クラスとB-tree動作">次へ</a></td></tr><tr><td width="40%" align="left" valign="top">F.5. basic_archive — WALアーカイブモジュールの例 </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.3文書">ホーム</a></td><td width="40%" align="right" valign="top"> F.7. btree_gin — GIN演算子クラスとB-tree動作</td></tr></table></div></body></html>