Re: PGroongaの全文検索時にメモリ不足でPostgresが落ちる (groonga-dev,04696) - Groonga - fulltext search engine.

須藤です。

In <96703****@yahoo*****>
  "[groonga-dev,04692] Re: PGroongaの全文検索時にメモリ不足でPostgresが落ちる" on Mon, 15 Oct 2018 21:13:19 +0900,
  川上 <hakuh****@yahoo*****> wrote:

> ちなみに、こちらはメモリが足りない場合にPostgresをシャットダウンさせな
> いような
> 根本的な対策は難しそうなものなのでしょうか？

根本的な対策はメモリー不足にならないようにすることだと思いま
す。メモリーを増やすなり（すごく遅くなるけど）スワップをそれ
なりに用意するだったりです。

メモリー不足の時はいろんなことができなくなるので、メモリーが
ないときでもちゃんと動くのはかなりしんどいです。

それでもがんばるなら実際にどこでどうなってシャットダウンして
いるのかをログなりデバッガーなりで確認して個別に対応を検討す
る必要があります。（メモリー不足でもログやデバッガーがちゃん
と動いてくれれば。）

私は見ていないのでなんとも言えませんが、川上さんがログを確認
した結果ではなにもでていなかったということなので、かなりしん
どそうだなぁという気持ちにはなります。

> ただ、実際の環境ではまだ再現し、実行されているSQLを深く追ってみました。
> どうも、UNIONし、IS NULLでSELECTしたときにまだ発生するようです。
> 
> 再現する環境を作成してみました。

ありがとうございます。

UNIONを計算するために次のようなHashAggregateを使っていました。
この処理のために一時的に多くのメモリーが必要になっています。

----
 Aggregate  (cost=469.59..469.60 rows=1 width=8) (actual time=19202.524..19202.524 rows=1 loops=1)
   Output: count(*)
   ->  HashAggregate  (cost=244.56..344.57 rows=10001 width=56) (actual time=14653.627..18284.251 rows=10000000 loops=1)
         Output: users.id, users.name, users.kind, users.created, users.updated
         Group Key: users.id, users.name, users.kind, users.created, users.updated
         ->  Append  (cost=0.00..119.55 rows=10001 width=56) (actual time=1992.513..7662.290 rows=10000000 loops=1)
               ->  Index Scan using pgroonga_name_index on public.users  (cost=0.00..4.01 rows=10000 width=50) (actual time=1992.512..6952.978 rows=10000000 loops=1)
                     Output: users.id, users.name, users.kind, users.created, users.updated
                     Index Cond: (users.name &@~ 'ka'::text)
                     Filter: (users.kind IS NULL)
               ->  Bitmap Heap Scan on public.taikai_users  (cost=0.00..15.53 rows=1 width=56) (actual time=0.174..0.174 rows=0 loops=1)
                     Output: taikai_users.id, taikai_users.name, taikai_users.kind, taikai_users.created, taikai_users.updated
                     Recheck Cond: (taikai_users.name &@~ 'ka'::text)
                     Filter: (taikai_users.kind IS NULL)
                     ->  Bitmap Index Scan on pgroonga_taikai_name_index  (cost=0.00..0.00 rows=10 width=0) (actual time=0.172..0.172 rows=0 loops=1)
                           Index Cond: (taikai_users.name &@~ 'ka'::text)
----

&@~ 'ka'をLIKE "%ka%'に変えると次のようにディスクを使ってソー
トしてからユニークするようになるのでメモリー使用量は増えませ
ん。

----
 Aggregate  (cost=2221057.58..2221057.59 rows=1 width=8) (actual time=21160.870..21160.870 rows=1 loops=1)
   Output: count(*)
   ->  Unique  (cost=1946085.05..2096070.07 rows=9999001 width=56) (actual time=15833.577..20214.880 rows=10000000 loops=1)
         Output: users.id, users.name, users.kind, users.created, users.updated
         ->  Sort  (cost=1946085.05..1971082.55 rows=9999001 width=56) (actual time=15833.576..16974.075 rows=10000000 loops=1)
               Output: users.id, users.name, users.kind, users.created, users.updated
               Sort Key: users.id, users.name, users.kind, users.created, users.updated
               Sort Method: external sort  Disk: 567592kB
               ->  Append  (cost=0.00..100004.58 rows=9999001 width=56) (actual time=2240.653..8572.135 rows=10000000 loops=1)
                     ->  Index Scan using pgroonga_name_index on public.users  (cost=0.00..4.01 rows=9999000 width=50) (actual time=2240.652..7899.599 rows=10000000 loops=1)
                           Output: users.id, users.name, users.kind, users.created, users.updated
                           Index Cond: (users.name ~~ '%ka%'::text)
                           Filter: (users.kind IS NULL)
                     ->  Bitmap Heap Scan on public.taikai_users  (cost=0.00..10.56 rows=1 width=56) (actual time=0.236..0.236 rows=0 loops=1)
                           Output: taikai_users.id, taikai_users.name, taikai_users.kind, taikai_users.created, taikai_users.updated
                           Recheck Cond: (taikai_users.name ~~ '%ka%'::text)
                           Filter: (taikai_users.kind IS NULL)
                           ->  Bitmap Index Scan on pgroonga_taikai_name_index  (cost=0.00..0.00 rows=10 width=0) (actual time=0.233..0.233 rows=0 loops=1)
                                 Index Cond: (taikai_users.name ~~ '%ka%'::text)
----

この実行計画の違いは各演算子でどのくらい絞り込めそうかという
見積に従って決まるのですが、&@~での見積があまり精度がよくな
いのが原因です。（原因だと思います。）

次のようにLIKEと同じ見積処理を使うようにすると同じ実行計画に
なります。

ALTER OPERATOR &@~ (text, text) SET (RESTRICT = likesel, JOIN = likejoinsel);

が、これはこれでまだ雑なので、ちゃんとPGroongaの演算子用の見
積関数を実装した方が精度が上がりそうだなぁと思いました。ちょっ
とすぐに実装できる、みたいなやつではありませんが。。。


-- 
須藤 功平 <kou****@clear*****>
株式会社クリアコード <https://www.clear-code.com/>

Groongaベースの全文検索システムを総合サポート：
  http://groonga.org/ja/support/
データ処理ツールの開発：
  https://www.clear-code.com/blog/2018/7/11.html

Groonga - fulltext search engine.

[groonga-dev,04696] Re: PGroongaの全文検索時にメモリ不足でPostgresが落ちる