存在を知らなかったchunkById()

以前にEloquentの出力のチャンキングを紹介しました。処理中のメモリーの使用量を減らすためのチャンキングですが、実行速度が遅いということでそれを改善する対策もこしらえました。３年前以上のことです。しかし、そのためのメソッドがすでにEloquentのメソッドに存在することに最近気づきました。

まず、以前のポストのリンクを再掲載（読んでね）。
チャンキングで使用メモリを抑える
 チャンキングは遅い

後者の「チャンキングは遅い」での対策は、対象のテーブルのプライマリIDを指定してクエリーを実行して、チャンキングの速度の改善を図りました。ここで再掲載します。

...
        $max = User::max('id');
        $unit = 1000;

        for ($i = 0; $i < intval(ceil($max/$unit)); $i++) {
            $from = $i * $unit + 1;
            $to = ($i + 1) * $unit;

            $rows = User::where('id', '>=', $from)
                ->where('id', '<=', $to)
                ->get();

            foreach ($rows as $row) {
                $values = [
                    $row->id,
                    $row->created_at,
                    $row->name,
                    $row->email
                ];
 
                fputcsv($fh, $values);
           }
       }
...

単にループしているだけで、毎回毎回これをコードするのはちょっと面倒ですね。

ところが、最近、新人のプログラマーが書いたコードをレビューしていたら、

...
User::chunkById(1000, function($rows) use($fh) {
    foreach ($rows as $row) {
        $values = [
            $row->id,
            $row->created_at,
            $row->name,
            $row->email
        ];
 
        fputcsv($fh, $values);
    }
});
...

chunkById()というメソッドを使いすっきりしたコードになっているではないですか。

そんなメソッドあったかなあと思いつつ、tinkerを使って実行されたsql文を見てみると、

>>> User::chunkbyId(1000, function($rows) {})
=> true

>>> sql()
=> [
     [
       "query" => "select * from `users` order by `id` asc limit 1000",
       "bindings" => [],
       "time" => 3.86,
     ],
     [
       "query" => "select * from `users` where `id` > ? order by `id` asc limit 1000",
       "bindings" => [
         1000,
       ],
       "time" => 2.03,
     ],
     [
       "query" => "select * from `users` where `id` > ? order by `id` asc limit 1000",
       "bindings" => [
         2000,
       ],
       "time" => 1.85,
     ],
...

私の解決策では上限と下限のidを指定してレコードを取得しているところ、chunkId()では、idで並べ替えして、開始のidを指定してそこからチャンクの数（ここでは、1000）を指定しています。なるほど。実行速度も測定したところ私の解決策と同様にchunk()に比べて確実に速い。

さて、この便利もののchunkById()ですが、ひとつ注意が必要です。

例えば、以下のように２つのテーブルをjoinしてchunkById()を適用するとき、

>>> Shop::join('product', 'shop.shop_id', '=', 'product.shop_id')
　　->chunkbyId(1000, function($row) { /* nothing */ });

Illuminate\Database\QueryException with message 'SQLSTATE[23000]: Integrity constraint violation: 1052 Column 'shop_id' in where clause is ambiguous (SQL: select `shop`.`shop_id`, `product`.`product_id` from `shop` inner join `product` on `shop`.`shop_id` = `product`.`shop_id` where `shop_id` > 1913 order by `shop_id` asc limit 1000)'

shop_idが曖昧というエラーになってしまいます。

使用されている、sql文を見ると、

>>> sql()
=> [
     [
       "query" => "select `shop`.`shop_id`, `product`.`product_id` from `shop` inner join `product` on `shop`.`shop_id` = `product`.`shop_id` order by `shop_id` asc limit 1000",
       "bindings" => [],
       "time" => 5.26,
     ],
   ]

order By shop_idのshop_idがshopのテーブルの項目か、productのテーブルの項目かわらかないのです。

これを解決するには、

>>> Shop::join('product', 'shop.shop_id', '=', 'product.shop_id')
    ->chunkbyId(1000, function($row) { /* nothing */ }, 'shop.shop_id', 'shop_id');

chunkById()の第３番目の引数に、明確に使用する項目名(shop.shop_id)を、第４番目にそれに取り替わる元の項目名(shop_id)を指定する必要があります。

実行するsql文は以下のようになり、エラーがなくなります。

>>> sql()
=> [
     [
       "query" => "select * from `shop` inner join `product` on `shop`.`shop_id` = `product`.`shop_id` order by `shop`.`shop_id` asc limit 1000",
       "bindings" => [],
       "time" => 27.81,
     ],
     [
       "query" => "select * from `shop` inner join `product` on `shop`.`shop_id` = `product`.`shop_id` where `shop`.`shop_id` > ? order by `shop`.`shop_id` asc limit 1000",
       "bindings" => [
         1913,
       ],
       "time" => 22.15,
     ],
...

メルマガ購読の申し込みはこちらから。

存在を知らなかったchunkById()

Bykhino

By khino

Related Post

開発のタスクを整理するためのツール

Pest4のE2Eテスト – セットアップ〜テストコードの実装例

第２のデータベースにアクセス