root ➜ /workspaces/generative_ai_with_langchain (second_edition) $ cd chapter9/ray
root ➜ /workspaces/generative_ai_with_langchain/chapter9/ray (second_edition) $ python build_index.py
2025-06-19 00:26:35,100 INFO worker.py:1841 -- Started a local Ray instance.
modules.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 349/349 [00:00<00:00, 2.21MB/s]
config_sentence_transformers.json: 100%|███████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 1.11MB/s]
README.md: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 10.4k/10.4k [00:00<00:00, 2.99MB/s]
sentence_bert_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 422kB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 571/571 [00:00<00:00, 1.51MB/s]
model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 438M/438M [00:20<00:00, 21.2MB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 363/363 [00:00<00:00, 3.87MB/s]
vocab.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 1.19MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 4.32MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 239/239 [00:00<00:00, 2.62MB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 190/190 [00:00<00:00, 2.55MB/s]
Loading documentation from https://docs.ray.io/en/master/...
🟢 Loaded 651 documents
(preprocess_documents pid=1709) Preprocessing batch of 50 documents
Waiting for preprocessing to complete...
(preprocess_documents pid=1715) 🟢 Generated 22154 chunks
Total chunks: 293040
Starting parallel embedding...
(embed_chunks pid=1715) Embedding batch of 73260 chunks...
(preprocess_documents pid=1718) Preprocessing batch of 50 documents [repeated 13x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(preprocess_documents pid=1718) 🟢 Generated 22576 chunks [repeated 13x across cluster]
(embed_chunks pid=1714) Embedding batch of 73260 chunks... [repeated 4x across cluster]
(raylet) [2025-06-19 00:33:35,069 E 1639 1639] (raylet) node_manager.cc:3219: 1 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 3fbcaf6ee2eac7d5950c5cb6daea4714e0980be53d9d49a51e5d444c, IP: 172.17.0.2) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 172.17.0.2`
(raylet)
(raylet) Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
(embed_chunks pid=1713) Embedding batch of 73260 chunks...
(raylet)
(raylet) [2025-06-19 00:34:35,072 E 1639 1639] (raylet) node_manager.cc:3219: 1 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 3fbcaf6ee2eac7d5950c5cb6daea4714e0980be53d9d49a51e5d444c, IP: 172.17.0.2) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 172.17.0.2`
(raylet) Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
(embed_chunks pid=1719) Embedding batch of 73260 chunks...
(raylet) [2025-06-19 00:35:35,077 E 1639 1639] (raylet) node_manager.cc:3219: 1 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 3fbcaf6ee2eac7d5950c5cb6daea4714e0980be53d9d49a51e5d444c, IP: 172.17.0.2) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 172.17.0.2`
(raylet)
(raylet) Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
(embed_chunks pid=1716) Embedding batch of 73260 chunks...
(raylet)
(raylet) [2025-06-19 00:36:35,080 E 1639 1639] (raylet) node_manager.cc:3219: 1 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 3fbcaf6ee2eac7d5950c5cb6daea4714e0980be53d9d49a51e5d444c, IP: 172.17.0.2) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 172.17.0.2`
(raylet) Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
(embed_chunks pid=1717) Embedding batch of 73260 chunks...
(raylet) [2025-06-19 00:37:35,449 E 1639 1639] (raylet) node_manager.cc:3219: 1 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 3fbcaf6ee2eac7d5950c5cb6daea4714e0980be53d9d49a51e5d444c, IP: 172.17.0.2) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 172.17.0.2`
(raylet)
(raylet) Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
(raylet)
(raylet) [2025-06-19 00:38:35,452 E 1639 1639] (raylet) node_manager.cc:3219: 1 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 3fbcaf6ee2eac7d5950c5cb6daea4714e0980be53d9d49a51e5d444c, IP: 172.17.0.2) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 172.17.0.2`
(raylet) Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
(embed_chunks pid=1710) Embedding batch of 73260 chunks...
(raylet) [2025-06-19 00:39:35,456 E 1639 1639] (raylet) node_manager.cc:3219: 1 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 3fbcaf6ee2eac7d5950c5cb6daea4714e0980be53d9d49a51e5d444c, IP: 172.17.0.2) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 172.17.0.2`
(raylet)
(raylet) Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
(embed_chunks pid=1709) Embedding batch of 73260 chunks...
(raylet)
(raylet) [2025-06-19 00:41:35,461 E 1639 1639] (raylet) node_manager.cc:3219: 1 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 3fbcaf6ee2eac7d5950c5cb6daea4714e0980be53d9d49a51e5d444c, IP: 172.17.0.2) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 172.17.0.2`
(raylet) Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
(embed_chunks pid=1712) Embedding batch of 73260 chunks...
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: 505ea7fab31c4ad40be23f71134674441be943f701000000 Worker ID: 2352259547fae1dccd550c8b0ecb2f7cfb05c3e2f0d3a9a8fd13d5cf Node ID: 3fbcaf6ee2eac7d5950c5cb6daea4714e0980be53d9d49a51e5d444c Worker IP address: 172.17.0.2 Worker port: 36261 Worker PID: 1711 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
(embed_chunks pid=7350) Embedding batch of 73260 chunks...
(raylet) [2025-06-19 00:42:35,773 E 1639 1639] (raylet) node_manager.cc:3219: 1 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 3fbcaf6ee2eac7d5950c5cb6daea4714e0980be53d9d49a51e5d444c, IP: 172.17.0.2) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 172.17.0.2`
(raylet)
(raylet) Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
(embed_chunks pid=7351) Embedding batch of 73260 chunks...
Last active
June 21, 2025 00:43
-
-
Save nov05/19aefe4ae69cbf986ce29bde83813c24 to your computer and use it in GitHub Desktop.
20250618_generative_ai_with_langchain_chapter9_ray
❌ Error log: Chapter 9, Ray, serve_index.py
root ➜ /workspaces/generative_ai_with_langchain/chapter9/ray (second_edition) $ python serve_index.py
2025-06-21 00:41:27,389 INFO worker.py:1841 -- Started a local Ray instance.
(ProxyActor pid=89507) INFO 2025-06-21 00:41:30,068 proxy 172.17.0.2 -- Proxy starting on node 0c835eca420ac10a7afc25d2949783a1cd811b0736039d889c71cf92 (HTTP port: 8000).
INFO 2025-06-21 00:41:30,144 serve 89265 -- Started Serve in namespace "serve".
(ProxyActor pid=89507) INFO 2025-06-21 00:41:30,123 proxy 172.17.0.2 -- Got updated endpoints: {}.
(ServeController pid=89499) INFO 2025-06-21 00:41:30,218 controller 89499 -- Deploying new version of Deployment(name='SearchDeployment', app='default') (initial target replicas: 1).
(ProxyActor pid=89507) INFO 2025-06-21 00:41:30,220 proxy 172.17.0.2 -- Got updated endpoints: {Deployment(name='SearchDeployment', app='default'): EndpointInfo(route='/', app_is_cross_language=False)}.
(ProxyActor pid=89507) INFO 2025-06-21 00:41:30,228 proxy 172.17.0.2 -- Started <ray.serve._private.router.SharedRouterLongPollClient object at 0x79926061b150>.
(ServeController pid=89499) INFO 2025-06-21 00:41:30,321 controller 89499 -- Adding 1 replica to Deployment(name='SearchDeployment', app='default').
(ServeReplica:default:SearchDeployment pid=89505) Loading pre-built index...
(ServeReplica:default:SearchDeployment pid=89505) free(): double free detected in tcache 2
(ServeReplica:default:SearchDeployment pid=89505) *** SIGABRT received at time=1750466497 on cpu 9 ***
(ServeReplica:default:SearchDeployment pid=89505) PC: @ 0x7c24b27dad61 (unknown) raise
(ServeReplica:default:SearchDeployment pid=89505) @ 0x7c24b2ad8140 2288 (unknown)
(ServeReplica:default:SearchDeployment pid=89505) @ 0x7c24b282375a (unknown) (unknown)
(ServeReplica:default:SearchDeployment pid=89505) @ ... and at least 1 more frames
(ServeReplica:default:SearchDeployment pid=89505) [2025-06-21 00:41:37,691 E 89505 90195] logging.cc:484: *** SIGABRT received at time=1750466497 on cpu 9 ***
(ServeReplica:default:SearchDeployment pid=89505) [2025-06-21 00:41:37,691 E 89505 90195] logging.cc:484: PC: @ 0x7c24b27dad61 (unknown) raise
(ServeReplica:default:SearchDeployment pid=89505) [2025-06-21 00:41:37,691 E 89505 90195] logging.cc:484: @ 0x7c24b2ad8140 2288 (unknown)
(ServeReplica:default:SearchDeployment pid=89505) [2025-06-21 00:41:37,691 E 89505 90195] logging.cc:484: @ 0x7c24b282375a (unknown) (unknown)
(ServeReplica:default:SearchDeployment pid=89505) [2025-06-21 00:41:37,691 E 89505 90195] logging.cc:484: @ ... and at least 1 more frames
(ServeReplica:default:SearchDeployment pid=89505) Fatal Python error: Aborted
(ServeReplica:default:SearchDeployment pid=89505)
(ServeReplica:default:SearchDeployment pid=89505) Stack (most recent call first):
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/site-packages/torch/cuda/__init__.py", line 372 in _lazy_init
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1341 in convert
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 942 in _apply
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915 in _apply
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915 in _apply
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915 in _apply
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915 in _apply
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1355 in to
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 348 in __init__
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/site-packages/langchain_huggingface/embeddings/huggingface.py", line 59 in __init__
(ServeReplica:default:SearchDeployment pid=89505) File "/workspaces/generative_ai_with_langchain/chapter9/ray/serve_index.py", line 29 in __init__
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/replica.py", line 1324 in _call_func_or_gen
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/replica.py", line 1363 in initialize_callable
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/replica.py", line 1164 in _run_user_code_event_loop
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/threading.py", line 975 in run
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
(ServeReplica:default:SearchDeployment pid=89505) File "/opt/conda/lib/python3.11/threading.py", line 995 in _bootstrap
(ServeReplica:default:SearchDeployment pid=89505)
(ServeReplica:default:SearchDeployment pid=89505) Extension modules: psutil._psutil_linux, psutil._psutil_posix, msgpack._cmsgpack, google._upb._message, setproctitle, yaml._yaml, _cffi_backend, uvloop.loop, ray._raylet, grpc._cython.cygrpc, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, numexpr.interpreter, bottleneck.move, bottleneck.nonreduce, bottleneck.nonreduce_axis, bottleneck.reduce, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, pyarrow._json, zstandard.backend_c, tornado.speedups, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, markupsafe._speedups, PIL._imaging, sklearn.__check_build._check_build, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, PIL._imagingft, pyarrow._parquet, pyarrow._fs, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, xxhash._xxhash, pyarrow._acero, pyarrow._csv, pyarrow._substrait, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet (total: 214)
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffdcf66e6bb63d5ac000715f1b01000000 Worker ID: 20b3c3acfe2f7198ce7c4d051066b355fd63c5ba8441bc78de9ec4c4 Node ID: 0c835eca420ac10a7afc25d2949783a1cd811b0736039d889c71cf92 Worker IP address: 172.17.0.2 Worker port: 38511 Worker PID: 89505 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
(ServeController pid=89499) ERROR 2025-06-21 00:41:44,345 controller 89499 -- Exception in Replica(id='cjuecmjq', deployment='SearchDeployment', app='default'), the replica will be stopped.
(ServeController pid=89499) Traceback (most recent call last):
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/deployment_state.py", line 694, in check_ready
(ServeController pid=89499) ) = ray.get(self._ready_obj_ref)
(ServeController pid=89499) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
(ServeController pid=89499) return fn(*args, **kwargs)
(ServeController pid=89499) ^^^^^^^^^^^^^^^^^^^
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
(ServeController pid=89499) return func(*args, **kwargs)
(ServeController pid=89499) ^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/_private/worker.py", line 2771, in get
(ServeController pid=89499) values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(ServeController pid=89499) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/_private/worker.py", line 921, in get_objects
(ServeController pid=89499) raise value
(ServeController pid=89499) ray.exceptions.ActorDiedError: The actor died unexpectedly before finishing this task.
(ServeController pid=89499) class_name: ServeReplica:default:SearchDeployment
(ServeController pid=89499) actor_id: dcf66e6bb63d5ac000715f1b01000000
(ServeController pid=89499) pid: 89505
(ServeController pid=89499) name: SERVE_REPLICA::default#SearchDeployment#cjuecmjq
(ServeController pid=89499) namespace: serve
(ServeController pid=89499) ip: 172.17.0.2
(ServeController pid=89499) The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
(ServeController pid=89499) INFO 2025-06-21 00:41:44,348 controller 89499 -- Replica(id='cjuecmjq', deployment='SearchDeployment', app='default') is stopped.
(ServeController pid=89499) INFO 2025-06-21 00:41:44,349 controller 89499 -- Adding 1 replica to Deployment(name='SearchDeployment', app='default').
(ServeReplica:default:SearchDeployment pid=89503) Loading pre-built index...
(ServeReplica:default:SearchDeployment pid=89503) free(): double free detected in tcache 2
(ServeReplica:default:SearchDeployment pid=89503) *** SIGABRT received at time=1750466511 on cpu 10 ***
(ServeReplica:default:SearchDeployment pid=89503) PC: @ 0x7397574ebd61 (unknown) raise
(ServeReplica:default:SearchDeployment pid=89503) @ 0x7397577e9140 2288 (unknown)
(ServeReplica:default:SearchDeployment pid=89503) @ 0x73975753475a (unknown) (unknown)
(ServeReplica:default:SearchDeployment pid=89503) @ ... and at least 1 more frames
(ServeReplica:default:SearchDeployment pid=89503) [2025-06-21 00:41:51,089 E 89503 90431] logging.cc:484: *** SIGABRT received at time=1750466511 on cpu 10 ***
(ServeReplica:default:SearchDeployment pid=89503) [2025-06-21 00:41:51,089 E 89503 90431] logging.cc:484: PC: @ 0x7397574ebd61 (unknown) raise
(ServeReplica:default:SearchDeployment pid=89503) [2025-06-21 00:41:51,089 E 89503 90431] logging.cc:484: @ 0x7397577e9140 2288 (unknown)
(ServeReplica:default:SearchDeployment pid=89503) [2025-06-21 00:41:51,089 E 89503 90431] logging.cc:484: @ 0x73975753475a (unknown) (unknown)
(ServeReplica:default:SearchDeployment pid=89503) [2025-06-21 00:41:51,089 E 89503 90431] logging.cc:484: @ ... and at least 1 more frames
(ServeReplica:default:SearchDeployment pid=89503) Fatal Python error: Aborted
(ServeReplica:default:SearchDeployment pid=89503)
(ServeReplica:default:SearchDeployment pid=89503) Stack (most recent call first):
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/site-packages/torch/cuda/__init__.py", line 372 in _lazy_init
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1341 in convert
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 942 in _apply
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915 in _apply
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915 in _apply
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915 in _apply
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915 in _apply
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1355 in to
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 348 in __init__
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/site-packages/langchain_huggingface/embeddings/huggingface.py", line 59 in __init__
(ServeReplica:default:SearchDeployment pid=89503) File "/workspaces/generative_ai_with_langchain/chapter9/ray/serve_index.py", line 29 in __init__
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/replica.py", line 1324 in _call_func_or_gen
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/replica.py", line 1363 in initialize_callable
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/replica.py", line 1164 in _run_user_code_event_loop
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/threading.py", line 975 in run
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
(ServeReplica:default:SearchDeployment pid=89503) File "/opt/conda/lib/python3.11/threading.py", line 995 in _bootstrap
(ServeReplica:default:SearchDeployment pid=89503)
(ServeReplica:default:SearchDeployment pid=89503) Extension modules: psutil._psutil_linux, psutil._psutil_posix, msgpack._cmsgpack, google._upb._message, setproctitle, yaml._yaml, _cffi_backend, uvloop.loop, ray._raylet, grpc._cython.cygrpc, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, numexpr.interpreter, bottleneck.move, bottleneck.nonreduce, bottleneck.nonreduce_axis, bottleneck.reduce, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, pyarrow._json, zstandard.backend_c, tornado.speedups, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, markupsafe._speedups, PIL._imaging, sklearn.__check_build._check_build, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, PIL._imagingft, pyarrow._parquet, pyarrow._fs, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, xxhash._xxhash, pyarrow._acero, pyarrow._csv, pyarrow._substrait, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet (total: 214)
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffb9eb67153430771b0fc86b2001000000 Worker ID: 8ed15156ea6b31fd72fb3e172fd31f18a091833af3f025dc024fcacc Node ID: 0c835eca420ac10a7afc25d2949783a1cd811b0736039d889c71cf92 Worker IP address: 172.17.0.2 Worker port: 33647 Worker PID: 89503 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
(ServeController pid=89499) ERROR 2025-06-21 00:41:57,753 controller 89499 -- Exception in Replica(id='43d74xer', deployment='SearchDeployment', app='default'), the replica will be stopped.
(ServeController pid=89499) Traceback (most recent call last):
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/deployment_state.py", line 694, in check_ready
(ServeController pid=89499) ) = ray.get(self._ready_obj_ref)
(ServeController pid=89499) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
(ServeController pid=89499) return fn(*args, **kwargs)
(ServeController pid=89499) ^^^^^^^^^^^^^^^^^^^
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
(ServeController pid=89499) return func(*args, **kwargs)
(ServeController pid=89499) ^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/_private/worker.py", line 2771, in get
(ServeController pid=89499) values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(ServeController pid=89499) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/_private/worker.py", line 921, in get_objects
(ServeController pid=89499) raise value
(ServeController pid=89499) ray.exceptions.ActorDiedError: The actor died unexpectedly before finishing this task.
(ServeController pid=89499) class_name: ServeReplica:default:SearchDeployment
(ServeController pid=89499) actor_id: b9eb67153430771b0fc86b2001000000
(ServeController pid=89499) pid: 89503
(ServeController pid=89499) name: SERVE_REPLICA::default#SearchDeployment#43d74xer
(ServeController pid=89499) namespace: serve
(ServeController pid=89499) ip: 172.17.0.2
(ServeController pid=89499) The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
(ServeController pid=89499) INFO 2025-06-21 00:41:57,755 controller 89499 -- Replica(id='43d74xer', deployment='SearchDeployment', app='default') is stopped.
(ServeController pid=89499) INFO 2025-06-21 00:41:57,755 controller 89499 -- Adding 1 replica to Deployment(name='SearchDeployment', app='default').
(ServeReplica:default:SearchDeployment pid=89508) Loading pre-built index...
(ServeReplica:default:SearchDeployment pid=89508) free(): double free detected in tcache 2
(ServeReplica:default:SearchDeployment pid=89508) *** SIGABRT received at time=1750466524 on cpu 7 ***
(ServeReplica:default:SearchDeployment pid=89508) PC: @ 0x726b97da6d61 (unknown) raise
(ServeReplica:default:SearchDeployment pid=89508) @ 0x726b980a4140 2288 (unknown)
(ServeReplica:default:SearchDeployment pid=89508) @ 0x726b97def75a (unknown) (unknown)
(ServeReplica:default:SearchDeployment pid=89508) @ ... and at least 1 more frames
(ServeReplica:default:SearchDeployment pid=89508) [2025-06-21 00:42:04,860 E 89508 90718] logging.cc:484: *** SIGABRT received at time=1750466524 on cpu 7 ***
(ServeReplica:default:SearchDeployment pid=89508) [2025-06-21 00:42:04,860 E 89508 90718] logging.cc:484: PC: @ 0x726b97da6d61 (unknown) raise
(ServeReplica:default:SearchDeployment pid=89508) [2025-06-21 00:42:04,860 E 89508 90718] logging.cc:484: @ 0x726b980a4140 2288 (unknown)
(ServeReplica:default:SearchDeployment pid=89508) [2025-06-21 00:42:04,860 E 89508 90718] logging.cc:484: @ 0x726b97def75a (unknown) (unknown)
(ServeReplica:default:SearchDeployment pid=89508) [2025-06-21 00:42:04,860 E 89508 90718] logging.cc:484: @ ... and at least 1 more frames
(ServeReplica:default:SearchDeployment pid=89508) Fatal Python error: Aborted
(ServeReplica:default:SearchDeployment pid=89508)
(ServeReplica:default:SearchDeployment pid=89508) Stack (most recent call first):
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/site-packages/torch/cuda/__init__.py", line 372 in _lazy_init
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1341 in convert
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 942 in _apply
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915 in _apply
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915 in _apply
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915 in _apply
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 915 in _apply
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1355 in to
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 348 in __init__
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/site-packages/langchain_huggingface/embeddings/huggingface.py", line 59 in __init__
(ServeReplica:default:SearchDeployment pid=89508) File "/workspaces/generative_ai_with_langchain/chapter9/ray/serve_index.py", line 29 in __init__
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/replica.py", line 1324 in _call_func_or_gen
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/replica.py", line 1363 in initialize_callable
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/replica.py", line 1164 in _run_user_code_event_loop
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/threading.py", line 975 in run
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
(ServeReplica:default:SearchDeployment pid=89508) File "/opt/conda/lib/python3.11/threading.py"
(ServeReplica:default:SearchDeployment pid=89508) , line 995 in
(ServeReplica:default:SearchDeployment pid=89508) _bootstrap
(ServeReplica:default:SearchDeployment pid=89508)
(ServeReplica:default:SearchDeployment pid=89508) Extension modules: psutil._psutil_linux, psutil._psutil_posix, msgpack._cmsgpack, google._upb._message, setproctitle, yaml._yaml, _cffi_backend, uvloop.loop, ray._raylet, grpc._cython.cygrpc, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, numexpr.interpreter, bottleneck.move, bottleneck.nonreduce, bottleneck.nonreduce_axis, bottleneck.reduce, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, pyarrow._json, zstandard.backend_c, tornado.speedups, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, markupsafe._speedups, PIL._imaging, sklearn.__check_build._check_build, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching
(ServeReplica:default:SearchDeployment pid=89508) , scipy.sparse.csgraph._reordering
(ServeReplica:default:SearchDeployment pid=89508) , scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, PIL._imagingft, pyarrow._parquet, pyarrow._fs, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, xxhash._xxhash, pyarrow._acero, pyarrow._csv, pyarrow._substrait, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet (total: 214)
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: fffffffffffffffffadd736ef6c43240e7e7143501000000 Worker ID: 98a712272c5ad615aff3c7d95adeb1b1f06badc015154a8a9b11cbe1 Node ID: 0c835eca420ac10a7afc25d2949783a1cd811b0736039d889c71cf92 Worker IP address: 172.17.0.2 Worker port: 34585 Worker PID: 89508 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
(ServeController pid=89499) ERROR 2025-06-21 00:42:12,758 controller 89499 -- Exception in Replica(id='iwkwx8to', deployment='SearchDeployment', app='default'), the replica will be stopped.
(ServeController pid=89499) Traceback (most recent call last):
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/deployment_state.py", line 694, in check_ready
(ServeController pid=89499) ) = ray.get(self._ready_obj_ref)
(ServeController pid=89499) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
(ServeController pid=89499) return fn(*args, **kwargs)
(ServeController pid=89499) ^^^^^^^^^^^^^^^^^^^
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
(ServeController pid=89499) return func(*args, **kwargs)
(ServeController pid=89499) ^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/_private/worker.py", line 2771, in get
(ServeController pid=89499) values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(ServeController pid=89499) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=89499) File "/opt/conda/lib/python3.11/site-packages/ray/_private/worker.py", line 921, in get_objects
(ServeController pid=89499) raise value
(ServeController pid=89499) ray.exceptions.ActorDiedError: The actor died unexpectedly before finishing this task.
(ServeController pid=89499) class_name: ServeReplica:default:SearchDeployment
(ServeController pid=89499) actor_id: fadd736ef6c43240e7e7143501000000
(ServeController pid=89499) pid: 89508
(ServeController pid=89499) name: SERVE_REPLICA::default#SearchDeployment#iwkwx8to
(ServeController pid=89499) namespace: serve
(ServeController pid=89499) ip: 172.17.0.2
(ServeController pid=89499) The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
(ServeController pid=89499) INFO 2025-06-21 00:42:12,761 controller 89499 -- Replica(id='iwkwx8to', deployment='SearchDeployment', app='default') is stopped.
(ServeController pid=89499) ERROR 2025-06-21 00:42:12,761 controller 89499 -- Failed to update the deployments ['SearchDeployment'].
⚠️ ERROR: Failed to start service: Deploying application default failed: Failed to update the deployments ['SearchDeployment'].
If this is related to the FAISS index, please rebuild it with:$ python build_index.py
Traceback (most recent call last):
File "/workspaces/generative_ai_with_langchain/chapter9/ray/serve_index.py", line 124, in <module>
serve.run(deployment)
File "/opt/conda/lib/python3.11/site-packages/ray/serve/api.py", line 533, in run
handle = _run(
^^^^^
File "/opt/conda/lib/python3.11/site-packages/ray/serve/api.py", line 484, in _run
handle = client.deploy_application(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/client.py", line 51, in check
return f(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/client.py", line 307, in deploy_application
self._wait_for_application_running(built_app.name)
File "/opt/conda/lib/python3.11/site-packages/ray/serve/_private/client.py", line 234, in _wait_for_application_running
raise RuntimeError(
RuntimeError: Deploying application default failed: Failed to update the deployments ['SearchDeployment'].
root ➜ /workspaces/generative_ai_with_langchain/chapter9/ray (second_edition) $
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ChatGPT output
Thanks for sharing your code — it's cleanly structured and well-commented, and you’ve already taken care to limit memory use (e.g., small embedding model, batching, GPU flag, checkpointing). The
(raylet) WARNING: 6 PYTHON worker processes...
you're seeing is expected in this context and not necessarily problematic. Here's why:🔍 Why You're Seeing the Warning
You're launching parallel tasks with
@ray.remote
for:preprocess_documents
(withnum_cpus=0.25
)embed_chunks
(withnum_gpus=1
)Ray is spawning multiple workers (at least 6) to handle your
ray.get()
calls across batches, and that's exactly what it's designed to do for parallelism. But if Ray detects some workers are blocked inray.get()
, it logs this warning.🧠 What It Actually Means
Ray is giving you a heads-up that:
ray.get()
).But since you’re explicitly batching and then calling
ray.get()
one batch at a time, this is a known and acceptable tradeoff for memory-constrained, batch-parallel embedding workflows.✅ What You're Doing Right
num_gpus=1
(prevents multiple workers from OOM).PREPROCESS_BATCH_SIZE
andEMBED_BATCH_SIZE
to chunk tasks..remote()
instead of nestingray.get()
within workers.gc.collect()
.🔧 Optional Improvements
If the warnings bother you or you want to reduce memory/workers even more:
Use
ray.wait()
with timeout instead ofray.get()
in loops:Use
max_concurrency
(if your model or code is thread-safe):Disable the warning explicitly if it's known and safe in your context:
Use
ray.util.ActorPool
instead of stateless remote functions if you want stateful batching with reuse, which can be memory-friendlier.✅ TL;DR
Your setup is working as intended — this Ray warning is just noise in your case unless you're seeing real issues like:
ray.get()
.You can safely ignore the warning or suppress it if desired. Let me know if you'd like help refactoring to
async
, actor pools, orray.wait()
for finer control.