You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
importmodin.pandasaspdimportosinputfilepath="top-domains-1m-in.csv"os.environ["RAY_memory_usage_threshold"] ='0.9'# Combine all conditionsdf=pd.read_csv(inputfilepath, encoding="ISO-8859-1")
Issue Description
my file is almost 2 G try to set os.environ["RAY_memory_usage_threshold"] =0.9 it says float not support
after some filter, tocsv dump give me memory error
Expected Behavior
it should max use 90% of my laptop
Error Logs
2024-08-07 13:07:41,906 INFO worker.py:1781 -- Started a local Ray instance.
UserWarning: `read_*` implementation has mismatches with pandas:
Data types of partitions are different! Please refer to the troubleshooting section of the Modin documentation to fix this issue.
UserWarning: <function Series.tolist> is not currently supported by PandasOnRay, defaulting to pandas implementation.
Please refer to https://modin.readthedocs.io/en/stable/supported_apis/defaulting_to_pandas.html for explanation.
(_remote_exec_multi_chain pid=20536)
(_remote_exec_multi_chain pid=20536) Traceback (most recent call last):
(_remote_exec_multi_chain pid=20536) File "d:\Download\audio-visual\a_ideas\.venv\Lib\site-packages\ray\_private\serialization.py", line 423, in deserialize_objects
(_remote_exec_multi_chain pid=20536) obj = self._deserialize_object(data, metadata, object_ref)
(_remote_exec_multi_chain pid=20536) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(_remote_exec_multi_chain pid=20536) File "d:\Download\audio-visual\a_ideas\.venv\Lib\site-packages\ray\_private\serialization.py", line 280, in _deserialize_object
(_remote_exec_multi_chain pid=20536) return self._deserialize_msgpack_data(data, metadata_fields)
(_remote_exec_multi_chain pid=20536) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(_remote_exec_multi_chain pid=20536) File "d:\Download\audio-visual\a_ideas\.venv\Lib\site-packages\ray\_private\serialization.py", line 235, in _deserialize_msgpack_data
(_remote_exec_multi_chain pid=20536) python_objects = self._deserialize_pickle5_data(pickle5_data)
(_remote_exec_multi_chain pid=20536) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(_remote_exec_multi_chain pid=20536) File "d:\Download\audio-visual\a_ideas\.venv\Lib\site-packages\ray\_private\serialization.py", line 225, in _deserialize_pickle5_data
(_remote_exec_multi_chain pid=20536) obj = pickle.loads(in_band)
(_remote_exec_multi_chain pid=20536) ^^^^^^^^^^^^^^^^^^^^^
(_remote_exec_multi_chain pid=20536) MemoryError
(_remote_exec_multi_chain pid=6340)
(_remote_exec_multi_chain pid=6340) obj = pickle.loads(in_band, buffers=buffers)
(_remote_exec_multi_chain pid=6340) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
---------------------------------------------------------------------------
RayTaskError(RaySystemError) Traceback (most recent call last)
Cell In[1], [line 41](vscode-notebook-cell:?execution_count=1&line=41)
[33](vscode-notebook-cell:?execution_count=1&line=33) # filtered_df = df[df['indexdate'] != 'unk']
[34](vscode-notebook-cell:?execution_count=1&line=34) # filtered_df = df[df['indexdate'].str.contains('month', case=False, na=False)]
[35](vscode-notebook-cell:?execution_count=1&line=35) # filtered_df = df[df['indexdate'].str.contains('1 year', case=False, na=False)]
(...)
[38](vscode-notebook-cell:?execution_count=1&line=38) # filtered_df = df[df['indexdate'].str.contains('2 years', case=False, na=False)]
[39](vscode-notebook-cell:?execution_count=1&line=39) # filtered_df = df[df['domain'].str.contains('ai', case=False, na=False)]
[40](vscode-notebook-cell:?execution_count=1&line=40) filtered_df = df[df['Intheirownwords'].str.contains(' ai ', case=False, na=False)]
---> [41](vscode-notebook-cell:?execution_count=1&line=41) filtered_df.to_csv('domain-ai-in-title.csv')
[43](vscode-notebook-cell:?execution_count=1&line=43) filtered_df = filtered_df[filtered_df['domain'].isin(rankdomains)]
[44](vscode-notebook-cell:?execution_count=1&line=44) filtered_df.to_csv('top-4m-domain-ai-in-title.csv')
File d:\Download\audio-visual\a_ideas\.venv\Lib\site-packages\modin\logging\logger_decorator.py:144, in enable_logging.<locals>.decorator.<locals>.run_and_log(*args, **kwargs)
[129](file:///D:/Download/audio-visual/a_ideas/.venv/Lib/site-packages/modin/logging/logger_decorator.py:129) """
[130](file:///D:/Download/audio-visual/a_ideas/.venv/Lib/site-packages/modin/logging/logger_decorator.py:130) Compute function with logging if Modin logging is enabled.
[131](file:///D:/Download/audio-visual/a_ideas/.venv/Lib/site-packages/modin/logging/logger_decorator.py:131)
(...)
[141](file:///D:/Download/audio-visual/a_ideas/.venv/Lib/site-packages/modin/logging/logger_decorator.py:141) Any
[142](file:///D:/Download/audio-visual/a_ideas/.venv/Lib/site-packages/modin/logging/logger_decorator.py:142) """
[143](file:///D:/Download/audio-visual/a_ideas/.venv/Lib/site-packages/modin/logging/logger_decorator.py:143) if LogMode.get() =="disable":
--> [144](file:///D:/Download/audio-visual/a_ideas/.venv/Lib/site-packages/modin/logging/logger_decorator.py:144) return obj(*args, **kwargs)
[146](file:///D:/Download/audio-visual/a_ideas/.venv/Lib/site-packages/modin/logging/logger_decorator.py:146) logger = get_logger()
[147](file:///D:/Download/audio-visual/a_ideas/.venv/Lib/site-packages/modin/logging/logger_decorator.py:147) logger.log(log_level, start_line)
...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\Download\audio-visual\a_ideas\.venv\Lib\site-packages\ray\_private\serialization.py", line 225, in _deserialize_pickle5_data
obj = pickle.loads(in_band)
^^^^^^^^^^^^^^^^^^^^^
MemoryError
Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?3eee492b-abf0-439d-872b-e3378420424f) or open in a [text editor](command:workbench.action.openLargeOutput?3eee492b-abf0-439d-872b-e3378420424f). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)...
Installed Versions
INSTALLED VERSIONS
commit : c8bbca8
python : 3.12.3.final.0
python-bits : 64
OS : Windows
OS-release : 11
Version : 10.0.22631
machine : AMD64
processor : AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Chinese (Simplified)_China.936
...
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
The text was updated successfully, but these errors were encountered:
Modin version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
Issue Description
my file is almost 2 G try to set os.environ["RAY_memory_usage_threshold"] =0.9 it says float not support
after some filter, tocsv dump give me memory error
Expected Behavior
it should max use 90% of my laptop
Error Logs
Installed Versions
INSTALLED VERSIONS
commit : c8bbca8
python : 3.12.3.final.0
python-bits : 64
OS : Windows
OS-release : 11
Version : 10.0.22631
machine : AMD64
processor : AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Chinese (Simplified)_China.936
Modin dependencies
modin : 0.31.0
ray : 2.34.0
dask : 2024.7.1
distributed : None
pandas dependencies
...
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
The text was updated successfully, but these errors were encountered: