[QST] #384

mhelal · 2024-03-18T23:50:59Z

Hi

Following the example in: https://github.com/rapidsai-community/notebooks-contrib/blob/main/community_tutorials_and_guides/census_education2income_demo.ipynb
I have a laptop with 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz, 64 GB RAM. In addition to the Intel TigerLake-LP GT2 [Iris Xe Graphics], there is an Nvidia GPU as follows:
3D controller TU117GLM [Quadro T500 Mobile]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list rom
configuration: driver=nvidia latency=0

when I create a cluster I get only one worker, and when I compute anything I see in the dashboard the CPU only working:

from dask_cuda import LocalCUDACluster
from dask.distributed import Client
import dask.dataframe as dd

cluster = LocalCUDACluster(memory_limit='30GB', device_memory_limit='1.5GB', local_directory='./cache', \
                           threads_per_worker=8, rmm_pool_size="1.5GB", rmm_async=True, rmm_log_directory='./log')
client = Client(cluster)

I am trying to read the backblaze 2022/2023 dataset. These are 730 csv files, on disk totaling to 62.6 GB using this code:

import dask.dataframe as dd 
import os

data_dir = './data/backblaze/'
# This will create a dask data frame with a partition for every one of the 730 files.
df = dd.read_csv(data_dir+'*.csv', dtype=dtypes)

with one CPU worker and 8 threads, I am having impossible bottlenecks to do any computation, such as count_values takes a few hours:

counts = df.model.value_counts(dropna=False).compute()

min and max of columns took a complete day to finish one column, and still is going for others:

for cur_col in col_list:
  if check_str in cur_col:
      cur_min = df[cur_col].min().compute()
      cur_max = df[cur_col].max().compute()
      #if not math.isnan(cur_min):
      if cur_min != cur_max : 
          print("  {:20s}   {:15d} {:15d} ".format(cur_col,int(cur_min),int(cur_max)))
          loc_col_list.append(cur_col)

I need advise on how to get the GPU cores working to speed up the processing. I also need an advise on purchasing the cheapest option for home GPU cluster using something around these options and in this price range:

External PCI-E chassis to connect to my laptop (although this one does not seem suitable to NVIDIA GPUs, please advise):
https://www.amazon.co.uk/gp/product/B0BJL7HKD8/ref=ox_sc_act_image_2?smid=A3P5ROKL5A1OLE&psc=1

and GPUs such as (or advise on best fastest value for money alternatives):
https://www.amazon.co.uk/gp/product/B0C8ZQTRD7/ref=ox_sc_act_image_1?smid=A20CAXEAQ9QIMK&psc=1

Thank you very much in advance,

Manal

The text was updated successfully, but these errors were encountered:

mhelal · 2024-03-24T19:43:42Z

I tried the same code on V100 GPU on Google Colab, and it is still not using the GPU, and extremely slow. Still running on laptop since last week now, and a few hours ago on Google Colab saying clearly that the GPU is not used, and I should switch to standard runtime. Can you please advise how I can use cuda data frame to read 62GB dataset and train RAPIDs algorithms on it,

Thank you,

raybellwaves · 2024-03-25T01:23:53Z

I was able to run the notebook on a T4 in colab with no issues: https://colab.research.google.com/drive/11hrLRui_Mi11mfe3V7LyrF2su-RHHGxD?usp=sharing

I would check your GPU software nvidia-smi and python package versions (pip freeze/conda list)

taureandyernv · 2024-03-25T03:42:47Z

okay, so first, your GPU is far too small - it has only 1.5GB of usable GPU memory (probably the 2GB variant of the T500). This notebook was meant to run on a 32GB or larger GPU. In fact, we recommend that you have a 16GB GPU to run our examples, however, we try to make accommodations for the 11GB x080s.

HOWEVER, I made this colab with a much smaller dataset a while back (just updated it with the new pip install): https://colab.research.google.com/drive/1DnzZk42PNc_Y-bItYJSvjLyWkx4-6jKE.

Other thoughts:
You can also convert the dataset to parquet and

use Dask-SQL to do some out of core processing,
use cudf.spill()
and make sure the resultant dataframe is small enough to fit in your T500. This will be very difficult though and likely to fail.

mhelal added the question Further information is requested label Mar 18, 2024

taureandyernv closed this as completed Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] #384

[QST] #384

mhelal commented Mar 18, 2024 •

edited

Loading

mhelal commented Mar 24, 2024

raybellwaves commented Mar 25, 2024

taureandyernv commented Mar 25, 2024

[QST] #384

[QST] #384

Comments

mhelal commented Mar 18, 2024 • edited Loading

mhelal commented Mar 24, 2024

raybellwaves commented Mar 25, 2024

taureandyernv commented Mar 25, 2024

mhelal commented Mar 18, 2024 •

edited

Loading