Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Data preprocessing #95

Open
amira-yahlali opened this issue Mar 2, 2023 · 7 comments
Open

Data preprocessing #95

amira-yahlali opened this issue Mar 2, 2023 · 7 comments

Comments

@amira-yahlali
Copy link

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

@algopy
Copy link

algopy commented Mar 2, 2023 via email

@amira-yahlali
Copy link
Author

Ok, what's your objective ?

On Thu, Mar 2, 2023, 15:58 amira-yahlali @.***> wrote:

I'm trying to clean my data and do some preprocessing but i don't have
much understanding of the Columns if the zero in them are normal or missing
values i'm using the dataset cic-collection on kaggle if any expert would
help i'd be much thankful


Reply to this email directly, view it on GitHub
#95,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AMQQRCQQ3UD2DHFC2J2VBX3W2BY6VANCNFSM6AAAAAAVNF5SK4
.
You are receiving this because you are subscribed to this thread.Message
ID: @.***>

I just need an understanding of what the columns represent and if the null value in each columns is a normal value or is it a missing value i'm trying to preprocess my data and like minimize it

@algopy
Copy link

algopy commented Mar 2, 2023 via email

@amira-yahlali
Copy link
Author

My data is the cic-ids-collection on kaggle using class label as target dropping label and the rest is features i'd love to send you my notebook directly to make it easier for you

@AnmolArora15
Copy link

Hi,
Is this issue still open?
I am looking forward to working on it.
Thanks,
Anmol Arora

@HeerakKashyap
Copy link

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

see brother, if u want to remove the columns having all the null values/missing values you can use : data.drop(colums=[' ',' ' ] , inplace=true) in order to remove those columns

if u want to check the columns with number of non null values you can use data.info() to have precise understanding for the data .

if order to check the outliers in the data you can use seaborn library and import pairplot fucntion i.e seaborn.pairplot in oder to have graph depicting the outliers .

Regards

@amira-yahlali
Copy link
Author

amira-yahlali commented Aug 13, 2024 via email

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

5 participants