Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

unexpected data value raises an error for IOC #91

Closed
carolakaiser opened this issue Jun 27, 2023 · 3 comments · Fixed by #92
Closed

unexpected data value raises an error for IOC #91

carolakaiser opened this issue Jun 27, 2023 · 3 comments · Fixed by #92

Comments

@carolakaiser
Copy link
Contributor

Hi All,
I have been trying to get the metadata for the IOC stations and I am currently experiencing an error when trying to set the activity_threshold. As of today (June 27), there seems to be at least one data record in the IOC stations that has not a "date" in the required field but a string called 'NA'.

The code snippet in stations.py is currently doing this :
ioc_gdf = ioc_gdf.assign(
delay=pd.concat(
(
ioc_gdf.delay[ioc_gdf.delay.str.endswith("'")].str[:-1].astype(int),
ioc_gdf.delay[ioc_gdf.delay.str.endswith("h")].str[:-1].astype(int) * 60,
ioc_gdf.delay[ioc_gdf.delay.str.endswith("d")].str[:-1].astype(int) * 24 * 60,
)
)
)

I made the following workaround:
    def as_int(x):
    try:
        return int(x)
    except:
        pass
    return -1
    
# Normalize IOC
# Convert delay to minutes
ioc_gdf = ioc_gdf.assign(
    delay=pd.concat(
        (
            ioc_gdf.delay[ioc_gdf.delay.str.endswith("'")].str[:-1].apply(as_int),
            ioc_gdf.delay[ioc_gdf.delay.str.endswith("h")].str[:-1].apply(as_int) * 60,
            ioc_gdf.delay[ioc_gdf.delay.str.endswith("d")].str[:-1].apply(as_int) * 24 * 60,
        )
    )        
)

This is working but maybe you have a better coding syntax. I any case, I wanted to bring this to your attention.

Thanks!

@pmav99
Copy link
Member

pmav99 commented Jun 27, 2023

Hi Carola, thanks for bringing this up.

It seems that IOC started to publish data from some extra stations from the USA which have the string "NA'" in the delay column and this breaks our parser.

It shouldn't be difficult to address (I guess we should just drop these stations), but we will need to make a new release for this.

@pmav99
Copy link
Member

pmav99 commented Jun 27, 2023

Something like this seems to fix it:

diff --git a/searvey/stations.py b/searvey/stations.py
index faba4f1..4378e76 100644
--- a/searvey/stations.py
+++ b/searvey/stations.py
@@ -51,6 +51,8 @@ def _get_ioc_stations(
     ioc_gdf = ioc.get_ioc_stations(region=region)
 
     # Normalize IOC
+    # Drop delay `NA'` : https://github.com/oceanmodeling/searvey/issues/91
+    ioc_gdf = ioc_gdf[ioc_gdf.delay != "NA'"]
     # Convert delay to minutes
     ioc_gdf = ioc_gdf.assign(
         delay=pd.concat(

But I will have a look at it in the morning with clean head.

@carolakaiser
Copy link
Contributor Author

Thank you! Yes this looks much cleaner, I was not sure about it. Thanks for addressing this issue.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants