Convert ListDataset to Multivariate dataset with gluonts.dataset.multivariate_grouper #3057
ArianKhorasani
started this conversation in
General
Replies: 1 comment
-
@ArianKhorasani |
Beta Was this translation helpful? Give feedback.
0 replies
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
-
Dear @lostella et al - I'd require your help!
I have my own dataset that includes 7 vital signs data (or variables) and 4 static features (including Age, Gender, HospAdmTime, and patient_id). I want to feed this data into a multivariate time series model. First, I load my dataset into a ListDataset and then use MultivariateGrouper to transform the data into a Multivariate dataset. In my dataset, different time series represent various vital signs for each patient. The original data should be transformed into a ListDataset, where each entry corresponds to: 1) TARGET, which is all vital signs data for the patients flattened into a single array, and 2) START, which is the start time of the data, along with the FEAT_STATIC_CAT, representing the static features for the patient. Here is the code that I have:
df = pd.read_csv('merged_test.csv')
df['ICULOS'] = pd.to_datetime(df['ICULOS'], unit='h')
prediction_length = 1
context_length = 5
static_features = df[['patient_id', 'HospAdmTime', 'Age', 'Gender']].drop_duplicates().reset_index(drop=True)
vital_signs = ['DBP', 'SBP', 'Resp', 'Temp', 'HR', 'O2Sat', 'MAP']
static_features = ['Age', 'Gender', 'HospAdmTime', 'patient_id']
time_window = 24
data = []
for patient_id, group in df.groupby('patient_id'):
for i in range(0, len(group), time_window):
target = group[i:i+time_window][vital_signs].values.flatten()
start = pd.Timestamp(group['ICULOS'].values[i])
entry = {
FieldName.TARGET: target,
FieldName.START: start,
FieldName.FEAT_STATIC_CAT: df[df['patient_id'] == patient_id][['Age', 'Gender']].values[0]
}
data.append(entry)
dataset = ListDataset(data, freq='1H')
grouper = MultivariateGrouper(max_target_dim=len(vital_signs))
dataset_multivariate = grouper(dataset)
My problem is, after running this code successfully, I see the len of dataset_multivariate is 1 and I'm not able to split it into train_ds and test_ds. I do split it as below:
split_ratio = 0.8
split_index = int(split_ratio * len(dataset_multivariate))
train_ds = dataset_multivariate[:split_index]
test_ds = dataset_multivariate[split_index:]
But I do see that the output of print(train_ds) is [] and it is because the len(dataset_multivariate) is 1.
I would appreciate it if you could guide me on how to proceed. I am eager to feed my original data into a multivariate time series model to forecast the next hour of vital signs for all patients (there are 155 patients) and to capture the relationships between different variables Thanks you!
Beta Was this translation helpful? Give feedback.
All reactions