Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Direct inference on pandas dataframe #98

Open
rbhatia46 opened this issue May 9, 2022 · 2 comments
Open

Direct inference on pandas dataframe #98

rbhatia46 opened this issue May 9, 2022 · 2 comments

Comments

@rbhatia46
Copy link

Hi,
I see that to make a new inference everytime, I have to save a seperate CSV and then load it by providing path to dm.data.process_unlabeled

Is there a way to directly pass pandas dataframe to this function and perform inference without creating a new csv

@rbhatia46
Copy link
Author

@sidharthms could you please assist with this ?

@etiennekintzler
Copy link

etiennekintzler commented Jul 7, 2022

Hey @rbhatia46

It's possible to handle pandas.DataFrame by modifying MatchingDataset.__init__ and process_unlabeled (I've tried on a fork of the project). Should I make a PR @sidharthms or it's out of scope ?

To make it work without changing the source code you could also use a temporary file:

import os
import tempfile

import pandas as pd
import deepmatcher as dm

def run_prediction(df, model, **kwargs):
    fd, path = tempfile.mkstemp()
    try:
        with os.fdopen(fd, 'w') as tmp:
            tmp.write(df.to_csv(None, index=False))
        unlabeled = dm.data.process_unlabeled(path=path, trained_model=model)
        predictions = model.run_prediction(unlabeled, **kwargs)
    finally:
        os.remove(path)
        return predictions

Then

model = dm.MatchingModel()
model.load_state('path/to/model.pth')
df = pd.DataFrame({
    "id": [0], "left_name": ["surname"], "right_name": ["name surname"]
})

run_prediction(df, model, output_attributes=True)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants