TransTab: A flexible transferable tabular learning framework [arxiv]
Document is available at
Paper is available at
5 min blog to understand TransTab at!
[05/04/23] Check the version
! -
[01/04/23] Check the version
! -
[12/03/22] Check out our [blog] for a quick understanding of TransTab!
Support encode tabular inputs into embeddings directly. An example is provided here. Several bugs are fixed.
Table embedding.
Add support to direct process table with missing values.
Add regression support.
This repository provides the python package transtab
for flexible tabular prediction model. The basic usage of transtab
can be done in a couple of lines!
import transtab
# load dataset by specifying dataset name
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
= transtab.load_data('credit-g')
# build classifier
model = transtab.build_classifier(cat_cols, num_cols, bin_cols)
# start training
transtab.train(model, trainset, valset, **training_arguments)
# make predictions, df_x is a pd.DataFrame with shape (n, d)
# return the predictions ypred with shape (n, 1) if binary classification;
# (n, n_class) if multiclass classification.
ypred = transtab.predict(model, df_x)
It's easy, isn't it?
First, download the right pytorch
version following the guide on
Then try to install from pypi directly: [Feb 2025: pypi version is not maintained, please try to install from github instead]
pip install git+
Please refer to for more guidance on installation and troubleshooting.
A novel feature of transtab
is its ability to learn from multiple distinct tables. It is easy to trigger the training like
# load the pretrained transtab model
model = transtab.build_classifier(checkpoint='./ckpt')
# load a new tabular dataset
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
= transtab.load_data('credit-approval')
# update categorical/numerical/binary column map of the loaded model
# then we just trigger the training on the new data
transtab.train(model, trainset, valset, **training_arguments)
We can also conduct contrastive pretraining on multiple distinct tables like
# load from multiple tabular datasets
dataname_list = ['credit-g', 'credit-approval']
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
= transtab.load_data(dataname_list)
# build contrastive learner, set supervised=True for supervised VPCL
model, collate_fn = transtab.build_contrastive_learner(
cat_cols, num_cols, bin_cols, supervised=True)
# start contrastive pretraining training
transtab.train(model, trainset, valset, collate_fn=collate_fn, **training_arguments)
If you find this package useful, please consider citing the following paper:
title={TransTab: Learning Transferable Tabular Transformers Across Tables},
author={Wang, Zifeng and Sun, Jimeng},
booktitle={Advances in Neural Information Processing Systems},