FLock OFF

A Dataset Quality Competition Network for Machine Learning

FLock OFF is a Bittensor subnet designed to incentivize the creation of high-quality datasets for machine learning. Miners generate and upload datasets to Hugging Face, while validators assess their quality by training LoRA (Low-Rank Adaptation) models on a standardized base model (Qwen/Qwen2.5-1.5B-Instruct) and rewarding miners based on performance.

Compute Requirements

For Validators

Validators perform LoRA training on miners' datasets, requiring significant GPU resources:

Recommended GPU: NVIDIA-RTX 4090 with 24GB VRAM
Minimum GPU: NVIDIA RTX 3060 with 12GB VRAM
Storage: ~50GB SSD
RAM: 16GB
CPU: 8-core Intel i7 or equivalent

For Miners

Miners focus on dataset creation and uploading, requiring minimal compute:

GPU: Not required
Storage: ~10GB
RAM: 8GB
Hugging Face Account: Required with an API token for dataset uploads

Installation

Overview

In order to run FLock OFF, you will need to install the FLock OFF package and set up a Hugging Face account. The following instructions apply to all major operating systems.

Clone the repository

git clone https://github.com/FLock-io/FLock-subnet.git
cd FLock-subnet

Install dependencies with uv

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Set up python environment and install dependencies:

# Install dependencies
uv sync

# Install the package in development mode
uv pip install -e .

Set up Hugging Face credentials

Create a Hugging Face account at huggingface.co
Generate an API token at huggingface.co/settings/tokens
Create a .env file in the project root:

HF_TOKEN=<YOUR_HUGGING_FACE_API_TOKEN>

Ensure the token has write access for miners (to upload datasets) and read access for validators

How to Run FLock OFF

Running a Miner

Prerequisites

Before mining, prepare the following:

Hugging Face Account and Repository:
- Create a dataset repository on Hugging Face (e.g., yourusername/my-dataset)
- Ensure your API token has write permissions
Dataset Creation:
- Generate a dataset in JSONL format (one JSON object per line)
- Each entry must follow this structure:

{
  "system": "Optional system prompt (can be null)",
  "conversations": [
    {"role": "user", "content": "User input text"},
    {"role": "assistant", "content": "Assistant response text"}
  ]
}

Example:

{"system": "You are a helpful assistant.", "conversations": [{"role": "user", "content": "What is AI?"}, {"role": "assistant", "content": "AI is artificial intelligence."}]}
{"system": null, "conversations": [{"role": "user", "content": "Tell me a joke."}, {"role": "assistant", "content": "Why don't skeletons fight? They don't have guts."}]}

Bittensor Wallet: Register your hotkey on the subnet

Steps to Run a Miner

Prepare Your Dataset: Use a script, scrape data, or manually curate data.jsonl. Aim for high-quality, diverse user-assistant pairs to maximize validator scores.

Run the Miner Script:

python3 neurons/miner.py \
  --wallet.name your_coldkey_name \
  --wallet.hotkey your_hotkey_name \
  --subtensor.network finney \
  --hf_repo_id yourusername/my-dataset \
  --netuid netuid \
  --dataset_path ./data/data.jsonl \
  --logging.trace

Replace placeholders:

your_coldkey_name: Your Bittensor wallet name
your_hotkey_name: Your miner's hotkey
finney: Network (use local for testing)
yourusername/my-dataset: Your Hugging Face repo
netuid: Subnet UID (adjust if different)
./data/data.jsonl: Path to your dataset

What Happens:

The script uploads data.jsonl to your Hugging Face repo
It retrieves a commit hash (e.g., abc123...) and constructs a ModelId (e.g., yourusername/my-dataset:ORIGINAL_COMPETITION_ID:abc123...)
It registers this metadata on the Bittensor chain (retrying every 120 seconds if the 20-minute commit cooldown applies)

Tips:

Ensure your dataset is unique—validators penalize duplicates
Monitor logs (--logging.trace) for upload or chain errors

Running a Validator

Prerequisites

Hardware: NVIDIA 4090 (24GB VRAM) recommended
Bittensor Wallet: Register your hotkey on the subnet
Hugging Face Token: Read access for downloading datasets

Steps to Run a Validator

Ensure GPU Setup:

Install CUDA (e.g., 12.1) and cuDNN compatible with PyTorch
Verify with nvidia-smi and torch.cuda.is_available()

Run the Validator Script:

python3 neurons/validator.py \
  --wallet.name your_coldkey_name \
  --netuid netuid \
  --logging.trace

Replace placeholders:

your_coldkey_name: Your Bittensor wallet name
netuid: Subnet UID

What Happens:

Syncs the metagraph to identify active miners
Selects up to 32 miners per epoch using an EvalQueue
For each miner:
- Retrieves metadata (e.g., ModelId) from the chain
- Downloads the dataset from Hugging Face (e.g., yourusername/my-dataset:abc123...)
- Downloads a fixed evaluation dataset (eval_data/data.jsonl)
- Trains a LoRA model on the miner's dataset using Qwen/Qwen2.5-1.5B-Instruct
- Evaluates loss on eval_data
- Computes win rates, adjusts weights, and submits them to the chain

Training Details:

Model: Qwen/Qwen2.5-1.5B-Instruct
LoRA Config: Rank=16, Alpha=32, Dropout=0.1, targeting all linear layers
Training Args: Batch size=2, gradient accumulation=4, 2 epochs, 4096-token context
Data Capacity: With 24GB VRAM, ~10,000-20,000 rows (assuming ~256 tokens/row) per dataset, though limited by epoch duration and miner sample size (32)

Tips:

Ensure ample storage for datasets and model checkpoints
Use --logging.trace to debug training or chain issues

What is a Dataset Competition Network?

FLock OFF is a decentralized subnet where miners compete to create high-quality datasets, and validators evaluate them using LoRA training. Rewards (in TAO) are distributed based on dataset performance, not raw compute power.

Role of a Miner

Task: Create and upload datasets that improve model performance (e.g., low evaluation loss).

Process:

Curate a dataset (e.g., conversational pairs in JSONL)
Upload to Hugging Face with version control
Register metadata on-chain (~0.01 TAO fee)

Goal: Outperform other miners in validator evaluations.

Role of a Validator

Task: Assess dataset quality and set miner rewards.

Process:

Fetch miner metadata from the chain
Download datasets from Hugging Face
Train LoRA on Qwen/Qwen2.5-1.5B-Instruct with each dataset
Evaluate loss on a standard test set
Compute win rates and update weights on-chain

Goal: Fairly reward miners based on dataset utility.

Features of FLock OFF

Hugging Face Integration

Storage: Miners use Hugging Face repos for datasets (e.g., username/repo:commit)
Versioning: Git-based commits ensure reproducibility
Accessibility: Validators download datasets via the Hugging Face API

LoRA Training Evaluation

Efficiency: LoRA adapts Qwen/Qwen2.5-1.5B-Instruct with minimal parameters
Fairness: Fixed training config ensures consistent evaluation
Capacity: Validators can process ~10,000-20,000 rows per dataset on a 4090, depending on token length and epoch timing
Metrics: Evaluation loss determines dataset quality, with duplicates penalized

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.circleci		.circleci
.github/workflows		.github/workflows
flockoff		flockoff
neurons		neurons
tests/FlockDataset		tests/FlockDataset
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FLock OFF

A Dataset Quality Competition Network for Machine Learning

Table of Contents

Compute Requirements

For Validators

For Miners

Installation

Overview

Clone the repository

Install dependencies with uv

Set up Hugging Face credentials

How to Run FLock OFF

Running a Miner

Prerequisites

Steps to Run a Miner

Running a Validator

Prerequisites

Steps to Run a Validator

What is a Dataset Competition Network?

Role of a Miner

Role of a Validator

Features of FLock OFF

Hugging Face Integration

LoRA Training Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

FLock-io/FLock-subnet

Folders and files

Latest commit

History

Repository files navigation

FLock OFF

A Dataset Quality Competition Network for Machine Learning

Table of Contents

Compute Requirements

For Validators

For Miners

Installation

Overview

Clone the repository

Install dependencies with uv

Set up Hugging Face credentials

How to Run FLock OFF

Running a Miner

Prerequisites

Steps to Run a Miner

Running a Validator

Prerequisites

Steps to Run a Validator

What is a Dataset Competition Network?

Role of a Miner

Role of a Validator

Features of FLock OFF

Hugging Face Integration

LoRA Training Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages