-
Notifications
You must be signed in to change notification settings - Fork 912
Summarization Parameters not working #453
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Labels
question
Further information is requested
Comments
Hi there 👋 Can you provide the code which you tried? That'll make debugging a lot easier :) |
Hey, thx for the awesome work and support!
"use strict";
import {pipeline, env} from './transformers.js';
//import {pipeline, env} from 'https://cdn.jsdelivr.net/npm/@xenova/transformers';
import Grammarify from '../grammarify.js';
//can't use cache in worker
//import { CustomCache } from "./cache.js";
// Define caching parameters
env.useBrowserCache = false;
//env.useCustomCache = true;
env.useCustomCache = false;
//env.customCache = new CustomCache('transformers-cache');
// Skip initial check for local models, since we are not loading any local models.
env.allowLocalModels = false;
// Due to a bug in onnxruntime-web, we must disable multithreading for now.
// See microsoft/onnxruntime#14445 for more information.
env.backends.onnx.wasm.numThreads = 1;
var startTime, endTime;
class PipelineSingleton {
//static task = 'text-classification';
//static model = 'Xenova/distilbert-base-uncased-finetuned-sst-2-english';
static task = 'summarization';
//unuseably slow, took 2min to do eiffel startup test
//static model = 'Xenova/distilbart-cnn-6-6';
//static model = 'Xenova/distilbart-cnn-12-6'; //can't load
//can't load config
//static model = 'Xenova/distilbart-cnn-6-6-optimised';
//static model = 'Xenova/distilbart-xsum-12-1';
//fastest so far by far
static model = 'Xenova/t5-small';
static instance = null;
static async getInstance(progress_callback = null) {
if (this.instance === null) {
this.instance = pipeline(this.task, this.model, { progress_callback });
}
return this.instance;
}
}
// Create generic summarize function, which will be reused for the different types of events.
const summarize = async (text, config={max_target_length:300, min_new_tokens:100, max_new_tokens:200} ) => {
// Get the pipeline instance. This will load and build the model when run for the first time.
let model = await PipelineSingleton.getInstance((data) => {
// You can track the progress of the pipeline creation here.
// e.g., you can send `data` back to the UI to indicate a progress bar
console.log('progress', data)
});
// Actually run the model on the input text
let result = await model(text, config );
return result;
};
|
Hello, I'm dead in the water on this. Is there something else I can submit so I can get some help please? |
Hi there 👋 Sorry for the delay. It looks like a typo: import { pipeline } from "@xenova/transformers";
// Create a summarization pipeline
const summarizer = await pipeline('summarization', 'Xenova/t5-small');
// Text to summarize
const text = "Data science is an interdisciplinary field[10] focused on extracting knowledge from typically large data sets and applying the knowledge and insights from that data to solve problems in a wide range of application domains.[11] The field encompasses preparing data for analysis, formulating data science problems, analyzing data, developing data-driven solutions, and presenting findings to inform high-level decisions in a broad range of application domains. As such, it incorporates skills from computer science, statistics, information science, mathematics, data visualization, information visualization, data sonification, data integration, graphic design, complex systems, communication and business.[12][13] Statistician Nathan Yau, drawing on Ben Fry, also links data science to human–computer interaction: users should be able to intuitively control and explore data.[14][15] In 2015, the American Statistical Association identified database management, statistics and machine learning, and distributed and parallel systems as the three emerging foundational professional communities.[16]";
// Generate summary
const output = await summarizer(text, { min_length: 32, max_length: 128 });
console.log(output);
// "data science is an interdisciplinary field focused on extracting knowledge from typically large data sets. it encompasses preparing data for analysis, formulating data science problems, analyzing data, developing data-driven solutions. it also combines skills from computer science, statistics, information science, mathematics, data visualization, information visualization, data sonification, data integration, graphic design, complex systems, communication and business." in contrast to using no parameters: const output = await summarizer(text);
console.log(output);
// [{ summary_text: 'data science is an interdisciplinary field focused on extracting knowledge from typically large data sets ' }] |
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Question
I've tried several of the supported summarization models with the code used in the browser extension example.
The only one I get any results from in a reasonable time is t5-small.
My problem with it is that despite any parameters I try to pass in the result is always same length.
I've traced through the code and it appears that the config params get passed in.
I've tried max_new_tokens, min_new_tokens, max_length, no joy.
I initially started specifying 2.5.3 and last tried just letting cdn handle it, looks like 2.10.x, no joy, same thing.
Could someone please provide me with an example of getting, in my case, the t5-small model running a summarization task that implements parameters as to output?
The text was updated successfully, but these errors were encountered: