-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Getting same words or names most of the time #122
Comments
One option is to add a The other option that is much more performant is to use a linear congruential generator to select from the dictionary. These generators are pseudorandom but can guarantee there are no repeated values within the dictionary size. This seems like a better option, if possible to implement. This could use a different interface since a RNG is not needed for getting items, only for selecting the initial seed. There is some discussion and linked issues for python's FactoryBoy FactoryBoy/factory_boy#305, I think their initial implementation uses a set. |
I'm also doing some DB testing that needs certain fields to be unique. Guess I'll have to generate & load the data in Python for now (since it has a library that can do performant & unique data gen), and benchmark querying in rust (for sub-ms precision). Maybe if there's maintainer interest in a particular design for this, I could look into a PR, but for now, the path of least resistance lies elsewhere. |
Figure it's worth an ask - @cksac do you have any ideas here? |
I think it would better to have a custom faker for the field required to be unique, like below use fake::{Dummy, Fake};
use once_cell::sync::Lazy;
use std::{collections::HashSet, sync::Mutex};
static ORDER_ID_CACHE: Lazy<Mutex<HashSet<usize>>> = Lazy::new(|| Mutex::new(HashSet::new()));
pub struct OrderIdFaker<U>(pub U);
impl<U> Dummy<OrderIdFaker<U>> for usize
where
usize: Dummy<U>,
{
fn dummy_with_rng<R: rand::prelude::Rng + ?Sized>(
config: &OrderIdFaker<U>,
rng: &mut R,
) -> Self {
let faker = &config.0;
let mut id = faker.fake_with_rng(rng);
let mut cache = ORDER_ID_CACHE.lock().unwrap();
while cache.contains(&id) {
id = faker.fake_with_rng(rng);
}
cache.insert(id);
id
}
}
#[derive(Debug, Dummy)]
pub struct Order {
#[dummy(faker = "OrderIdFaker(0..1000)")]
id: usize,
}
fn main() {
let orders = fake::vec![Order; 1..10];
println!("{:?}", orders);
}
pub struct Unique<T>(T);
impl<U, T> Dummy<Unique<T>> for U where U: Dummy<T> {
....
} |
Not sure if I'm on the same page as you regarding technical limitations, not sure this note will be helpful, but it is worth mentioning for the record if nothing else. If I have this schema (python code): schema_fun = lambda: {
"username": field("person.username"),
"pwd": "password",
"name": field("full_name"),
"email": field("person.email", unique=True),
"created": field("timestamp", fmt=TimestampFormat.POSIX),
"verified": field("timestamp", fmt=TimestampFormat.POSIX),
"modified": field("timestamp", fmt=TimestampFormat.POSIX),
} Suppose internally, the thing that generates (In practice, I actually had to drop the username field because I couldn't find the spot in the docs where mimesis provides unique usernames. Just unique emails.) All that to say locally unique outputs is, in fact, a useful start, and would solve problems. -- Another thought: Suppose we had both (1) UniqueFromArray, that "shuffled" an array, and pulled each item at most once, and (2) UniqueFromArrays, that picked from multiple arrays and combined them according to a lambda (but only returned each combo once). That'd probably be a good start. Locally-unique-only, doesn't support the normal APIs, takes some serious hacking to do, but at least it enables problems to be solved without devising a custom algorithm from scratch to spit out each unique field individually. Like, email address would require manually combining First Name, Last Name, and Free Email Domain (or Lorem Ipsum Word + Lorem Ipsum Word + TLD for additional options). But that's far more approachable than having to write the same set-membership-check every time, or having to devise a custom linear congruential sequence every time. |
hi @proegssilb, that is what I propose in previous suggestion. In below example, email is unique among generated user profile instances and not related to the username. And your proposed approach can be implemented in different faker if you like. use fake::faker::internet::en::*;
use fake::locales::EN;
use fake::{Dummy, Fake};
use once_cell::sync::Lazy;
use std::{collections::HashSet, sync::Mutex};
static EMAIL_CACHE: Lazy<Mutex<HashSet<String>>> = Lazy::new(|| Mutex::new(HashSet::new()));
pub struct UniqueEmailFaker;
impl Dummy<UniqueEmailFaker> for String {
fn dummy_with_rng<R: rand::prelude::Rng + ?Sized>(
config: &UniqueEmailFaker,
rng: &mut R,
) -> Self {
let mut email: String = FreeEmail().fake_with_rng(rng);
let mut cache = EMAIL_CACHE.lock().unwrap();
while cache.contains(&email) {
email = FreeEmail().fake_with_rng(rng);
}
cache.insert(email.clone());
email
}
}
#[derive(Debug, Dummy)]
pub struct UserProfile {
#[dummy(faker = "Username()")]
pub username: String,
#[dummy(faker = "UniqueEmailFaker")]
pub email: String,
}
fn main() {
let user_set_1 = fake::vec![UserProfile; 1..10];
println!("{:?}", user_set_1);
let user_set_2 = fake::vec![UserProfile; 1..10];
println!("{:?}", user_set_2);
// no duplicate emails among user_set_1 and user_set_2, unless EMAIL_CACHE is cleared
} |
I noticed that almost every time I get the same word or same name, how can I make this much more random?
I'm running some tests in my API and I'm connecting to a real database, and I have 52 documents and pretty much everytime that I run the tests it rejects since some fields should be unique and the words or names generated is already in the database
The text was updated successfully, but these errors were encountered: