Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Option to make Faker return unique values #305

Closed
beniwohli opened this issue May 24, 2016 · 12 comments
Closed

Option to make Faker return unique values #305

beniwohli opened this issue May 24, 2016 · 12 comments

Comments

@beniwohli
Copy link

I see random test failures because e.g. factory.Faker('company') returns duplicate values (usually after a few hundred calls, but as low as the second call). To remedy this, I wrote a subclass of Faker that keeps track of values returned so it can ensure uniqueness. The code is fairly trivial:

from factory.faker import Faker


class UniqueFaker(Faker):
    """
    A Faker that keeps track of returned values so it can ensure uniqueness.
    """
    def __init__(self, *args, **kwargs):
        super(UniqueFaker, self).__init__(*args, **kwargs)
        self._values = {None}

    def generate(self, extra_kwargs):
        value = None
        while value in self._values:
            value = super(UniqueFaker, self).generate(extra_kwargs)
        self._values.add(value)
        return value

Is there any interest in either adding this subclass to factoryboy, or integrating the functionality into Faker itself?

@whyscream
Copy link

Running into this very same issue too.

@jamescooke
Copy link

This looks interesting. However, faker is its own stand-alone package - therefore I think your issue would be better opened / addressed on Faker's Issues

See also:

@jeffwidman
Copy link
Member

jeffwidman commented Jun 3, 2016

Wrapping faker in a subclass feels like it'd be non-performant and although fine for an individual project, probably not the right way we should solve this at the library level.

I'm leery of us adding much code here, as there's a lot of permutations of this behavior, depending on which faker provider you're calling... some faker providers have a fairly small subset of unique values. For example, when I needed a ton of unique words, I had to hack the words provider by appending a random character to the end. Other providers never hit that problem, so if we're ignoring non-unique values, we might accidentally silently swallow an error somewhere.

Personally, I think this might be a case where the best solution is to call faker directly through something like:

title = factory.Sequence(lambda n: fake.text(random.randint(5, 58))[:-1] + str(n))

I'm going to go ahead and close this, @rbarrois if you feel differently feel free to reopen.

@jcrben
Copy link

jcrben commented Nov 4, 2017

fyi it appears that similar libraries such as PHP Faker or the Ruby version have options to guarantee uniqueness - see discussion at e.g. faker-ruby/faker#251

(came here because I'm new to the library and recently read https://kev.inburke.com/kevin/faker-js-problems/)

some faker providers have a fairly small subset of unique values. For example, when I needed a ton of unique words, I had to hack the words provider by appending a random character to the end

I suppose one of the trade-offs in this would be to give up using real dictionary words or throwing an error if you've used up the whole dictionary

@danihodovic
Copy link

danihodovic commented Jul 3, 2019

Has anyone managed to work around this yet? I've tried using a sequence, but that doesn't work:

name = factory.Sequence(lambda n: factory.Faker("company") + f" {n}")

>   name = factory.Sequence(lambda n: factory.Faker("company") + f" {n}")
E   TypeError: unsupported operand type(s) for +: 'Faker' and 'str'

@francoisfreitag
Copy link
Member

francoisfreitag commented Jul 3, 2019

factory.Faker("company") is a Faker object, not a string. Concatenating a str to a Faker object is not supported by Faker (nor should it be), which is why you’re getting the TypeError.

@gregbrowndev
Copy link

Has anyone managed to work around this yet? I've tried using a sequence, but that doesn't work:

name = factory.Sequence(lambda n: factory.Faker("company") + f" {n}")

>   name = factory.Sequence(lambda n: factory.Faker("company") + f" {n}")
E   TypeError: unsupported operand type(s) for +: 'Faker' and 'str'

@danihodovic you can use generate to imperatively create a value in your sequence:

name = factory.Sequence(lambda n: factory.Faker("company").generate() + f" {n}") 

@jonjonw
Copy link

jonjonw commented Dec 12, 2020

Is there a way to use Faker's unique API in factory boy?

ie

from faker import Faker
fake = Faker()
names = [fake.unique.first_name() for i in range(500)] # All unique

https://faker.readthedocs.io/en/master/#unique-values

@francoisfreitag
Copy link
Member

@arthurHamon2 is making good progress in #820.

@x-yuri
Copy link

x-yuri commented Feb 10, 2021

Oh, that's (the OP) almost what I did:

# app/tests/__init__.py
class UniqueFaker(factory.Faker):
    # based on factory.faker.Faker.generate
    def generate(self, params):
        locale = params.pop('locale')
        subfaker = self._get_faker(locale)
        return subfaker.unique.format(self.provider, **params)

class MyTestCase(TestCase):
    def tearDown(self):
        for l, v in factory.Faker._FAKER_REGISTRY.items():
            factory.Faker._get_faker(locale=l).unique.clear()

# app/tests/factories.py
class ProductFactory(factory.django.DjangoModelFactory):
    ...
    size = t.UniqueFaker('size')  # S, M, L, ...

The idea is to override the method that calls the faker's format method, copy the contents and add unique.

Wrapping faker in a subclass feels like it'd be non-performant and although fine for an individual project, probably not the right way we should solve this at the library level.

Non-performant? Meaning, slow? There seems to be nothing that suggests that. I'd say subclassing is the best workaround I could find. Not that it means factory-boy should follow that path.

For example, when I needed a ton of unique words, I had to hack the words provider by appending a random character to the end.

Personally, I think this might be a case where the best solution is to call faker directly through something like:

title = factory.Sequence(lambda n: fake.text(random.randint(5, 58))[:-1] + str(n))

Sounds like a different use case. I wonder if one'd want to reproduce test failures for such tests...

Other providers never hit that problem, so if we're ignoring non-unique values, we might accidentally silently swallow an error somewhere.

AFAICS, nobody's suggesting enforcing uniqueness globally. Usually that's needed for database fields with a unique constraint.

name = factory.Sequence(lambda n: factory.Faker("company").generate() + f" {n}")

Doesn't work since factory-boy==3.1.0, and in factory-boy==3.2.0 generate() was altogether removed (?).

@alashow
Copy link

alashow commented Dec 5, 2021

Doesn't work since factory-boy==3.1.0, and in factory-boy==3.2.0 generate() was altogether removed (?).

Apparently it was only renamed to evaluate

import factory


from faker import Faker
from faker.proxy import UniqueProxy


class UniqueFaker(factory.Faker):
    def evaluate(self, instance, step, extra):
        locale = extra.pop('locale')
        subfaker: Faker = self._get_faker(locale)
        unique_proxy: UniqueProxy = subfaker.unique
        return unique_proxy.format(self.provider, **extra)

@aalvrz-syndio
Copy link

Isn't it possible to enable a unique parameter to the factory.Faker() call? This way we might be able to apply Faker's uniqueness feature: https://faker.readthedocs.io/en/master/#unique-values

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests