Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Some entities imported via search:import are not indexed (missing records) #372

Open
quentint opened this issue Mar 14, 2023 · 2 comments
Open

Comments

@quentint
Copy link

  • Symfony version: v6.2.7
  • Algolia Search Bundle version: 6.0.0
  • Algolia Client Version: N/A
  • Language Version: PHP 8.1.14 (cli)

Description

When importing entities with search:import, the logs display correct index counts, but when browsing the index, some are missing.

Here is the command output:

> bin/console search:import
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 160 / 160 App\Entity\MediaTranslation entities into quentin_media index
Done!

I'd then expect my index to contain 14 * 500 + 160 = 7160 items, but only 5216 exist:

image

But clearing the index and importing again yields another record count (+/-5%).

Here's my configuration:

algolia_search:
    prefix: '%algolia_search_prefix%'
    indices:
        - name: media
          class: App\Entity\MediaTranslation
And here's the index settings file (created with `search:settings:backup`)
{
    "minWordSizefor1Typo": 4,
    "minWordSizefor2Typos": 8,
    "hitsPerPage": 20,
    "maxValuesPerFacet": 100,
    "version": 2,
    "searchableAttributes": [
        "unordered(media.id)",
        "unordered(title)",
        "unordered(tags)",
        "unordered(description)",
        "unordered(features)",
        "unordered(goals)",
        "unordered(more)"
    ],
    "numericAttributesToIndex": null,
    "attributesToRetrieve": null,
    "unretrievableAttributes": null,
    "optionalWords": null,
    "attributesForFaceting": [
        "locale",
        "media.type",
        "status",
        "filterOnly(tags)",
        "filterOnly(title)"
    ],
    "attributesToSnippet": null,
    "attributesToHighlight": null,
    "paginationLimitedTo": 1000,
    "attributeForDistinct": null,
    "exactOnSingleWordQuery": "attribute",
    "ranking": [
        "typo",
        "geo",
        "words",
        "filters",
        "proximity",
        "attribute",
        "exact",
        "custom"
    ],
    "customRanking": null,
    "separatorsToIndex": "",
    "removeWordsIfNoResults": "none",
    "queryType": "prefixLast",
    "highlightPreTag": "<em>",
    "highlightPostTag": "<\/em>",
    "snippetEllipsisText": "",
    "alternativesAsExact": [
        "ignorePlurals",
        "singleWordSynonym"
    ],
    "sortFacetValuesBy": "count",
    "renderingContent": {
        "facetOrdering": {
            "facets": {
                "order": [
                    "locale",
                    "media.type",
                    "status"
                ]
            },
            "values": {
                "locale": {
                    "sortRemainingBy": "alpha"
                },
                "media.type": {
                    "sortRemainingBy": "alpha"
                },
                "status": {
                    "sortRemainingBy": "alpha"
                }
            }
        }
    }
}

I tried changing the batchSize but the issue remained.
I used to have a index_if in there, but removed it and the issue remained.

When running the search:import command and regularly refreshing the index on the Algolia dashboard, the "No. records" evolves like so (that's only an example, values change if I re-run this on a clear index):

  • 500
  • 1000
  • 1,500
  • 2,000
  • 2,253
  • 2,525
  • 3,025
  • (...)

As you can see, thinks looks OK at first, but then get a bit crazy around the 2000/2500 mark.

Steps To Reproduce

Unfortunately this is hard to reproduce, because I can't pinpoint the origin of the issue (and the randomness makes it even stranger) 🙁

I tried looking at the Symfony logs to see if some error appeared there, but found nothing.

What could prevent records from appearing in my index?

@quentint
Copy link
Author

quentint commented Mar 14, 2023

Digging a bit more, I can confirm the issue come from this repo (and not algolia/algoliasearch-client-php), because I wrote this simple command that uses it directly and works as intended:

<?php
// src/Command/MediaIndexCommand.php

namespace App\Command;

use Algolia\AlgoliaSearch\SearchClient;
use App\Entity\MediaTranslation;
use App\Serializer\Normalizer\MediaTranslationNormalizer;
use Doctrine\ORM\EntityManagerInterface;
use Symfony\Component\Console\Attribute\AsCommand;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;

#[AsCommand(
    name: 'app:media:index',
    description: 'Index media translations',
)]
class MediaIndexCommand extends Command
{

    public function __construct(private readonly EntityManagerInterface $manager, private readonly MediaTranslationNormalizer $normalizer)
    {
        parent::__construct();
    }

    protected function execute(InputInterface $input, OutputInterface $output): int
    {
        $io = new SymfonyStyle($input, $output);

        $client = SearchClient::create('...', '...');
        $index = $client->initIndex('quentin_media');
        $index->clearObjects();

        $translations = $this->manager->getRepository(MediaTranslation::class)->findAll();
        $chunks = array_chunk($translations, 500);

        foreach ($chunks as $chunkIndex => $chunk) {
            $io->info("Chunk $chunkIndex");
            $objects = array_map(fn(MediaTranslation $translation) => [...$this->normalizer->normalize($translation, 'searchableArray'), 'objectID' => $translation->getId()], $chunk);
            $index->saveObjects($objects);
        }

        return Command::SUCCESS;
    }
}

image

I hope this helps.

@quentint
Copy link
Author

Still investigating... Looking at the logs generated with Algolia\AlgoliaSearch\Log\DebugLogger::enable(); I don't see anything special.

Also, I don't understand how/where the bundle does anything different from my own command (apart from supporting more cases) 🤔

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant