-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
ENG-1551 ai xplain sdk caching onboarded models pipelines and agents #389
base: development
Are you sure you want to change the base?
ENG-1551 ai xplain sdk caching onboarded models pipelines and agents #389
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xainaz Thanks for the PR!
I have a main issue with the current iplementation. We have separate cache methods for models, pipelines and agents. This will make very hard for us to make any change in future. Having all into one class named AixplainCache object that has more modular desigh such that the utils methods and all repetitive methods for different asset types should be part of this class.
Can you please redesign your codebase if this make sense to you. Otherwise we can discuss to find a way to make it more modular .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xainaz Can you please add functional test for it? i approved the rest.
|
||
Args: | ||
asset_type (str): Type of the asset (e.g., "models", "pipelines", "agents"). | ||
cache_filename (str): Filename for storing cached data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this field optional. If not provided, use the asset id as the filename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asset id or asset type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asset ID. One file for each asset in the proper folder.
if cached_data is not None: | ||
return self.parse_assets(cached_data) | ||
|
||
api_key = config.TEAM_API_KEY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to move this fetching part to a dedicated function to make the structure more modular and testable.
type=str, | ||
) | ||
|
||
assets_details = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use a dataclass, typed dict or pydantic model if present in the context instead of using arbitrary dictionaries for cache.
return Enum(self.asset_type.capitalize(), {}), {} | ||
|
||
if "items" not in assets_data: | ||
return Enum(self.asset_type.capitalize(), {}), {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using dynamic enums such like this? What is the benefit behind it?
if time.time() - cache_data["timestamp"] < int(get_cache_expiry()): | ||
return cache_data["data"] | ||
else: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we know the cache is expired, shouldn't we cleanup the cache files and also lock files in case of need?
This class reduces code repetition and allows easier maintenance. | ||
""" | ||
|
||
def __init__(self, asset_type: str, cache_filename: Optional[str] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of expecting arbitrary strings here for asset_type
we should either expect the real type of the asset conforming interface or at least an enumeration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, instead of having individual AssetCache
instances for each asset type, a more generic and centralized design could be considered if applicable.
Returns: | ||
Tuple[Enum, Dict]: (Enum of asset IDs, Dictionary with asset details) | ||
""" | ||
cached_data = load_from_cache(self.cache_file, self.lock_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that load_from_cache
and save_to_cache
function should be an integral part of this class if we're considering this class to be main caching logic for assets
json.dump({"timestamp": time.time(), "data": data}, f) | ||
with FileLock(lock_file): | ||
with open(cache_file, "w") as f: | ||
json.dump({"timestamp": time.time(), "data": data}, f) | ||
except Exception as e: | ||
logging.error(f"Failed to save cache to {cache_file}: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we're hiding the exception and just log it? If we have no valid plan or purpose for preventing an exception to raise, we should let it raise to halt the program to proceed with an unwanted data flow.
if time.time() - cache_data["timestamp"] < int(get_cache_expiry()): | ||
return cache_data["data"] | ||
else: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we doing the house keeping? I mean if we here know that the cache expired, shouldn't we delete the waste?
@@ -91,9 +91,28 @@ def __init__( | |||
model_params (Dict, optional): parameters for the function. | |||
**additional_info: Any additional Model info to be saved | |||
""" | |||
ModelCache = AixplainCache("models", "models") | |||
ModelEnum, ModelDetails = ModelCache.load_assets() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each time we're instantiating a model class (or any other asset using the same technique) will load the entire cache again and again which is quite inefficient . The entire cache should be instantiated once and loaded elsewhere independently from asset instantiation. Cache could be lazily populated at first demand or should be prepopulated when the library loaded.
@@ -74,6 +75,17 @@ def __init__( | |||
status (AssetStatus, optional): Pipeline status. Defaults to AssetStatus.DRAFT. | |||
**additional_info: Any additional Pipeline info to be saved | |||
""" | |||
PipelineCache = AixplainCache("pipelines", "pipelines") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the same concerns above.
ModelCache = AixplainCache("models", "models") | ||
ModelEnum, ModelDetails = ModelCache.load_assets() | ||
|
||
if id in ModelDetails: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need the entire data to populate Model class instance, why we wouldn't directly get the related Model class instance directly from the cache?
For example below the sample code should be used in the ModelFactory
ModelCache = AixplainCache(Model)
ModelCache.load_assets() # this could run once internally (populate on demand)
model = ModelCache.get(id)
if not model:
model = Model(id=..., foo=bar)
@@ -91,9 +91,28 @@ def __init__( | |||
model_params (Dict, optional): parameters for the function. | |||
**additional_info: Any additional Model info to be saved | |||
""" | |||
ModelCache = AixplainCache("models", "models") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should separate the caching logic and the core logic. Here he have the core logic of model asset and cache should be managed externally. In our case, as we have factory classes for each asset, those factory classes should be responsible to populate instances whether from cache for not.
from aixplain.utils.cache_utils import save_to_cache, load_from_cache, CACHE_FOLDER | ||
|
||
|
||
class AixplainCache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't rather such name containing the company name which is so generic. Such name like AssetCache
would be more focused and pointing the purpose.
Also added file lock, global expiry time in seconds is configurable as an environmental variable.