Usage¶
Example using csv_loader¶
from datetime import datetime
from pathlib import Path
import pandas as pd
from fast_csv_loader import csv_loader
file = Path("path/to/your/timeseries_data.csv")
# Load 160 lines of data from end of file
df = csv_loader(file_path=file)
print(df)
# Load last 500 lines of data upto 10th May 2024
df = csv_loader(
file_path=file,
period=500,
end_date=datetime(2024, 5, 10),
)
# If `Date` is not default datetime column, specify it using `date_column`
df = csv_loader(file_path=file, date_column="time")
# If pandas is unable to parse the date column, specify the format using `date_format`
df = csv_loader(file_path=file, date_format="%d/%m/%Y")
# Increase the `chunk_size` to optimize performance based of your specific needs.
df = csv_loader(file_path=file, chunk_size=1024 * 10)
Example using cached_csv_loader¶
Note
cached_csv_loader was introduced in version 2.2.0
from fast_csv_loader import (
cached_csv_loader,
invalidate, invalidate_all,
cache_stats, set_max_cache_size,
)
df = cached_csv_loader(Path("AAPL.csv"), period=200) # disk read
df = cached_csv_loader(Path("AAPL.csv"), period=200) # cache hit
# Auto-invalidates when file mtime changes (e.g. after EOD sync overwrites)
# For explicit invalidation:
invalidate("AAPL.csv")
invalidate_all()
# Observability:
cache_stats()
# {'hits': 49, 'misses': 1, 'evictions': 0, 'size': 1, 'hit_rate': 98.0, 'max_size': 500}
API¶
- fast_csv_loader.csv_loader(file_path: Path, period: int = 160, end_date: datetime | None = None, date_format: str | None = None, use_columns: List[str] | None = None, chunk_size: int = 6144) DataFrame¶
Load a CSV file with timeseries data in chunks from the end.
Could return an empty DataFrame, if no data was found. Use
df.emptyto check if the DataFrame is empty before further processing.- Parameters:
file_path (pathlib.Path) – The path to the CSV file to be loaded.
period (int) – Number of lines/candles to return. The default is 160.
end_date (Optional[datetime]) – Load N lines up to this date. If None, will load the last N lines from the file. If the date is provided, load the last N lines from this date.
date_format (Optional[str]) – Custom date format in case pandas is unable to parse the date column.
use_columns (Optional[List[str]]) – Default None. List of column names to load from the CSV file. If None, all columns are loaded.
chunk_size (int) – The size of data chunks loaded into memory. The default is 6144 bytes (6 KB).
- Returns:
A DataFrame containing the loaded timeseries data.
- Return type:
pd.DataFrame
- Raises:
IndexError – if
end_dateis provided but not within the boundary of the data.
- fast_csv_loader.cached_csv_loader(file_path: Path, period: int = 160, end_date: datetime | None = None, date_format: str | None = None, use_columns: List[str] | None = None, chunk_size: int = 6144) DataFrame¶
Added in version 2.2.0.
Mtime-aware cached wrapper around
csv_loader.Provides a drop-in replacement for
csv_loaderfor cases where the same CSV file may be read multiple times within the same process. Results are cached based on file path, modification time, and selected query parameters.The cache key is composed of (file_path, end_date, date_format, use_columns). The
periodparameter is NOT part of the cache key and is applied after cache retrieval, meaning differentperiodvalues reuse the same cached DataFrame and only affect the returned slice.If the underlying file has changed (based on mtime), the cache entry is invalidated and the file is reloaded.
- Parameters:
file_path (pathlib.Path) – The path to the CSV file to be loaded.
period (int) – Number of rows/candles to return from the end of the dataset. Default is 160.
end_date (Optional[datetime]) – Load data up to this timestamp. If None, the most recent data is used. If provided, loading is anchored to this date.
date_format (Optional[str]) – Custom datetime format string used for parsing the CSV date column if automatic parsing fails.
use_columns (Optional[List[str]]) – List of column names to load from the CSV file. If None, all columns are loaded.
chunk_size (int) – Size of chunks (in bytes) used when reading the CSV file. Default is 6144 bytes (6 KB).
- Returns:
A DataFrame containing the requested slice of timeseries data.
- Return type:
pd.DataFrame
- Raises:
FileNotFoundError – If
file_pathdoes not exist.
- fast_csv_loader.invalidate(file_path) int¶
Added in version 2.2.0.
Drop all cache entries for a given file (any
end_date/ columns combination).Useful after writing new data to disk when you want to ensure subsequent reads do not return stale cached results. Otherwise, cache entries are invalidated automatically based on file modification time.
- Parameters:
file_path (pathlib.Path | str) – Path of the file whose cache entries should be removed.
- Returns:
Number of cache entries removed for the given file.
- Return type:
int
- fast_csv_loader.invalidate_all() int¶
Added in version 2.2.0.
Drop all entries from the cache.
Useful for resetting cache state entirely, for example during testing or after bulk data updates.
- Returns:
Number of cache entries removed.
- Return type:
int
- fast_csv_loader.cache_stats() dict¶
Added in version 2.2.0.
Return cache observability metrics including hit/miss counts, current cache size, and hit rate.
- Returns:
Dictionary containing cache statistics:
hits: Number of cache hitsmisses: Number of cache missesevictions: Number of evicted entriessize: Current number of cached entrieshit_rate: Cache hit rate as a percentage (rounded to 1 decimal)max_size: Maximum allowed cache size
- Return type:
dict
- fast_csv_loader.set_max_cache_size(n: int) None¶
Added in version 2.2.0.
Set the maximum number of cached entries allowed.
If the cache exceeds this size, older entries will be evicted automatically.
- Parameters:
n (int) – New maximum cache size. Must be greater than 0.
- Raises:
ValueError – If
nis less than or equal to 0.