Algorithm¶
This is a broad overview of the csv_loader
algorithm. To get more detailed understanding, please read the actual code.
To maximise efficiency, all operations are performed in binary.
Assume you wish to load the last 160 lines of the file.
Get the file size.
If the size if less than
chunk_size
(or 19 kb whichever is higher),If
end_date
is provided:Filter the data upto the
end_date
and return the last 160 lines
Load the entire file and return the last 160 lines.
Read the first line of file to get the column header.
Seek to the end of the file.
Read the last N bytes (Chunk) of the file.
On the first chunk, get a count of line breaks (
\n
) in the chunk to estimate lines per chunk.On every chunk,
Update the number of lines read by adding the lines per chunk.
Store the chunks in a list
If
end_date
is specified, we parse the first date string in the chunk to check if we’re past theend_date
If yes, we continue, until desired number of lines have been loaded.
Once we have the desired number of lines,
Append the column header and final chunk.
Reverse the list and join it into a io.BytesIO object.
Load it into a Pandas DataFrame and return the slice of data required.