Introduction
Within the earlier article, we checked out logging uncaught exceptions. Let’s make the most of the log output from that submit for one more widespread process: error log file processing. This instance goes to be fairly easy, because the error log for the submit is tiny, however in a manufacturing setting this might be tons of of 1000’s of strains, or gigabytes in measurement. Probably, that is lots to shove into reminiscence for processing. However that is the place a generator can are available in to assist.
Mills don’t retailer their outcomes, as an alternative they keep state and yield the end result again to the caller. This implies every line in a log could be processed and returned, with out loading your entire file.
Flashback
As a reminder from the earlier article, the log file getting used seems to be like this:
2025-07-14 22:30:44,061 __main__ INFO Utility begin
2025-07-14 22:30:44,061 __main__ CRITICAL uncaught exception, software will terminate.
Traceback (most current name final):
File "/house/andy/principal.py", line 31, in <module>
principal()
~~~~^^
File "/house/andy/principal.py", line 27, in principal
logger.information(divide(a,b))
~~~~~~^^^^^
File "/house/andy/principal.py", line 21, in divide
return a/b
~^~
ZeroDivisionError: division by zero
There may be one INFO entry and one CRITICAL entry.
yield vs return
It is necessary to grasp what makes a generator completely different from an everyday perform. The important thing distinction is the yield key phrase. When a perform incorporates yield, Python treats it as a generator perform, which behaves in a different way from capabilities that use return.
An everyday perform with return executes utterly, returns again a single end result after which terminates. A perform with yield creates a generator object that may pause execution, return a price, and later resume from precisely the place it left off. That is what permits the memory-efficient and lazy analysis that makes mills highly effective.
Gotcha
Log processing
A generator end result can solely be utilized one time. Whether or not you might be utilizing a generator to output the subsequent merchandise in a sequence or course of a file line by line, after getting handed an iterable or exhausted the generator, it would not get reused. This straightforward generator demonstrates the problem.
def read_log_lines(filename):
with open(filename, 'r') as f:
for line in f:
if 'CRITICAL' in line:
yield line.strip()
error_logs = read_log_lines('app.log')
error_count = len(record(error_logs))
print(f"Discovered {error_count} CRITICAL strains")
recent_errors = [log for log in error_logs if '2025' in log]
print(f"Current errors: {len(recent_errors)}")
At first look, it seems to be like it will learn the log file, rely the variety of errors after which output what number of of these had been in 2025 (or include the string 2025). Nonetheless, the precise output is completely different.
Discovered 1 CRITICAL strains
Current errors: 0
error_logs is a generator object. If a perform yields, it’s a generator. As error_count is initialized, it processes the error log and yields again any crucial strains. The record() perform will eat your entire generator (file). Just a few strains later, the developer needs to see what number of of those are current errors and makes an attempt to undergo the error_logs generator once more. Success! No current errors!
Proper?
No, and searching on the log rapidly reveals that.
Fibonacci
Let’s use a generator to construct the Fibonocci sequence. Spoiler for interviews! On this case, I will use a generator to get the primary 10 gadgets. Then print out the primary 5 after which attempt to print your entire record of 10 gadgets.
def fibonacci(n):
a, b = 0, 1
for _ in vary(n):
yield a
a, b = b, a + b
fib_numbers = fibonacci(10)
print("First 5 numbers:")
for i, num in enumerate(fib_numbers):
print(num)
if i >= 4:
break
print("Whole Listing:")
full_list = record(fib_numbers)
print(full_list)
The output for that is:
First 5 numbers:
0
1
1
2
3
Whole Listing:
[5, 8, 13, 21, 34]
Discover that the full_list variable solely incorporates the gadgets remaining on the generator. For the reason that first 5 (indexes 0 by way of 4) had been printed, they’re now not a part of the generator. When the complete record is printed, solely the remaining gadgets could be printed.
Answer
The answer to the issue is straightforward sufficient. Name the generator perform once more. For instance, with the log code from above:
error_logs = read_log_lines('app.log')
error_count = len(record(error_logs))
...
error_logs = read_log_lines('app.log') # Name once more and create a brand new generator
recent_errors = [log for log in error_logs if '2025' in log]
For the Fibonacci code, you’ll name fib_numbers = fibonacci(10) once more earlier than printing the complete record.
Clearly, there’s a down facet right here with duplicate processing of the identical knowledge resulting from working the generator twice. This might most likely be solved with some logic changes to the generator or the code calling the generator, however that’ll differ by software relying on what the generator is doing.
Conclusion
The necessary factor to remove from that is that after getting iterated over an merchandise in a generator, it is now not a part of a generator. Which means that if you wish to get intelligent and see if there are extra gadgets in a generator, or decide the subsequent merchandise, you have consumed the subsequent merchandise.
The facility of mills, particularly when processing giant quantities of information, cannot be understated. However, on the identical time, it is necessary to know that reusing an exhausted generator or trying to entry a earlier merchandise immediately from the generator isn’t going to work. As an alternative, to reuse generator logic, name the generator perform once more to create a brand new generator object, or convert to a listing if reminiscence permits and a number of iterations are wanted.
