What is windowing?

  • Window input data into fixed size windows
  • Process each of those windows separately
  • Example:
    • Sum of all website clicks every hour
    • Average of food order total cost every hour
    • Max latency from a bunch of metrics observed every hour

How does it work?

  • System essentially buffers up incoming data into windows until some amount of time has passed
  • After that result will be sent downstream

How does state work?

  • While the window is live all events are stored in memory
  • once window has been finalized and message is sent downstream, you clear that window’s data from memory

What time to use for windowing?

  • Processing time: you don’t care about actual time when event happened
    • website logs
    • system metrics
  • Event time
    • need a way to handle late data
    • in an unbounded stream you don’t know when you are done processing events from a particular event time window
    • Events can arrive late and out of order
    • Example:
      • summing all events from 10AM-11AM
      • You sum 500 at 11 AM, but then you get another event. What do you do?