What is Latency - meaning for developers

To understand performance of any of your developed application, you have to know one of the most impacting parameter - latency.

1. Definition and what does latency mean?

Wikipedias Definition of Latency is: Latency is the time interval between the stimulation and response

In computing a latency is typically the time between requesting some information and receiving the response of it. Some examples explain it better. Latency are the times between:

click of a mouse button and until your application responds to this users click
requesting data from a database and until the database responds with the data result
requesting data via network and until the service responds with the data result

2. The impact of latency on you application's performance

First you have to understand latency and their scale. Let us start with basics.

Assume you have a 2GHz CPU. This CPU has a clock speed of 2GHz, which means your CPU clock is 1 / 2GHz = 1 / (2 *1/s) = 0.5ns.
Means 0.5ns time from one CPU operation to the next CPU operation.
Theoretically your CPU could do Two-Giga-Operations per second (1/0.5ns). But the CPU needs to access data (we are ignoring the fact for simplicity, that cpu instructions usually take multiple cycles)

Assume your data is in your memory only. The latency of your CPU to the memory is 50 - 120ns. So magnitudes slower as your CPU operates.

But what if your CPU needs to access the data from your disk?
The typical latency for spinning hard disk drives is 4 - 10ms. Again magnitudes slower.

But what if your CPU needs to access data from the LAN?
The typical latency is something about 10ms.

But what if your CPU needs to access data from a webservice?
The typical latency is something 100-200ms.

You think, even if the scale is pretty high, but a value of 200 ms is still very fast?

Lets see first the values as a table and afterwards we go into some real life comparable examples.

3. Latency at human scale

Some overview over the important typical latencies

Computing scale
component	typical latency	latency (ns)
2GHz CPU		0.5
Level 1 cache access	Latency 3-4 cycles	1.5
Level 2 cache access	Latency 20 clocks	10.0
Level 3 cache access	Latency 60-100 clocks	40.0
Main memory access (DDR DIMM)	50-120 ns	100.0
NVMe SSD I/O	20-40 µs	25 000.0
SSD I/O	150 - 200 µs	175 000.0
Rotational laptop HDD disk I/O	4-10ms	5 000 000.0
Internet packet latency: 4000 km / 2500 miles	60 ms	60 000 000.0
Internet packet latency: 10000 km / 2900 miles	140 ms	140 000 000.0
Latency for a webservice	200 ms	200 000 000.0

So we have in the bad case about 200ms of latency for the webservice access. It sounds very fast, but let us compare this "fast" speed, with the speed compared to a human scale.

As a reference point we take the concious operation to "seek" information in our brain memory, which is comparable to "seek" in the (DDR) Main memory access.
So we make 100ns on computer scale to 100ms (0.1s) at human scale as the reference value. (see the coursive line in the table below)

You quickly see, that if we take the same speed scale from computing over the the human scale, that the times are getting awefully slow. E.g. a Webservice call would be comparable to accessing a information in a book in a different continent and the library itself hast to access the book via a archiving service to get it. In sum from 0.1s of brain memory access the access time to the information would explode to 55.56 hours of access time for the single information.

Now it sounds pretty owkward and extremly slow, isn't it?

Computing scale		Human scale compare with our brain memory as reference
component	latency (ns)	latency (s)	latency (min)	latency (hour)	Human scale compare
2GHz CPU	0.5	0.0005	0.00002	0.00	unconcious operations
Level 1 cache access	1.5	0.0015	0.00003	0.00	unconcious operations
Level 2 cache access	10.0	0.01	0.00017	0.00	unconcious operations
Level 3 cache access	40.0	0.04	0.00067	0.00	unconcious operations
Main memory access (DDR DIMM)	100.0	0.1	0.00167	0.00	access information in your brain memory
NVMe SSD I/O	25 000.0	25	0.41667	0.01	access information in a book on your desk
SSD I/O	175 000.0	175	2.91667	0.05	access information in a book in your office book shelf
Rotational laptop HDD disk I/O	5 000 000.0	5 000	83.33333	1.39	access information in a book in your towns book library
Internet packet latency: 4000 km / 2500 miles	60 000 000.0	60 000	1 000.00000	16.67	access information in a book in your country capital city book library
Internet packet latency: 10000 km / 2900 miles	140 000 000.0	140 000	2 333.33333	38.89	access information in a book in another continent book library
Latency for a webservice	200 000 000.0	200 000	3 333.33333	55.56	access information in a book in another continent book library via a service from the book archive

We have some unconcious operations in our body, so we mostly cannot feel them. Let us start with a reference which we all have already faced. Accessing information in our brain memory.

Memory access in our brain
If you need to access informations, which you remember (in your brain memory) you can access these information with a latency of let's say 100ms (0.1 seconds). This is comparable to the access on the computers main memory. The computer can access it much faster in about 100 nanoseconds. But this helps to create a reference value to calculate all other computing values to a human scale.
You cannot remember and you need to fetch the information from a book
- Book on your desk
  let say you have the book on your desk, you will need maybe 25 seconds latency. This could be compared from the scale to accessing information on a high speed NVMe SSD.
- Book is in your towns book library
  If you need to access the information in the towns book library, it will take about 1.39 hours latency. This is comparable from the scale to accessing the data on a rotational laptop hard disk drive (HDD).
- Book is in your central book library in your countries captical city
  If you need to access the information in your country capital city book library, it will take about 16.67 hours of latency. This is comparable to accessing data via Internet over a 4000km distance from the scale perspective.
- Book is

We could discuss even more facts. But these examples above are already explaining the scale very well.

4. A real life latency example at human scale

Let me explain the factor of latency in a real life example. Lets say you have to do some tasks for me. E.g. you have to fetch informations from the local book library. Now I give you the task please search for the information the birthday and birthtown of George Washington in the local book library. You go out, you research and you get back and present me the information. Now I tell you please go out and research the birthday and birthtown of John Adams in the local book library. You will already begin to think, why did he not tell me the first time, would have been a lot faster. Anyway you decide to go out and to collect the information. You get back and you get the next task, please fetch me the birthday and birthtown of Thomas Jefferson from the local book library.
Wouldn't you give me now some really bad and insulting names for this stupidy?
Of course, I could have asked you, please fetch me all birthdays and birthtowns of all presidents in the USA. This would have taken you longer in the book library, but you would have only to travel once the way. So compared to the first way it would have been slower, but compared overall the travelling latency would have been reduce to exactly one travelling instead of over 40 travellings.

Sadly many developers do the same with databases or even much slower resources like webservices. Instead of asking the full set of data in one request, they loop over the resource.

5. Conclusion

Thenever you access resouces, ensure you do it in a way that would take the best path to have a low latency. Try to remember the human scale if you are unsure.