Usually, when we are in the process of gathering data, we might run into slowness issues. Let’s say I have to call an API because it returns some data I am interested in. It is fairly easy to do that in Python synchronously, meaning one request at time and each one after the other is finished. It works, but what if I can launch a second request while the first one is still running and didn’t get me a response yet ? If you’re considering this, it means you’re interested in concurrency. I will show you an example of how to do it on Python, let’s get started !
Concurrency VS parallelism
First, I would like you to introduce you to the concurrency definition, compared to another concept it is often mistaken for : parallelism. I like the definition from this post:
Concurrency and parallelism are names for two different mechanisms for juggling tasks in programming. Concurrency involves allowing multiple jobs to take turns accessing the same shared resources, like disk, network, or a single CPU core. Parallelism is about allowing several tasks to run side by side on independently partitioned resources, like multiple CPU cores.
While parallelism is possible in Python using multi-threading and multi-processing, concurrency is all about alternating tasks in a smart way. For that, I will use python’s famous library asyncio. There is an excellent explanation of its basic concepts async/await here.
Asyncio
So, to summarize, I would like my API calls to run asynchronously. I don’t need my current API call to finish in order to issue the next one. Instead, once the current api call is issued, if I don’t get a response right away, I would like my code to launch the next API call while the first one is still running. Let’s how it is done.
This could be the synchronous version of the api call function. You can use your own api call instead, but for illustration purposes, I will just use http://httpbin.org/get or http://webcode.me/ as url. We probably want to perform multiple calls and save each response to a json file for instance, ie 10000 calls.
import requests def call_api(url): resp = requests.get(url) return resp def run(n=10000): url = 'http://webcode.me/' for i in range(n): response = call_api(url) # save to json or other processing with open(f'data/{i}_outputfile.json', 'w') as file: file.write(response.text) run()
The asynchronous version of the code is the following. I also show here how to limit simultaneous api calls using the semaphore concept from asyncio:
from aiohttp import ClientSession import asyncio async def async_call_api(url, session, sem, i): async with sem: resp = session.get(url) return await resp # or follow up with needed processing async def async_run(n=10000): url = 'http://httpbin.org/get' sem = asyncio.Semaphore(500) async with ClientSession() as session: tasks = [asyncio.ensure_future(async_call_api(url, session, sem, i)) for i in range(n)] for response in await asyncio.gather(*tasks): pass loop = asyncio.get_event_loop() future = asyncio.ensure_future(async_run()) loop.run_until_complete(future)
There is also an excellent post about asyncio here, that I highly recommend. Coming next, a post using asyncio with multiple api calls each one involving 2 steps: get + post that must be executed sequentially
Thanks for reading !