Concurrency in Python

Parallelism consists of performing multiple operations at the same time. Multiprocessing is a means to effect parallelism, and it entails spreading tasks over a computer’s central processing units (CPUs, or cores). Multiprocessing is well-suited for CPU-bound tasks: tightly bound for loops and mathematical computations usually fall into this category.

Concurrency is a slightly broader term than parallelism. It suggests that multiple tasks have the ability to run in an overlapping manner. (There’s a saying that concurrency does not imply parallelism.)

Threading is a concurrent execution model whereby multiple threads take turns executing tasks. One process can contain multiple threads. Python has a complicated relationship with threading thanks to its GIL, but that’s beyond the scope of this article.

What’s important to know about threading is that it’s better for IO-bound tasks. While a CPU-bound task is characterized by the computer’s cores continually working hard from start to finish, an IO-bound job is dominated by a lot of waiting on input/output to complete.

To recap the above, concurrency encompasses both multiprocessing (ideal for CPU-bound tasks) and threading (suited for IO-bound tasks). Multiprocessing is a form of parallelism, with parallelism being a specific type (subset) of concurrency. The Python standard library has offered longstanding support for both of these through its multiprocessing, threading, and concurrent.futures packages.

update python on ubuntu

When there are multiple version of python in the system, how to set the default python to use. Below we suppose to install newer version of python3.9

sudo apt install python3.9

sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.[old-version] 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 2

#after the following command, select the new version when being prompted and press enter
sudo update-alternatives --config python3

# There is a simpler way. From system admin perspective, this is not scalable.
sudo ln -sf /usr/bin/python3.9 /usr/bin/python3
pip show -f package-name

Find out python binry file path

>>> import streamlit
>>> print(streamlit.__file__)

Process Got Killed

Once in a while, I found my python process got killed without any errors. Most of time, it’s related to out of memory (OOM) issue. We can quickly check that using the following command

dmesg | grep "oom-kill" | less

Virtual Env

python3 -m pip install --upgrade pip
python3 -m pip install --user virtualenv
python3 -m venv .venv
source .venv/bin/activate

# exit
deactivate

Debug

# using pdb
import pdb; pdb.set_trace()


# using ipdb
if torch.distributed.get_rank() == 0:
    import ipdb; ipdb.set_trace()


# using ipdb. Program will enter ipython at the exception
from ipdb import launch_ipdb_on_exception
def filter_even(nums):
    for i in range(len(nums)):
        if nums[i] % 2 == 0:
            del nums[i]

with launch_ipdb_on_exception():
    print(filter_even(list(range(6))))
# we can check internal stack, `i`, `nums` etc and can also execute the next step. 
    
# We can also use the following way. The program will error out and enter ipdb.
import sys
from IPython.core import ultratb
sys.excepthook = ultratb.FormattedTB(call_pdb=1)

def filter_even(nums):
    for i in range(len(nums)):
        if nums[i] % 2 == 0:
            del nums[i]

filter_even(list(range(6)))

Exception

How to catch generic exception type

try:
    someFunction()
except Exception as ex:
    template = "An exception of type {0} occurred. Arguments:\n{1!r}"
    message = template.format(type(ex).__name__, ex.args)
    print(message)

The difference between the above and using just except without any argument is twofold: (1) A bare except doesn’t give you the exception object to inspect (2) The exceptions SystemExit, KeyboardInterrupt and GeneratorExit aren’t caught by the above code, which is generally what you want.

If you also want the same stacktrace you get if you do not catch the exception, you can get that like this (still inside the except clause):

import traceback
print(traceback.format_exc())
# or
traceback.print_exc()

If you use the logging module, you can print the exception to the log (along with a message) like this:

import logging
log = logging.getLogger()
log.exception("Message for you, sir!")

To dig deeper and examine the stack, look at variables etc., use the post_mortem function of the pdb module inside the except block:

import pdb
pdb.post_mortem()

new method

  • The new() is a static method of the object class.
  • When you create a new object by calling the class, Python calls the new() method to create the object first and then calls the init() method to initialize the object’s attributes.
  • Override the new() method if you want to tweak the object at creation time.

Hacky Way to Add File into PYTHONPATH

curr_file_path = os.path.dirname(os.path.abspath(os.path.expanduser(__file__)))
sys.path.append(curr_file_path)
sys.path.append(os.path.dirname(curr_file_path))

Subprocess

import subprocess
# Download
dl = subprocess.Popen(["git", "clone", str(repo_path), str(repo_dir)])

# It also accepts str as the input command
output_path = "my_out"
cmd = """python3 train.py --local-rank 0"""
proc = subprocess.Popen(
    cmd,
    stdout=subprocess.PIPE,
    stderr=open(f"{output_path}.stderr", "wt", encoding="utf-8"),
    shell=True,
    encoding='utf-8',
    bufsize=0)

Dataclass

Data classes use something called a default_factory to handle mutable default values. To use default_factory, we need to use the field() specifier.

from dataclasses import dataclass, field
from typing import List

RANKS = '2 3 4 5 6 7 8 9 10 J Q K A'.split()
SUITS = '♣ ♢ ♡ ♠'.split()

def make_french_deck():
    return [PlayingCard(r, s) for s in SUITS for r in RANKS]

@dataclass
class Deck:
    # we can't do the following way to assign a mutable default value to a field. 
    # cards: List[PlayingCard] = make_french_deck()

    cards: List[PlayingCard] = field(default_factory=make_french_deck)

A few commonly used parameters that field supports

  • default: Default value of the field
  • default_factory: Function that returns the initial value of the field
  • init: Use field in .init() method? (Default is True.)
  • repr: Use field in repr of the object? (Default is True.)

References

  1. https://realpython.com/python-data-classes/