Python

Concurrency in Python

Parallelism consists of performing multiple operations at the same time. Multiprocessing is a means to effect parallelism, and it entails spreading tasks over a computer’s central processing units (CPUs, or cores). Multiprocessing is well-suited for CPU-bound tasks: tightly bound for loops and mathematical computations usually fall into this category.

Concurrency is a slightly broader term than parallelism. It suggests that multiple tasks have the ability to run in an overlapping manner. (There’s a saying that concurrency does not imply parallelism.)

Threading is a concurrent execution model whereby multiple threads take turns executing tasks. One process can contain multiple threads. Python has a complicated relationship with threading thanks to its GIL, but that’s beyond the scope of this article.

What’s important to know about threading is that it’s better for IO-bound tasks. While a CPU-bound task is characterized by the computer’s cores continually working hard from start to finish, an IO-bound job is dominated by a lot of waiting on input/output to complete.

To recap the above, concurrency encompasses both multiprocessing (ideal for CPU-bound tasks) and threading (suited for IO-bound tasks). Multiprocessing is a form of parallelism, with parallelism being a specific type (subset) of concurrency. The Python standard library has offered longstanding support for both of these through its multiprocessing, threading, and concurrent.futures packages.

update python on ubuntu

When there are multiple version of python in the system, how to set the default python to use. Below we suppose to install newer version of python3.9

sudo apt install python3.9

sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.[old-version] 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 2

#after the following command, select the new version when being prompted and press enter
sudo update-alternatives --config python3

# There is a simpler way. From system admin perspective, this is not scalable.
sudo ln -sf /usr/bin/python3.9 /usr/bin/python3

pip show -f package-name

Find out python binry file path

>>> import streamlit
>>> print(streamlit.__file__)

Process Got Killed

Once in a while, I found my python process got killed without any errors. Most of time, it’s related to out of memory (OOM) issue. We can quickly check that using the following command

dmesg | grep "oom-kill" | less

Virtual Env

python3 -m pip install --upgrade pip
python3 -m pip install --user virtualenv
python3 -m venv .venv
source .venv/bin/activate

# exit
deactivate

Debug

# using pdb
# use `n` to step over
# use `s` to step into
# use `ll` to check current lines
# see more here: https://realpython.com/python-debugging-pdb/
import pdb; pdb.set_trace()


# using ipdb
if torch.distributed.get_rank() == 0:
    import ipdb; ipdb.set_trace()


# using ipdb. Program will enter ipython at the exception
from ipdb import launch_ipdb_on_exception
def filter_even(nums):
    for i in range(len(nums)):
        if nums[i] % 2 == 0:
            del nums[i]

with launch_ipdb_on_exception():
    print(filter_even(list(range(6))))
# we can check internal stack, `i`, `nums` etc and can also execute the next step. 
    
# We can also use the following way. The program will error out and enter ipdb.
import sys
from IPython.core import ultratb
sys.excepthook = ultratb.FormattedTB(call_pdb=1)

def filter_even(nums):
    for i in range(len(nums)):
        if nums[i] % 2 == 0:
            del nums[i]

filter_even(list(range(6)))

Exception

How to catch generic exception type

try:
    someFunction()
except Exception as ex:
    template = "An exception of type {0} occurred. Arguments:\n{1!r}"
    message = template.format(type(ex).__name__, ex.args)
    print(message)

The difference between the above and using just except without any argument is twofold: (1) A bare except doesn’t give you the exception object to inspect (2) The exceptions SystemExit, KeyboardInterrupt and GeneratorExit aren’t caught by the above code, which is generally what you want.

If you also want the same stacktrace you get if you do not catch the exception, you can get that like this (still inside the except clause):

import traceback
print(traceback.format_exc())
# or
traceback.print_exc()

If you use the logging module, you can print the exception to the log (along with a message) like this:

import logging
log = logging.getLogger()
log.exception("Message for you, sir!")

To dig deeper and examine the stack, look at variables etc., use the post_mortem function of the pdb module inside the except block:

import pdb
pdb.post_mortem()

new method

The new() is a static method of the object class.
When you create a new object by calling the class, Python calls the new() method to create the object first and then calls the init() method to initialize the object’s attributes.
Override the new() method if you want to tweak the object at creation time.

Hacky Way to Add File into PYTHONPATH

curr_file_path = os.path.dirname(os.path.abspath(os.path.expanduser(__file__)))
sys.path.append(curr_file_path)
sys.path.append(os.path.dirname(curr_file_path))

Subprocess

import subprocess
# Download
dl = subprocess.Popen(["git", "clone", str(repo_path), str(repo_dir)])

# It also accepts str as the input command
output_path = "my_out"
cmd = """python3 train.py --local-rank 0"""
proc = subprocess.Popen(
    cmd,
    stdout=subprocess.PIPE,
    stderr=open(f"{output_path}.stderr", "wt", encoding="utf-8"),
    shell=True,
    encoding='utf-8',
    bufsize=0)

Dataclass

Data classes use something called a default_factory to handle mutable default values. To use default_factory, we need to use the field() specifier.

from dataclasses import dataclass, field
from typing import List

RANKS = '2 3 4 5 6 7 8 9 10 J Q K A'.split()
SUITS = '♣ ♢ ♡ ♠'.split()

def make_french_deck():
    return [PlayingCard(r, s) for s in SUITS for r in RANKS]

@dataclass
class Deck:
    # we can't do the following way to assign a mutable default value to a field. 
    # cards: List[PlayingCard] = make_french_deck()

    cards: List[PlayingCard] = field(default_factory=make_french_deck)

A few commonly used parameters that field supports

default: Default value of the field
default_factory: Function that returns the initial value of the field
init: Use field in .init() method? (Default is True.)
repr: Use field in repr of the object? (Default is True.)

Global Variable in Python

Python looks for variables in four different scopes:

The local, or function-level, scope, which exists inside functions
The enclosing, or non-local, scope, which appears in nested functions
The global scope, which exists at the module level
The built-in scope, which is a special scope for Python’s built-in names

The code snippet below shows how these scopes work.

# Global scope

def outer_func():
    # Non-local scope
    def inner_func():
        # Local scope
        print(some_variable)
    inner_func()

Notice that global is only at module level. There is no program level global variable for python. After we define a global variable in one module, we could use it in other module with its module name.

UV

Common Usages


# install
pip install uv

# Add the following line to zshrc or bashrc after pip install to get the binary in PATH
# export PATH=`python3 -m site --user-base`/bin:\$PATH

# initialize a project
uv init

# create virtual env
uv venv
uv venv .my_venv_path

# specify python version in venv
uv venv -p 3.11
uv venv -python 3.11

source .my_venv_path/bin/activate


# install packages using uv
uv pip install pandas 

# update package
uv pip install -U pandas

# install from local directory
uv pip install .
# install from local, support editable
uv pip install -e .


# uninstall 
uv pip uninstall pandas

# uv get package versions
uv pip compile my_packages.txt -o requirements.txt
uv pip compile - -o requirements.txt
uv pip sync requirements.txt


uv cache prune
uv cache clean

Python Development Mode

pip install -e path/to/SomeProject

Editable installs allow you to install your project without copying any files. Instead, the files in the development directory are added to Python’s import path. This approach is well suited for development and is also known as a “development installation”.

Concurrency in Python#

update python on ubuntu#

find a python package related files#

Find out python binry file path#

Process Got Killed#

Virtual Env#

Debug#

Exception#

new method#

Hacky Way to Add File into PYTHONPATH#

Subprocess#

Dataclass#

Global Variable in Python#

UV#

Common Usages#

Python Development Mode#

References#