Created
October 10, 2017 08:16
-
-
Save kevin3/f289e7f4d1bfd1cba233f33fc36eac3a to your computer and use it in GitHub Desktop.
Cool Python tips from https://www.quora.com/What-are-some-cool-Python-tricks/answer/Alex-Maison-4?srid=YG05
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The following tricks I find pretty useful in my daily Python work. I also added a few I stumbled upon lately. | |
1. Use collections | |
This really makes your code more elegant and less verbose, a few examples I absorbed this week: | |
Named tuples: | |
>>> Point = collections.namedtuple('Point', ['x', 'y']) | |
>>> p = Point(x=1.0, y=2.0) | |
>>> p | |
Point(x=1.0, y=2.0) | |
Now you can index by keyword, much nicer than offset into tuple by number (less readable) | |
>>> p.x | |
1.0 | |
>>> p.y | |
Elegantly used when looping through a csv: | |
with open('stock.csv') as f: | |
f_csv = csv.reader(f) | |
headings = next(f_csv) | |
Row = namedtuple('Row', headings) | |
for r in f_csv: | |
row = Row(*r) # note the star extraction | |
# ... process row ... | |
I like the unpacking star feature to throw away useless fields: | |
line = 'nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false' | |
>>> uname, *fields, homedir, sh = line.split(':') | |
>>> uname | |
'nobody' | |
>>> homedir | |
'/var/empty' | |
>>> sh | |
'/usr/bin/false' | |
Superconvenient: the defaultdict: | |
from collections import defaultdict | |
rows_by_date = defaultdict(list) | |
for row in rows: | |
rows_by_date[row['date']].append(row)", | |
Before I would init the list each time which leads to needless code: | |
if row['date'] not in rows_by_date: | |
rows_by_date[row['date']] = [] | |
You can use OrderedDict to leave the order of inserted keys: | |
>>> import collections | |
>>> d = collections.OrderedDict() | |
>>> d['a'] = 'A' | |
>>> d['b'] = 'B' | |
>>> d['c'] = 'C' | |
>>> d['d'] = 'D' | |
>>> d['e'] = 'E' | |
>>> for k, v in d.items(): | |
... print k, v | |
... | |
a A | |
b B | |
c C | |
d D | |
e E | |
Another nice one is Counter: | |
from collections import Counter | |
words = [ | |
'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes', | |
'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the', | |
'eyes', ""don't"", 'look', 'around', 'the', 'eyes', 'look', 'into', | |
'my', 'eyes', ""you're"", 'under' | |
] | |
word_counts = Counter(words) | |
top_three = word_counts.most_common(3) | |
print(top_three) | |
# Outputs [('eyes', 8), ('the', 5), ('look', 4)]", | |
Again, before I would write most_common manually. Not necessary, this is all done already somewhere in the stdlib :) | |
2. sorted() accepts a key arg which you can use to sort on something else | |
Here for example we sort on surname: | |
>>> sorted(names, key=lambda name: name.split()[-1].lower()) | |
['Ned Batchelder', 'David Beazley', 'Raymond Hettinger', 'Brian Jones'] | |
3. Create XMl from dict | |
Creating XML tags manually is usually a bad idea, I bookmarked this simple dict_to_xml helper: | |
from xml.etree.ElementTree import Element | |
def dict_to_xml(tag, d): | |
''' | |
Turn a simple dict of key/value pairs into XML | |
''' | |
elem = Element(tag) | |
for key, val in d.items(): | |
child = Element(key) | |
child.text = str(val) | |
elem.append(child) | |
return elem" | |
4. Oneliner to see if there are any python files in a particular directory | |
Sometimes ‘any’ is pretty useful: | |
import os | |
files = os.listdir('dirname') | |
if any(name.endswith('.py') for name in files): | |
5. Use set operations to match common items in lists | |
>>> a = [1, 2, 3, 'a'] | |
>>> b = ['a', 'b', 'c', 3, 4, 5] | |
>>> set(a).intersection(b) | |
{3, 'a'} | |
6. Use re.compile | |
If you are going to check a regular expression in a loop, don’t do this: | |
for i in longlist: | |
if re.match(r'^...', i) | |
yet define the regex once and use the pattern: | |
p = re.compile(r'^...') | |
for i in longlist: | |
if p.match(i) | |
7. Printing files with potential bad (Unicode) characters | |
The book suggested to print filenames of unknown origin, use this convention to avoid errors: | |
def bad_filename(filename): | |
return repr(filename)[1:-1] | |
try: | |
print(filename) | |
except UnicodeEncodeError: | |
print(bad_filename(filename)) | |
Handling unicode chars in files can be nasty because they can blow up your script. However the logic behind it is not that hard to grasp. A good snippet to bookmark is the encoding / decoding of Unicode: | |
>>> a | |
'pýtĥöñ is awesome\n' | |
>>> b = unicodedata.normalize('NFD', a) | |
>>> b.encode('ascii', 'ignore').decode('ascii') | |
'python is awesome\n' | |
O’Reilly has a course on Working with Unicode in Python. | |
8. Print is pretty cool (Python 3) | |
I am probably not the only one writing this kind of join operations: | |
>>> row = ["1", "bob", "developer", "python"] | |
>>> print(','.join(str(x) for x in row)) | |
1,bob,developer,python | |
Turns out you can just write it like this: | |
>>> print(*row, sep=',') | |
1,bob,developer,python | |
Note again the * unpacking. | |
9. Functions like sum() accept generators / use the right variable type | |
I wrote this at a conference to earn me a coffee mug ;) | |
sum = 0 | |
for i in range(1300): | |
if i % 3 == 0 or i % 5 == 0: | |
sum += i | |
print(sum) | |
Returns 394118, while handing it in I realized this could be written much shorter and efficiently: | |
>>> sum(i for i in range(1300) if i % 3 == 0 or i % 5 == 0) | |
394118 | |
A generator: | |
lines = (line.strip() for line in f) | |
is more memory efficient than: | |
lines = [line.strip() for line in f] # loads whole list into memory at once | |
And concatenating strings is inefficient: | |
s = "line1\n" | |
s += "line2\n" | |
s += "line3\n" | |
print(s) | |
Better build up a list and join when printing: | |
lines = [] | |
lines.append("line1") | |
lines.append("line2") | |
lines.append("line3") | |
print("\n".join(lines)) | |
Another one I liked from the cookbook: | |
portfolio = [ | |
{'name':'GOOG', 'shares': 50}, | |
{'name':'YHOO', 'shares': 75}, | |
{'name':'AOL', 'shares': 20}, | |
{'name':'SCOX', 'shares': 65} | |
] | |
min_shares = min(s['shares'] for s in portfolio) | |
One line to get the min of a numeric value in a nested data structure. | |
10. Enumerate lines in for loop | |
You can number lines (or whatever you are looping over) and start with 1 (2nd arg), this is a nice debugging technique | |
for lineno, line in enumerate(lines, 1): # start counting at 0 | |
fields = line.split() | |
try: | |
count = int(fields[1]) | |
... | |
except ValueError as e: | |
print('Line {}: Parse error: {}'.format(lineno, e)) | |
11. Pandas | |
Import pandas and numpy: | |
import pandas as pd | |
import numpy as np | |
12. Make random dataframe with three columns: | |
df = pd.DataFrame(np.random.rand(10,3), columns=list('ABC')) | |
Select: | |
# Boolean indexing (remember the parentheses) | |
df[(df.A < 0.5) & (df.B > 0.5)] | |
# Alternative, using query which depends on numexpr | |
df.query('A < 0.5 & B > 0.5') | |
Project: | |
# One columns | |
df.A | |
# Multiple columns | |
# there may be another shorter way, but I don't know it | |
df.loc[:,list('AB')] | |
Often used snippets | |
Dates | |
13. Difference (in days) between two dates: | |
from datetime import date | |
d1 = date(2013,1,1) | |
d2 = date(2013,9,13) | |
abs(d2-d1).days | |
directory-of-script snippet | |
os.path.dirname(os.path.realpath(__file__)) | |
# combine with | |
os.path.join(os.path.dirname(os.path.realpath(__file__)), 'foo','bar','baz.txt') | |
14. PostgreSQL-connect-query snippet | |
import psycopg2 | |
conn = psycopg2.connect("host='localhost' user='xxx' password='yyy' dbname='zzz'") | |
cur = conn.cursor() | |
cur.execute("""SELECT * from foo;""") | |
rows = cur.fetchall() | |
for row in rows: | |
print " ", row[0] | |
conn.close() | |
Input parsing functions | |
15. Expand input-file args: | |
# input_data: e.g. 'file.txt' or '*.txt' or 'foo/file.txt' 'bar/file.txt' | |
filenames = [glob.glob(pathexpr) for pathexpr in input_data] | |
filenames = [item for sublist in filenames for item in sublist] | |
15. Parse key-value pair strings like ‘x=42.0,y=1’: | |
kvp = lambda elem,t,i: t(elem.split('=')[i]) | |
parse_kvp_str = lambda args : dict([(kvp(elem,str,0), kvp(elem,float,1)) for elem in args.split(',')]) | |
parse_kvp_str('x=42.0,y=1') | |
Postgres database functions | |
16. Upper case in Python (just for example): | |
-- create extension plpythonu; | |
CREATE OR REPLACE FUNCTION python_upper | |
( | |
input text | |
) RETURNS text AS | |
$$ | |
return input.upper() | |
$$ LANGUAGE plpythonu STRICT; | |
17. Convert IP address from text to integer: | |
CREATE FUNCTION ip2int(input text) RETURNS integer | |
LANGUAGE plpythonu | |
AS $$ | |
if 'struct' in SD: | |
struct = SD['struct'] | |
else: | |
import struct | |
SD['struct'] = struct | |
if 'socket' in SD: | |
socket = SD['socket'] | |
else: | |
import socket | |
SD['socket'] = socket | |
return struct.unpack("!I", socket.inet_aton(input))[0] | |
$$; | |
Convert IP address from integer to text: | |
CREATE FUNCTION int2ip(input integer) RETURNS text | |
LANGUAGE plpythonu | |
AS $$ | |
if 'struct' in SD: | |
struct = SD['struct'] | |
else: | |
import struct | |
SD['struct'] = struct | |
if 'socket' in SD: | |
socket = SD['socket'] | |
else: | |
import socket | |
SD['socket'] = socket | |
return socket.inet_ntoa(struct.pack("!I", input)) | |
$$; | |
18. Commandline options | |
optparse-commandline-options snippet | |
from optparse import OptionParser | |
usage = "usage: %prog [options] arg " | |
parser = OptionParser(usage=usage) | |
parser.add_option("-x", "--some-option-x", dest="x", default=42.0, type="float", | |
help="a floating point option") | |
(options, args) = parser.parse_args() | |
print options.x | |
print args[0] | |
19. print-in-place (progress bar) snippet | |
import time | |
import sys | |
for progress in range(100): | |
time.sleep(0.1) | |
sys.stdout.write("Download progress: %d%% \r" % (progress) ) | |
sys.stdout.flush() | |
Packaging snippets | |
20. poor-mans-python-executable trick | |
Learned this trick from voidspace. The trick uses two files (__main__.py and hashbang.txt): | |
__main__.py: | |
print 'Hello world' | |
hashbang.txt (adding a newline after ‘python2.6’ is important): | |
#!/usr/bin/env python2.6 | |
Build an “executable”: | |
zip main.zip __main__.py | |
cat hashbang.txt main.zip > hello | |
rm main.zip | |
chmod u+x hello | |
Run “executable”: | |
$ ./hello | |
Hello world | |
21. import-class-from-file trick | |
Import class MyClass from a module file (adapted from stackoverflow): | |
import imp | |
mod = imp.load_source('name.of.module', 'path/to/module.py') | |
obj = mod.MyClass() | |
22. Occusional-usage snippets | |
Extract words from string | |
words = lambda text: ''.join(c if c.isalnum() else ' ' for c in text).split() | |
words('Johnny.Appleseed!is:a*good&farmer') | |
# ['Johnny', 'Appleseed', 'is', 'a', 'good', 'farmer'] | |
23. IP address to integer and back | |
import struct | |
import socket | |
def ip2int(addr): | |
return struct.unpack("!I", socket.inet_aton(addr))[0] | |
def int2ip(addr): | |
return socket.inet_ntoa(struct.pack("!I", addr)) | |
24. Fluent Python Interface | |
Copied from riaanvddool. | |
# Fluent Interface Definition | |
class sql: | |
class select: | |
def __init__(self, dbcolumn, context=None): | |
self.dbcolumn = dbcolumn | |
self.context = context | |
def select(self, dbcolumn): | |
return self.__class__(dbcolumn,self) | |
# Demo | |
q = sql.select('foo').select('bar') | |
print q.dbcolumn #bar | |
print q.context.dbcolumn #foo | |
Flatten a nested lists | |
def flatten(elems): | |
""" | |
[['a'], ['b','c',['d'],'e',['f','g']]] | |
""" | |
stack = [elems] | |
top = stack.pop() | |
while top: | |
head, tail = top[0], top[1:] | |
if tail: stack.append(tail) | |
if not isinstance(head, list): yield head | |
else: stack.append(head) | |
if stack: top = stack.pop() | |
else: break | |
snap rounding | |
EPSILON = 0.000001 | |
snap_ceil = lambda x: math.ceil(x) if abs(x - round(x)) > EPSILON else round(x) | |
snap_floor = lambda x: math.floor(x) if abs(x - round(x)) > EPSILON else round(x) | |
merge-two-dictionaries snippet | |
x = {'a': 42} | |
y = {'b': 127} | |
z = dict(x.items() + y.items()) | |
# z = {'a': 42, 'b': 127} | |
25. anonymous-object snippet | |
Adapted from stackoverflow: | |
class Anon(object): | |
def __new__(cls, **attrs): | |
result = object.__new__(cls) | |
result.__dict__ = attrs | |
return result | |
26. Alternative: | |
class Anon(object): | |
def __init__(self, **kwargs): | |
self.__dict__.update(kwargs) | |
def __repr__(self): | |
return self.__str__() | |
def __str__(self): | |
return ", ".join(["%s=%s" % (key,value) for key,value in self.__dict__.items()]) | |
27. generate-random-word snippet | |
Function that returns a random word (could also use random.choicewith this list of words): | |
import string, random | |
randword = lambda n: "".join([random.choice(string.letters) for i in range(n)]) | |
setdefault tricks | |
Increment (and initialize) value: | |
d = {} | |
d[2] = d.setdefault(2,39) + 1 | |
d[2] = d.setdefault(2,39) + 1 | |
d[2] = d.setdefault(2,39) + 1 | |
d[2] # value is 42 | |
29. Append value to (possibly uninitialized) list stored under a key in dictionary: | |
d = {} | |
d.setdefault(2, []).append(42) | |
d.setdefault(2, []).append(127) | |
d[2] # value is [42, 127] | |
Binary tricks | |
30. add-integers-using-XOR snippet | |
Swap two integer variables using the XOR swap algorithm: | |
x = 42 | |
y = 127 | |
x = x ^ y | |
y = y ^ x | |
x = x ^ y | |
x # value is 127 | |
y # value is 42 | |
I know that most of it has been mentioned already But I think you should find some new tricks as well. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment