Transform Strings
Formatting Strings
Python supports multiple ways to format strings. With formatting I mean building new strings out of different kind of values. The most commons are the old school %
-formatting, the .format()
method and the Python3 novelty f-strings
. In this manual, we will look into the latest way, given its minimal syntax and high flexibility.
Formatting strings with f-string
is as easy as defining a standard string literal, look at this example:
myName = 'Roberto'
f'Hi! My name is {myName}'
# My name is Roberto
Moreover, f-strings
allow to include full Python expressions like:
a = 10
b = 20
f'The mid value is {a+(b-a)*.5}'
# The mid value is 15.0
Don't forget to prefix the f-string with an 'f'
or 'F'
otherwise it would be considered just a regular string, look:
a = 10
b = 20
'The mid value is {a+(b-a)*.5}'
# The mid value is {a+(b-a)*.5}
As you may have already noticed, the expression into the 'f-string' should be surrounded by braces:
f'The mid value is a+(b-a)*.5'
# The mid value is a+(b-a)*.5
Otherwise, the expression is not evaluated.
f-strings
and .format()
share the same format specifier mini-language. Which can be synthesized in the following way:
f'{[expression]:[width][type]}'
This mini-language allows us to precisely instruct how to format the data into the string adding some extra information after a colon. For example:
# we import the euler constant from the math module
from math import e
# then we print the constant value with two digits after the period
print(f'euler: {e:.2f}') # euler: 2.71
This step is optional, so if we omit any extra instruction, Python will use a standard conversion intent.
# we import the euler constant from the math module
from math import e
# then we print the constant value
print(f'euler: {e}') # euler: 2.718281828459045
# note the different amount of digits after the period
[width]
provides instructions concerning padding, for example allowing the appending of extra characters to the right
message = 'hello'
f'{message:+<10}'
# hello+++++
or to the left
message = 'hello'
f'{message:>10}'
# hello
In the cases above, white spaces will be added until the length of 10 characters is reached.
[width]
also allows to center a string within a certain amount of characters:
print(f"{'a':^10}")
print(f"{'bcd':^10}")
print(f"{'efghi':^10}")
print(f"{'jklmnop':^10}")
print(f"{'qrstu':^10}")
print(f"{'vwx':^10}")
print(f"{'y':^10}")
# a
# bcd
# efghi
# jklmnop
# qrstu
# vwx
# y
You can easily define which character should be used by the interpreter:
print(f'{"hello":@<10}')
# hello@@@@@
print(f'{"hello":_>10}')
# _____hello
print(f'{"hello":-^10}')
# --hello---
If no instruction for [type]
is defined, Python will use the basic string representation for the value provided. For example, integers will be represented using 10 base notation, but it is conveniently possible to specify a different base adding [type]
. What follows is a list of the possible conversion options for integer values.
Type | Meaning | Output |
---|---|---|
'b' | Binary format | Outputs the number in base 2 |
'c' | Character | Converts the integer to the corresponding unicode character before printing |
'd' | Decimal Integer | Outputs the number in base 10 |
'o' | Octal format | Outputs the number in base 8 |
'x' | Hex format | Outputs the number in base 16, using lower- case letters for the digits above 9 |
'X' | Hex format | Outputs the number in base 16, using upper- case letters for the digits above 9 |
'n' | Number | This is the same as 'd' , except that it uses the current locale setting to insert the appropriate number separator characters |
Consider the following examples:
value = 242
f'{value:.^12b}'
# '..11110010..'
242
is converted to 11110010
. The binary representation is then centered in a string of length 12 using '.'
as placeholder
value = 65
f'H{value:c}H'
# 'HAH'
65 in the Unicode mapping points to the uppercase 'A' which is then joined to the string literal
value = 200
f'{value:>+6d}'
# ' +200'
The plus in front of the string’s width forces the interpreter to put a sign in front of the decimal integer even if positive. Note that the plus can be applied to any numerical data conversion. 200 is displayed in base 10, and some white spaces are put in front of the sign until the string reaches length 6.
value = 379
f'U+{value:0>4X}'
# 'U+017B'
converts 3792₁₀
to 17B₁₆
, using uppercase letters. Then it adds an extra 0
character in front of the hexadecimal representation to reach length 4. Finally, it is linked to the string literal 'U+'
.
Now, let’s look at a selection of options for floating-point numbers
Type | Meaning | Output |
---|---|---|
'f' | Fixed point | Displays the number as a fixed-point number. The default precision is 6. |
'n' | Number | It uses the current locale setting to insert the appropriate number separator characters. If the number is too large, it switches to scientific notation. |
'%' | Percentage | Multiplies the number by 100 and displays in fixed ('f' ) format, followed by a percent sign. |
Consider the following examples:
value = 12.2345
f'{value:.2f}' # 12.23
'f'
in combination with '.2'
will output a floating point representation which precision is limited to 2 digits after the dot
# US standard
value = 2345.67
f'{value:n}'
# 2,345.67
# Italian standard
import locale
locale.setlocale(locale.LC_ALL, 'it_IT')
f'{value:n}'
# 2345,67
This conversion method could be useful when typesetting languages different from English. Check the Python standard locale
module documentation.
value = .45
f'{value:.0%}' # 45%
The '%'
conversion type will convert the floating point to percentage. '.0'
will truncate the decimal information of the percentage (45%
instead of 45.000000%
).
Useful String Methods
Python provides a number of specific methods to transform text data. Remember that strings are immutable, so they are not manipulated “in place”. The following methods generate a brand new string that you have to assign to an identifier if you need to use their output afterwards.
Method | Input | Output |
---|---|---|
s.capitalize() | hello | Hello |
s.lower() | HELLO | hello |
s.swapcase() | Hello | hELLO |
s.title() | hello world | Hello World |
s.upper() | hello | HELLO |
Python provides a number of specific methods to inspect text data. These methods return a boolean value, in fact their identifier describes a string condition
Method | Behaviour |
---|---|
s.islower() | return True if s is lowercase |
s.istitle() | return True if str is title cased ('Hello World' ) |
s.isupper() | return True if str is uppercase |
s.startswith(s2) | return True if str starts with s2 |
s.endswith(s2) | return True if str ends with s2 |
s.isalnum() | return True if str is alphanumeric (A-Z, a-z, 0-9, no white spaces) |
s.isalpha() | return True if str is alphabetic (A-Z, a-z, no white spaces) |
You can find correspondences of a substring into a string using the following methods:
Method | Behaviour |
---|---|
s.index(s2, i, j) | Index of first occurrence of s2 in s after index i and before index j |
s.rindex(s2) | Return highest index of s2 in s (raise ValueError if not found) |
s.find(s2) | Find and return lowest index of s2 in s |
s.rfind(s2) | Return highest index of s2 in s |
Or, you can generate new strings (or list of strings) using the following common methods:
Method | Behaviour |
---|---|
s.join('123') | Return s joined by iterable '123' if s is 'hello' → '1hello2hello3' |
s.split(sep, maxsplit) | Return the input string split by the separator sep . Optionally you can define a maximum amount of splits maxsplits to be performed |
s.splitlines() | Return a list of lines in s if s is 'hello\nworld' → ['hello', 'world'] |
s.replace(s2, s3, count) | Replace s2 with s3 in s at most count times |