# Transform Strings

## Formatting Strings

Python supports multiple ways to format strings. With formatting I mean building new strings out of different kind of values. The most commons are the old school `%`

-formatting, the `.format()`

method and the Python3 novelty `f-strings`

. In this manual, we will look into the latest way, given its minimal syntax and high flexibility.

Formatting strings with `f-string`

is as easy as defining a standard string literal, look at this example:

```
myName = 'Roberto'
f'Hi! My name is {myName}'
# My name is Roberto
```

Moreover, `f-strings`

allow to include full Python expressions like:

```
a = 10
b = 20
f'The mid value is {a+(b-a)*.5}'
# The mid value is 15.0
```

Don't forget to prefix the f-string with an `'f'`

or `'F'`

otherwise it would be considered just a regular string, look:

```
a = 10
b = 20
'The mid value is {a+(b-a)*.5}'
# The mid value is {a+(b-a)*.5}
```

As you may have already noticed, the expression into the 'f-string' should be surrounded by braces:

```
f'The mid value is a+(b-a)*.5'
# The mid value is a+(b-a)*.5
```

Otherwise, the expression is not evaluated.

`f-strings`

and `.format()`

share the same format specifier mini-language. Which can be synthesized in the following way:

`f'{[expression]:[width][type]}'`

This mini-language allows us to precisely instruct how to format the data into the string adding some extra information after a colon. For example:

```
# we import the euler constant from the math module
from math import e
# then we print the constant value with two digits after the period
print(f'euler: {e:.2f}') # euler: 2.71
```

This step is optional, so if we omit any extra instruction, Python will use a standard conversion intent.

```
# we import the euler constant from the math module
from math import e
# then we print the constant value
print(f'euler: {e}') # euler: 2.718281828459045
# note the different amount of digits after the period
```

`[width]`

provides instructions concerning padding, for example allowing the appending of extra characters to the right

```
message = 'hello'
f'{message:+<10}'
# hello+++++
```

or to the left

```
message = 'hello'
f'{message:>10}'
# hello
```

In the cases above, white spaces will be added until the length of 10 characters is reached.

`[width]`

also allows to center a string within a certain amount of characters:

```
print(f"{'a':^10}")
print(f"{'bcd':^10}")
print(f"{'efghi':^10}")
print(f"{'jklmnop':^10}")
print(f"{'qrstu':^10}")
print(f"{'vwx':^10}")
print(f"{'y':^10}")
# a
# bcd
# efghi
# jklmnop
# qrstu
# vwx
# y
```

You can easily define which character should be used by the interpreter:

```
print(f'{"hello":@<10}')
# hello@@@@@
```

```
print(f'{"hello":_>10}')
# _____hello
```

```
print(f'{"hello":-^10}')
# --hello---
```

If no instruction for `[type]`

is defined, Python will use the basic string representation for the value provided. For example, integers will be represented using 10 base notation, but it is conveniently possible to specify a different base adding `[type]`

. What follows is a list of the possible conversion options for integer values.

Type | Meaning | Output |
---|---|---|

`'b'` | Binary format | Outputs the number in base 2 |

`'c'` | Character | Converts the integer to the corresponding unicode character before printing |

`'d'` | Decimal Integer | Outputs the number in base 10 |

`'o'` | Octal format | Outputs the number in base 8 |

`'x'` | Hex format | Outputs the number in base 16, using lower- case letters for the digits above 9 |

`'X'` | Hex format | Outputs the number in base 16, using upper- case letters for the digits above 9 |

`'n'` | Number | This is the same as `'d'` , except that it uses the current locale setting to insert the appropriate number separator characters |

Consider the following examples:

```
value = 242
f'{value:.^12b}'
# '..11110010..'
```

`242`

is converted to `11110010`

. The binary representation is then centered in a string of length 12 using `'.'`

as placeholder

```
value = 65
f'H{value:c}H'
# 'HAH'
```

65 in the Unicode mapping points to the uppercase 'A' which is then joined to the string literal

```
value = 200
f'{value:>+6d}'
# ' +200'
```

The plus in front of the string’s width forces the interpreter to put a sign in front of the decimal integer even if positive. Note that the plus can be applied to any numerical data conversion. 200 is displayed in base 10, and some white spaces are put in front of the sign until the string reaches length 6.

```
value = 379
f'U+{value:0>4X}'
# 'U+017B'
```

converts `3792₁₀`

to `17B₁₆`

, using uppercase letters. Then it adds an extra `0`

character in front of the hexadecimal representation to reach length 4. Finally, it is linked to the string literal `'U+'`

.

Now, let’s look at a selection of options for floating-point numbers

Type | Meaning | Output |
---|---|---|

`'f'` | Fixed point | Displays the number as a fixed-point number. The default precision is 6. |

`'n'` | Number | It uses the current locale setting to insert the appropriate number separator characters. If the number is too large, it switches to scientific notation. |

`'%'` | Percentage | Multiplies the number by 100 and displays in fixed (`'f'` ) format, followed by a percent sign. |

Consider the following examples:

```
value = 12.2345
f'{value:.2f}' # 12.23
```

`'f'`

in combination with `'.2'`

will output a floating point representation which precision is limited to 2 digits after the dot

```
# US standard
value = 2345.67
f'{value:n}'
# 2,345.67
# Italian standard
import locale
locale.setlocale(locale.LC_ALL, 'it_IT')
f'{value:n}'
# 2345,67
```

This conversion method could be useful when typesetting languages different from English. Check the Python standard `locale`

module documentation.

```
value = .45
f'{value:.0%}' # 45%
```

The `'%'`

conversion type will convert the floating point to percentage. `'.0'`

will truncate the decimal information of the percentage (`45%`

instead of `45.000000%`

).

## Useful String Methods

Python provides a number of specific methods to transform text data. Remember that strings are immutable, so they are not manipulated “in place”. The following methods generate a brand new string that you have to assign to an identifier if you need to use their output afterwards.

Method | Input | Output |
---|---|---|

`s.capitalize()` | `hello` | `Hello` |

`s.lower()` | `HELLO` | `hello` |

`s.swapcase()` | `Hello` | `hELLO` |

`s.title()` | `hello world` | `Hello World` |

`s.upper()` | `hello` | `HELLO` |

Python provides a number of specific methods to inspect text data. These methods return a boolean value, in fact their identifier describes a string condition

Method | Behaviour |
---|---|

`s.islower()` | return `True` if `s` is lowercase |

`s.istitle()` | return `True` if `str` is title cased (`'Hello World'` ) |

`s.isupper()` | return `True` if `str` is uppercase |

`s.startswith(s2)` | return `True` if `str` starts with `s2` |

`s.endswith(s2)` | return `True` if `str` ends with `s2` |

`s.isalnum()` | return `True` if `str` is alphanumeric (A-Z, a-z, 0-9, no white spaces) |

`s.isalpha()` | return `True` if `str` is alphabetic (A-Z, a-z, no white spaces) |

You can find correspondences of a substring into a string using the following methods:

Method | Behaviour |
---|---|

`s.index(s2, i, j)` | Index of first occurrence of `s2` in `s` after index i and before index j |

`s.rindex(s2)` | Return highest index of `s2` in s (raise `ValueError` if not found) |

`s.find(s2)` | Find and return lowest index of `s2` in `s` |

`s.rfind(s2)` | Return highest index of `s2` in `s` |

Or, you can generate new strings (or list of strings) using the following common methods:

Method | Behaviour |
---|---|

`s.join('123')` | Return `s` joined by iterable `'123'` if `s` is `'hello'` → `'1hello2hello3'` |

`s.split(sep, maxsplit)` | Return the input string split by the separator `sep` . Optionally you can define a maximum amount of splits `maxsplits` to be performed |

`s.splitlines()` | Return a list of lines in s if `s` is `'hello\nworld'` → `['hello', 'world']` |

`s.replace(s2, s3, count)` | Replace `s2` with `s3` in `s` at most count times |