# ESO py 3.0 : 
# Write/Read Text files In Python - Basic manipulations


## **First things first**: What is a file? (well that's is not an easy definition!)

==> A File is a named location on disk to store related information. It is used to permanently store data in a non-volatile memory (e.g. hard disk).

Random Access memory is *volatile* a loses data when you switch of your computer

--> You need files to store data.

In basic python (and other languages too) we do like this:

 1. You open the file
 2. Read and/or write 
 3. Close the file


## 1 - With Built-in python

### A - Open file

Python as a built-in function, that is already included with any python distribution. 

 --> **open(file, access mode)**

**file**: 
1. if you are in the current directory where the file is = name of the file
2. If you are not = name of file + absolute path 

**Access mode**: 

* 'r' 	Open file for reading only. Starts reading from beginning of file. This is the default mode.
* 'r+'	Open file for reading and writing. File pointer placed at beginning of the file.
* 'w'  	Open file for writing only. File pointer placed at beginning of the file. Overwrites existing file and creates a new one if it does not exists.
* 'a'	Open a file for appending. Starts writing at the end of file. Creates a new file if file does not exist.
* 'a+'	Same as a but also open for reading.
* 'ab+'	Same as ab but also open for reading.

**FYI** it can also handle binary files:
* 'rb'.	Open a file for reading only in binary format. Starts reading from beginning of file.
* 'wb'	Same as w but opens in binary mode.
* 'w+'	Same as w but also alows to read from file.
* 'wb+'	Same as wb but also alows to read from file.
* 'ab'	Same as a but in binary format. Creates a new file if file does not exist.



In [110]:
f = open('my_file.txt', 'w')  ###I open 'my_file.txt' for writing --> if it does not exist, it is created

### B - Write and Append to a file

Now we created 'f', let's write in it!

'f' is called the *handle* and has different method among which the method called **write**!

In [111]:
f.write('VIMOS Rocks!!')

13

In [112]:
##one more time!
f.write('Yes it does!')

12

Now to make the changes effective, you must close it!

In [113]:
f.close()

In [114]:
f = open('my_file.txt', 'a') ##->Append to the file!

In [115]:
f.write('\nI maintain!\n')
f.write('yep yep yep')

11

In [116]:
f.close()

### C - Read from a file!

In [120]:
f = open('my_file.txt', 'r') ##->read mode

a = f.read() ##-->this read the file as a single string

print(a, type(a))

f.close()

VIMOS Rocks!!Yes it does!
I maintain!
yep yep yep <class 'str'>


In [130]:
f = open('my_file.txt', 'r') ##->read mode

a = f.read(2) ##-->this read 2 first characters of the file

print(a, type(a))

f.close()

VI <class 'str'>


In [165]:
f = open('my_file.txt', 'r') ##->read mode

a = f.readlines() ## readlineS-->this put all the lines in a 'list'

print(a, type(a))

f.close()

['VIMOS Rocks!!Yes it does!\n', 'I maintain!\n', 'yep yep yep'] <class 'list'>


In [167]:
f = open('my_file.txt', 'r') ##->read mode

a = f.readline() ## readline()-->this will read only one line, and pass to the next
print(a, type(a))
a = f.readline(2) ##-->this will read only one line, and pass to the next
print(a, type(a))
a = f.readline() ##-->this will read only one line and pass
print(a, type(a))

f.close()

VIMOS Rocks!!Yes it does!
 <class 'str'>
I  <class 'str'>
maintain!
 <class 'str'>


In [170]:
###But what happen if you have an error before closing? --> nothing is written in the file
##--> use the 'with' statements

with open('my_file.txt', 'w') as f:
    f.write('ESOpy3.0 test with statement')

    
##--> no need for closing!

**Exercice:** Using what we just saw, and what Ivan told you today, let's write a file called cat.txt that contains:

## 2 - Reading/writing catalogs with numpy

Numpy is a very powerful library for numerical computing (see next talk by me at the end of the day)

--> For now we are just going to open/write ascii catalogs

--> required functions --> **savetxt, genfromtxt & savetxt**

In [176]:
##--> import the module
import numpy
help(numpy.loadtxt)

Help on function loadtxt in module numpy:

loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None)
    Load data from a text file.
    
    Each row in the text file must have the same number of values.
    
    Parameters
    ----------
    fname : file, str, or pathlib.Path
        File, filename, or generator to read.  If the filename extension is
        ``.gz`` or ``.bz2``, the file is first decompressed. Note that
        generators should return byte strings for Python 3k.
    dtype : data-type, optional
        Data-type of the resulting array; default: float.  If this is a
        structured data-type, the resulting array will be 1-dimensional, and
        each row will be interpreted as an element of the array.  In this
        case, the number of columns used must match the number of fields in
        the data-type.
    comments : str or sequence of str, optional
  

In [253]:
a = numpy.loadtxt('cat.txt') ##--> reads all the file line by line
print(a)
print(a[0]) ##<---read first line

[[ 1.  4.  7.]
 [ 2.  5.  8.]
 [ 3.  6.  9.]
 [ 4.  7. 10.]
 [ 5.  8. 11.]
 [ 6.  9. 12.]
 [ 7. 10. 13.]]
[1. 4. 7.]


In [254]:
a = numpy.loadtxt('cat.txt').T ## T--> transpose: reads all the file column by column
print(a)
print(a[0]) ##<---read first column

[[ 1.  2.  3.  4.  5.  6.  7.]
 [ 4.  5.  6.  7.  8.  9. 10.]
 [ 7.  8.  9. 10. 11. 12. 13.]]
[1. 2. 3. 4. 5. 6. 7.]


In [180]:
##unpacking
A, B, C = numpy.loadtxt('cat.txt', unpack = True)
print(A)

[1. 2. 3. 4. 5. 6. 7.]


In [183]:
##which is the same as
A, B, C = numpy.loadtxt('cat.txt').T
print(A)

[1. 2. 3. 4. 5. 6. 7.]


In [184]:
help(numpy.genfromtxt)

Help on function genfromtxt in module numpy:

genfromtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=None, replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None, encoding='bytes')
    Load data from a text file, with missing values handled as specified.
    
    Each line past the first `skip_header` lines is split at the `delimiter`
    character, and characters following the `comments` character are discarded.
    
    Parameters
    ----------
    fname : file, str, pathlib.Path, list of str, generator
        File, filename, list, or generator to read.  If the filename
        extension is `.gz` or `.bz2`, the file is first decompressed. Note
        that generators must return byte strings in Python 3k.  The strings
        in a 

In [187]:
a = numpy.genfromtxt('cat.txt')
print(a)
print(a[0])

[[ 1.  4.  7.]
 [ 2.  5.  8.]
 [ 3.  6.  9.]
 [ 4.  7. 10.]
 [ 5.  8. 11.]
 [ 6.  9. 12.]
 [ 7. 10. 13.]]
[1. 4. 7.]


In [249]:
a = numpy.genfromtxt('cat.txt', names = True).T
print(a.dtype.names)

('A', 'B', 'C')


In [229]:
help(numpy.savetxt)

Help on function savetxt in module numpy:

savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None)
    Save an array to a text file.
    
    Parameters
    ----------
    fname : filename or file handle
        If the filename ends in ``.gz``, the file is automatically saved in
        compressed gzip format.  `loadtxt` understands gzipped files
        transparently.
    X : 1D or 2D array_like
        Data to be saved to a text file.
    fmt : str or sequence of strs, optional
        A single format (%10.5f), a sequence of formats, or a
        multi-format string, e.g. 'Iteration %d -- %10.5f', in which
        case `delimiter` is ignored. For complex `X`, the legal options
        for `fmt` are:
    
        * a single specifier, `fmt='%.4e'`, resulting in numbers formatted
          like `' (%s+%sj)' % (fmt, fmt)`
        * a full string specifying every real and imaginary part, e.g.
          `' %.4e %+.4ej %.4e %+.4ej %.4

In [250]:
a = numpy.genfromtxt('cat.txt').T
numpy.savetxt('new_cat.txt', a[:2]) ##--> this will write lines (even if you loaded column!)

Output:

1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00 4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00 7.000000000000000000e+00

4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00 7.000000000000000000e+00 8.000000000000000000e+00 9.000000000000000000e+00 1.000000000000000000e+01


In [252]:
numpy.savetxt('new_cat.txt', a[:2].T) ##--> this will write column

1.000000000000000000e+00 4.000000000000000000e+00  
2.000000000000000000e+00 5.000000000000000000e+00  
3.000000000000000000e+00 6.000000000000000000e+00  
4.000000000000000000e+00 7.000000000000000000e+00  
5.000000000000000000e+00 8.000000000000000000e+00  
6.000000000000000000e+00 9.000000000000000000e+00  
7.000000000000000000e+00 1.000000000000000000e+01  

In [255]:
numpy.savetxt('new_cat_2.txt', a[:2].T, fmt='%1.1f') ###<--this will adjust the format 

1.0 4.0  
2.0 5.0  
3.0 6.0  
4.0 7.0  
5.0 8.0  
6.0 9.0  
7.0 10.0  

## 2 - Reading catalogs with catscii

In [257]:
from catscii import catscii

In [261]:
help(catscii.load_cat)

Help on class load_cat in module catscii.catscii:

class load_cat(builtins.object)
 |  load_cat(catalog, header)
 |  
 |  This class creates, from an ascii catalog,
 |  a python object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, catalog, header)
 |      Class constructor
 |      
 |      Parameters
 |      ----------
 |      catalog : str
 |                this is the catalog (eventually with its path) you want to open
 |      
 |      header  : Boolean
 |                True if the catalog contains a header, otherwise false
 |      
 |      Attributes
 |      ----------
 |      name    : str
 |                Name of the catalog, as passed by the user
 |      
 |      cat     : numpy array
 |                numpy array loaded from the catalog (all columns are string by default)
 |      
 |      header  : list of string
 |                identification of each column. If no header is present in the catalog
 |                then each column are renames col1, col2, ....colN
 

In [266]:
catalog = catscii.load_cat('cat.txt', True) 

In [267]:
catalog.header

['A', 'B', 'C']

In [285]:
catalog.get_column('A') ##<---load as string by default (U2: 2-character unicode string)

array(['1', '2', '3', '4', '5', '6', '7'], dtype='<U2')

In [279]:
catalog.get_column('A', float)

array([1., 2., 3., 4., 5., 6., 7.])

In [291]:
catalog.get_line('A','4')

[{'A': '4', 'B': '7', 'C': '10'}]

**Exercice**

Using what you know now, do the following:

 1. Write down a file that contains 5 column x 20 rows called 'final_ex.txt' looking like this (hint: while....)

 2. Read the file using numpy and save it back with just the 3 last columns (save the header as well!). Call it 'final_modified.txt'
 3. Read it again using catscii and extract the second column (D)