Tell my sisters about python-S01E09 file operation

Tell my sisters about python-S01E09 file operation

In the previous episodes, we introduced the python string and the related content of encoding and decoding in detail, which is essentially the basis of file operations. In today's episode, we are just talking about file operations.

Let's warm up first, look at a simple example of using the open function to open a file:

myfile = open('myfile.txt','w')
myfile = open('myfile.txt','r')
 

[Girl said] There should be many modes for file reading and writing, such as read-only, read-write, etc. How should it be implemented?

We can see that when using the built-in function open for file operations, the first parameter is the file name, and the second parameter is the processing mode. Typical usage mode parameters are: r is to open the file in read-only mode, w is to open the file in output mode, a represents appending content to the end of the file to open the file, and b to the end of the mode string can be used for binary data processing.

The built-in open function will create a python file object as an interface for file operations.

We need to keep one thing in mind: the content of the file is a string. The data read from the file is a string when returned by the function. If the string is not what you need, for example, what you actually need is a floating-point number, you need to convert the string to a floating-point number type and write the data When importing a file, you must also pass a formatted string to the write method.

[Sister said] That's still an old routine, let's take a look at some practical examples:

OK, let s take a look at an example of actually using a file: We write two lines of strings (including line breaks) in the file, and then read them out in several different ways. 1. write the data:

myfile = open('myfile.txt','w')
myfile.write('hello text file\n')
myfile.write('goodbyt text file\n')
myfile.close()
 

The first method used is the readline method, one line at a time is manually read, and an empty string is returned the last time, which means that the bottom of the file has been reached

myfile = open('myfile.txt','r')
print(myfile.readline())
print(myfile.readline())
print(myfile.readline())

hello text file
goodbyt text file
 

Secondly, you can also use the read method to read all file contents at once

myfile = open('myfile.txt','r')
print(myfile.read())

hello text file
goodbyt text file
 

Finally, look at a more pythonic method, which can automatically scan files line by line

myfile = open('myfile.txt','r')
for line in myfile:
    print(line, end='')

hello text file
goodbyt text file
 

This method involves the concept of a file iterator. The file object myfile created by the open method will automatically read in and return a new line of data every time the loop iterates. This format is usually easy to write, can use memory well, and runs fast.

Regarding the concept of iterators, we will introduce them in the follow-up session. Just remember: it is the most convenient way to use file iterators to read data row by row.

Let's talk about the read and write operations of binary files. The thing to remember is that we must use bytes strings to process binary files. Because when we read a binary data file, what we get is a bytes object, the binary file will not perform any conversion on the data.

As a reminder, binary files cannot be opened in text mode, because text files implement unicode encoding. If the contents of binary files are decoded by unicode, it is obviously meaningless and may fail. We talked about this issue in the last section, so I won t talk about it here. We only need to review an example.

myfile = open('data.bin','wb')
myfile.write(b'abcdefg')
myfile.close()

data = open('data.bin', 'rb').read()
print(data)
print(list(data))

b'abcdefg'
[97, 98, 99, 100, 101, 102, 103]
 

Let's briefly talk about the closing and refreshing of files

The closing of the file. Calling the file close method will terminate the link to the external file, that is, manually close the file. For example, when the file is no longer in use, the memory space of the file object will be recovered. Although Python also has the feature of automatically closing the file, manual closing is the safest method. The context manager of the file object will be introduced later, which can automatically close the file.

By default, files are always buffered, which means that the written text may not be automatically transferred from memory to hard disk immediately. And closing a file, or running the flush method, can force the cached data to enter the hard disk immediately.

Compared with the previous file in the form of a string, we finally talk about a special file storage method: object storage

The pickle module is an advanced tool that allows us to store almost any python object directly in a file, and does not require us to convert strings. It is a general data formatting and parsing tool. Let s take an example. Store a dictionary object and a list object directly in the file

import pickle
D = {'a': 1, 'b': 2, 'c': 3}
L = [3, 4, 5]
with open('datafile.pkl', 'wb') as file:
    pickle.dump(D, file)
    pickle.dump(L, file)
 

In this way, it is very simple to store the two objects in the specified file. If you want to use these objects, you only need to simply rebuild the objects.

with open('datafile.pkl', 'rb') as file:
    print(pickle.load(file))
    print(pickle.load(file))

{'b': 2, 'a': 1, 'c': 3}
[3, 4, 5]
 

The so-called object serialization performed by the Pickle module is essentially the conversion process between pickle's internal dictionary objects, list objects, and byte strings.

There is also a struct tool that processes packaged binary files. Here is a brief mention. It is good to have an impression that the struct tool can construct and parse packaged binary data. In a sense, it is also a data conversion tool.

1. let s look at how to pack the data into binary data and store it in a file. The first parameter is a format string,> means high-order, a 4-byte integer, a 5-byte string, and a floating-point number. Format.

import struct

F = open('data.bin', 'wb')
data = struct.pack('>i5sf', 8, b'abcde', 4.3)
print(data)
F.write(data)
F.close()


F = open('data.bin', 'rb')
data = F.read()
print(data)
values = struct.unpack('>i5sf', data)
print(values)

b'\x00\x00\x00\x08abcde@\x89\x99\x9a'
b'\x00\x00\x00\x08abcde@\x89\x99\x9a'
(8, b'abcde', 4.300000190734863)
 

The latter part is easy to understand, that is, read the byte string in the file and decompress it in the same format. Python directly converts it into a normal Python object.

However, what I want to say is that in general, the binary file processing mode is used to process simpler binary files, such as pictures and audio files, without the need to decompress its content. At the same time, if you want to store data, use more databases.

[Girl said] Well, as of today, we have learned several main data types in python: lists, dictionaries, tuples and strings. On the basis of basic data types, I have further understood the advanced concepts in the container---iteration and list comprehension, and the important and difficult points in strings---character encoding and file access. I need to organize and review it~

Original release time: 2018-08-08
author: soy sauce brother
article from Yunqi community partners " Python enthusiast community " for information may concern " Python enthusiast community ."