Чтение файлово-подобных объектов Python из C | Python

Опубликовано: 12 Апреля, 2022

Writing C extension code that consumes data from any Python file-like object (e.g., normal files, StringIO objects, etc.). read() method has to be repeatedly invoke to consume data on a file-like object and take steps to properly decode the resulting data.
Given below is a C extension function that merely consumes all of the data on a file-like object and dumps it to standard output.

Code #1 :
#define CHUNK_SIZE 8192
  
/* Consume a "file-like" object and write bytes to stdout */
static PyObject* py_consume_file(PyObject* self, PyObject* args)
{
    PyObject* obj;
    PyObject* read_meth;
    PyObject* result = NULL;
    PyObject* read_args;
  
    if (!PyArg_ParseTuple(args, "O", &obj)) {
        return NULL;
    }
  
    /* Get the read method of the passed object */
    if ((read_meth = PyObject_GetAttrString(obj, "read")) == NULL) {
        return NULL;
    }
  
    /* Build the argument list to read() */
    read_args = Py_BuildValue("(i)", CHUNK_SIZE);
    while (1) {
        PyObject* data;
        PyObject* enc_data;
        char* buf;
        Py_ssize_t len;
  
        /* Call read() */
        if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) {
            goto final;
        }
  
        /* Check for EOF */
        if (PySequence_Length(data) == 0) {
            Py_DECREF(data);
            break;
        }
  
        /* Encode Unicode as Bytes for C */
        if ((enc_data = PyUnicode_AsEncodedString(data,
             "utf-8", "strict")) == NULL) {
            Py_DECREF(data);
            goto final;
        }
  
        /* Extract underlying buffer data */
        PyBytes_AsStringAndSize(enc_data, &buf, &len);
  
        /* Write to stdout (replace with something more useful) */
        write(1, buf, len);
  
        /* Cleanup */
        Py_DECREF(enc_data);
        Py_DECREF(data);
    }
    result = Py_BuildValue("");
  
final:
    /* Cleanup */
    Py_DECREF(read_meth);
    Py_DECREF(read_args);
    return result;
}

A file-like object such as a StringIO instance is prepared to test the code and then it is passed in:

Code #2 :

import io
f = io.StringIO("Hello World ")
import sample
sample.consume_file(f)

Output :

Hello
World

Unlike a normal system file, a file-like object is not necessarily built around a low-level file descriptor. Thus, a normal C library functions can’t be used to access it. Instead, a Python’s C API is used to manipulate the file-like object much like you would in Python.
So, the read() method is extracted from the passed object. An argument list is built and then repeatedly passed to PyObject_Call() to invoke the method. To detect end-of-file (EOF), PySequence_Length() is used to see if the returned result has zero length.
For all I/O operations, the concern is underlying encoding and distinction between bytes and Unicode. This recipe shows how to read a file in text mode and decode the resulting text into a bytes encoding that can be used by C. If the file is read in binary mode, only minor changes will be made as shown in the code below.

Code #3 :

/* Call read() */
if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) {
    goto final;
}
  
/* Check for EOF */
if (PySequence_Length(data) == 0) {
    Py_DECREF(data);
    break;
}
  
if (!PyBytes_Check(data)) {
    Py_DECREF(data);
    PyErr_SetString(PyExc_IOError, "File must be in binary mode");
    goto final;
}
  
/* Extract underlying buffer data */
PyBytes_AsStringAndSize(data, &buf, &len);

Внимание компьютерщик! Укрепите свои основы с помощью базового курса программирования Python и изучите основы.

Для начала подготовьтесь к собеседованию. Расширьте свои концепции структур данных с помощью курса Python DS. А чтобы начать свое путешествие по машинному обучению, присоединяйтесь к курсу Машинное обучение - базовый уровень.