Почему чтение записей структурных полей из std:: istream завершается неудачно, и как я могу это исправить?

Предположим, что мы имеем следующую ситуацию:

Структура записи объявляется следующим образом

struct Person {
    unsigned int id;
    std::string name;
    uint8_t age;
    // ...
};

Записи хранятся в файле в следующем формате:

ID      Forename Lastname Age
------------------------------
1267867 John     Smith    32
67545   Jane     Doe      36
8677453 Gwyneth  Miller   56
75543   J. Ross  Unusual  23
...

Файл должен быть прочитан для сбора произвольного количества записей Person, упомянутых выше:

std::istream& ifs = std::ifstream("SampleInput.txt");
std::vector<Person> persons;

Person actRecord;
while(ifs >> actRecord.id >> actRecord.name >> actRecord.age) {
    persons.push_back(actRecord);
}

if(!ifs) {
    std::err << "Input format error!" << std::endl;
}

Вопрос: (часто задаваемый вопрос в той или иной форме)
Что я могу сделать, чтобы читать отдельные значения, сохраняя их значения в поля actRecord variables?

Приведенный выше пример кода заканчивается ошибками времени выполнения:

Runtime error    time: 0 memory: 3476 signal:-1
stderr: Input format error!

Ответы

Ответ 1

У вас есть пробел между именем и именем. Измените свой класс, чтобы иметь имя и фамилию как отдельные строки, и он должен работать. Другое, что вы можете сделать, это прочитать две отдельные переменные, такие как name1 и name2, и назначить ее как

actRecord.name = name1 + " " + name2;

Ответ 2

Один жизнеспособное решение заключается в изменении порядка полей ввода (если это возможно)

ID      Age Forename Lastname
1267867 32  John     Smith    
67545   36  Jane     Doe      
8677453 56  Gwyneth  Miller   
75543   23  J. Ross  Unusual  
...

и читайте в отчетах следующим образом

#include <iostream>
#include <vector>

struct Person {
    unsigned int id;
    std::string name;
    uint8_t age;
    // ...
};

int main() {
    std::istream& ifs = std::cin; // Open file alternatively
    std::vector<Person> persons;

    Person actRecord;
    unsigned int age;
    while(ifs >> actRecord.id >> age && 
          std::getline(ifs, actRecord.name)) {
        actRecord.age = uint8_t(age);
        persons.push_back(actRecord);
    }

    return 0;
}

Ответ 3

Здесь реализована реализация манипулятора I, который подсчитывает разделитель через каждый извлеченный символ. Используя количество разделителей, которые вы указали, оно будет извлекать слова из входного потока. Здесь рабочая демонстрация.

template<class charT>
struct word_inserter_impl {
    word_inserter_impl(std::size_t words, std::basic_string<charT>& str, charT delim)
        : str_(str)
        , delim_(delim)
        , words_(words)
    { }

    friend std::basic_istream<charT>&
    operator>>(std::basic_istream<charT>& is, const word_inserter_impl<charT>& wi) {
        typename std::basic_istream<charT>::sentry ok(is);

        if (ok) {
            std::istreambuf_iterator<charT> it(is), end;
            std::back_insert_iterator<std::string> dest(wi.str_);

            while (it != end && wi.words_) {
                if (*it == wi.delim_ && --wi.words_ == 0) {
                    break;
                }
                dest++ = *it++;
            }
        }
        return is;
    }
private:
    std::basic_string<charT>& str_;
    charT delim_;
    mutable std::size_t words_;
};

template<class charT=char>
word_inserter_impl<charT> word_inserter(std::size_t words, std::basic_string<charT>& str, charT delim = charT(' ')) {
    return word_inserter_impl<charT>(words, str, delim);
}

Теперь вы можете просто сделать:

while (ifs >> actRecord.id >> word_inserter(2, actRecord.name) >> actRecord.age) {
    std::cout << actRecord.id << " " << actRecord.name << " " << actRecord.age << '\n';
}

Live Demo

Ответ 4

Решение должно состоять в том, чтобы прочитать в первом входе переменную ID.
Затем прочитайте все остальные слова из строки (просто нажмите их во временном векторе) и создайте имя человека со всеми элементами, за исключением последней записи, которая представляет собой Age.

Это позволит вам по-прежнему иметь возраст на последней позиции, но иметь возможность иметь дело с именем "Дж. Росс Необычное".

Обновить, чтобы добавить код, который иллюстрирует вышеприведенную теорию:

#include <memory>
#include <string>
#include <vector>
#include <iterator>
#include <fstream>
#include <sstream>
#include <iostream>

struct Person {
    unsigned int id;
    std::string name;
    int age;
};

int main()
{
    std::fstream ifs("in.txt");
    std::vector<Person> persons;

    std::string line;
    while (std::getline(ifs, line))
    {
        std::istringstream iss(line);

        // first: ID simply read it
        Person actRecord;
        iss >> actRecord.id;

        // next iteration: read in everything
        std::string temp;
        std::vector<std::string> tempvect;
        while(iss >> temp) {
            tempvect.push_back(temp);
        }

        // then: the name, let join the vector in a way to not to get a trailing space
        // also taking care of people who do not have two names ...
        int LAST = 2;
        if(tempvect.size() < 2) // only the name and age are in there
        {
            LAST = 1;
        }
        std::ostringstream oss;
        std::copy(tempvect.begin(), tempvect.end() - LAST,
            std::ostream_iterator<std::string>(oss, " "));
        // the last element
        oss << *(tempvect.end() - LAST);
        actRecord.name = oss.str();

        // and the age
        actRecord.age = std::stoi( *(tempvect.end() - 1) );
        persons.push_back(actRecord);
    }

    for(std::vector<Person>::const_iterator it = persons.begin(); it != persons.end(); it++)
    {
        std::cout << it->id << ":" << it->name << ":" << it->age << std::endl;
    }
}

Ответ 5

Так как мы можем легко разбить строку на пробелы, и мы знаем, что единственное значение, которое может быть разделено, - это имя, возможным решением является использование deque для каждой строки, содержащей отдельные элементы строки, разделенные пробелами. Идентификатор и возраст могут быть легко извлечены из дека, а остальные элементы могут быть объединены для получения имени:

#include <iostream>
#include <fstream>
#include <deque>
#include <vector>
#include <sstream>
#include <iterator>
#include <string>
#include <algorithm>
#include <utility>

struct Person {
    unsigned int id;
    std::string name;
    uint8_t age;
};

int main(int argc, char* argv[]) {

    std::ifstream ifs("SampleInput.txt");
    std::vector<Person> records;

    std::string line;
    while (std::getline(ifs,line)) {

        std::istringstream ss(line);

        std::deque<std::string> info(std::istream_iterator<std::string>(ss), {});

        Person record;
        record.id = std::stoi(info.front()); info.pop_front();
        record.age = std::stoi(info.back()); info.pop_back();

        std::ostringstream name;
        std::copy
            ( info.begin()
            , info.end()
            , std::ostream_iterator<std::string>(name," "));
        record.name = name.str(); record.name.pop_back();

        records.push_back(std::move(record));
    }

    for (auto& record : records) {
        std::cout << record.id << " " << record.name << " " 
                  << static_cast<unsigned int>(record.age) << std::endl;
    }

    return 0;
}

Ответ 6

Что я могу сделать, чтобы читать отдельные слова, образующие имя, в одну переменную actRecord.name?

Общий ответ: Нет, вы не можете сделать это без дополнительных спецификаций разделителя и исключительного анализа для частей, составляющих предполагаемое содержимое actRecord.name.
Это связано с тем, что поле std::string будет анализироваться только до следующего появления символа пробела.

Примечательно, что некоторые стандартные форматы (например, .csv) могут потребовать поддержки отличия пробелов (' ') от табуляции ('\t') или других символов, чтобы разграничить определенные поля записи (которые могут быть не видны с первого взгляда).

Также обратите внимание:
Чтобы прочитать значение uint8_t как числовой ввод, вам придется отклоняться с использованием временного значения unsigned int. Чтение только a unsigned char (aka uint8_t) приведет к зависанию состояния разбора потока.

Ответ 7

Другое решение - потребовать определенные символы разделителя для определенного поля и предоставить для этого специальный манипулятор для извлечения.

Предположим, что мы определяем символ разделителя ", и вход должен выглядеть следующим образом:

1267867 "John Smith"      32   
67545   "Jane Doe"        36  
8677453 "Gwyneth Miller"  56  
75543   "J. Ross Unusual" 23

В общем необходимо:

#include <iostream>
#include <vector>
#include <iomanip>

Объявление записи:

struct Person {
    unsigned int id;
    std::string name;
    uint8_t age;
    // ...
};

Объявление/определение прокси-класса (структуры), который поддерживает использование глобальной перегрузки оператора std::istream& operator>>(std::istream&, const delim_field_extractor_proxy&):

struct delim_field_extractor_proxy { 
    delim_field_extractor_proxy
       ( std::string& field_ref
       , char delim = '"'
       ) 
    : field_ref_(field_ref), delim_(delim) {}

    friend 
    std::istream& operator>>
       ( std::istream& is
       , const delim_field_extractor_proxy& extractor_proxy);

    void extract_value(std::istream& is) const {
        field_ref_.clear();
        char input;
        bool addChars = false;
        while(is) {
            is.get(input);
            if(is.eof()) {
                break;
            }
            if(input == delim_) {
                addChars = !addChars;
                if(!addChars) {
                    break;
                }
                else {
                    continue;
                }
            }
            if(addChars) {
                field_ref_ += input;
            }
        }
        // consume whitespaces
        while(std::isspace(is.peek())) {
            is.get();
        }
    }
    std::string& field_ref_;
    char delim_;
};

std::istream& operator>>
    ( std::istream& is
    , const delim_field_extractor_proxy& extractor_proxy) {
    extractor_proxy.extract_value(is);
    return is;
}

Совмещение всего соединения и создание delim_field_extractor_proxy:

int main() {
    std::istream& ifs = std::cin; // Open file alternatively
    std::vector<Person> persons;

    Person actRecord;
    int act_age;
    while(ifs >> actRecord.id 
              >> delim_field_extractor_proxy(actRecord.name,'"')
              >> act_age) {
        actRecord.age = uint8_t(act_age);
        persons.push_back(actRecord);
    }

    for(auto it = persons.begin();
        it != persons.end();
        ++it) {
        std::cout << it->id << ", " 
                      << it->name << ", " 
                      << int(it->age) << std::endl;
    }
    return 0;
}

Смотрите рабочий рабочий пример здесь.

Примечание:
Это решение также работает хорошо, указав символ TAB (\t) в качестве разделителя, что полезно для разбора стандартных форматов .csv.

Ответ 8

Еще одна попытка решения проблемы синтаксического анализа.

int main()
{
   std::ifstream ifs("test-115.in");
   std::vector<Person> persons;

   while (true)
   {
      Person actRecord;
      // Read the ID and the first part of the name.
      if ( !(ifs >> actRecord.id >> actRecord.name ) )
      {
         break;
      }

      // Read the rest of the line.
      std::string line;
      std::getline(ifs,line);

      // Pickup the rest of the name from the rest of the line.
      // The last token in the rest of the line is the age.
      // All other tokens are part of the name.
      // The tokens can be separated by ' ' or '\t'.
      size_t pos = 0;
      size_t iter1 = 0;
      size_t iter2 = 0;
      while ( (iter1 = line.find(' ', pos)) != std::string::npos ||
              (iter2 = line.find('\t', pos)) != std::string::npos )
      {
         size_t iter = (iter1 != std::string::npos) ? iter1 : iter2;
         actRecord.name += line.substr(pos, (iter - pos + 1));
         pos = iter + 1;

         // Skip multiple whitespace characters.
         while ( isspace(line[pos]) )
         {
            ++pos;
         }
      }

      // Trim the last whitespace from the name.
      actRecord.name.erase(actRecord.name.size()-1);

      // Extract the age.
      // std::stoi returns an integer. We are assuming that
      // it will be small enough to fit into an uint8_t.
      actRecord.age = std::stoi(line.substr(pos).c_str());

      // Debugging aid.. Make sure we have extracted the data correctly.
      std::cout << "ID: " << actRecord.id
         << ", name: " << actRecord.name
         << ", age: " << (int)actRecord.age << std::endl;
      persons.push_back(actRecord);
   }

   // If came here before the EOF was reached, there was an
   // error in the input file.
   if ( !(ifs.eof()) ) {
       std::cerr << "Input format error!" << std::endl;
   } 
}

Ответ 9

При просмотре такого входного файла я считаю, что это не файл с разделителями (новый способ), а старые добрые поля фиксированного размера, такие как программисты Fortran и Cobol. Поэтому я бы разобрал его так (обратите внимание, что я разделил имя и фамилию):

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>

struct Person {
    unsigned int id;
    std::string forename;
    std::string lastname;
    uint8_t age;
    // ...
};

int main() {
    std::istream& ifs = std::ifstream("file.txt");
    std::vector<Person> persons;
    std::string line;
    int fieldsize[] = {8, 9, 9, 4};

    while(std::getline(ifs, line)) {
        Person person;
        int field = 0, start=0, last;
        std::stringstream fieldtxt;
        fieldtxt.str(line.substr(start, fieldsize[0]));
        fieldtxt >> person.id;
        start += fieldsize[0];
        person.forename=line.substr(start, fieldsize[1]);
        last = person.forename.find_last_not_of(' ') + 1;
        person.forename.erase(last);
        start += fieldsize[1];
        person.lastname=line.substr(start, fieldsize[2]);
        last = person.lastname.find_last_not_of(' ') + 1;
        person.lastname.erase(last);
        start += fieldsize[2];
        std::string a = line.substr(start, fieldsize[3]);
        fieldtxt.str(line.substr(start, fieldsize[3]));
        fieldtxt >> age;
        person.age = person.age;
        persons.push_back(person);
    }
    return 0;
}