Jump to content

Mult-Threaded configuration parsing


Guest Mythli

Recommended Posts

I've programmed this for learning reasons and i need some feedback (coding style, correct pointers, ...)

this codesnip will parse mangos-like configuration files using regular expressions and multiple threads (you need the boost c++ libary to compile)

On a AMD Phenom x4 945 (+intel ssd) this code is up to 20 times faster in parsing than the dotconfpp mangos configuration class i don't think performance matter if we parse configuration files but this is still an interesting fact.

here is my code:

header-file:

#ifndef CONFIG_H
#define CONFIG_H

#include "Common.h"
#include <fstream>
#include <boost/regex.hpp>

class Config {
public:
   Config(std::string ConfigFileName);
   void ReloadConfig();
   std::string GetStringDefault(std::string Name, std::string DefaultValue = "");
   bool GetBoolDefault(std::string Name, bool DefaultValue = false);
   int GetIntDefault(std::string Name, int DefaultValue = 0);
   float GetFloatDefault(std::string Name, float DefaultValue = 0);

   std::string GetFileName()
   {
       return this->configFileName;
   }
   ~Config();
private:
   std::string configFileName;
   std::ifstream fileStream;
   boost::regex searchPattern;
   stringMapType configMap;

   //--used for threading
   boost::mutex lineBufferMutex;
   boost::mutex configMapMutex;
   std::list<std::string> lineBuffer;
   bool isFileFinish;
   void lineParser();
   //--
};
#endif

code-file:

#include "Config.h"

Config::Config(std::string ConfigFileName) {
   this->configFileName = ConfigFileName;

   this->fileStream.open(this->configFileName);
   if (this->fileStream.is_open()) {
       this->searchPattern = boost::regex("^([ \\\\t]+)?([^#][^\\\\s]+)\\\\s*=\\\\s*(?\\\\d+)|(?:\\"([^\\"]*)\\")).*$");
       this->ReloadConfig();
   }
}

void Config::lineParser()
{
   //--keep thread alive until file is fully pushed into lineBuffer and lineBuffer is empty
   while(this->isFileFinish == false || this->lineBuffer.size() != 0)
   {
       //--wait until lines for parsing are avaible
       if (this->lineBuffer.size() == 0)
           boost::this_thread::sleep(boost::Posix_time::millisec(1));
       //--
       else
       {
           boost::match_results<std::string::const_iterator> matches;
           //--lock lineBuffer object
           this->lineBufferMutex.lock();
           //copy first line into temp var
           std::string line = *this->lineBuffer.begin();

           //remove first line from lineBuffer
           this->lineBuffer.pop_front();
           this->lineBufferMutex.unlock();
           //--

           if (boost::regex_match(line, matches, this->searchPattern))
           {
               //--insert match into our configMap (we have to lock our configMap object first)
               this->configMapMutex.lock();
               if (matches[3].matched)
                   //insert number (without qoutes)
                   this->configMap.insert(std::make_pair(matches[2], matches[3]));
               else
                   //insert string (in quoutes)
                   this->configMap.insert(std::make_pair(matches[2], matches[4]));
               this->configMapMutex.unlock();
               //--
           }
       }
       //--
   }
}

void Config::ReloadConfig() {
   //--reset
   this->configMap.clear();
   this->fileStream.seekg(0, ios::beg);
   this->isFileFinish = false;
   //--

   //--define variables
   std::string line;
   boost::thread_group threadPool;
   //--

   //--create parser-threads for each logical cpu
   for(uint32 i = 0; i <  boost::thread::hardware_concurrency(); i++)
   {
       threadPool.create_thread(boost::bind(&Config::lineParser, this));
   }
   //--

   //--push lines from file in our lineBuffer
   while(std::getline(this->fileStream, line)) 
   {
       this->lineBuffer.push_back(line);
   }
   //--
   this->isFileFinish = true;
   //wait until all lines are parsed
   threadPool.join_all();
}

std::string Config::GetStringDefault(std::string Name, std::string DefaultValue)
{
   stringMapType::const_iterator configNode = this->configMap.find(Name);
   if(configNode != this->configMap.end())
       return configNode->second;
   else
       return DefaultValue;
}

bool Config::GetBoolDefault(std::string Name, bool DefaultValue) 
{
   std::string baseValue = this->GetStringDefault(Name, "");
   if (baseValue == "")
       return DefaultValue;
   else
       return boost::lexical_cast<bool>(baseValue);
}
int Config::GetIntDefault(std::string Name, int DefaultValue)
{
   std::string baseValue = this->GetStringDefault(Name, "");
   if (baseValue == "")
       return DefaultValue;
   else 
       return boost::lexical_cast<int>(baseValue);
}

float Config::GetFloatDefault(std::string Name, float DefaultValue)
{
   std::string baseValue = this->GetStringDefault(Name, "");
   if (baseValue == "")
       return DefaultValue;
   else 
       return boost::lexical_cast<float>(baseValue);
}

Config::~Config() {
   this->fileStream.close();
}

Link to comment
Share on other sites

OK ^_^

On a AMD Phenom x4 945 (+intel ssd) this code is up to 20 times faster in parsing than the dotconfpp mangos configuration class i don't think performance matter if we parse configuration files but this is still an interesting fact.

Performance does matter, especially when you think about the .reload config command

Link to comment
Share on other sites

And it makes things clearer when you don't prepend member stuff with m_ I guess.
i started programming with c++ a few weeks ago but i still think this is a good thing ("this->) "logical divide" member and local variables like m_ do.
Performance does matter, especially when you think about the .reload config command

we are talking about 100-200ms or something around ;)

Link to comment
Share on other sites

skynyrd, you mean CTRL+SPACE (to trigger IntelliSense)?

and i prefer the this-> too, just because it's more 'clear' and prevents any naming-problems between local variables and class variables (even if i'm using m_ i might still miss one). i also use std:: instead of 'using namespace std'. i guess this is a question of style which everyone has to decide on his own.

Link to comment
Share on other sites

I have two problems with this code:

1. Using multiple threads to read single lines from a file seems to be totally inefficient.

2. Using regular expressions to parse seems to be totally inefficient.

then again if its just for learning it totally fine \\o/

(if the parsing of one line would really take so long that it would be worth creating multiple thread correct design patter is as few locks as possible imo. so you would insert into configMap only once per thread, at the end.)

On a AMD Phenom x4 945 (+intel ssd) this code is up to 20 times faster in parsing than the dotconfpp mangos configuration class

how did you measure that?

Also what would be interesting is usage statistics of each thread, are threads used the same? (each thread does 25% of the config file with 4 threads) Or is one thread doing most of the work while the rest sleep away.

How does this class compare to the same without multithreading? ie. no linebuffer but calling the regexp (if you really want to use that which is really overkill, since initialization probably costs more then parsing the file) from

while(std::getline(this->fileStream, line)) 
   {
    parselinehere(line);
   }

Link to comment
Share on other sites

No, it wouldn't. At least not any sane way.

Multithreaded config parsing (ie. reading from the same file) makes no sense to me at all. If you do it performance-wise, it can take as much as 5ms with one thread. "The .reload problem" is more in re-reading data from a database, which can take several seconds, not about config parsing.

One of the fastest ways would be to use ie. scanf()/sscanf() or related function. It's very fast (from my experience), much faster than using regular expressions. One could make the config file format somewhat more strict and use sscanf() to read it, skipping comment lines with another if() condition. I was able to parse about 16000 "name=value" pairs in 146ms on my desktop computer, single thread.

When you chose to use regexps, you need to at least understand how are they handled. Each regexp must be parsed, pre-processed, compiled and then it can be executed. Doing this for each line is a ****load waste of resources. There are (at least in C) POSIX regexp functions regcomp() for regexp compilation, regexec() for regexp execution and regfree() for you - to free the compiled regexp. So you can compile a text representation of a regexp into it's binary form and then execute it as many times as you want, ie. for each line.

From the looks of your code, you're compiling the regexp for each line, which is where the slowdown (most likely) comes from.

edit: It seems that "searchPattern" is indeed a compiled regexp.

Link to comment
Share on other sites

edit: It seems that "searchPattern" is indeed a compiled regexp.

that's true

and...

I have already said that this was just programmed for learning reasons so sscanf(..) is faster and i never said that regular expressions make the performance boost

if we look at the code

while(std::getline(this->fileStream, line)) 
{
    parselinehere(line);
}

would end in:

* read line from file

* parse line

* waiting for parsing line to read next line

until all lines are parsed

with multiple threads we can read lines without waiting for parsing - on my hardware reading lines is very fast (ssd) and parsing lines is the time consuming thing. With multiple threads and a line buffer we can reduce the overhead to a minimum.

Link to comment
Share on other sites

imho you should read file in one go anyway, reading line by line is slower then reading everything at once iirc

There's one problem with that; buffer size. Since you're doing one read(), you have to store the result somewhere. That "somewhere" must already be allocated, so you would have to seek to EOF, measure the file size, allocate buffer for it, seek back and read it. Doing it line-by-line with, say, 255 chars per line (max) is somewhat easier.

Link to comment
Share on other sites

Well from what i gather with 2 min google, something like this should work too, no?

vector<string> text;
string line;
ifstream textstream ("config.txt");
while (getline(textstream, line)) {
  text.push_back(line);
}
textstream.close();
for (int i=0; i < text.size(); i++)
  parseline(text[i]);

Link to comment
Share on other sites

Well from what i gather with 2 min google, something like this should work too, no?

vector<string> text;
string line;
ifstream textstream ("config.txt");
while (getline(textstream, line)) {
  text.push_back(line);
}
textstream.close();
for (int i=0; i < text.size(); i++)
  parseline(text[i]);

look at my source, i don't make anything different.

Link to comment
Share on other sites

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue. Privacy Policy Terms of Use