Jump to content

[dev] New networking code based on "proactor" pattern


Guest faramir118

Recommended Posts

I finally got around to playing with an asynchronous network IO implementation.

I've only thought it through a little, so before I get carried away I figured I would open a discussion.

In a nutshell, the benefits are:

  • better scalability via non-blocking IO - read this paper for some explanation on why
  • reduced synchronization complexity - not really seeing this yet

the code: https://github.com/faramir118/mangos/tree/aio

And like my other project, I'll update this post occasionally.

Link to comment
Share on other sites

  • Replies 57
  • Created
  • Last Reply

Top Posters In This Topic

It can be nice to test it under load and see the actual difference.

It may not be as good as we imagine it to be, simply since (as far as I know), all async sockets are more or less "hidden" listening threads.

The way they implement non-blocking functions with callbacks are by running thread pool on the background transparent to user. Those threads do all the dirty, blocking work. When its done, callback called.

So, in reality, we don't save anything - we just use build-in lib's solution.

ACE still doing the dirty work behind the scenes. So, I don't know how much we really save here.

Maybe I'm totally wrong and ACE lib does some kind "dark magic" or simply implements it efficiently or differently.

But, those things require testing. You don't get anything for free.

In general, its really nice to see this attempt!

Cheers.

Link to comment
Share on other sites

I agree with qsa - we already implement 'async IO' by using thread pool doing durty work with synchronous sockets. Thats why, unfortunately, only tests can tell us if the gain will be noticable or not :/ Anyway, personal cheers for your attempt to bring true async networking IO to mangos :)

P.S. IMO, for networking code we already have acceptable level of performance. What do we need is stability since crashes in network code are really weird and annoying :(

Link to comment
Share on other sites

While it's true that ACE spawns a thread to block for IO completion, I think there's more possibilities with a system like this. (edit: referring to asynchronous system, not specifically ACE proactor)

For example, we could handle all DB writes, and maybe some reads via continuations, asynchronously. In this way and others, the mangos update cycle would get more dedicated time to do its thing.

The supposed efficiency of IO completion ports, RT signals, aio, etc might just be icing on the cake.

And it would be neat if mangos were 'multithreaded' out of the box ;)

Link to comment
Share on other sites

faeamir118, qsa, Ambal and others. Guys you are doing great job, but I want to ask you about "multy-threading" support for mysql. Have no idea now it's called really, but I mean - mangos working with several mysql threads. So job is divided into several streams. This will help against //bang crash.

BTW, what do you think about multi-threading for realmd? After any crash..many ppl try to connect and it take to many time.

P.S. sory for bad English, hope you understood what I'm talking about

Link to comment
Share on other sites

faeamir118, qsa, Ambal and others. Guys you are doing great job, but I want to ask you about "multy-threading" support for mysql. Have no idea now it's called really, but I mean - mangos working with several mysql threads. So job is divided into several streams. This will help against //bang crash.

BTW, what do you think about multi-threading for realmd? After any crash..many ppl try to connect and it take to many time.

P.S. sory for bad English, hope you understood what I'm talking about

I spoke with Ambal some time ago about a "pool" of characters db connections. He said that all we have to do is to create a thread pool of SQL threads implemented in SqlDelayThread and use it, but i don't have enought knowledge about how mysql transactions works in MaNGOS.

About realmd as i know since it use Ace socket systems is multithreaded. I have a dedicated cloud for realmd because if realmd doesn't have enough resources will get hanged. (Tested with 3000+ pl accesing to realms at the same time).

Link to comment
Share on other sites

We do use async SQL requests - they are mostly for INSERT statements. Also we already have a subsystem which is capable to post and process SQL query completion callbacks to the main thread.

Some words about several connections to DB - it is not so difficult to do, but this is not a straight-forward solution since ALL SQL requests from separate object must be made on dedicated SQL connection otherwise there might be BIG problems with data consistency even if we use transactions. Current design with single DB connection does not have this issue because there are no concurrent queries.

We can think of redesigning DB layer since it lacks some performance features like 'prepared statements', but this patch will be HUGE and will take alot of time to code and test.

P.S. At current stage 'prepared statements' will give us more performance bonuses than several DB connections. So my bets on this functionality. There was a patch made by Wyk3d loooooong ago but it didn't get into main repo, unfortunately.

Cheers :)

Link to comment
Share on other sites

Completed the implementation, now handles all network traffic if you set the config value I added.

Testing on Windows from a couple different computers... my client always eventually stops sending CMSG_TIME_SYNC_RESP - when this happens, the client stops updating even though it stays connected.

I can't figure out what is causing it... going to work on something else for a bit :)

Link to comment
Share on other sites

  • 2 weeks later...

Even though my implementation doesn't seem to work, and current network IO implementation is not a bottleneck, I figure someone out there is interested in learning (at least I am).

Did some digging, finally found the document I read a while back that explains some of the scalability benefits of asynchronous event demultiplexing (proactor pattern, or event based IO) vs synchronous event demultiplexing (reactor pattern):

http://www.hpl.hp.com/techreports/2000/HPL-2000-174.pdf

Keep in mind that their test environment is a) old and slow (400mhz) and b) uses a simple, accept based protocol that requires very little processing.

I'm doing a bit of digging in ACE code to see exactly what the ACE_Reactor and ACE_Acceptor implementations do.

Link to comment
Share on other sites

faramir118, ACE_Proactor class uses IO completion ports (IOCP) under Windows to do its tasks. On *nix it should use the same advanced techniques. So if you'll be able to make your patch work with "Proactor" pattern - we'll definitely accept it into the Core.

Cheers :)

Link to comment
Share on other sites

  • 3 weeks later...

I'll second that, Ambal!

faramir is likely up to his neck in map code, especially since he added backporting vmap v3 for MaNGOS One to his workload.

The idea for this patch seems like it would work great even if his code did not seem to offer any noticeable increase in efficiency. Perhaps it needs a larger test environment, such as a high-population server, to show any gains?

Link to comment
Share on other sites

No progress on this patch itself, just some effort towards better understanding client-side issues.

I have zero skill at asm reversing/decompiling, so I can't see what is going wrong from the client's perspective.

Server side + packet sniffing looks normal (other than the client not responding).

Created simple dumb client, it will at least give me an easier time to see what is received client-side. Can see it here: https://github.com/faramir118/MangosClient

Link to comment
Share on other sites

I know the PseuWoW client has been modified in the past to compile as a console-only program. crashuncle used such a modified version as a chat client for 2.43. Maybe shlainn could be persuaded to create a version to facilitate work like yours and other server-side code that deals with basic functions, a PseuWoW-Dev, if I may be so bold as to suggest a moniker.

Sorry I couldn't be of more help to you, but it is your brain sweat and time so you would know best what to do. I'll keep cheering you on from the sidelines. :)

Link to comment
Share on other sites

Full disclosure:

I had PseuWoW working a while ago... it was a lot more work than I had wanted, mostly resolving issues with dependencies and creating VS 2010 files from scratch. Irrlicht was in particularly bad shape, and I completely removed all traces of DirectX just to simplify things.

I would have been happy to continue on with this, but I did some 'housekeeping' on my dev machine and apparently deleted the only copy, so that work is gone. :/

Anyway, a large portion of the little client that I wrote was easily pieced together from PseuWoW, MaNGOS, and WCell - with good resources like those, it didn't take that much time or effort.

Link to comment
Share on other sites

That's too bad all that work was lost, but you're a mad genius for cobbling together a testing client from various sources in such short order! Hopefully, others will find it useful for testing their own patches.

This community owes you a great debt for everything you've accomplished, faramir. Thank you, most sincerely!

My server's hard drive burned out, but I do have an antiquated spare that should be adequate for getting things up and running so I can resume testing several patches, yours included. Can you provide some criteria and recommended Windows tools for gathering the data you need?

Link to comment
Share on other sites

Wouldn't simple login in empty world with and without patch until it fails at patched version be enough to track down error? You could just compare both packet logs and check what the difference is. Since only change is in network code the logs should be totally equal.

Link to comment
Share on other sites

I've experienced the same issues as you had, faramir118. It seemed like client was refusing or server not sending packets back to client. Also, your current scheme of crunching networking buffer for packets looks not very efficient. But this can be easily improved once issue with packet send will get fixed.

Link to comment
Share on other sites

Maybe fixed the issue.

What was it?

A key to shorten things:

CTS CMSG_TIME_SYNC_RESP

CMTS CMSG_MOVE_TIME_SKIPPED

In packet dumps, the behavior can be seen as a lack of CTS packets - client eventually stop sending that packet.

It seems that CMTS and CTS packets are related:

  • when the client sends CMTS, it is immediately before CTS
  • time diffs between CTS increase as values in CMTS packets increase

In several dumps, I could see that the last CTS was preceded by a CMTS with a very large value

What fixed it?

The fix is rather simple - I queue SMSG_TIME_SYNC_REQ packets to the head of the send queue.

This keeps the client synchronized when there are many queued packets

CMTS packets have much smaller values now and are less common, and the client seems happier (no freeze yet).

This is probably a hacky fix... I see that WCell uses the CMTS value in all outgoing player movement packets.

It may be nice to prioritize packets that destabilize the client if they aren't sent in a timely manner...

Bad news

This makes me think that asynchronous IO performs rather poorly, at least for sending.

Anyway, please let me know what you think.

Link to comment
Share on other sites

faramir118, I'll definitely check today if issue is resolved. Also, don't think that IOCP is slow - it is highly sensitive in case of async op notification receive delays. The less data you send - the more your performance might suffer because notification should be queued by OS and read by your thread (e.g using GetQueuedCompletionStatus() function call in Windows).

You have to do 2 things to improve your code and lower the impact of described flaws:

1) Read maximum amount of information from OS' TCP buffer and not just header-by-body style you use. It is harder since you need to crunch received data for packets, but you will always keep your TCP stack happy and avoid packet drop by client because of possible overflows.

2) Send maximum amount of data with one async write request. To achive this keep merging your outgoing packets into one single data buffer until you receive next async write op complete notification. This way OS will send as much data as possible + you will minimize the impact of thread scheduling while it is waiting for your async op to complete.

These simple ( atleast to say :) ) things will maximize server's throughput as well as minimize amount of async read/write requests :) Otherwise you might end up sending/receiving each packet once in 10-15 ms time which is a killer...

Cheers :)

Link to comment
Share on other sites

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue. Privacy Policy Terms of Use