Skip to content
GitLab
    • Explore Projects Groups Snippets
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • Q quickfix
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 48
    • Issues 48
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 25
    • Merge requests 25
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • quickfixgo
  • quickfix
  • Merge requests
  • !329
An error occurred while fetching the assigned milestone of the selected merge_request.

Change EventTimer to reset its Timer without requiring a reset channel

  • Review changes

  • Download
  • Email patches
  • Plain diff
Administrator requested to merge github/fork/johndydo/master into master 7 years ago
  • Overview 2
  • Commits 1
  • Pipelines 0
  • Changes 1

Created by: johndydo

Sending multiple messages in rapid succession (or concurrently) causes deadlock.

While building an Acceptor implementation, I experienced the symptoms of a deadlock when the server happened to receive two requests in quick succession. The messages would be processed on the same goroutine as the FromApp callback and a response sent back using SendToTarget. The session thread got stuck in the second call to SendToTarget.

To debug the issue, I created an Initiator (which runs in a separate process on the same machine) that generates many requests:

for {
	time.Sleep(time.Second * 10)

	for i := 0; i < 10; i++ {
		go func() {
			log.Println("before send")
			must(quickfix.SendToTarget(NewRequestMessage(), sessionId))
			log.Println("after send")
		}()
	}
}

Surprisingly the Initiator would also deadlock quickly and stop printing "after send".

The deadlock occurs between the session thread and the EventTimer thread (for the stateTimer heartbeat timer).

Here is a diagram of the two threads:

     Session          EventTimer stateTimer
     ------          ----------------------
t=0  select          select
t=1	             case Timer fires
	             EventTimer.f()
	             blocking send to unbuffered channel sessionEvent
	             --- waiting for Session to synchronously receive from sessionEvent ---
	             --- never gets to select case which would read reset channel ---
t=2  case messageEvent
     SendAppMessages() with at least 2 messages
t=3  send msg #1 to socket
t=4  stateTimer.Reset(s.HeartBtInt)
     buffered send to Timer's reset channel (fills the buffer, but does not block)
t=5  send msg #2 to socket
t=6  stateTimer.Reset(s.HeartBtInt)
     buffered send to Timer's reset channel (blocks since the channel buffer is full)
     --- send is waiting for Timer's reset channel to not be full ---
     --- never gets to the select case which reads sessionEvent ---

In summary, the session thread's select happens to enter the case for messageEvent, but there is a race with the timer thread which enters the case for executing the timer action.

The timer's action is to send to an unbuffered channel that can only get read when the session thread is in a different select case than it is currently in.

If there happens to be more than 1 message to send, then the session thread will call EventTimer.Reset more than once, which will guarantee that the reset channel will become full and now these threads are deadlocked.

The deadlock won't happen if EventTimer.Reset simply resets the timer in a nonblocking fashion.

Past commit https://github.com/quickfixgo/quickfix/commit/b75fbeaf62d9e9b242c0456681e6f5b0c887380b#diff-8dc0bfb75b5413b6f4ed2d880e648ad9 moved from using time.After to the current implementation.

This PR changes the EventTimer to instead use time.Timer directly, without the need for a reset channel, as any thread can call Reset on it to a future time, without the possibility of getting blocked.

Loading
Loading

Activity


  • Administrator mentioned in issue #287 (closed) 7 years ago

    mentioned in issue #287 (closed)

  • Administrator
    Administrator @root · 7 years ago
    Author Contributor

    Created by: cbusbey

    :thumbsup: :thumbsup: :thumbsup:

    Thanks for digging into this, @johndydo !

  • Administrator
    Administrator @root · 7 years ago
    Author Contributor

    Merged by: cbusbey at 2018-03-23 11:33:04 UTC

  • Administrator closed 7 years ago

    closed

  • Administrator added 1 deleted label 7 years ago

    added 1 deleted label

Please register or sign in to reply
0 Assignees
None
Assign to
0 Reviewers
None
Request review from
Labels
0
None
0
None
    Assign labels
  • Manage project labels

Milestone
No milestone
None
None
Time tracking
No estimate or time spent
Lock merge request
Unlocked
0
0 participants
Reference:
Source branch: github/fork/johndydo/master

Menu

Explore Projects Groups Snippets