Today I spent more time working on an assignment for MIT 6.824: Distributed Systems . I haven’t done much coding using threads in the past, certainly nothing very complex, so there were some new ideas to grasp.

I tried to use the Go race detector and it helped me to identify a number of obvious issues that I hadn’t really thought about.

I had something like this:

func Inner (s *S)(){
	s.mu.Lock()
	defer mu.Unlock()

	// do something to s 
    return
}

func Outer (s *S)(){
	s.mu.Lock()
	defer s.mu.Unlock()
	
	// do something to s 

	go s.Inner()
	
    return 
}

func main() {
	s := new(S)
	s.Outer()
	time.Sleep(10 * time.Second)
	fmt.Println(s.outer, s.inner)
}

I was confused when were 2 race conditions. My understanding was that Outer having the lock would execute first and Inner would acquire the lock once Outer had released the lock and it would then be able to execute.

I allowed a few seconds to elapse by which time I expected that the execution would be complete before adding the print statement.

But I forgot that s.outer and s.inner are read in the print statement and reading the output of the race detector carefully I realised that this was causing the race conditions picked up.

Thinking about this made me question what happens as a result of the return statement in Outer. Since Outer has returned in main supposing that it did not sleep for 10 seconds would main terminate before Inner? And I know you shouldn’t rely merely on waiting to ensure all threads have executed.

I confirmed this was the case by omitting time.Sleep and adding a sleep statement inside Inner to extend its execution time. As a result Inner never completed.

That made me think it was actually how I was dealing with Inner which was the reason for the race condition since my understanding is that defer ensures that the lock is released when the function returns. So you can’t be sure Inner has returned when fmt.Println accesses s. When I got rid of Inner, no race conditions were detected.

Some of these matters were covered in 6.824 so I reviewed some code examples from that. Then I changed it to look like this:

func Inner (s *S)(){
	s.mu.Lock()
	defer s.mu.Unlock()

	// do something to s 
    return
}

func Outer (s *S)(){
	s.mu.Lock()

	// do something to s
    
    s.mu.Unlock()

    var done sync.WaitGroup
	done.Add(1)

	go func(){
    		s.Inner()
        	done.Done()
    	}
	done.Wait()
    return 
}

func main() {
	s := new(S)
	s.Outer()
	time.Sleep(10 * time.Second)
	fmt.Println(s.outer, s.inner)
}

Now, as I understand it, what happens is that the WaitGroup ensures that done.Wait() does not terminate until done.Done is called. I also found that the defer in Outer could not be used anymore and I had to Unlock it. I think that otherwise Inner will wait forever to acquire the lock which can’t be released until everything else is finished including done.Wait which depends on Inner completing.

I am still not totally sure this is right but at least the race detector is no longer throwing warnings.