Move input coroutine management variables to TLS#11576
Open
runderwo wants to merge 4 commits intofluent:masterfrom
Open
Move input coroutine management variables to TLS#11576runderwo wants to merge 4 commits intofluent:masterfrom
runderwo wants to merge 4 commits intofluent:masterfrom
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In c98a68e, coroutine state was moved to thread-local storage. However, the variables introduced to manage thread-local coroutine state were not themselves thread-local, which is a problem because
flb_input_coro_collect()is not guarded against re-entrancy. This results in a race when multiple threads wake up from polling and run, because they then both read (e.g. ininput_pre_cb_collect()) and overwrite (e.g. inflb_input_coro_collect()) the TLS key used to look up the stored coroutine state (libco_in_param_key), without synchronization.There are almost certainly additional bugs here but this at least mitigates the race condition. The error handling logic in
input_pre_cb_collect()was removed and replaced with an assert because the recovery path made no sense, and it's not clear there is in fact a sensible recovery path given the code structure.Most likely this is related to #11488 and may be the root cause.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit