Why does this happen?

The following are test cases for how screen readers handle span vs div content, as well as inline vs block CSS. I tested with VoiceOver and the results are shown below every test case.

Test 1: Span followed by a div

<div>
      <span>Span Text</span>
      <div>Div Text</div>
    </div>
    
Span Text
Div Text

Result: read as two paragraphs - "Span Text (pause) Div Text"

Test 2: Span inside the div

<div>
      <div>
        <span>Span Text</span>
        Div Text
      </div>
    </div>
Span Text Div Text

Result: read as one paragraph - "Span Text Div Text"

Test 3: CSS Inline

<div>
      <span>Span Text</span>
      <div style="display:inline">Div Text</div>
    </div>
Span Text
Div Text

Result: read as one paragraph - "Span Text Div Text"

Why?

So, I think I understand why test 1 and test 2 do what they do. A div is considered as 'group' element, while a span is not a 'group' element. Before the Screen Reader reads a new group element, it knows to pause (for example, before starting a new paragraph). You can also see this when using the Accessibility Inspector in MacOS.

However, when it comes to test 3, I was a little surprised. CSS can have an affect on screen reader output, but I haven't seen anything in the standards about the CSS 'display' property.

This leads me to two questions:

  1. Is there a standard that describes when a screen reader should pause before continuing to another element?
  2. Is there a standard that describes the affect that CSS's 'display' property should have on screen reader output?