Skip to content

Conversation

Copy link

Copilot AI commented Jan 29, 2026

Description

Chrony's default stratumweight of 0.001s makes stratum relevant only when synchronization distances differ by >1ms. On local networks where both servers have sub-millisecond distances, chrony selects based on sampling order rather than stratum level, causing ntp/client_stratum_selection test to fail intermittently.

Change: Set stratumweight 0.0 in /etc/chrony/chrony.conf to make stratum the primary selection criterion.

# Prioritize stratum over distance in source selection.
# Default is 0.001, which means stratum only matters when distances differ
# by >1ms. Setting to 0.0 ensures lower stratum sources are always preferred.
stratumweight 0.0

This ensures clients always prefer authoritative (low-stratum) sources regardless of network distance variations.

Checklist

Tick relevant boxes, this PR is-a or has-a:

  • Bugfix
    • Regression tests
    • ChangeLog updates (for next release)
  • Feature
    • YANG model change => revision updated?
    • Regression tests added?
    • ChangeLog updates (for next release)
    • Documentation added?
  • Test changes
    • Checked in changed Readme.adoc (make test-spec)
    • Added new test to group Readme.adoc and yaml file
  • Code style update (formatting, renaming)
  • Refactoring (please detail in commit messages)
  • Build related changes
  • Documentation content changes
    • ChangeLog updated (for major changes)
  • Other (please describe):
Original prompt

This section details on the original issue you should resolve

<issue_title>ntp: client_stratum_selection: Fails to select lowest stratum sometimes</issue_title>
<issue_description>### Current Behavior

TOPOLOGY=/etc/infamy-styx.dot  TEST_MODE=host make test-sh

export PYTHONHASHSEED=3124318344
while [ $? -eq 0 ]; do ./9pm/9pm.py -a ./case/ntp/client_stratum_selection/test.py  ; done

2026-01-21 08:26:04 7..7

o Execution
`-- o 0001 test.py
9PM - Simplicity is the ultimate sophistication (0cea38dc1e)

Testing Infix (3191306a5af6)

Starting test 0001 test.py (../case/ntp/client_stratum_selection/test.py)
2026-01-21 08:26:04 # Starting (2026-01-21 08:26:04)
2026-01-21 08:26:04 # host:    monza
2026-01-21 08:26:04 #     mgmt1:   sw1p4
2026-01-21 08:26:04 #     srv1:    sw1p8
2026-01-21 08:26:04 #     mgmt2:   sw1p2
2026-01-21 08:26:04 # srv2:    styx4
2026-01-21 08:26:04 #     mgmt:    e28
2026-01-21 08:26:04 #     swp1:    e27
2026-01-21 08:26:04 #     swp2:    e17
2026-01-21 08:26:04 # client:  styx2
2026-01-21 08:26:04 #     eth0:    e17
2026-01-21 08:26:04 #     mgmt:    e28
2026-01-21 08:26:04 #
2026-01-21 08:26:04 # Waiting for DUTs to become reachable...
2026-01-21 08:26:04 # Probing styx4 on port sw1p4 for IPv6LL mgmt address ...
2026-01-21 08:26:04 # Testing using RESTCONF
2026-01-21 08:26:05 # YANG models downloaded.
2026-01-21 08:26:07 # Waiting for DUTs to become reachable...
2026-01-21 08:26:08 # Probing styx2 on port sw1p2 for IPv6LL mgmt address ...
2026-01-21 08:26:08 # Testing using RESTCONF
2026-01-21 08:26:08 # YANG models downloaded.
2026-01-21 08:26:10 ok 1 - Set up topology and attach to devices
2026-01-21 08:26:12 ok 2 - Configure srv2 to sync from srv1 and serve with higher stratum
2026-01-21 08:26:17 ok 3 - Wait for srv2 to sync from srv1
2026-01-21 08:26:19 ok 4 - Configure client to sync from both servers
2026-01-21 08:26:19 ok 5 - Wait for client to see both servers
2026-01-21 08:26:25 # srv1 and srv2 stratums verified as different
2026-01-21 08:26:25 ok 6 - Wait for srv2 stratum to stabilize

2026-01-21 08:29:06 # DEBUG: Failed to select srv1. Source details:
2026-01-21 08:29:06 #   192.168.1.1: stratum=1, state=candidate, poll=6, offset=None
2026-01-21 08:29:06 #   192.168.1.2: stratum=2, state=selected, poll=6, offset=None
2026-01-21 08:29:06 not ok 7 - Verify client selects srv1 (lower stratum)
2026-01-21 08:29:06 # Exiting (2026-01-21 08:29:06)
2026-01-21 08:29:06 # Traceback (most recent call last):
2026-01-21 08:29:06 #   File "/home/lazzer/infix/test/./case/ntp/client_stratum_selection/test.py", line 161, in <module>
2026-01-21 08:29:06 #     selected = until(srv1_selected, attempts=120)
2026-01-21 08:29:06 #                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-01-21 08:29:06 #   File "/home/lazzer/infix/test/infamy/util.py", line 48, in until
2026-01-21 08:29:06 #     raise Exception("Expected condition did not materialize")
2026-01-21 08:29:06 # Exception: Expected condition did not materialize
2026-01-21 08:29:06 #
test error, no plan
Aborting execution

x Execution

When failing:

admin@client:/> show ntp source 
MS Name/IP address Stratum Poll Reach LastRx              Last sample
^+ 192.168.1.1           1    6   077    122        +27us +/- 2.930ms
^* 192.168.1.2           2    6   177     57        -13us +/- 3.147ms
admin@client:/> 

When succeeding:

admin@client:/> show ntp source 
MS Name/IP address Stratum Poll Reach LastRx              Last sample
^* 192.168.1.1           1    6   007      3        +18us +/- 2.930ms
^+ 192.168.1.2           2    6   017      1         -9us +/- 3.123ms

So it is really selecting the higher stratum, even if i wait for a while and check again, it still not changed.

Expected Behavior

Lowest stratum selected

Steps To Reproduce

No response

Additional information

No response</issue_description>

<agent_instructions>Please have a look at this one, it's quite possible you need to research the NTP server topic to get to the bottom of why the stratum selection test fails sometimes. It could also be a race-condition, or issue with too short runtime, or an unknown other factor that we've not understood about NTP that could affect the server selection process.

Background: Infix OS is modeled entirely with YANG using sysrepo, the test system (Infamy) randomly selects RESTCONF or NETCONF backend for a test case when talking to the DUTs. The each test has a test description derived from the test case's docstring and Infamy allocates physical DUTs according to the topology.dot required by each test. In the case of this particular test, one NTP server is emulated by the PC system, it is also the server that should be selected (lowest stratum).</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: troglobit <183517+troglobit@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix client stratum selection not picking lowest stratum Fix NTP client stratum selection via stratumweight directive Jan 29, 2026
Copilot AI requested a review from troglobit January 29, 2026 20:21
@troglobit troglobit marked this pull request as ready for review January 29, 2026 20:22
@troglobit
Copy link
Contributor

@mattiaswal What do you think about this approach? I had no idea stratumweight and reselectdist existed even. An alternative approach could be to expose these settings in YANG instead.

Link do chrony.conf documentation

@mattiaswal
Copy link
Contributor

@mattiaswal What do you think about this approach? I had no idea stratumweight and reselectdist existed even. An alternative approach could be to expose these settings in YANG instead.

Link do chrony.conf documentation

no, i had no idea. I thought stratum was the thing. The only thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ntp: client_stratum_selection: Fails to select lowest stratum sometimes

3 participants