On a server where the user authentication happens on a Windows Active Directory, I saw the following errors when a user tried to log in with SSH:
sshd: pam_winbind(sshd:account): valid_user: wbcGetpwnam gave WBC_ERR_DOMAIN_NOT_FOUND
A test of the current winbind settings with the command wbinfo showed that there is indeed a problem:
checking the trust secret for domain EXAMPLE via RPC calls failed
error code was NT_STATUS_DOMAIN_CONTROLLER_NOT_FOUND (0xc0000233)
Failed to call wbcCheckTrustCredentials: WBC_ERR_AUTH_ERROR
Could not check secret
I tried to join the machine to the domain again, but it failed:
net ads join -U EXAMPLE\aduser
Failed to join domain: failed to lookup DC info for domain 'EXAMPLE.COM' over rpc: NT_STATUS_CONNECTION_RESET
However the correct information was shown when net ads info was launched:
net ads info
LDAP server: 192.168.40.10
LDAP server name: DC001.example.com
Bind Path: dc=EXAMPLE,dc=COM
LDAP port: 389
KDC server: 192.168.40.10
Server time offset: 0
After a lot of googling and after having launched winbindd manually with a high debug level, I finally came across a blog post, which described similar problems and that they were solved by deleting the computer in the primary domain controller (PDC).
First I stopped the winbind daemon and verified that all processes were gone:
ps aux | grep winbind
Then I left the domain:
net ads leave -U aduser
Deleted account for 'LINUXSERVER' in realm 'EXAMPLE.COM'
I verified on the domain controller, that the computer really disappeared. Then I created a backup of /var/lib/samba and deleted all *tdb files:
cp -Rp /var/lib/samba /root/samba-tdb-bkp-$(date +%Y%m%d)
Now I joined the domain again:
net ads join -U aduser
Using short domain name -- EXAMPLE
Joined 'LINUXSERVER' to dns domain 'example.com'
This took a while (around 1-2 mins) and once done new tbd files have appeared in /var/lib/samba/.
The computer "LINUXSERVER" could now be found on the PDC again, in the default "Computers" folder.
Time to start winbind again:
... and verify if communication with the AD now works again:
checking the trust secret for domain EXAMPLE via RPC calls succeeded
From now on the SSH login was working again.