LG Pro:Centric TV Fleet Monitoring

10 min read

Quick summary

Fleet: 600+ TVs across multiple facilities, with near-real-time health and telemetry.
Database: After batching/bulk inserts, connections dropped 192/min → 6/min (97% reduction).
Access: Short-lived JWT access tokens (15 minutes), TOTP MFA for admins, PBKDF2 password hashing (600k iterations).
Monitoring: Zabbix integration via a small JSON metrics endpoint for alerting and trending.
Handoff: Runbooks, endpoint inventory, and upgrade notes via a disciplined changelog.

The context: With hundreds of patient-room TVs spread across facilities, "it seems fine" isn't a useful signal. We needed something that could tell ops what's failing (or drifting) before it turns into a pile of tickets.

This started as a small helper for field diagnostics and grew into a real platform: a .NET 8/C# service with a REST API, live WebSocket streams for dashboards, and an integration path into existing monitoring. The hard part wasn't the UI. It was making the ingestion, storage, and operations boring and predictable at fleet scale.

Chapter 1: The problem space

Hospital TV deployments are weird in the ways that matter: patient impact, tight change windows, and networks that are designed to say "no" by default. Firmware updates cannot interrupt rooms. Troubleshooting time matters. And some device-side protocols are not modern.

The starting point was reactive: wait for a call, walk someone through a manual check, and guess whether an update actually took. There was no single view of fleet health, and no clean way to push signals into the monitoring stack the team already used.

The goal was simple to state: visibility across the whole fleet (power, firmware, configuration drift, integration health) without weakening the security posture.

Chapter 2: Ingestion at scale

The platform centers on ingestion. TVs report health through a mix of channels, and not all of them are pleasant: vendor interfaces, long-lived connections, and device-specific event formats. The system had to tolerate flaky devices and noisy networks without turning the database into a bottleneck.

On the server side we batch work on purpose: aggregate events, parse them once, and write them efficiently. That keeps per-event overhead low while still feeling "live" in dashboards.

Device identity had to survive DHCP churn. We key devices off stable hardware attributes (not IP), so a TV can move and we keep its history without manual re-registration.

Telemetry is exposed via REST (integration/polling) and WebSockets (dashboards). The WebSocket side uses bounded queues so one slow client can't stall everyone else.

Chapter 3: Security and access

This lives in healthcare IT, so the security posture is part of the feature set. Auth uses short-lived JWT access tokens (15 minutes) with refresh tokens (7 days) stored in httpOnly cookies, with rotation so a stolen refresh token has a short useful life.

Passwords are hashed with PBKDF2 (600k iterations). Admin accounts can use TOTP MFA, and enforcement can be turned on per role.

Session handling sticks to sane defaults: timeouts, secure cookie flags, and brute-force backoff/lockout.

CORS is explicit. Dev stays on localhost, and production requires allowlisting the real dashboard origins.

For service-to-service integrations, API keys are scoped and revocable so we don't have to reuse user credentials.

Chapter 4: Making the database keep up

Telemetry at 600+ devices will happily melt a naive database write path. Early on we were opening too many connections and doing too many tiny inserts. At fleet scale that turned into 192 DB connections/min and a lot of unnecessary roundtrips.

The fixes were straightforward: consolidate connections per flush, batch inserts into multi-row statements, and tune flush intervals so we weren't thrashing the server for no operational gain.

The end result: connections dropped from 192/min to 6/min (97% reduction), command count dropped, and a class of intermittent transaction conflicts stopped showing up in normal operation.

On top of that we added transient failure handling (timeouts, brief connection issues) with bounded retries and backoff, so the system rides out hiccups without hiding persistent problems.

Chapter 5: Fleet-scale operations

Things that are "fine" at 50 devices become problems at 600. Health checks turn into background load, discovery scans can saturate a segment, and small inefficiencies add up quickly.

We ended up with pragmatic guardrails: cap fanout, keep concurrency sane, and avoid "thundering herd" behavior when something goes sideways.

Where we can, we reuse connections and apply per-device backoff so a network blip doesn't cause the entire fleet to retry at once.

Long-running jobs are hosted services with explicit startup/shutdown so restarts are clean and failures degrade predictably.

Chapter 6: Network reality and monitoring

Hospitals are segmented for good reasons. In practice that meant a TV-management segment for device communication, an IT ops segment for DB/monitoring, and a staff segment for dashboards.

Some TV management protocols are cleartext (device limitation, not wishful thinking). Segmentation and firewall allowlists keep that traffic in the right place and out of patient-facing networks.

From a compliance standpoint, we focused on the stuff auditors and operators actually ask for: unique users, timeouts, audit trails for admin actions, and HTTPS for dashboards in production.

For monitoring, we pushed health into Zabbix via a small JSON metrics endpoint (device connectivity, firmware status, alert counts, backup verification). That keeps alerts and trends in the same place as the rest of the hospital infrastructure.

Zabbix integration, kept simple. No agents and no coupling to the UI: just an HTTP-polled JSON endpoint. We also added a backup verifier that publishes signals ops can alert on (age, count, last verification, basic size anomalies).

Chapter 7: Offline deployment hardening

One of the most time-consuming parts was not feature work. It was making installs succeed in constrained environments: segmented networks, change windows, machines without internet access, and teams that need repeatable deployments.

The installer was an offline bundle that set up web components and services. It also shipped encrypted environment values, which is good for security but easy to get wrong: some values assumed drive letters, paths, and hostnames that were not true everywhere.

The hardening work turned a "one environment" bundle into something portable:

1) Remove fixed paths. Introduced an install root and created required directories automatically. Where legacy paths leaked through, post-deploy repair steps rewrote config safely.

2) Detect host identity. Added automatic host/IP detection (with an override) so we did not have to ship a different bundle per site.

3) Add a safe override mechanism. Used a small local JSON overlay for non-secret overrides (paths, URLs, toggles) without breaking encrypted config handling.

4) Make the web server setup predictable. Added elevation checks, fixed physical path mismatches, and repaired ACLs so static content worked reliably.

5) Add a repair mode. A single "repair" run re-applies the critical fixes after upgrades without redoing the whole install.

6) Handle the real failure modes. Port conflicts, partial IIS config failures, missing directories, and ACL problems now fail fast with clear output (and are repairable).

Chapter 8: Documentation and handoff

For systems that live in hospital IT, "done" includes operations. This shipped with documentation aimed at the people who run and troubleshoot it day-to-day.

What we documented: endpoint inventory for integrations, troubleshooting for common field failures (ports, connectivity, persistence), scaling notes, and network/security deployment guidance for segmented environments.

Changelog discipline. Clear "Added/Changed/Fixed" entries so operations can reason about upgrades (telemetry volume, auth behavior, DB load) before rolling changes through a fleet.

Next: tighten the hot paths (allocations and broadcast batching), add more per-device counters for operations, and keep trimming operational friction as fleets and integrations grow.

Impact

This moved TV operations out of purely reactive mode. Staff can see fleet health in one place, catch update failures earlier, and track integration issues without starting every session from scratch. The security/performance work made it practical at 600+ devices, and the deployment hardening made it maintainable on real hospital networks.

.NET 8 C# ASP.NET Core SQL Server REST API WebSocket Streaming JWT Authentication TOTP MFA Zabbix Integration LG Pro:Centric Chrome DevTools Protocol HIPAA Compliance Healthcare IT Real-time Telemetry