WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Commit 9ebff84

Browse files
committed
Added a lot of documentation and tweaked some links
1 parent e2866cc commit 9ebff84

22 files changed

+403
-227
lines changed

README.md

Lines changed: 2 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,3 @@
1-
# Website
1+
# SMOCS Documentation Website
22

3-
This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator.
4-
5-
## Installation
6-
7-
```bash
8-
yarn
9-
```
10-
11-
## Local Development
12-
13-
```bash
14-
yarn start
15-
```
16-
17-
This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.
18-
19-
## Build
20-
21-
```bash
22-
yarn build
23-
```
24-
25-
This command generates static content into the `build` directory and can be served using any static contents hosting service.
26-
27-
## Deployment
28-
29-
Using SSH:
30-
31-
```bash
32-
USE_SSH=true yarn deploy
33-
```
34-
35-
Not using SSH:
36-
37-
```bash
38-
GIT_USER=<Your GitHub username> yarn deploy
39-
```
40-
41-
If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch.
3+
This website is designed to be a wiki for the SMOCS repo and is built using [Docusaurus](https://docusaurus.io/), a modern static website generator.

blog/2019-05-28-first-blog-post.md

Lines changed: 0 additions & 12 deletions
This file was deleted.

blog/2019-05-29-long-blog-post.md

Lines changed: 0 additions & 44 deletions
This file was deleted.

blog/2021-08-01-mdx-blog-post.mdx

Lines changed: 0 additions & 24 deletions
This file was deleted.
-93.9 KB
Binary file not shown.

blog/2021-08-26-welcome/index.md

Lines changed: 0 additions & 29 deletions
This file was deleted.

blog/authors.yml

Lines changed: 0 additions & 25 deletions
This file was deleted.

blog/tags.yml

Lines changed: 0 additions & 19 deletions
This file was deleted.
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
## Agent Architecture: Multi-Component Systems
2+
3+
### Three-Thread Agent Design
4+
SMOCS agents represent the most complex architectural pattern, orchestrating three specialized threads that work in concert:
5+
6+
**Data Ingest Thread** (`DataIngestThreadBase``KafkaConsumerBase`):
7+
- Consumes real-time sensor data from Kafka topics
8+
- Implements agent-specific data parsing and validation
9+
- Stores structured data to MySQL using the shared database manager
10+
- Maintains data quality and provides input validation for downstream components
11+
12+
**ML Training Thread** (`MLTrainingThreadBase``KafkaProducerBase`):
13+
- Operates on a timer-based loop rather than message consumption
14+
- Queries the database for accumulated training data
15+
- Implements agent-specific model training and evaluation logic
16+
- Publishes training results and model metadata to Kafka for monitoring
17+
- Handles model versioning and persistence
18+
19+
**ML Inference Thread** (`MLInferenceThreadBase``KafkaStreamingProcessBase`):
20+
- Consumes real-time data for immediate inference
21+
- Automatically loads the latest trained models
22+
- Processes streaming data through the loaded models
23+
- Publishes inference results and anomaly detections to output topics
24+
- Provides the real-time decision-making capability of the agent
25+
26+
### Agent Orchestration
27+
The `AgentBase` class manages the complete agent lifecycle through several key phases:
28+
29+
**Initialization**: Creates database connections, registers the agent in the system, and prepares component configurations.
30+
31+
**Component Creation**: Uses abstract factory methods to instantiate the three specialized threads, allowing concrete agent implementations to define their specific component types.
32+
33+
**Thread Management**: Launches each component in its own thread with proper daemon configuration and maintains thread health monitoring.
34+
35+
**Health Monitoring**: Continuously checks thread vitality and automatically restarts failed components, ensuring agent resilience.
36+
37+
## Inter-Thread Coordination
38+
39+
### Database Communication Layer
40+
The three threads coordinate primarily through the shared MySQL database rather than direct communication. The Data Ingest Thread populates the raw data tables, the ML Training Thread consumes this data for model development, and the ML Inference Thread accesses the latest model artifacts.
41+
42+
### Temporal Decoupling
43+
This architecture creates natural temporal decoupling: data ingestion operates at sensor sampling rates, training occurs on longer time scales based on data accumulation, and inference happens at message arrival rates. Each thread can optimize for its specific timing requirements without impacting others.
44+
45+
### Resource Management
46+
Each thread manages its own resources (Kafka connections, database cursors, model memory) and implements appropriate cleanup procedures. The `AgentBase` class monitors thread health and can restart individual components without affecting others.
47+
48+
This three-thread architecture creates a complete machine learning pipeline that ingests streaming data, continuously improves models, and provides real-time intelligence - all while maintaining the simplicity and reliability principles that define the SMOCS platform.
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
### Core Responsibilities
2+
The Data Ingest Thread serves as the agent's primary interface to the streaming data ecosystem. Built on `KafkaConsumerBase`, it maintains a persistent connection to configured Kafka topics and transforms raw streaming messages into structured database records.
3+
4+
### Processing Pipeline
5+
The thread operates through a continuous polling loop that retrieves message batches from Kafka. Each message undergoes parsing to extract sensor readings, timestamps, and metadata. The `AutoencoderDataIngestThread` implementation demonstrates this pattern by parsing JSON messages containing gymnasium environment state data, extracting numeric state values while filtering out non-numeric metadata fieldsI wou.
6+
7+
### Data Storage Strategy
8+
Parsed data flows into MySQL through the shared `DBManager` instance. The thread uses the `record_sensor_data()` method to store timestamped sensor readings with proper data type conversion (numpy arrays to binary blobs). This creates a persistent training dataset that accumulates over time, providing the foundation for ML model development.
9+
10+
### Error Handling and Resilience
11+
The thread implements robust error handling at multiple levels: JSON parsing failures are logged but don't terminate processing, database connection issues trigger retry logic, and malformed messages are skipped with appropriate warnings. This ensures that transient data quality issues don't disrupt the overall data flow.
12+
13+
### Configuration Integration
14+
Topic subscriptions, parsing rules, and storage parameters are all driven by the central configuration file. This allows agents to be reconfigured for different data sources without code changes, supporting the system's flexibility goals.
15+
16+
## User Implementation Requirements
17+
18+
### Single Required Method
19+
```python
20+
def store_message(self, message, topic, partition, offset) -> bool:
21+
# Parse message, validate data, store to database
22+
# Return True for success, False for failure
23+
```
24+
25+
### Implementation Steps
26+
27+
1. **Parse Message**: Decode bytes to string, parse JSON/format
28+
2. **Extract Data**: Pull relevant fields (timestamps, sensor values, metadata)
29+
3. **Validate**: Check data types, handle missing fields, filter invalid data
30+
4. **Transform**: Convert to database schema format (arrays to numpy, timestamps to datetime)
31+
5. **Store**: Call `self.db_manager.record_sensor_data(data_dict)`
32+
6. **Return Status**: `True` if successful, `False` if failed
33+
34+
### Configuration Needed
35+
```python
36+
config = {
37+
'kafka_topics': {'input': 'your-topic-name'}
38+
}
39+
```
40+
41+
### Environment Variables Required
42+
- `KAFKA_BROKER_URL`
43+
- `MYSQL_HOST`, `MYSQL_PORT`, `MYSQL_USER`, `MYSQL_ROOT_PASSWORD`, `MYSQL_DATABASE`
44+
45+
## What's Handled Automatically
46+
47+
- Kafka consumer setup/teardown
48+
- Topic subscription and polling
49+
- Database connection management
50+
- Thread lifecycle and health monitoring
51+
- Error recovery and restart logic
52+
- Message batching and offset management
53+
54+
## Key Constraints
55+
56+
- **Single-threaded**: Keep `store_message()` fast and efficient
57+
- **Error handling**: Catch exceptions, log errors, return `False` for failures
58+
- **Database schema**: Match expected format for `record_sensor_data()`
59+
- **Memory management**: Don't accumulate state between messages
60+
61+
The base classes handle all infrastructure complexity - users only implement domain-specific data transformation logic.

0 commit comments

Comments
 (0)