6DIMCOCO๏ƒ

Status: ๐ŸŸข Active
Language: Python
Category: Research Software
Repository: https://github.com/st7ma784/6dimcoco

Overview๏ƒ

6-dimensional multi-objective continuous optimization research code

Technologies & Tags๏ƒ

Optimization โ€ข Multi-objective โ€ข Research

Associated Publication๏ƒ

This repository contains the implementation and analysis code for the 6dimcoco-preprint preprint.

Recent Activity๏ƒ

Last Commit: 2025-09-02 by Stephen Mander Commit: 2cf9c7c0 - Upgrade upload-pages-artifact action to v3

Repository Documentation๏ƒ

6DIMCOCO: Multi-dimensional CLIP Training Framework๏ƒ

Tests Documentation Python 3.8+ PyTorch

A comprehensive research framework for training CLIP models with novel n-dimensional loss functions and advanced analysis techniques including CKA (Centered Kernel Alignment).

๐Ÿ”ฌ Research Focus๏ƒ

This framework enables research in:

  • Multi-dimensional CLIP Training: 3D, 4D, 6D, and custom dimensional configurations

  • Novel Loss Functions: 18+ mathematically rigorous loss function variants

  • CKA Analysis: Deep model comparison and understanding

  • Cross-modal Learning: Image-text and multilingual capabilities

  • Numerical Optimization: Stable training with proper gradient flow

โœจ Key Features๏ƒ

  • ๐Ÿงฎ Numerically Stable: All loss functions include stability checks and proper error handling

  • ๐Ÿ”ง Highly Configurable: Type-safe configuration system for reproducible experiments

  • ๐Ÿ“Š Advanced Analysis: Built-in CKA tools for model comparison

  • ๐Ÿงช Thoroughly Tested: Comprehensive test suite with 95%+ coverage

  • ๐Ÿ“š Well Documented: Complete API documentation with Sphinx

  • ๐ŸŒ Multilingual: Support for Chinese-English translation tasks

๐Ÿš€ Quick Start๏ƒ

Installation๏ƒ

git clone https://github.com/st7ma784/6DIMCOCO.git
cd 6DIMCOCO
pip install -r requirements.txt
pip install -e .

Basic Usage๏ƒ

# Run basic training
python scripts/run_training.py

# Run with wandb logging
python scripts/run_training.py --wandb

# Build datasets
python data_builders/BuildImagenet.py
python data_builders/BuildLAION.py
from src.config.base_config import ExperimentConfig
from src.losses import create_loss_function

# Create experiment configuration
config = ExperimentConfig()
config.model.dimensions = 6.0
config.training.learning_rate = 2e-3

# Create loss function
loss_fn = create_loss_function('norm_based', config=config.model)

# Use with your features
import torch
features = [torch.randn(32, 512) for _ in range(6)]
loss = loss_fn(*features)

Available Loss Functions๏ƒ

from src.losses import get_available_losses

losses = get_available_losses()
# Output:
# stock_clip: Standard CLIP contrastive loss
# einsum: Einstein summation based n-dimensional loss  
# euclidean_distance: Euclidean distance based loss with stability
# norm_based: Norm-based loss with multiple variants
# cosine_similarity: Cosine similarity based multi-dimensional loss

๐Ÿ“– Documentation๏ƒ

๐Ÿงช Testing๏ƒ

Run the comprehensive test suite:

# All tests
pytest tests/ -v

# Specific test categories
pytest tests/test_losses.py -v          # Loss function tests
pytest tests/test_config.py -v          # Configuration tests  
pytest tests/test_cka_analysis.py -v    # CKA analysis tests

# Skip GPU tests if no CUDA
pytest tests/ -m "not gpu" -v

๐Ÿ—๏ธ Architecture๏ƒ

Project Structure๏ƒ

6DIMCOCO/
โ”œโ”€โ”€ src/                    # Core source code
โ”‚   โ”œโ”€โ”€ config/            # Configuration management
โ”‚   โ””โ”€โ”€ losses/            # Loss function implementations
โ”œโ”€โ”€ model/                 # Model implementations
โ”œโ”€โ”€ scripts/               # Training and analysis scripts
โ”‚   โ”œโ”€โ”€ launch.py         # Main training orchestration
โ”‚   โ”œโ”€โ”€ run_training.py   # Entry point script
โ”‚   โ”œโ”€โ”€ CKA_*.py         # CKA analysis scripts
โ”‚   โ””โ”€โ”€ benchmark_cupy.py # Performance benchmarking
โ”œโ”€โ”€ data_builders/         # Dataset construction scripts
โ”‚   โ”œโ”€โ”€ BuildCNDataset.py # Chinese dataset builder
โ”‚   โ”œโ”€โ”€ BuildImagenet.py  # ImageNet dataset builder
โ”‚   โ””โ”€โ”€ Build*.py         # Other dataset builders
โ”œโ”€โ”€ notebooks/             # Jupyter notebooks for analysis
โ”œโ”€โ”€ results/               # Training results and plots
โ”œโ”€โ”€ experiments/           # Experimental configurations
โ”œโ”€โ”€ tests/                 # Test suite
โ”œโ”€โ”€ docs/                  # Documentation
โ”œโ”€โ”€ requirements.txt       # Dependencies
โ””โ”€โ”€ README.md             # This file

Configuration Management๏ƒ

Type-safe configuration system replacing hardcoded values:

@dataclass
class ModelConfig:
    embed_dim: int = 512
    dimensions: float = 6.0
    normalize_logits: bool = True
    # ... with validation

Testing Framework๏ƒ

Comprehensive testing addressing original issues:

  • โœ… Unit Tests: All loss functions and configurations

  • โœ… Integration Tests: End-to-end workflows

  • โœ… Numerical Stability: Edge cases and error handling

  • โœ… Mathematical Properties: Transpose invariance, symmetry

  • โœ… Performance Tests: Memory usage and gradient flow

๐Ÿ“Š Research Applications๏ƒ

This framework has been used for:

  • Multi-dimensional contrastive learning research

  • Cross-modal representation learning

  • Model architecture analysis via CKA

  • Chinese-English translation tasks

  • Numerical optimization in deep learning

๐Ÿ”ง Configuration๏ƒ

Model Configuration๏ƒ

config.model.dimensions = 6.0           # 3, 3.5, 4, 6, -1, 0
config.model.embed_dim = 512            # Embedding dimension
config.model.normalize_logits = True    # Feature normalization
config.model.loss_version = 0           # Legacy compatibility

Training Configuration๏ƒ

config.training.learning_rate = 2e-3
config.training.train_batch_size = 64
config.training.precision = 16          # Mixed precision
config.training.gradient_clip_val = 0.25

๐Ÿ› Issues Fixed๏ƒ

Original Testing Issues๏ƒ

  • โŒ Minimal test coverage (1 basic test)

  • โŒ No systematic validation

  • โŒ Hardcoded dependencies

  • โŒ No edge case handling

Now Fixed๏ƒ

  • โœ… Comprehensive test suite (95%+ coverage)

  • โœ… Systematic validation framework

  • โœ… Configurable dependencies

  • โœ… Robust error handling

Original Code Quality Issues๏ƒ

  • โŒ 600+ line monolithic loss file

  • โŒ Hardcoded API keys

  • โŒ Poor separation of concerns

  • โŒ Code duplication across 30+ model versions

Now Fixed๏ƒ

  • โœ… Modular, well-organized architecture

  • โœ… Secure configuration management

  • โœ… Clean separation of concerns

  • โœ… DRY principle with shared base classes

๐Ÿค Contributing๏ƒ

  1. Fork the repository

  2. Create a feature branch (git checkout -b feature/amazing-feature)

  3. Run tests (pytest tests/ -v)

  4. Commit changes (git commit -m 'Add amazing feature')

  5. Push to branch (git push origin feature/amazing-feature)

  6. Open a Pull Request

๐Ÿ“„ License๏ƒ

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ“š Citation๏ƒ

If you use this framework in your research, please cite:

@misc{6dimcoco2024,
  title={6DIMCOCO: Multi-dimensional CLIP Training Framework},
  author={PhD Research Project},
  year={2024},
  url={https://github.com/st7ma784/6DIMCOCO}
}

๐Ÿ™ Acknowledgments๏ƒ

  • Original research codebase and methodologies

  • PyTorch Lightning for training infrastructure

  • Weights & Biases for experiment tracking

  • The open-source community for inspiration and tools

Quick Start๏ƒ

# Clone the repository
git clone https://github.com/st7ma784/6dimcoco
cd 6DIMCOCO

# Install dependencies (if requirements.txt exists)
pip install -r requirements.txt

# Or install in development mode
pip install -e .