Phase 1: Database Schema Design
Duration: 1 week
Dependencies: None
Objective: Update existing Django models to support read-only archive
⚠️ Important: Current Codebase State
This phase updates EXISTING models, not creating new ones.
Current State (as of 2024-12-05)
- ✅ Models exist:
Channel,Post,Commentinchannels/models.py - ✅ Using
Base36IntegerFieldforpost_idandcomment_id(unique=True) - ✅
scorefield exists asBigIntegerField(null=True)on Post and Comment - ✅
plain_textexists as@propertyon Post (lines 267-275) - ❌ No
reddit_idCharField fields - ❌ No
archived_onfields - ❌ No
CommentTreeNodemodel - ❌ No django-treebeard dependency
Overview
This phase updates existing Django models to support a read-only archive. The key changes are:
- Add
reddit_idfields - New CharField to store base36 Reddit IDs for reference - Modify
scorefields - Make non-nullable with default=0 - Add
archived_onfields - Track when content was archived - Convert
plain_textto stored field - Stop computing on every read - Create
CommentTreeNodemodel - New model for efficient tree traversal - Add database indexes - Optimize read performance
Critical: We keep existing post_id and comment_id fields during migration, removing them only after data migration completes (Phase 2).
Technical Requirements
1. Update Existing Models
1.1 Update Channel Model
File: channels/models.py (starts at line 134)
Current state:
class Channel(TimestampedModel):
name = models.CharField(unique=True, max_length=100)
title = models.CharField(max_length=100, null=True)
# ... other fields exist ...
allowed_post_types = BitField(flags=VALID_EXTENDED_POST_CHOICES, null=True)
channel_type = models.CharField(max_length=20, choices=VALID_CHANNEL_CHOICES, null=True)
Changes needed:
class Channel(TimestampedModel):
# ... existing fields stay ...
# ADD these new fields:
reddit_id = models.CharField(
max_length=20,
null=True,
blank=True,
unique=True,
db_index=True,
help_text="Original Reddit subreddit ID (base36)"
)
archived_on = models.DateTimeField(
null=True,
blank=True,
help_text="Timestamp when this channel was archived from Reddit"
)
# KEEP allowed_post_types for now (remove in Phase 7)
# KEEP all existing fields
1.2 Update Post Model
File: channels/models.py (starts at line 243)
Current state:
class Post(TimestampedModel):
channel = models.ForeignKey(Channel, on_delete=models.CASCADE)
author = models.ForeignKey(User, null=True, on_delete=models.CASCADE)
link_meta = models.ForeignKey(LinkMeta, null=True, on_delete=models.CASCADE)
post_id = Base36IntegerField(unique=True) # Current primary identifier
post_type = models.CharField(max_length=10, choices=VALID_EXTENDED_POST_CHOICES, null=True)
title = models.CharField(max_length=300, null=True)
text = models.TextField(null=True)
preview_text = models.TextField(null=True)
url = models.URLField(max_length=2048, null=True)
score = models.BigIntegerField(null=True) # ALREADY EXISTS
num_comments = models.BigIntegerField(null=True)
edited = models.BooleanField(null=True)
removed = models.BooleanField(null=True)
deleted = models.BooleanField(null=True)
@property
def plain_text(self): # Currently computed on every access
# ... computation logic ...
Changes needed:
class Post(TimestampedModel):
# ... existing fields stay ...
# KEEP post_id for now (will remove in Phase 7 after migration)
post_id = Base36IntegerField(unique=True)
# ADD reddit_id (will eventually replace post_id)
reddit_id = models.CharField(
max_length=20,
null=True, # Null during migration, make non-null after
blank=True,
unique=True,
db_index=True,
help_text="Original Reddit post ID (base36)"
)
# MODIFY score: make non-nullable with default
# Change from: score = models.BigIntegerField(null=True)
# To:
score = models.IntegerField(default=0, help_text="Score frozen at archive time")
# ADD plain_text as stored field (remove @property)
plain_text = models.TextField(
null=True,
blank=True,
help_text="Cached plain text version for search/display"
)
# ADD archived_on
archived_on = models.DateTimeField(
null=True,
blank=True,
help_text="Timestamp when this post was archived from Reddit"
)
# ADD created field if not exists (check current model first)
created = models.DateTimeField(
null=True, # Nullable for migration, populate from Reddit
db_index=True
)
# REMOVE @property plain_text method (move logic to migration)
1.3 Update Comment Model
File: channels/models.py (starts at line 305)
Current state:
class Comment(TimestampedModel):
post = models.ForeignKey(Post, on_delete=models.CASCADE)
author = models.ForeignKey(User, null=True, on_delete=models.CASCADE)
comment_id = Base36IntegerField(unique=True) # Current primary identifier
parent_id = Base36IntegerField(null=True)
text = models.TextField(null=True)
score = models.BigIntegerField(null=True) # ALREADY EXISTS
edited = models.BooleanField(null=True)
removed = models.BooleanField(null=True)
deleted = models.BooleanField(null=True)
Changes needed:
class Comment(TimestampedModel):
# ... existing fields stay ...
# KEEP comment_id for now (will remove in Phase 7)
comment_id = Base36IntegerField(unique=True)
# ADD reddit_id (will eventually replace comment_id)
reddit_id = models.CharField(
max_length=20,
null=True, # Null during migration
blank=True,
unique=True,
db_index=True,
help_text="Original Reddit comment ID (base36)"
)
# MODIFY parent_id to store reddit_id of parent (CharField not Base36)
# Keep old parent_id temporarily, add new one:
parent_reddit_id = models.CharField(
max_length=20,
null=True,
blank=True,
db_index=True,
help_text="Parent comment reddit_id (CharField)"
)
# MODIFY score: make non-nullable with default
score = models.IntegerField(default=0, help_text="Score frozen at archive time")
# ADD created field if not exists
created = models.DateTimeField(
null=True, # Nullable for migration
db_index=True
)
# ADD archived_on
archived_on = models.DateTimeField(
null=True,
blank=True,
help_text="Timestamp when this comment was archived from Reddit"
)
1.4 Create CommentTreeNode Model
File: channels/models.py
Changes:
- Remove dependency on Reddit IDs
- Simplify to read-only fields
- Keep existing fields that contain valuable metadata
Current Channel Model Issues:
- Primary key should be auto-incrementing, not Reddit-based
allowed_post_typesnot needed for read-only archive
Required Changes:
class Channel(TimestampedModel):
"""Channel model - read-only archive"""
# Change from implicit to explicit auto ID
id = models.AutoField(primary_key=True)
# Keep existing fields
name = models.CharField(unique=True, max_length=100)
title = models.CharField(max_length=100, null=True)
membership_is_managed = models.BooleanField(default=False)
# Images
avatar = models.ImageField(null=True, max_length=2083, upload_to=avatar_uri)
avatar_small = models.ImageField(null=True, max_length=2083, upload_to=avatar_uri_small, blank=True)
avatar_medium = models.ImageField(null=True, max_length=2083, upload_to=avatar_uri_medium, blank=True)
banner = models.ImageField(null=True, max_length=2083, upload_to=banner_uri)
# Metadata
public_description = models.TextField(blank=True)
channel_type = models.CharField(max_length=50, choices=VALID_CHANNEL_CHOICES)
# Archive metadata
reddit_id = models.CharField(max_length=20, null=True, blank=True, help_text="Original Reddit ID for reference")
archived_on = models.DateTimeField(null=True, blank=True, help_text="Date this channel was archived from Reddit")
# REMOVE: allowed_post_types (BitField) - not needed for read-only
# Keep widget_list if needed for display
widget_list = models.OneToOneField(...)
1.2 Update Post Model
File: channels/models.py
Changes:
- Make
idthe primary key (auto-increment) - Add
reddit_idfield to preserve original Reddit post ID - Add
scoreas static field (frozen at migration) - Simplify text fields
- Remove voting-related fields
Current Post Model Issues:
post_iduses Base36IntegerField tied to Reddit- No static score field
- Multiple text representations need consolidation
Required Changes:
class Post(TimestampedModel):
"""Post model - read-only archive"""
# New auto-incrementing primary key
id = models.AutoField(primary_key=True)
# Original Reddit ID for reference
reddit_id = models.CharField(max_length=20, unique=True, db_index=True,
help_text="Original Reddit post ID (base36)")
# Relationships
channel = models.ForeignKey(Channel, on_delete=models.CASCADE, related_name='posts')
author = models.ForeignKey(User, on_delete=models.SET_NULL, null=True, related_name='posts')
# Content
title = models.CharField(max_length=300)
url = models.URLField(null=True, max_length=2083)
text = models.TextField(null=True, blank=True) # For text/article posts
plain_text = models.TextField(null=True, blank=True, help_text="Plain text version for search/display")
post_type = models.CharField(max_length=50)
# Article support
article = models.OneToOneField(Article, null=True, on_delete=models.SET_NULL, related_name='post')
link_meta = models.OneToOneField(LinkMeta, null=True, on_delete=models.SET_NULL)
# Static metadata (frozen at migration)
score = models.IntegerField(default=0, help_text="Score frozen at archive time")
num_comments = models.IntegerField(default=0)
created = models.DateTimeField(db_index=True)
# Moderation flags (frozen state)
removed = models.BooleanField(default=False)
deleted = models.BooleanField(default=False)
stickied = models.BooleanField(default=False)
# Archive metadata
archived_on = models.DateTimeField(null=True, blank=True)
# REMOVE: upvoted (was user-specific, not needed)
# REMOVE: thumbnail (can derive from link_meta)
class Meta:
indexes = [
models.Index(fields=['channel', '-score', '-created']),
models.Index(fields=['author', '-created']),
models.Index(fields=['-created']),
models.Index(fields=['-score']),
]
1.3 Update Comment Model
File: channels/models.py
Changes:
- Make
idthe primary key (auto-increment) - Add
reddit_idfield to preserve original Reddit comment ID - Add
scoreas static field - Simplify metadata
Required Changes:
class Comment(TimestampedModel):
"""Comment model - read-only archive"""
# New auto-incrementing primary key
id = models.AutoField(primary_key=True)
# Original Reddit ID for reference
reddit_id = models.CharField(max_length=20, unique=True, db_index=True,
help_text="Original Reddit comment ID (base36)")
# Relationships
post = models.ForeignKey(Post, on_delete=models.CASCADE, related_name='comments')
author = models.ForeignKey(User, on_delete=models.SET_NULL, null=True, related_name='comments')
parent_id = models.CharField(max_length=20, null=True, blank=True,
help_text="Parent comment reddit_id")
# Content
text = models.TextField()
# Static metadata (frozen at migration)
score = models.IntegerField(default=0, help_text="Score frozen at archive time")
created = models.DateTimeField(db_index=True)
# Flags (frozen state)
removed = models.BooleanField(default=False)
deleted = models.BooleanField(default=False)
edited = models.BooleanField(default=False)
# Archive metadata
archived_on = models.DateTimeField(null=True, blank=True)
class Meta:
indexes = [
models.Index(fields=['post', '-score', '-created']),
models.Index(fields=['author', '-created']),
]
1.4 Create CommentTreeNode Model
File: channels/models.py or new channels/comment_tree.py
This is a NEW model (doesn’t exist yet)
Location: Add after Comment model definition in channels/models.py
from treebeard.mp_tree import MP_Node
class CommentTreeNode(MP_Node):
"""
Materialized path tree for comments sorted by score.
One tree per post for efficient traversal.
Uses django-treebeard for tree operations.
Each post has one root node (comment=None) with all comments as descendants.
"""
# Customize alphabet for PostgreSQL efficiency (62 chars)
alphabet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
# Post relationship
post = models.ForeignKey(
Post,
on_delete=models.CASCADE,
related_name='tree_nodes',
db_index=True,
help_text="Post this tree belongs to"
)
# Comment relationship (null for root node)
comment = models.OneToOneField(
Comment,
on_delete=models.CASCADE,
null=True,
blank=True,
related_name='tree_node',
help_text="Comment this node represents (null for root)"
)
# Denormalized sort fields (frozen at archive time)
score = models.IntegerField(
default=0,
help_text="Denormalized score for sorting"
)
created = models.DateTimeField(
help_text="Denormalized created timestamp for sorting"
)
# Tree sorting: score DESC, then created DESC (newest first if tied)
node_order_by = ['-score', '-created']
class Meta:
indexes = [
models.Index(fields=['post', 'path'], name='tree_post_path_idx'),
models.Index(fields=['post', 'depth'], name='tree_post_depth_idx'),
]
verbose_name = 'Comment Tree Node'
verbose_name_plural = 'Comment Tree Nodes'
def __str__(self):
if self.comment:
return f"TreeNode for Comment {self.comment.reddit_id or self.comment.comment_id}"
return f"Root TreeNode for Post {self.post.reddit_id or self.post.post_id}"
Note: Treebeard’s MP_Node provides these fields automatically:
path- CharField for materialized pathdepth- PositiveIntegerField for tree depthnumchild- PositiveIntegerField for number of childrenid- AutoField primary key
2. What NOT to Remove (Keep for Migration)
CRITICAL: Do not remove these during Phase 1. They will be removed in Phase 7 after migration completes.
Keep These Models
- ✅
RedditRefreshToken- Remove in Phase 7 - ✅
RedditAccessToken- Remove in Phase 7 - ✅
Subscription- Remove in Phase 7 - ✅
Base36IntegerFieldclass - Remove in Phase 7
Keep These Fields
- ✅
Post.post_id(Base36IntegerField) - Remove in Phase 7 after data migration - ✅
Comment.comment_id(Base36IntegerField) - Remove in Phase 7 after data migration - ✅
Comment.parent_id(Base36IntegerField) - Remove in Phase 7 after switching to parent_reddit_id - ✅
Channel.allowed_post_types(BitField) - Remove in Phase 7 if not needed
Keep These Imports
- ✅
import base36- Remove in Phase 7 - ✅
from bitfield import BitField- Remove in Phase 7 if allowed_post_types removed
Rationale: These are needed for the data migration in Phase 2. We populate both old and new fields during migration, then verify everything works with new fields before removing old ones.
3. Dependencies
3.1 Add django-treebeard
File: channels/models.py
Remove these models entirely:
# REMOVE: Reddit authentication
class RedditRefreshToken(models.Model):
# Delete entire model
class RedditAccessToken(models.Model):
# Delete entire model
# REMOVE: Voting (not needed for read-only)
# Note: These may not exist yet, but document that they should NOT be created
# class PostVote - DO NOT CREATE
# class CommentVote - DO NOT CREATE
# REMOVE: Reporting (not needed for read-only)
# class ContentReport - DO NOT CREATE
# REMOVE: Base36IntegerField custom field
class Base36IntegerField(models.BigIntegerField):
# Delete entire class - use CharField for reddit_id instead
2.2 Subscription Models
File: channels/models.py
Decision: Keep ChannelSubscription but remove Subscription
# KEEP: ChannelSubscription (users can track viewed channels)
class ChannelSubscription(TimestampedModel):
channel = models.ForeignKey(Channel, on_delete=models.CASCADE)
user = models.ForeignKey(User, on_delete=models.CASCADE)
# Keep as-is
# REMOVE: Content subscriptions (no notifications in archive)
class Subscription(TimestampedModel):
# Delete entire model - no post/comment notifications needed
4. Create Django Migrations
4.1 Migration Strategy
Create migrations in this specific order to avoid dependency issues:
Migration 1: Add reddit_id and archived_on fields
- Add
reddit_idto Post (CharField, null=True, unique=True) - Add
reddit_idto Comment (CharField, null=True, unique=True) - Add
reddit_idto Channel (CharField, null=True, unique=True) - Add
parent_reddit_idto Comment (CharField, null=True) - Add
archived_onto Post, Comment, Channel (DateTimeField, null=True) - Add
createdto Post and Comment if not exists (DateTimeField, null=True)
Migration 2: Add plain_text stored field
- Add
plain_textto Post (TextField, null=True) - Data migration to populate from existing @property logic (optional, can do in Phase 2)
Migration 3: Modify score fields
- Alter Post.score: set default=0 (keep null=True for now)
- Alter Comment.score: set default=0 (keep null=True for now)
- Note: Make non-nullable after data migration in Phase 2
Migration 4: Create CommentTreeNode model
- Create CommentTreeNode with treebeard MP_Node fields
- Add custom indexes
Migration 5: Add performance indexes
- Create indexes on Post (reddit_id, created, score, channel+score+created)
- Create indexes on Comment (reddit_id, created, score, post+score+created)
- Create indexes on CommentTreeNode (post+path, post+depth)
4.2 Example Migration Files
Migration 0001_add_reddit_id_fields.py:
from django.db import migrations, models
class Migration(migrations.Migration):
dependencies = [
('channels', 'XXXX_previous_migration'), # Find actual previous migration
]
operations = [
# Add reddit_id fields
migrations.AddField(
model_name='post',
name='reddit_id',
field=models.CharField(
max_length=20,
null=True,
blank=True,
unique=True,
db_index=True,
help_text="Original Reddit post ID (base36)"
),
),
migrations.AddField(
model_name='comment',
name='reddit_id',
field=models.CharField(
max_length=20,
null=True,
blank=True,
unique=True,
db_index=True,
help_text="Original Reddit comment ID (base36)"
),
),
migrations.AddField(
model_name='channel',
name='reddit_id',
field=models.CharField(
max_length=20,
null=True,
blank=True,
unique=True,
db_index=True,
help_text="Original Reddit subreddit ID (base36)"
),
),
# Add parent_reddit_id to Comment
migrations.AddField(
model_name='comment',
name='parent_reddit_id',
field=models.CharField(
max_length=20,
null=True,
blank=True,
db_index=True,
help_text="Parent comment reddit_id (CharField)"
),
),
# Add archived_on fields
migrations.AddField(
model_name='channel',
name='archived_on',
field=models.DateTimeField(null=True, blank=True),
),
migrations.AddField(
model_name='post',
name='archived_on',
field=models.DateTimeField(null=True, blank=True),
),
migrations.AddField(
model_name='comment',
name='archived_on',
field=models.DateTimeField(null=True, blank=True),
),
# Add created fields if not exists
# Check if these exist first, add only if needed
migrations.AddField(
model_name='post',
name='created',
field=models.DateTimeField(null=True, db_index=True),
),
migrations.AddField(
model_name='comment',
name='created',
field=models.DateTimeField(null=True, db_index=True),
),
]
Migration 0002_add_plain_text_field.py:
from django.db import migrations, models
class Migration(migrations.Migration):
dependencies = [
('channels', '0001_add_reddit_id_fields'),
]
operations = [
migrations.AddField(
model_name='post',
name='plain_text',
field=models.TextField(
null=True,
blank=True,
help_text="Cached plain text version for search/display"
),
),
]
Migration 0003_modify_score_defaults.py:
from django.db import migrations, models
class Migration(migrations.Migration):
dependencies = [
('channels', '0002_add_plain_text_field'),
]
operations = [
# Add default to score fields (keep nullable for now)
migrations.AlterField(
model_name='post',
name='score',
field=models.IntegerField(default=0, null=True),
),
migrations.AlterField(
model_name='comment',
name='score',
field=models.IntegerField(default=0, null=True),
),
]
Migration 0004_create_comment_tree.py:
from django.db import migrations, models
import treebeard.mp_tree
class Migration(migrations.Migration):
dependencies = [
('channels', '0003_modify_score_defaults'),
]
operations = [
migrations.CreateModel(
name='CommentTreeNode',
fields=[
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('path', models.CharField(max_length=255, unique=True)),
('depth', models.PositiveIntegerField()),
('numchild', models.PositiveIntegerField(default=0)),
('score', models.IntegerField(default=0, help_text='Denormalized score for sorting')),
('created', models.DateTimeField(help_text='Denormalized created timestamp for sorting')),
('comment', models.OneToOneField(
blank=True,
null=True,
on_delete=models.CASCADE,
related_name='tree_node',
to='channels.comment',
help_text='Comment this node represents (null for root)'
)),
('post', models.ForeignKey(
on_delete=models.CASCADE,
related_name='tree_nodes',
to='channels.post',
help_text='Post this tree belongs to'
)),
],
options={
'verbose_name': 'Comment Tree Node',
'verbose_name_plural': 'Comment Tree Nodes',
},
bases=(treebeard.mp_tree.MP_Node, models.Model),
),
]
Migration 0005_add_performance_indexes.py:
from django.db import migrations, models
class Migration(migrations.Migration):
dependencies = [
('channels', '0004_create_comment_tree'),
]
operations = [
# Post indexes
migrations.AddIndex(
model_name='post',
index=models.Index(fields=['channel', '-score', '-created'], name='post_chan_score_idx'),
),
migrations.AddIndex(
model_name='post',
index=models.Index(fields=['author', '-created'], name='post_author_idx'),
),
migrations.AddIndex(
model_name='post',
index=models.Index(fields=['-created'], name='post_created_idx'),
),
migrations.AddIndex(
model_name='post',
index=models.Index(fields=['-score'], name='post_score_idx'),
),
# Comment indexes
migrations.AddIndex(
model_name='comment',
index=models.Index(fields=['post', '-score', '-created'], name='comment_post_score_idx'),
),
migrations.AddIndex(
model_name='comment',
index=models.Index(fields=['author', '-created'], name='comment_author_idx'),
),
# CommentTreeNode indexes (some already in model Meta, verify)
migrations.AddIndex(
model_name='commenttreenode',
index=models.Index(fields=['post', 'path'], name='tree_post_path_idx'),
),
migrations.AddIndex(
model_name='commenttreenode',
index=models.Index(fields=['post', 'depth'], name='tree_post_depth_idx'),
),
]
4.3 Testing Migrations
Test on clean database:
# Create test database
python manage.py migrate --run-syncdb
# Apply migrations
python manage.py migrate channels
# Verify models created
python manage.py shell
>>> from channels.models import Post, Comment, CommentTreeNode
>>> Post._meta.get_field('reddit_id')
>>> CommentTreeNode._meta.get_field('path')
Test on copy of production data:
# Restore production snapshot to staging
pg_restore -d staging_db production_snapshot.dump
# Apply migrations
python manage.py migrate channels
# Verify data intact
python manage.py shell
>>> from channels.models import Post, Comment
>>> Post.objects.count() # Should match pre-migration count
>>> Comment.objects.count() # Should match pre-migration count
File: pyproject.toml
Add to dependencies:
[tool.poetry.dependencies]
django-treebeard = "^4.7.0"
Run:
poetry add django-treebeard
5.2 Update Settings
File: open_discussions/settings.py
Add to INSTALLED_APPS:
INSTALLED_APPS = [
# ...
'treebeard',
# ...
]
Testing Requirements
1. Model Tests
File: channels/models_test.py
Create tests for:
- New field validations
- reddit_id uniqueness constraints
- CommentTreeNode creation and traversal
- Index usage verification
def test_post_reddit_id_unique():
"""Test that reddit_id must be unique"""
post1 = PostFactory(reddit_id='abc123')
with pytest.raises(IntegrityError):
PostFactory(reddit_id='abc123')
def test_comment_tree_node_creation():
"""Test creating a comment tree"""
post = PostFactory()
root = CommentTreeNode.add_root(post=post, score=0)
comment1 = CommentFactory(post=post, score=10)
node1 = root.add_child(comment=comment1, score=10, created=comment1.created)
assert node1.get_parent() == root
assert node1.comment == comment1
def test_comment_tree_sorting():
"""Test that comments are sorted by score"""
post = PostFactory()
root = CommentTreeNode.add_root(post=post, score=0)
# Add comments with different scores
c1 = CommentFactory(post=post, score=5)
c2 = CommentFactory(post=post, score=10)
c3 = CommentFactory(post=post, score=3)
root.add_child(comment=c1, score=5, created=c1.created)
root.add_child(comment=c2, score=10, created=c2.created)
root.add_child(comment=c3, score=3, created=c3.created)
children = root.get_children()
# Should be sorted: c2 (10), c1 (5), c3 (3)
assert children[0].comment == c2
assert children[1].comment == c1
assert children[2].comment == c3
2. Migration Tests
File: channels/migrations_test.py
Create tests to verify:
- Migrations apply cleanly
- Indexes are created
- Old models are removed
def test_migrations_apply():
"""Test that all migrations apply successfully"""
call_command('migrate', 'channels', verbosity=0)
def test_comment_tree_model_exists():
"""Test CommentTreeNode model is created"""
from channels.models import CommentTreeNode
assert CommentTreeNode is not None
3. Performance Tests
Create benchmark tests:
- Tree traversal performance
- Comment listing queries
- Index usage verification
Deliverables
Code Changes
- Updated
channels/models.pywith new schema - Created
channels/comment_tree.py(if separating tree logic) - Created all migration files
- Updated
pyproject.tomlwith django-treebeard - Updated
open_discussions/settings.py
Tests
- Model tests for new fields
- CommentTreeNode tests
- Migration tests
- Performance benchmarks
Documentation
- Updated model docstrings
- Database schema diagram
- Migration guide
- Index optimization notes
Success Criteria
- All migrations run successfully on clean database
- All tests pass
- CommentTreeNode can create and traverse trees
- Indexes are created and used by queries
- No Reddit-specific fields remain (except reference fields)
- Models support read-only archive requirements
Notes for Implementation
- Keep existing fields initially: Don’t delete
post_idandcomment_idfields until after data migration (Phase 2) - reddit_id as varchar: Use CharField instead of Base36IntegerField for simplicity
- Tree performance: django-treebeard is battle-tested; use their recommended approach
- Backward compatibility: Keep migrations reversible where possible
- Testing: Test migrations on copy of production data if possible
Related Issues
- Blocks Phase 2 (Data Migration)
- Related to cleanup of Base36IntegerField
- Related to removal of PRAW dependencies (later phases)