Building a large scale Custom CMS

with Django...

About Lincoln Loop

  • Established in 2007
  • Open Source Contributor

Clients: National Geographic, PBS, Redbeacon, Nasuni, ...

Agenda

Talk Structure

  1. Planning and Methodology
  2. Dealing With Legacy Data Stores
  3. Building it
  4. Bad News: Django isn't perfect but...

Planning and Methodology

Digesting, Evaluating, and Structuring the Work Ahead

Migrating vs. Building Infrastructure

These 2 tasks will compete for your attention.

Migrating existing content:

  • Helps you understand the goal of your customer
  • Gives you "real data" to evaluate your development
  • Most precious asset of your customer
  • Hard and time consuming

Migrating vs. Building Infrastructure

Building the new CMS

  • You are evaluated and paid for this
    • The customer selects you based on your skills here
  • Fun and creative part of the job
  • Opportunity to improve content work flows
    • Be critical of requirements
    • Be careful of individual pet projects

Implementation Process

Prototype driven development based on 2 week sprint cycles.

Typical project phases:

  • Create a prototype as fast as possible
  • Populate it with "real data" as soon as you can
  • Load test your prototype to find the bottlenecks

...and iterate until you are done

Methodology

Iterate on this virtuous cycle:

  • Develop
  • QA
  • Live Demo

At the conclusion of each phase you should have new working features and code you're proud of.

Keeping Yourself Honest

Developing in iterations helps keep project metrics in check (you know, the things that always get abandoned):

  • Code quality
    • PyLint
    • PyMetrics
    • Coverage

Keeping Yourself Honest

Developing in iterations helps keep project metrics in check (you know, the things that always get abandoned):

  • Documentation
    • Sphinx Docs
    • Any new developer should be able to bootstrap the project using only the docs.
  • Repeatable Deployments

    • Use pip + fabric or Buildout
    • Setting up a new environment should be as automated as possible.

Keeping Yourself Honest

Developing in iterations helps keep project metrics in check (you know, the things that always get abandoned):

  • Tests and Continuous Integration
    • Maintain a good baseline of test coverage
    • Regressions are silly time sucks
    • Use a CI package like Hudson - know when things have gone astray

Team Communication

Keep stakeholders involved and aware as much as possible. Enforce regular meetings PM should make sure tickets, questions get answered.

Tools :

  • IRC room per project
  • Sprint Demo
  • Integration server (WIP)
  • Backlog monitoring (Redmine & co)
  • Maintain good project docs
    • Wikis
    • Sphinx (platform/technical docs)

Key success factors

First you build a raw estimate

  • Experience helps
  • Even still, things always take longer.
    • Migrating old, undocumented business logic nearly impossible to estimate.

Continuously refine your estimate Demo finished features on a fix schedule

  • Keep your build working
  • Keep stakeholders in the loop
  • Define an escalation process

Pin down early on what the most complex aspects will be. Make sure you are chipping away at them rather than pushing everything back until the end.

Dealing With Legacy Data Stores

  1. Planning and Methodology
  2. Dealing With Legacy Data Stores
  3. Building it
  4. Bad News: Django isn't perfect but...

Migrating Legacy Data

Two opposing factors to consider:

  1. The "real" migration will only ever happen one time.
  2. The data is the business - it has to be handled properly.

Plan ahead:

  • How are you going to keep migration plan in sync with changes in your new application?
  • Don't underestimate the work required to have a workable process
  • The modify-migrate-test cycle is SLOW if you take a naive approach

Things to Consider

A convenient way to migrate your data is very important because you will need to run it often. No matter how much time you spend to optimize this.

  • Determine the maximum allowable downtime / read-only window
    • This is your target. Every change to the migration script must be below this number.

Optimize the ease of run rather than the speed...

  • Make your script flexible in terms of resource use
  • Put it in the cloud
  • ...Or another computer

Your new data model will evolve and you will need to run this often

Migration Pitfalls

You'll probably encounter many of these problems:

  • Other frameworks and databases use different conventions or even other data types for FK then standard Django
  • Mapping to Django User is problematic
  • Any large dataset is inconsistent
  • Fix the integrity at the source when possible
  • Naive migration strategy will leave you a process that takes many hours or even days.

Keep in Mind That...

  • Your new models will go through many evolutions as you continually adapt various pieces of complex logic (and hacks) into a more elegant schema.
  • Legacy content fields might have markup and structure not trivial to deal with
    • HTML tags/snippets,
    • HTML entities
    • Character encodings
    • Various page layouts

Migration Tips

Dataset

  • Visual inspection of the dataset
  • Gather as much legacy app logic and SQL as you can up front. Working backwards through relationships is tedious
  • Check your assumptions

Django-specific

  • Use inspectdb to navigate the original dataset
  • Use multidb to move the data from the old schema
  • Don't burn time fighting the ORM. Use the cursor with raw SQL or reach for other Python tools.

Features for your migration tool set

Sanity must haves for your tool(s):

  • Pause / Clean break
  • Resume
  • Progress meter
  • Logging
  • Profiling
  • Partial / Range
  • Graceful Error Handling

Building It

How to Decide What to Use and What to Create

  1. Planning and Methodology
  2. Dealing With Legacy Data Stores
  3. Building it
  4. Bad News: Django isn't perfect but...

Pitfalls -- Django in Wonderland

It is easy to create a system that is completely "unDjango-ish" due to external constraints and influence from legacy system.

  • There is often a temptation for developers new to Django to structure code or use idioms from other frameworks/languages.
  • Trying to "clone" a large legacy system might lead you astray of commonly accepted Django best practices. Possible consequences:
    • Reusable apps won't plug in well
    • Updates to Django trunk might not work
    • New/external developers might be lost / need training
    • Integrating existing Django project is HARD
  • Study up on Best Practices (Google It!)

Pitfalls -- Don't Carbon Copy

Avoid trying to exactly recreate the legacy workflow/UI in your new system.

UX has evolved. Look for ways to optimize work flow now that we have better tools and paradigms.

Learn from existing open sourced projects - the patterns solve your problems even if the features aren't exactly what you want:

  • Data models
  • Algorithms

Customize or Build Your Own?

Our philosophy is to favor 3rd party components but not be afraid to fork early. Pick out the best parts and use those.

  • Use your network of trust to evaluate an app
  • Check up the code quality metrics
    • Test coverage
    • Complexity
    • Documentation
    • Author / number of followers

Shoehorning out-of-the-box solutions on large project might be more hassle than it's worth.

  • Customize
  • Extend

Engage the 3rd Party App Community

This community is composed of the most knowledgeable people in this domain.

A good connection will enable you to:

  • Fix bugs: get patches in trunk / master
  • Influence the evolution by suggesting:
    • New feature
    • Enhancement of the existing code base
  • Understand existing design decisions
  • Get free, instant, expert advice

Effectively Contributing to a Project

Embrace the project:

  • Communication channels (IRC, Bug tracker, ...)
  • Infrastructure (DVCS, test suite, ...)
  • Understand the author's motivation
  • Align your expectations
    • Is this a brainstorm by someone or is intended to be production-ready?

Get Your Contribution "IN" Upstream

Assume a minimum effort of:

  • A detailed bug report with clear steps to reproduce
  • Attaching a test illustrating the problem
  • Providing a patch that fixes the issue
  • Writing clear, concise documentation

Even still, a little more effort may be required:

  • Engage the core developers in a continuous exchange
    • This issue must be on their radar and they should be aware of your progress

Help them to help you...

Weighing Your Options

To fork or not to fork?

Pros

  • Trade a little bit learning time and effort in return for cutting overall development time

Cons

  • Harder or impossible to merge upstream improvements:
    • Bug fixes
    • Enhancements
    • Migration to new version
    • On your own - disconnected from the original community

Bad News

Django Isn't Perfect

  1. Planning and Methodology
  2. Dealing With Legacy Data Stores
  3. Building it
  4. Bad News: Django isn't perfect but...

Common Complaints

  • Settings for reusable apps
  • ORM
    • Doesn't support the feature(s) you need
    • That nifty view you wrote in a few lines of code generates hundreds of queries per page
  • Forms, ModelForms, and FormSets can be adventure once past basic use cases
  • HTML4, HTML5, XHTML compliance with django.forms
  • auth.User
    • username unique
    • Replacing/Overriding means you lose many 3rd party apps
  • "Ready for Check-in" tickets go to purgatory.
  • Admin customization can get tricky.

Seen Better Days

  • django.contrib.* is
    • Admin (used to be impressive but start to show its age)
      • Django Admin isn't doing what you want
      • raw_id_fields is just ugly
      • No drag-and-drop for reordering objects
      • No built-in widget to display tree structure
      • No dashboard / toolbar / quick access

Conclusion

Django is particularly well suited for these types of projects. (It was born in a newsroom afterall)

However, like any tool you use, you need to be aware of the tradeoffs in order to maximize your productivity and minimize mistakes. We hope that you have a better idea of how to make best use of Django.

Questions?

Thank you for your attention.

Brian Luft - @unbracketed

Yann Malet - @gwadeloop