Skip to content

Instantly share code, notes, and snippets.

@a-b
Forked from wvengen/README.md
Created November 26, 2015 08:46

Revisions

  1. @wvengen wvengen revised this gist Nov 25, 2015. 1 changed file with 5 additions and 0 deletions.
    5 changes: 5 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -119,3 +119,8 @@ gem the leaking objects may be created. If it's a string, run

    If it's something else, edit _graph.rb_ and expand the `case`-block. In this way you may be able to zoom in on the cause.


    Sample
    ------

    ![graph-type-count](https://cloud.githubusercontent.com/assets/503804/11392637/56f47762-935b-11e5-8122-a7bfd16cbec8.png)
  2. @wvengen wvengen renamed this gist Nov 25, 2015. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  3. @wvengen wvengen revised this gist Nov 25, 2015. No changes.
  4. @wvengen wvengen revised this gist Nov 25, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -30,7 +30,7 @@ def heap_dump

    # On Heroku you'll need to push it elsewhere, like S3
    #s3 = AWS::S3.new(access_key_id: ENV['S3_ACCESS_KEY'], secret_access_key: ENV['S3_SECRET_KEY'])
    #bucket = s3.buckets[ENV['S3_MEM_BUCKET']
    #bucket = s3.buckets['qm-import-export']
    #obj = bucket.objects["ruby-heap-#{i}.jsonl"]
    #obj.write(IO.binread(path))
    end
  5. @wvengen wvengen revised this gist Nov 25, 2015. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -62,7 +62,7 @@ end
    # add to config/routes.rb
    get "/heap_dump", to: HeapDumpsController.action(:heap_dump)

    # config/heap_dump_tracing.rb
    # config/initializers/heap_dump_tracing.rb
    if ENV['HEAP_DUMP'] == 1
    require 'objspace'
    ObjectSpace.trace_object_allocations_start
  6. @wvengen wvengen created this gist Nov 25, 2015.
    6 changes: 6 additions & 0 deletions Gemfile
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,6 @@
    source 'https://rubygems.org'

    gem 'pg', '~> 0.18.4'
    gem 'activerecord', '~> 4.2.5'
    gem 'ruby-progressbar', '~> 1.7.5'
    gem 'gnuplot', '~> 2.6.2'
    121 changes: 121 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,121 @@
    Finding a Ruby memory leak using a time analysis
    ================================================

    When developing a program in [Ruby](http://ruby-lang.org), you may sometimes encounter a memory leak.
    For a while now, Ruby has a facility to gather information about what objects are laying around:
    [ObjectSpace](http://ruby-doc.org/core/ObjectSpace.html).

    There are several approaches one can take to debug a leak. This discusses a time-based approach, where
    a full memory dump is generated every, say, 5 minutes, during a time that the memory leak is showing up.
    Afterwards, one can look at all the objects, and find out which ones are staying around, causing the
    memory leak.


    Gather
    ------

    Setup your Ruby application to dump all objects to a file. If you have an event loop, something like this would work:

    ```ruby
    require 'objspace'

    def heap_dump
    GC.start

    i = Time.now.strtime('%s')

    open("/tmp/ruby-heap-#{i}.dump", "w") do |io|
    ObjectSpace.dump_all(output: io)
    end

    # On Heroku you'll need to push it elsewhere, like S3
    #s3 = AWS::S3.new(access_key_id: ENV['S3_ACCESS_KEY'], secret_access_key: ENV['S3_SECRET_KEY'])
    #bucket = s3.buckets[ENV['S3_MEM_BUCKET']
    #obj = bucket.objects["ruby-heap-#{i}.jsonl"]
    #obj.write(IO.binread(path))
    end

    ObjectSpace.trace_object_allocations_start
    mainloop do
    # assuming your mainloop does the work, and calls this block every 5 minutes
    heap_dump
    end
    ```

    Or, if you're having a Rails app, do this in a controller that you visit every 5 minutes

    ```ruby
    # app/controllers/heap_dumps_controller.rb
    class HeapDumpsController < ActionController::Metal

    def heap_dump
    if ENV['HEAP_DUMP'] == '1' && params[:token].to_s == ENV['HEAP_DUMP_TOKEN']
    heap_dump
    self.response_body = 'Dumped heap'
    else
    self.status = 401
    self.response_body = 'Invalid token'
    end
    end
    end

    # add to config/routes.rb
    get "/heap_dump", to: HeapDumpsController.action(:heap_dump)

    # config/heap_dump_tracing.rb
    if ENV['HEAP_DUMP'] == 1
    require 'objspace'
    ObjectSpace.trace_object_allocations_start
    end
    ```


    Install
    -------

    - Having [Ruby](http://ruby-lang.org/), install the dependencies with `bundle install`.
    - Having [PostgreSQL](http://postgresql.org/), create the database with `createdb mem_analysis`.
    - When getting dumps from Amazon S3, [s3cmd](https://github.com/s3tools/s3cmd) may come in handy.


    Import
    ------

    If stored on S3, get the dump list. Update the bucket and date in the grep command to reflect your case. This stores filenames and dates in _index.txt_.

    S3_URL=s3://qm-import-export/
    s3cmd ls $S3_URL | grep '^2015-11-23' | sed 's/[0-9]*\+\s\+s3:.*\///' >index.txt

    Then download them:

    for file in `cat index.txt | awk '{print $3}'`; do s3cmd get $S3_URL/$file $file; done

    Initialize the database:

    bundle exec ruby createdb.rb

    Because importing can take quite a while, this is split into two steps: converting each file to SQL, and loading all into the database:

    bundle exec ruby gencsv.rb
    sh genimport.sh | psql mem_analysis


    Analyse
    -------

    Now that the database is loaded, we're ready to gather information.
    To find out what is causing a memory leak, we can look at graphs plotting memory usage over time in different dimensions.
    This is done by `graph.rb`. Let's start with the object type.

    bundle exec ruby graph.rb type-mem

    This will create the file _graph-type-mem.png_ showing the total size of objects by type. If there's one thing leaking,
    you'll probably have a number of somewhat flat lines, and one with a positive slope, which is the culprit.

    Then create a similar graph for that object type only, and plot lines by file, for example. This gives one an idea in which
    gem the leaking objects may be created. If it's a string, run

    bundle exec ruby graph.rb string-mem

    If it's something else, edit _graph.rb_ and expand the `case`-block. In this way you may be able to zoom in on the cause.

    4 changes: 4 additions & 0 deletions createdb.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,4 @@
    #!/usr/bin/env ruby
    require_relative 'db'

    init_database
    82 changes: 82 additions & 0 deletions db.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,82 @@
    #!/usr/bin/env ruby
    require 'active_record'

    ActiveRecord::Base.establish_connection({adapter: 'postgresql', database: 'mem_analysis'})

    def connection
    ActiveRecord::Base.connection
    end

    class SpaceObject < ActiveRecord::Base
    self.inheritance_column = 'zoink' # use type as ordinary column (not STI)
    has_many :references, class_name: 'SpaceObjectReference', foreign_key: 'from_id', inverse_of: 'from', dependent: :destroy
    has_one :default, class_name: 'SpaceObject', foreign_key: 'default', primary_key: 'address'
    end

    class SpaceObjectReference < ActiveRecord::Base
    belongs_to :from, class_name: 'SpaceObject', required: true, inverse_of: 'references'
    belongs_to :to, class_name: 'SpaceObject', foreign_key: 'to_address', primary_key: 'address'
    end


    def init_database(c = connection)
    c.tables.each {|t| c.drop_table(t) }
    c.create_table 'space_objects' do |t|
    t.datetime :time
    t.string :type
    t.string :node_type
    t.string :root
    t.string :address
    t.text :value
    t.string :klass
    t.string :name
    t.string :struct
    t.string :file
    t.string :line
    t.string :method
    t.integer :generation
    t.integer :size
    t.integer :length
    t.integer :memsize
    t.integer :bytesize
    t.integer :capacity
    t.integer :ivars
    t.integer :fd
    t.string :encoding
    t.string :default_address
    t.boolean :freezed
    t.boolean :fstring
    t.boolean :embedded
    t.boolean :shared
    t.boolean :flag_wb_protected
    t.boolean :flag_old
    t.boolean :flag_long_lived
    t.boolean :flag_marking
    t.boolean :flag_marked
    end
    c.create_table 'space_object_references' do |t|
    t.integer :from_id, null: false
    t.string :to_address, null: false
    end
    restore_indexes
    nil
    end

    def remove_indexes(c = connection)
    c.indexes('space_objects').each {|i| connection.remove_index('space_objects', name: i.name) }
    c.indexes('space_objects_references').each {|i| connection.remove_index('space_objects_references', name: i.name) }
    end

    def restore_indexes(c = connection)
    c.change_table 'space_objects' do |t|
    t.index :time
    t.index :address
    t.index :type
    t.index [:klass, :method]
    t.index [:file, :line]
    t.index :size
    t.index :memsize
    end
    c.execute('VACUUM ANALYZE')
    end

    59 changes: 59 additions & 0 deletions gencsv.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,59 @@
    #!/usr/bin/env ruby
    require 'ruby-progressbar'
    require 'json'
    require 'csv'

    def parse_dump(filename, &block)
    lines = open(filename).readlines
    lines.each do |line|
    block.call JSON.parse(line), lines.count
    end
    end

    def parse_index(filename, &block)
    open(filename).each do |line|
    date, time, dumpname = line.split(/\s+/)
    block.call dumpname, "#{date} #{time}"
    end
    end

    FIELDS = %w(time type node_type root address value klass name struct file line method generation size length memsize bytesize capacity ivars fd encoding default_address freezed fstring embedded shared flag_wb_protected flag_old flag_long_lived flag_marking flag_marked)
    REF_FIELDS = %w(id from_id to_address)

    id = 1
    ref_id = 1
    parse_index('index.txt') do |file, time|

    next if ARGV.any? && !ARGV.include?(file)

    progressbar = ProgressBar.create(title: file, format: "%t |%B| %c/%C %E", throttle_rate: 0.5)
    CSV.open(file.gsub(/.jsonl$/i, '') + '.csv', 'w') do |csv|
    csv << FIELDS
    CSV.open(file.gsub(/.jsonl$/i, '') + '.refs.csv', 'w') do |ref_csv|
    ref_csv << REF_FIELDS
    parse_dump(file) do |data, count|
    progressbar.total = count

    data['value'] = data['value'].gsub(/[^[:print:]]/, '.') if data['value'] # allow string database column
    data['klass'] = data.delete('class') if data['class'] # avoid error
    data['freezed'] = data.delete('frozen') if data['frozen'] # idem
    data['default_address'] = data.delete('default') if data['default'] # consistency
    data['time'] = time
    data['id'] = id
    (data.delete('flags') || {}).each {|k, v| data["flag_#{k}"] = v }
    data['default_address'] = data.delete('default') if data['default']
    refs = data.delete('references') || []

    csv << FIELDS.map {|f| data[f]}
    refs.each do |ref|
    ref_csv << [ref_id, id, ref]
    ref_id += 1
    end

    id += 1
    progressbar.increment
    end
    end
    end
    end

    8 changes: 8 additions & 0 deletions genimport.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,8 @@
    #!/bin/sh
    for file in *.csv; do
    table=space_objects
    echo "$file" | grep -q '\.refs\.csv$' && table=space_object_references

    echo "\\COPY $table (`head -n1 $file`) FROM '$file' WITH (FORMAT CSV, HEADER);"
    done
    echo "VACUUM ANALYZE;"
    91 changes: 91 additions & 0 deletions graph.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,91 @@
    #!/usr/bin/env ruby
    require 'date'
    require 'yaml'
    require 'gnuplot'
    require_relative 'db'


    ### Parse arguments

    type = ARGV[0]
    type == 'type' and type = 'type-mem'

    case type
    when 'type-count'
    ylabel = 'count'
    query, ycolumn, group = nil, 'COUNT(id)', :type
    key_pos = 'left top'
    when 'type-mem'
    query, ycolumn, group = nil, 'SUM(memsize)', :type
    ylabel, yscale = 'memsize [MB]', 1024*1024
    key_pos = 'left top'
    when 'string-count'
    ylabel = 'count'
    query, ycolumn, group = {type: 'STRING'}, 'COUNT(id)', :file
    when 'string-mem'
    query, ycolumn, group = {type: 'STRING'}, 'SUM(memsize)', :file
    ylabel, yscale = 'memsize [MB]', 1024*1024
    when 'data-count'
    ylabel = 'count'
    query, ycolumn, group = {type: 'DATA'}, 'COUNT(id)', :file
    when 'data-mem'
    query, ycolumn, group = {type: 'DATA'}, 'SUM(memsize)', :file
    ylabel, yscale = 'memsize [MB]', 1024*1024
    else
    STDERR.puts "Usage: graph <type>"
    exit 1
    end

    xoffset = 60*60 # GMT+1
    graph_basename = File.dirname(File.expand_path(__FILE__)) + '/graph-' + type


    ### Read cache or execute query

    if File.exists?(graph_basename + '.yml')
    data = YAML.load(File.read(graph_basename + '.yml'))
    else
    scope = SpaceObject
    scope = scope.where(**query) if query
    scope = scope.order(ycolumn + ' DESC NULLS LAST')
    scope = scope.group(:time, group)
    data = scope.limit(500).pluck(group, :time, ycolumn)
    File.open(graph_basename + '.yml', 'w') do |f|
    f.write(data.to_yaml)
    end
    end


    ### Then plot

    Gnuplot.open(persist: true) do |gp|
    Gnuplot::Plot.new(gp) do |plot|
    plot.terminal 'png large'
    plot.output graph_basename + '.png'

    plot.xdata :time
    plot.timefmt '"%s"'
    plot.format 'x "%H:%M"'

    plot.xlabel "time"
    plot.ylabel ylabel
    plot.key key_pos if key_pos

    grouped_data = data.group_by(&:first)
    keys = grouped_data.keys.sort_by {|key| -grouped_data[key].reduce(0) {|sum,d| sum + (d[2]||0) } }
    keys[0,10].each do |key|
    data = grouped_data[key]
    data.sort_by!{|d| d[1] }
    x = data.map{|d| d[1].to_i + (xoffset||0) }
    y = data.map{|d| d[2] }
    y = data.map{|d| (d[2]||0) / (yscale||1) }
    plot.data << Gnuplot::DataSet.new( [x, y] ) do |ds|
    ds.using = '1:2'
    ds.with = "linespoints"
    ds.title = key || '(empty)'
    end
    end

    end
    end