做技术，不能只知其然而不知其所以然。在知道了工具的原理之后，才能更高效的使用这个工具。在程序的世界里，源码里面没有秘密，看懂了源码，也就看懂了原理。

这次就来阅读一下ArrayList的源码。

类的声明

1
2
3

public class ArrayList<E>
    extends AbstractList<E>
    implements List<E>, RandomAccess, Cloneable, java.io.Serializable { ... }

上面代码声明了一个叫ArrayList的泛型类，继承了AbstractList，并实现了List，RandomAccess，Cloneable，Serializable接口。

AbstractList抽象类提供了一个“骨架”级别的List接口的实现，用来减少实现一个支持随机存储的List的工作量。

RandomAccess中没有声明任何方法，是一个标记接口(marker interface)，表明了这个类支持快速(通常是O(1)时间复杂度)的随机存取。在遍历一个集合前，可以用instanceof判断这个集合是否实现了RandomAccess，来选择合适的遍历方法。

Cloneable也是一个标记接口，表明了这个类允许使用Object.clone()命令进行属性到属性的复制。

Serializable也是一个标记接口，表明在这个类上启用Java的序列化功能。

如何存储数据

/**
 * The array buffer into which the elements of the ArrayList are stored.
 * The capacity of the ArrayList is the length of this array buffer. Any
 * empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA
 * will be expanded to DEFAULT_CAPACITY when the first element is added.
 */
transient Object[] elementData; // non-private to simplify nested class access

/**
 * The size of the ArrayList (the number of elements it contains).
 *
 * @serial
 */
private int size;

elementData数组用来实际存放数据，ArrayList的空间(capacity)对应这个数组的长度(size)。ArrayList实现了自己的序列化(ArrayList#writeObject())和反序列化(ArrayList#readObject())方法，所以加上transient关键字来使elementData不参与Java自带的序列化和反序列化过程。

size成员变量记录当前ArrayList中元素的数量。

构造方法

ArrayList有三个构造方法

使用默认大小的ArrayList()
指定最初大小的ArrayList(int initialCapacity)
根据一个给定集合来初始化的ArrayList(Collection<? extends E> c)

使用默认大小

类中首先指定了默认的大小

/**
 * Default initial capacity.
 */
private static final int DEFAULT_CAPACITY = 10;

但是，在它下面，还有这么一个东西：

/**
 * Shared empty array instance used for default sized empty instances. We
 * distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when
 * first element is added.
 */
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

在最初被构造时，elementData会先指向DEFAULTCAPACITY_EMPTY_ELEMENTDATA，而不是直接创建一个容量为10的数组。

/**
 * Constructs an empty list with an initial capacity of ten.
 */
public ArrayList() {
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

这样做的好处在于可以更合理的利用空间。试想一下，如果某个场景中需要创建5个ArrayList备用，如果直接就分配好空间的话，那么就会消耗掉至少50个元素所需要的空间。所以Java选择先将elementData指向一个空数组，在向ArrayList中添加数据时，再去创建合适大小的数组。

指定最初大小

/**
 * Constructs an empty list with the specified initial capacity.
 *
 * @param  initialCapacity  the initial capacity of the list
 * @throws IllegalArgumentException if the specified initial capacity
 *         is negative
 */
public ArrayList(int initialCapacity) {
    if (initialCapacity > 0) {
        this.elementData = new Object[initialCapacity];
    } else if (initialCapacity == 0) {
        this.elementData = EMPTY_ELEMENTDATA;
    } else {
        throw new IllegalArgumentException("Illegal Capacity: "+
                                           initialCapacity);
    }
}

当指定的大小是一个正整数时，Java会创建好对应大小的数组，并将elementData指向这个数组；如果指定的大小为零，那么Java也会将elementData指向一个共享的空数组EMPTY_ELEMENTDATA，注意这个空数组与上文提到的不是同一个；如果指定的大小为负数，则抛出一个异常。

那么为什么要专门把EMPTY_ELEMENTDATA和DEFAULTCAPACITY_EMPTY_ELEMENTDATA区分出来呢？DEFAULTCAPACITY_EMPTY_ELEMENTDATA的JavaDoc是这么说的：

We distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when first element is added.
我们将它与EMPTY_ELEMENTDATA区分开来，是方便在添加第一个元素时计算要扩张多少空间。

根据给定的集合初始化

/**
 * Constructs a list containing the elements of the specified
 * collection, in the order they are returned by the collection's
 * iterator.
 *
 * @param c the collection whose elements are to be placed into this list
 * @throws NullPointerException if the specified collection is null
 */
public ArrayList(Collection<? extends E> c) {
    elementData = c.toArray();
    if ((size = elementData.length) != 0) {
        // c.toArray might (incorrectly) not return Object[] (see 6260652)
        if (elementData.getClass() != Object[].class)
            elementData = Arrays.copyOf(elementData, size, Object[].class);
    } else {
        // replace with empty array.
        this.elementData = EMPTY_ELEMENTDATA;
    }
}

程序首先试图调用给定集合的Collection#toArray()方法，将集合转换成一个Object[]数组。

当数组中有元素时，检查elementData的数据类型是否为Object[]类型，如果不是则使用Arrays.copyOf()方法重新复制元素到一个Object[]对象中；而当数组中没有元素时，则重新使elementData指向EMPTY_ELEMENTDATA。

添加元素

当添加元素时，首先会调用ensureCapacityInternal()方法，来保证空间足够。保证有足够空间后，就会向elementData[size]处放置被添加的元素，并且使size加一。

/**
 * Appends the specified element to the end of this list.
 *
 * @param e element to be appended to this list
 * @return <tt>true</tt> (as specified by {@link Collection#add})
 */
public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

扩容

ensureCapacityInternal()方法用于确保在添加元素时有足够的空间。如果空间不足，则会调用grow()方法扩容。

grow()方法会将elementData扩张为当前的1.5倍空间，并使用Arrays.copyOf()方法将元素放入新的数组。

/**
 * 确保空间
 */
private void ensureCapacityInternal(int minCapacity) {
    ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
}

/**
 * 计算扩容目标
 */
private static int calculateCapacity(Object[] elementData, int minCapacity) {
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        return Math.max(DEFAULT_CAPACITY, minCapacity);
    }
    return minCapacity;
}

private void ensureExplicitCapacity(int minCapacity) {
    modCount++;
    // overflow-conscious code
    // 检查目标容量是否大于当前已有容量
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

/**
 * Increases the capacity to ensure that it can hold at least the
 * number of elements specified by the minimum capacity argument.
 *
 * 增加容量，以确保至少可以容纳minCapacity所指定个数的元素
 *
 * @param minCapacity the desired minimum capacity 目标最小容量
 */
private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;

    // newCapacity = olcCapacity + (oldCapacity / 2)
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}

删除元素

ArrayList提供了两种方式来删除一个元素：根据元素位置(index)删除，和匹配元素删除。

根据位置删除

根据位置删除时，首先会检查给定的位置是否越界。如果没有越界，就会先取出被删除的元素，用来向调用方返回。

删除元素的方法是将index+1后面的元素重新放在index起始的位置上。可以看出，删除操作的消耗是比较高的。

在重新排列元素后，数组中最后一个元素将与倒数第二个元素重复。所以还需要将最后一个元素置为null，并将size减一。

/**
 * Removes the element at the specified position in this list.
 * Shifts any subsequent elements to the left (subtracts one from their
 * indices).
 *
 * @param index the index of the element to be removed
 * @return the element that was removed from the list
 * @throws IndexOutOfBoundsException {@inheritDoc}
 */
public E remove(int index) {
    rangeCheck(index);
    modCount++;
    E oldValue = elementData(index);

    // 计算要移动的元素数量
    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(
            // 源
            elementData,
            // 源位置
            index+1,
            // 目标
            elementData,
            // 目标位置
            index,
            // 要复制的个数
            numMoved);
    elementData[--size] = null; // clear to let GC do its work
    return oldValue;
}

匹配元素删除

如果向remove()方法提供了一个对象，那么ArrayList会遍历elementData，并会删除第一个与给定对象匹配的元素。

/**
 * Removes the first occurrence of the specified element from this list,
 * if it is present.  If the list does not contain the element, it is
 * unchanged.  More formally, removes the element with the lowest index
 * <tt>i</tt> such that
 * <tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>
 * (if such an element exists).  Returns <tt>true</tt> if this list
 * contained the specified element (or equivalently, if this list
 * changed as a result of the call).
 *
 * @param o element to be removed from this list, if present
 * @return <tt>true</tt> if this list contained the specified element
 */
public boolean remove(Object o) {
    if (o == null) {
        for (int index = 0; index < size; index++)
            if (elementData[index] == null) {
                fastRemove(index);
                return true;
            }
    } else {
        for (int index = 0; index < size; index++)
            if (o.equals(elementData[index])) {
                fastRemove(index);
                return true;
            }
    }
    return false;
}

/*
 * Private remove method that skips bounds checking and does not
 * return the value removed.
 */
private void fastRemove(int index) {
    modCount++;
    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work
}

缩减容量

ArrayList#trimToSize()方法可以将ArrayList的容量缩减至当前元素个数。这个操作需要通过Arrays.copyOf()方法进行，所以成本也是比较高的。

/**
 * Trims the capacity of this <tt>ArrayList</tt> instance to be the
 * list's current size.  An application can use this operation to minimize
 * the storage of an <tt>ArrayList</tt> instance.
 */
public void trimToSize() {
    modCount++;
    if (size < elementData.length) {
        elementData = (size == 0)
          ? EMPTY_ELEMENTDATA
          : Arrays.copyOf(elementData, size);
    }
}

Fail fast

在会改变elementData大小的方法中，经常会看到类似modCount++这样的操作。那么这个操作的目的是什么呢？

首先来看看modCount成员变量的JavaDoc是怎么说的。

/**
 * The number of times this list has been <i>structurally modified</i>.
 * Structural modifications are those that change the size of the
 * list, or otherwise perturb it in such a fashion that iterations in
 * progress may yield incorrect results.
 *
 * <p>This field is used by the iterator and list iterator implementation
 * returned by the {@code iterator} and {@code listIterator} methods.
 * If the value of this field changes unexpectedly, the iterator (or list
 * iterator) will throw a {@code ConcurrentModificationException} in
 * response to the {@code next}, {@code remove}, {@code previous},
 * {@code set} or {@code add} operations.  This provides
 * <i>fail-fast</i> behavior, rather than non-deterministic behavior in
 * the face of concurrent modification during iteration.
 *
 * <p><b>Use of this field by subclasses is optional.</b> If a subclass
 * wishes to provide fail-fast iterators (and list iterators), then it
 * merely has to increment this field in its {@code add(int, E)} and
 * {@code remove(int)} methods (and any other methods that it overrides
 * that result in structural modifications to the list).  A single call to
 * {@code add(int, E)} or {@code remove(int)} must add no more than
 * one to this field, or the iterators (and list iterators) will throw
 * bogus {@code ConcurrentModificationExceptions}.  If an implementation
 * does not wish to provide fail-fast iterators, this field may be
 * ignored.
 */
protected transient int modCount = 0;

也就是说，modCount记录了一个List的结构被修改的次数，并且提到了如果在迭代过程中修改了List的结构，那么可能会导致得到错误的结果。

在迭代或者序列化的过程中，程序会检查modCount的值是否被修改过，如果被修改，就会抛出ConcurrentModificationException异常。比如ArrayList.Itr#next()方法：

@SuppressWarnings("unchecked")
public E next() {
    checkForComodification();
    int i = cursor;
    if (i >= size)
        throw new NoSuchElementException();
    Object[] elementData = ArrayList.this.elementData;
    if (i >= elementData.length)
        throw new ConcurrentModificationException();
    cursor = i + 1;
    return (E) elementData[lastRet = i];
}

final void checkForComodification() {
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();
}

序列化与反序列化

如上文所说，ArrayList实现了自己的序列化与反序列化方法，所以elementData使用transient修饰。

在序列化时，程序并不是直接序列化elementData这个数组，而是只取出数组中有效的元素(包括null元素)，并逐个序列化每个元素的对象。

/**
 * Save the state of the <tt>ArrayList</tt> instance to a stream (that
 * is, serialize it).
 *
 * @serialData The length of the array backing the <tt>ArrayList</tt>
 *             instance is emitted (int), followed by all of its elements
 *             (each an <tt>Object</tt>) in the proper order.
 */
private void writeObject(java.io.ObjectOutputStream s)
    throws java.io.IOException{
    // Write out element count, and any hidden stuff
    int expectedModCount = modCount;
    s.defaultWriteObject();
    // Write out size as capacity for behavioural compatibility with clone()
    s.writeInt(size);
    // Write out all elements in the proper order.
    for (int i=0; i<size; i++) {
        s.writeObject(elementData[i]);
    }
    if (modCount != expectedModCount) {
        throw new ConcurrentModificationException();
    }
}

在反序列化时，首先会使elementData指向EMPTY_ELEMENTDATA，只在有元素会被反序列化时，才会为elementData扩容并逐个反序列化对应的对象。

/**
 * Reconstitute the <tt>ArrayList</tt> instance from a stream (that is,
 * deserialize it).
 */
private void readObject(java.io.ObjectInputStream s)
    throws java.io.IOException, ClassNotFoundException {
    elementData = EMPTY_ELEMENTDATA;
    // Read in size, and any hidden stuff
    s.defaultReadObject();
    // Read in capacity
    s.readInt(); // ignored
    if (size > 0) {
        // be like clone(), allocate array based upon size not capacity
        int capacity = calculateCapacity(elementData, size);
        SharedSecrets.getJavaOISAccess().checkArray(s, Object[].class, capacity);
        ensureCapacityInternal(size);
        Object[] a = elementData;
        // Read in all elements in the proper order.
        for (int i=0; i<size; i++) {
            a[i] = s.readObject();
        }
    }
}

学习记录 Java 源码阅读 Java ArrayList

本博客所有文章除特别声明外，均采用 CC BY-SA 3.0协议。转载请注明出处！