I m trying to integrate Hibernate Search into one of the projects I m currently working on. The first step in such an endeavour is fairly simply - index all the existing entities with Hibernate Search(which uses Lucene under the hood). Many of the tables mapped to entities in the domain model contain a lot of records(> 1 million) and I m using simple pagination technique to split them into smaller units. However I m experiencing some memory leak while indexing the entities. Here s my code:
@Service(objectName = "LISA-Admin:service=HibernateSearch")
@Depends({"LISA-automaticStarters:service=CronJobs", "LISA-automaticStarters:service=InstallEntityManagerToPersistenceMBean"})
public class HibernateSearchMBeanImpl implements HibernateSearchMBean {
private static final int PAGE_SIZE = 1000;
private static final Logger LOGGER = LoggerFactory.getLogger(HibernateSearchMBeanImpl.class);
@PersistenceContext(unitName = "Core")
private EntityManager em;
@Override
@TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
public void init() {
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em);
Session s = (Session) em.getDelegate();
SessionFactory sf = s.getSessionFactory();
Map<String, EntityPersister> classMetadata = sf.getAllClassMetadata();
for (String key : classMetadata.keySet()) {
LOGGER.info("Class: " + key + "
Entity name: " + classMetadata.get(key).getEntityName());
Class entityClass = classMetadata.get(key).getMappedClass(EntityMode.POJO);
LOGGER.info("Class: " + entityClass.getCanonicalName());
if (entityClass != null && entityClass.getAnnotation(Indexed.class) != null) {
index(fullTextEntityManager, entityClass, classMetadata.get(key).getEntityName());
}
}
}
@TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
public void index(FullTextEntityManager pFullTextEntityManager, Class entityClass, String entityName) {
LOGGER.info("Class " + entityClass.getCanonicalName() + " is indexed by hibernate search");
int currentResult = 0;
Query tQuery = em.createQuery("select c from " + entityName + " as c order by oid asc");
tQuery.setFirstResult(currentResult);
tQuery.setMaxResults(PAGE_SIZE);
List entities;
do {
entities = tQuery.getResultList();
indexUnit(pFullTextEntityManager, entities);
currentResult += PAGE_SIZE;
tQuery.setFirstResult(currentResult);
} while (entities.size() == PAGE_SIZE);
LOGGER.info("Finished indexing for " + entityClass.getCanonicalName() + ", current result is " + currentResult);
}
@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public void indexUnit(FullTextEntityManager pFullTextEntityManager, List entities) {
for (Object object : entities) {
pFullTextEntityManager.index(object);
LOGGER.info("Indexed object with id " + ((BusinessObject)object).getOid());
}
}
}
It s just a simple MBean, whose init method I execute manually via JBoss s JMX console. When I monitor the execution of the method in the JVisualVM I see that the memory usage constantly grows until all the heap is consumed and although a lot of garbage collections happen no memory get freed that leads me to believe I have introduced a memory leak in my code. I however cannot spot the offending code, so I m hoping for your assistance in locating it.
The problem is certainly not in the indexing itself, because I get the leak even without it, so I think I m not doing the pagination right. The only reference to the entities that I have, however, is the list entities, that should be easily garbage collected after each iteration of the loop calling indexUnit.
Thanks in advance for your help.
EDIT
Changing the code to
List entities;
do {
Query tQuery = em.createQuery("select c from " + entityName + " as c order by oid asc");
tQuery.setFirstResult(currentResult);
tQuery.setMaxResults(PAGE_SIZE);
entities = tQuery.getResultList();
indexUnit(pFullTextEntityManager, entities);
currentResult += PAGE_SIZE;
tQuery.setFirstResult(currentResult);
} while (entities.size() == PAGE_SIZE);
alleviated the problem. The leak is still there, but not as bad as it was. I guess there is something fault with the JPA query itself, keeping references it shouldn t, but who knows.