English 中文(简体)
• 如何处理与日本邮政局(或至少与Hibernate)的大型数据集?
原标题:How to handle large dataset with JPA (or at least with Hibernate)?

我需要利用真正巨大的数据集进行网络应用。 此时,我收到一分钟的外围观察或产出。

让我们简单地预测一下,我们在行文中有两个表格:WorkerWork Carlo/code>,在头一行和第二行中,有大约1 000个浏览点。 碎片表有几个领域,包括工人 工作区等。 我们需要的是:

  1. 每个用户的总工作时间;

  2. 每个用户的工作期限清单。

9. 结构平原每项任务最简单的做法是:

1 P-4, 1 P-3, 1 FS, 1 NS

select Worker.name, sum(hoursWorked) from Worker, WorkLog 
   where Worker.id = WorkLog.workerId 
   group by Worker.name;

//results of this query should be transformed to Multimap<Worker, Long>

2)

select Worker.name, WorkLog.start, WorkLog.hoursWorked from Worker, WorkLog
   where Worker.id = WorkLog.workerId;

//results of this query should be transformed to Multimap<Worker, Period>
//if it was JDBC then it would be vitally 
//to set resultSet.setFetchSize (someSmallNumber), ~100

因此,我有两个问题:

  1. how to implement each of my approaches with JPA (or at least with Hibernate);
  2. how would you handle this problem (with JPA or Hibernate of course)?
最佳回答

假设我们在非行有2个表格:在头1个和10 000个牢房中,有大约1 000名工人和工作记录仪

如同此,我的建议是使用

Alternatively, Hibernate provides a command-oriented API that can be used for streaming data to and from the database in the form of detached objects. A StatelessSession has no persistence context associated with it and does not provide many of the higher-level life cycle semantics. In particular, a stateless session does not implement a first-level cache nor interact with any second-level or query cache. It does not implement transactional write-behind or automatic dirty checking. Operations performed using a stateless session never cascade to associated instances. Collections are ignored by a stateless session. Operations performed via a stateless session bypass Hibernate s event model and interceptors. Due to the lack of a first-level cache, Stateless sessions are vulnerable to data aliasing effects. A stateless session is a lower-level abstraction that is much closer to the underlying JDBC.

StatelessSession session = sessionFactory.openStatelessSession();
Transaction tx = session.beginTransaction();

ScrollableResults customers = session.getNamedQuery("GetCustomers")
    .scroll(ScrollMode.FORWARD_ONLY);
while ( customers.next() ) {
    Customer customer = (Customer) customers.get(0);
    customer.updateStuff(...);
    session.update(customer);
}

tx.commit();
session.close();

In this code example, the Customer instances returned by the query are immediately detached. They are never associated with any persistence context.

The insert(), update() and delete() operations defined by the StatelessSession interface are considered to be direct database row-level operations. They result in the immediate execution of a SQL INSERT, UPDATE or DELETE respectively. They have different semantics to the save(), saveOrUpdate() and delete() operations defined by the Session interface.

问题回答

It seems you can do this with EclipseLink too. Check this : http://wiki.eclipse.org/EclipseLink/Examples/JPA/Pagination :

Query query = em.createQuery...
query.setHint(QueryHints.CURSOR, true)
     .setHint(QueryHints.SCROLLABLE_CURSOR, true)
ScrollableCursor scrl = (ScrollableCursor)q.getSingleResult();
Object o = null;
while ((o = scrl.next()) != null) { ... }

有几个技术需要相互结合使用,以便在记忆受到限制的情况下为大型数据集制造和操纵查询:

  1. Use setFetchSize(some value, maybe 100+) as the default (via JDBC) is 10. This is more about performance and is the single biggest related factor thereof. Can be done in JPA using queryHint available from provider (Hibernate, etc). There does not (for whatever reason) seem to be a JPA Query.setFetchSize(int) method.
  2. Do not try to marshall the entire result-set for 10K+ records. Several strategies apply: For GUIs, use paging or a framework that does paging. Consider Lucene or commercial searching/indexing engines (Endeca if the company has the money). For sending data somewhere, stream it and flush the buffer every N records to limit how much memory is used. The stream may be flushed to a file, network, etc. Remember that underneath, JPA uses JDBC and JDBC keeps the result-set on the Server, only fetching N-rows in a row-set group at a time. This break-down can be manipulated to facilitate flushing data in groups.
  3. Consider what the use-case is. Typically, an application is trying to answer questions. When the answer is to weed through 10K+ rows, then the design should be reviewed. Again, consider using indexing engines like Lucene, refine the queries, consider using BloomFilters as contains check caches to find needles in haystacks without going to the database, etc.

Raw shouldn t被认为是最后的手段。 如果你想要把“标准”留在联保局一级,而不是放在数据库一级,这仍应被视为一种选择。 联合人民军还支持本地的询问,在那里,它仍会向标准实体进行测绘。

然而,如果你有一套无法在数据库中处理的巨大结果,那么你就应该真正使用亚行的便捷径,因为日本邮局(标准)不支持大量数据流出。

如果你使用JPA具体实施构件,则更难以将申请传送到不同的应用服务器中,因为JPA发动机被装入应用服务器,而且你可能无法控制使用JPA供应商。

我使用像这样的东西,而且很快。 我也怀着仇恨,认为我们的申请应当用于任何数据库。

将树脂列入地图记录最优化的胶片和回归清单。

String hql = "select distinct " +
            "t.uuid as uuid, t.title as title, t.code as code, t.date as date, t.dueDate as dueDate, " +
            "t.startDate as startDate, t.endDate as endDate, t.constraintDate as constraintDate, t.closureDate as closureDate, t.creationDate as creationDate, " +
            "sc.category as category, sp.priority as priority, sd.difficulty as difficulty, t.progress as progress, st.type as type, " +
            "ss.status as status, ss.color as rowColor, (p.rKey ||     || p.name) as project, ps.status as projectstatus, (r.code ||     || r.title) as requirement, " +
            "t.estimate as estimate, w.title as workgroup, o.name ||     || o.surname as owner, " +
            "ROUND(sum(COALESCE(a.duration, 0)) * 100 / case when ((COALESCE(t.estimate, 0) * COALESCE(t.progress, 0)) = 0) then 1 else (COALESCE(t.estimate, 0) * COALESCE(t.progress, 0)) end, 2) as factor " +
            "from " + Task.class.getName() + " t " +
            "left join t.category sc " +
            "left join t.priority sp " +
            "left join t.difficulty sd " +
            "left join t.taskType st " +
            "left join t.status ss " +
            "left join t.project p " +
            "left join t.owner o " +
            "left join t.workgroup w " +
            "left join p.status ps " +
            "left join t.requirement r " +
            "left join p.status sps " +
            "left join t.iterationTasks it " +
            "left join t.taskActivities a " +
            "left join it.iteration i " +
            "where sps.active = true and " +
            "ss.done = false and " +
            "(i.uuid <> :iterationUuid or it.uuid is null) " + filterHql +
            "group by t.uuid, t.title, t.code, t.date, t.dueDate, " +
            "t.startDate, t.endDate, t.constraintDate, t.closureDate, t.creationDate, " +
            "sc.category, sp.priority, sd.difficulty, t.progress, st.type, " +
            "ss.status, ss.color, p.rKey, p.name, ps.status, r.code, r.title, " +
            "t.estimate, w.title, o.name, o.surname " + sortHql;

    if (logger.isDebugEnabled()) {
        logger.debug("Executing hql: " + hql );
    }

    Query query =  hibernateTemplate.getSessionFactory().getCurrentSession().getSession(EntityMode.MAP).createQuery(hql);
    for(String key: filterValues.keySet()) {
        Object valueSet = filterValues.get(key);

        if (logger.isDebugEnabled()) {
            logger.debug("Setting query parameter for " + key );
        }

        if (valueSet instanceof java.util.Collection<?>) {
            query.setParameterList(key, (Collection)filterValues.get(key));
        } else {
            query.setParameter(key, filterValues.get(key));
        }
    }       
    query.setString("iterationUuid", iteration.getUuid());
    query.setResultTransformer(Transformers.ALIAS_TO_ENTITY_MAP);

    if (logger.isDebugEnabled()) {
        logger.debug("Query building complete.");
        logger.debug("SQL: " + query.getQueryString());
    }

    return query.list();

我同意,对数据库服务器进行计算是你所提到的特定情况下的最佳选择。 波尔图公司和JPAQL公司可以处理这两个问题:

1 P-4, 1 P-3, 1 FS, 1 NS

select w, sum(wl.hoursWorked) 
from Worker w, WorkLog wl
where w.id = wl.workerId 
group by w

或者,如果协会被描绘成:

select w, sum(wl.hoursWorked) 
from Worker w join w.workLogs wl
group by w

在目标为工人和长者的情况下,列入或退还你名单。 或者,你也可以使用“动态瞬时”的问询来总结,例如:

select new WorkerTotal( select w, sum(wl.hoursWorked) )
from Worker w join w.workLogs wl
group by w

或(视需要)甚至可能只是:

select new WorkerTotal( select w.id, w.name, sum(wl.hoursWorked) )
from Worker w join w.workLogs wl
group by w.id, w.name

劳动者只是一个平原。 它必须有配对构件。

2)

select w, new Period( wl.start, wl.hoursWorked )
from Worker w join w.workLogs wl

这将使各位回到工作记录表中的每一行。 The new period(...)bit is known to “dynamic Immediateiation”, and is used to Pack tuples from the result into Object (easier needs).

关于操纵和一般使用,我建议将无政府状态作为帕斯卡尔点。





相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...