今天做到用分词之后统计文本的特征了。主要统计了词数、字数、词频三个特征。把统计得到的词频输入到表格中。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
| List list = new ArrayList(); Workbook rwb = null; Cell cell = null;
InputStream stream = new FileInputStream("d:\\testJXL.xls");
rwb = Workbook.getWorkbook(stream);
Sheet sheet = rwb.getSheet(0);
for (int i = 0; i < sheet.getRows(); i++) { String[] str = new String[sheet.getColumns()]; for (int j = 0; j < sheet.getColumns(); j++) { cell = sheet.getCell(j, i); str[j] = cell.getContents(); } list.add(str); } for (int i = 0; i < list.size(); i++) { String[] str = (String[]) list.get(i); for (int j = 0; j < str.length; j++) { System.out.println(str[j]); } }
|
下面是我用于文本分析的实现,又读又写。在网上找了一下,好像不能更新excel,只能复制、然后写:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
| InputStream stream = new FileInputStream( "C:\\Users\\hp\\Desktop\\MyData.xls"); Workbook wb = Workbook.getWorkbook(stream);
WritableWorkbook book = Workbook.createWorkbook(new File( "result.xls"), wb); WritableSheet sheet;
for (int i = 0; i < 4; i++) { sheet = book.getSheet(i); test.dealSheetRev(sheet); } test.printRes(); test.analAll();
Label label; sheet = book.createSheet("总词频", 4); Iterator<Entry<String, Integer>> iter = test.frequency.entrySet() .iterator(); int k = 0; while (iter.hasNext()) { HashMap.Entry entry = iter.next(); String string = (String) entry.getKey(); Integer sfre = (Integer) entry.getValue(); label = new Label(0, k, string); sheet.addCell(label); label = new Label(1, k, sfre.toString()); sheet.addCell(label); k++; } System.out.println("总词数:"+k); book.write(); book.close();
|
期间,出现了这样的错误:jxl.read.biff.BiffException: Unable to recognize OLE stream
,只需要把xlsx另存为xls格式就ok了。