Commit graph

331 commits

Author SHA1 Message Date
Sunil Mushran
bebe6f120b ocfs2: Replace panic() with emergency_restart() when fencing
We have noticed panic() hanging leading us to a situation in which
the node, while otherwise dead, is still disk heartbeating. This
leads to a hung cluster as the other nodes are waiting for this
node to stop disk heartbeating. This situation is only resolved
by power resetting the box.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:39:02 -07:00
Sunil Mushran
5d262cc7dd ocfs2: Silence compiler warnings
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:38:55 -07:00
Mark Fasheh
be9e986b82 ocfs2: Local mounts should skip inode updates
We don't want the extent map and uptodate cache destruction in
ocfs2_meta_lock_update() on a local mount, so skip that.

This fixes several bugs with uptodate being cleared on buffers and extent
maps being corrupted.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:35:21 -07:00
Sunil Mushran
0d01af6e5d ocfs2_dlm: Call cond_resched_lock() once per hash bucket scan
In dlm_migrate_all_locks(), we currently call cond_resched_lock() after
processing each lockres in a hash bucket. Move it outside the loop so as to
call it only after the entire hash bucket has been processed.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:33:11 -07:00
Srinivas Eeda
756a1501dd ocfs2_dlm: fix race in dlm_remaster_locks
There is a possibility that dlm_remaster_locks could overwride node->state
with DLM_RECO_NODE_DATA_REQUESTED after dlm_reco_data_done_handler sets the
node->state to DLM_RECO_NODE_DATA_DONE. This could lead to recovery getting
stuck and requires a cluster reboot. Synchronize with dlm_reco_state_lock
spinlock.

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:33:02 -07:00
Sunil Mushran
2f5bf1f2d0 ocfs2_dlm: Check for migrateable lockres in dlm_empty_lockres()
In dlm_migrate_lockres(), we check upfront whether the lockres is a
candidate for migration. This patch encapsulates that code in a separate
function so that dlm_empty_lockres() can also use it during umount. This
patch addresses the umount process spinning problem.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-26 16:51:15 -07:00
Sunil Mushran
78062cb2e5 ocfs2_dlm: Fix lockres ref counting bug
During umount, the umount thread migrates the lockres' and the dlm_thread
frees the empty lockres'. Due to a race, the reference counting on the
lockres goes awry leading to extra puts.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-26 16:50:52 -07:00
Sunil Mushran
b36c3f8498 ocfs2_dlm: Add missing locks in dlm_empty_lockres
__dlm_lockres_unused() expects the caller to take the lockres spinlock.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14 14:37:35 -07:00
Sunil Mushran
3fca0894a4 ocfs2_dlm: Missing get/put lockres in dlm_run_purge_lockres
In some circumstances, this was causing us to reference freed memory.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14 14:37:33 -07:00
Joel Becker
03f981cf2e ocfs2: add some missing address space callbacks
Under load, OCFS2 would crash in invalidate_inode_pages2_range() because
invalidate_complete_page2() was unable to invalidate a page.  It would
appear that JBD is holding on to the page.  ext3 has a specific
->releasepage() handler to cover this case.

Steal ext3's ->releasepage(), ->invalidatepage(), and ->migratepage(), as
they appear completely appropriate for OCFS2.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14 14:37:16 -07:00
Joel Becker
e6c352dbc0 ocfs2: Concurrent access of o2hb_region->hr_task was not locked
This means that a build-up and a teardown could race which would result in a
double-kthread_stop().

Protect the setting and clearing of hr_task with o2hb_live_lock, as it's not
a common thing and not performance critical.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14 14:37:12 -07:00
Joel Becker
c24f72cc7c ocfs2: Proper cleanup in case of error in ocfs2_register_hb_callbacks()
If ocfs2_register_hb_callbacks() succeeds on its first callback but fails
its second, it doesn't release the first on the way out. Fix that.

While we're at it, o2hb_unregister_callback() never returns anything but
0, so let's make it void.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14 14:37:09 -07:00
Uwe Kleine-König
1b3c3714cb Fix typos concerning hierarchy
heirarchical, hierachical -> hierarchical
        heirarchy, hierachy -> hierarchy

Signed-off-by: Uwe Kleine-König <zeisberg@informatik.uni-freiburg.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-17 19:23:03 +01:00
Eric W. Biederman
0b4d414714 [PATCH] sysctl: remove insert_at_head from register_sysctl
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name.  Which is
pain for caching and the proc interface never implemented.

I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.

So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:59 -08:00
Eric W. Biederman
0e03036c97 [PATCH] sysctl: register the ocfs2 sysctl numbers
ocfs2 was did not have the binary number it uses under CTL_FS registered in
sysctl.h.  Register it to avoid future conflicts, and change the name of the
definition to be in line with the rest of the sysctl numbers.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:58 -08:00
Josef 'Jeff' Sipek
ee9b6d61a2 [PATCH] Mark struct super_operations const
This patch is inspired by Arjan's "Patch series to mark struct
file_operations and struct inode_operations const".

Compile tested with gcc & sparse.

Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:47 -08:00
Arjan van de Ven
92e1d5be91 [PATCH] mark struct inode_operations const 2
Many struct inode_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:46 -08:00
Arjan van de Ven
00977a59b9 [PATCH] mark struct file_operations const 6
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:45 -08:00
Philipp Reisner
b559292e06 [PATCH] ocfs2 heartbeat: clean up bio submission code
As was already pointed out Mathieu Avila on Thu, 07 Sep 2006 03:15:25 -0700
that OCFS2 is expecting bio_add_page() to add pages to BIOs in an easily
predictable manner.

That is not true, especially for devices with own merge_bvec_fn().

Therefore OCFS2's heartbeat code is very likely to fail on such devices.

Move the bio_put() call into the bio's bi_end_io() function. This makes the
whole idea of trying to predict the behaviour of bio_add_page() unnecessary.
Removed compute_max_sectors() and o2hb_compute_request_limits().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:15:58 -08:00
Zhen Wei
925037bcba ocfs2: introduce sc->sc_send_lock to protect outbound outbound messages
When there is a lot of multithreaded I/O usage, two threads can collide
while sending out a message to the other nodes. This is due to the lack of
locking between threads while sending out the messages.

When a connected TCP send(), sendto(), or sendmsg() arrives in the Linux
kernel, it eventually comes through tcp_sendmsg(). tcp_sendmsg() protects
itself by acquiring a lock at invocation by calling lock_sock().
tcp_sendmsg() then loops over the buffers in the iovec, allocating
associated sk_buff's and cache pages for use in the actual send. As it does
so, it pushes the data out to tcp for actual transmission. However, if one
of those allocation fails (because a large number of large sends is being
processed, for example), it must wait for memory to become available. It
does so by jumping to wait_for_sndbuf or wait_for_memory, both of which
eventually cause a call to sk_stream_wait_memory(). sk_stream_wait_memory()
contains a code path that calls sk_wait_event(). Finally, sk_wait_event()
contains the call to release_sock().

The following patch adds a lock to the socket container in order to
properly serialize outbound requests.

From: Zhen Wei <zwei@novell.com>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:15:11 -08:00
Sunil Mushran
0dd82141b2 ocfs2_dlm: Add timeout to dlm join domain
Currently the ocfs2 dlm has no timeout during dlm join domain. While this is
not a problem in normal operation, this does become an issue if, say, the
other node is refusing to let the node join the domain because of a stuck
recovery. This patch adds a 90 sec timeout.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:10:39 -08:00
Sunil Mushran
e4968476a9 ocfs2_dlm: Silence some messages during join domain
These messages can easily be activated using the mlog infrastructure
and don't need to be enabled by default.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:10:14 -08:00
Srinivas Eeda
1faf289454 ocfs2_dlm: disallow a domain join if node maps mismatch
There is a small window where a joining node may not see the node(s) that
just died but are still part of the domain. To fix this, we must disallow
join requests if the joining node has a different node map.

A new field node_map is added to dlm_query_join_request to send the current
nodes nodemap along with join request. On the receiving end the nodes that
are part of the cluster verifies if this new node sees all the nodes that
are still part of the cluster. They disallow the join if the maps mismatch.

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:09:14 -08:00
Sunil Mushran
f3f854648d ocfs2_dlm: Ensure correct ordering of set/clear refmap bit on lockres
Eventhough the set refmap bit message is sent before the clear refmap
message, currently there is no guarentee that the set message will be
handled before the clear. This patch prevents the clear refmap to be
processed while the node is sending assert master messages to other
nodes. (The set refmap message is sent as a response to the assert
master request).

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:08:13 -08:00
Sunil Mushran
ab81afd30b ocfs2: Binds listener to the configured ip address
This patch binds the o2net listener to the configured ip address
instead of INADDR_ANY for security. Fixes oss.oracle.com bugzilla#814.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:07:49 -08:00
Kurt Hackel
3b8118cffa ocfs2_dlm: Calling post handler function in assert master handler
This patch prevents the dlm from sending the clear refmap message
before the set refmap. We use the newly created post function handler
routine to accomplish the task.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:07:24 -08:00
Kurt Hackel
d74c9803a9 ocfs2: Added post handler callable function in o2net message handler
Currently o2net allows one handler function per message type. This
patch adds the ability to call another function to be called after
the handler has returned the message to the other node.

Handlers are now given the option of returning a context (in the form of a
void **) which will be passed back into the post message handler function.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:06:56 -08:00
Kurt Hackel
74aa25856c ocfs2_dlm: Cookies in locks not being printed correctly in error messages
The dlm encodes the node number and a sequence number in the lock cookie.
It also stores the cookie in the lockres in the big endian format to avoid
swapping 8 bytes on each lock request. The bug here was that it was assuming
the cookie to be in the cpu format when decoding it for printing the error
message. This patch swaps the bytes before the print.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:06:24 -08:00
Kurt Hackel
90aaaf1c23 ocfs2_dlm: Silence a failed convert
When the lockres is in migrate or recovery state, all convert requests
are denied with the appropriate error status that is handled on the
requester node. This patch silences the erroneous error message printed
on the master node.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:05:48 -08:00
Kurt Hackel
a6fa36402a ocfs2_dlm: wake up sleepers on the lockres waitqueue
The dlm was not waking up threads waiting on the lockres wait queue,
waiting for the lockres to be no longer be in the DLM_LOCK_RES_IN_PROGRESS
and the DLM_LOCK_RES_MIGRATING states.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:05:19 -08:00
Kurt Hackel
28b72d9c92 ocfs2_dlm: Dlm dispatch was stopping too early
dlm_dispatch_work was not processing the queued up tasks at
the first sign of the node leaving the domain leading to not
only incompleted tasks but also a mismatch in the dlm refcnt.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:04:27 -08:00
Kurt Hackel
50635f15b3 ocfs2_dlm: Drop inflight refmap even if no locks found on the lockres
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:03:42 -08:00
Kurt Hackel
1cd04dbe33 ocfs2_dlm: Flush dlm workqueue before starting to migrate
This is to prevent the condition in which a previously queued
up assert master asserts after we start the migration. Now
migration ensures the workqueue is flushed before proceeding
with migrating the lock to another node. This condition is
typically encountered during parallel umounts.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:03:02 -08:00
Kurt Hackel
e17e75ecb8 ocfs2_dlm: Fix migrate lockres handler queue scanning
The migrate lockres handler was only searching for its lock on
migrated lockres on the expected queue. This could be problematic
as the new master could have also issued a convert request
during the migration and thus moved the lock to the convert queue.
We now search for the lock on all three queues.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:02:40 -08:00
Kurt Hackel
71ac106243 ocfs2_dlm: Make dlmunlock() wait for migration to complete
dlmunlock() was not waiting for migration to complete before releasing locks
on locally mastered locks.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:01:49 -08:00
Kurt Hackel
ddc09c8dda ocfs2_dlm: Fixes race between migrate and dirty
dlmthread was removing lockres' from the dirty list
and resetting the dirty flag before shuffling the list.
This patch retains the dirty state flag until the lists
are shuffled.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 12:00:57 -08:00
Adrian Bunk
faf0ec9f13 [PATCH] fs/ocfs2/dlm/: make functions static
This patch makes some needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 11:53:31 -08:00
Kurt Hackel
ba2bf21851 ocfs2_dlm: fix cluster-wide refcounting of lock resources
This was previously broken and migration of some locks had to be temporarily
disabled. We use a new (and backward-incompatible) set of network messages
to account for all references to a lock resources held across the cluster.
once these are all freed, the master node may then free the lock resource
memory once its local references are dropped.

Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07 11:53:07 -08:00
Mark Fasheh
e051fda4fd ocfs2: ocfs2_link() journal credits update
Commit 592282cf2e fixed some missing directory
c/mtime updates in part by introducing a dinode update in ocfs2_add_entry().
Unfortunately, ocfs2_link() (which didn't update the directory inode before)
is now missing a single journal credit. Fix this by doubling the number of
inode updates expected during hard link creation.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-01 12:03:19 -08:00
Mark Fasheh
a8a75a20e9 [PATCH] ocfs2: fix thinko in ocfs2_backup_super_blkno()
Fix a bug which was introduced when I synced up ocfs2_fs.h with ocfs2-tools.
We can't do u64/u32 in kernel.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 14:53:27 -08:00
Mark Fasheh
50af94b14c ocfs2: Add backup superblock info to ocfs2_fs.h
This synchronizes us with recent ocfs2-tools changes.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21 16:20:10 -08:00
Mark Fasheh
6a1bd4a578 ocfs2: cleanup ocfs2_iget() errors
Get rid of some error prints in the ocfs2_iget() path from
ocfs2_get_dentry(). NFSD can easily cause us to read stale inodes.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21 16:19:12 -08:00
Mark Fasheh
592282cf2e ocfs2: Directory c/mtime update fixes
ocfs2 wasn't updating c/mtime on directories during dirent
creation/deletion. Fix ocfs2_unlink(), ocfs2_rename() and
__ocfs2_add_entry() by adding the proper code to update the struct inode and
push the change out to disk.

This helps rename/unlink on nfs exported file systems in particular as those
clients compare directory time values to avoid a full re-reading a directory
which hasn't changed.

ocfs2_rename() loses some superfluous error handling as a result of this
patch.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21 16:18:49 -08:00
Mark Fasheh
72bce5078d ocfs2: Don't print errors when following symlinks
We shouldn't print errors returned from vfs_follow_link(). This was causing
spurious errors to show up in the logs.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21 16:18:14 -08:00
Zhen Wei
92efc15241 ocfs2: export heartbeat thread pid via configfs
The patch allows the ocfs2 heartbeat thread to prioritize I/O which may
help cut down on spurious fencing. Most of this will be in the tools -
we can have a pid configfs attribute and let userspace (ocfs2_hb_ctl)
calls the ioprio_set syscall after starting heartbeat, but only cfq
scheduler supports I/O priorities now.

Signed-off-by: Zhen Wei <zwei@novell.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28 16:40:32 -08:00
Mark Fasheh
7f4a2a97e3 ocfs2: always unmap in ocfs2_data_convert_worker()
Mmap-heavy clustered workloads were sometimes finding stale data on mmap
reads. The solution is to call unmap_mapping_range() on any down convert of
a data lock.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28 16:38:59 -08:00
Mark Fasheh
6c2aad0567 ocfs2: ignore NULL vfsmnt in ocfs2_should_update_atime()
This can come from NFSD.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28 16:38:32 -08:00
Mark Fasheh
564f8a3228 ocfs2: Allow direct I/O read past end of file
ocfs2_direct_IO_get_blocks() was incorrectly returning -EIO for a direct I/O
read whose start block was past the end of the file allocation tree. Fix
things so that we return a hole instead. do_direct_IO() will then notice
that the range start is past eof and return a short read.

While there, remove the unused vbo_max variable.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28 16:38:08 -08:00
Mark Fasheh
0333394bff ocfs2: don't print error in ocfs2_permission()
Errors from generic_permission() can happen in valid cases and shouldn't be
reported.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28 16:37:20 -08:00
Robert P. J. Day
cd86128088 [PATCH] Fix numerous kcalloc() calls, convert to kzalloc()
All kcalloc() calls of the form "kcalloc(1,...)" are converted to the
equivalent kzalloc() calls, and a few kcalloc() calls with the incorrect
ordering of the first two arguments are fixed.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:52 -08:00
Mark Fasheh
7e913c5360 [PATCH] ocfs2: relative atime support
Update ocfs2_should_update_atime() to understand the MNT_RELATIME flag and
to test against mtime / ctime accordingly.

[akpm@osdl.org: cleanups]
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Valerie Henson <val_henson@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:50 -08:00
Linus Torvalds
741441ab78 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  [patch 3/3] OCFS2 Configurable timeouts - Protocol changes
  [patch 2/3] OCFS2 Configurable timeouts
  [patch 1/3] OCFS2 - Expose struct o2nm_cluster
  ocfs2: Synchronize feature incompat flags in ocfs2_fs.h
  ocfs2: update mount option documentation
  ocfs2: local mounts
2006-12-12 10:21:01 -08:00
Andrew Beekhof
828ae6afbe [patch 3/3] OCFS2 Configurable timeouts - Protocol changes
Modify the OCFS2 handshake to ensure essential timeouts are configured
identically on all nodes.

Only allow changes when there are no connected peers

Improves the logic in o2net_advance_rx() which broke now that
sizeof(struct o2net_handshake) is greater than sizeof(struct o2net_msg)

Included is the field for userspace-heartbeat timeout to avoid the need for
further protocol changes.

Uses a global spinlock to ensure the decisions to update configfs entries
are made on the correct value.  The region covered by the spinlock when
incrementing the counter is much larger as this is the more critical case.

Small cleanup contributed by Adrian Bunk <bunk@stusta.de>

Signed-off-by: Andrew Beekhof <abeekhof@suse.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-11 14:26:44 -08:00
Josef Sipek
d28c91740a [PATCH] struct path: convert ocfs2
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:48 -08:00
Jeff Mahoney
b5dd80304d [patch 2/3] OCFS2 Configurable timeouts
Allow configuration of OCFS2 timeouts from userspace via configfs

Signed-off-by: Andrew Beekhof <abeekhof@suse.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-07 18:13:20 -08:00
Andrew Beekhof
296b75ed6a [patch 1/3] OCFS2 - Expose struct o2nm_cluster
Subsequent patches (namely userspace heartbeat and configurable timeouts)
require access to the o2nm_cluster struct.  This patch does the necessary
shuffling.

Signed-off-by: Andrew Beekhof <abeekhof@suse.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-07 18:13:01 -08:00
Mark Fasheh
8903901dbf ocfs2: Synchronize feature incompat flags in ocfs2_fs.h
These got a little bit out of date with ocfs2-tools, make things consistent
again. We reserve a flag for sparse allocation code as that's pretty close
to testable at this point.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-07 18:05:37 -08:00
Sunil Mushran
c271c5c22b ocfs2: local mounts
This allows users to format an ocfs2 file system with a special flag,
OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT. When the file system sees this flag, it
will not use any cluster services, nor will it require a cluster
configuration, thus acting like a 'local' file system.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-07 17:37:53 -08:00
Alexey Dobriyan
4a6e617a4b [PATCH] fs/*: trivial vsnprintf() conversion
It would very lame to get buffer overflow via one of the following.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:35 -08:00
Christoph Lameter
e18b890bb0 [PATCH] slab: remove kmem_cache_t
Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

	#!/bin/sh
	#
	# Replace one string by another in all the kernel sources.
	#

	set -e

	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
		quilt add $file
		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
		mv /tmp/$$ $file
		quilt refresh
	done

The script was run like this

	sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:25 -08:00
Christoph Lameter
e6b4f8da3a [PATCH] slab: remove SLAB_NOFS
SLAB_NOFS is an alias of GFP_NOFS.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:23 -08:00
David Howells
9db7372445 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/ata/libata-scsi.c
	include/linux/libata.h

Futher merge of Linus's head and compilation fixups.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 17:01:28 +00:00
Tiger Yang
d38eb8db6a ocfs2: implement i_op->permission
Implement .permission() in ocfs2_file_iops, ocfs2_special_file_iops and
ocfs2_dir_iops.

This helps us avoid some multi-node races with mode change and vfs
operations.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:29:14 -08:00
Tiger Yang
25899deef4 ocfs2: update file system paths to set atime
Conditionally update atime in ocfs2_file_aio_read(), ocfs2_readdir() and
ocfs2_mmap().

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:58 -08:00
Tiger Yang
7f1a37e31f ocfs2: core atime update functions
This patch adds the core routines for updating atime in ocfs2.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:51 -08:00
Tiger Yang
8659ac25b4 ocfs2: Add splice support
Add splice read/write support in ocfs2.

ocfs2_file_splice_read/write are very similar to ocfs2_file_aio_read/write.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:46 -08:00
Mark Fasheh
e88d0c9a41 ocfs2: Remove ocfs2_write_should_remove_suid()
Use should_remove_suid() instead.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:43 -08:00
Mark Fasheh
1fabe1481f ocfs2: Remove struct ocfs2_journal_handle in favor of handle_t
This is mostly a search and replace as ocfs2_journal_handle is now no more
than a container for a handle_t pointer.

ocfs2_commit_trans() becomes very straight forward, and we remove some out
of date comments / code.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:28 -08:00
Mark Fasheh
65eff9ccf8 ocfs2: remove handle argument to ocfs2_start_trans()
All callers either pass in NULL directly, or a local variable that is
already set to NULL.

The internals of ocfs2_start_trans() get a nice cleanup as a result.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:23 -08:00
Mark Fasheh
dae85832ff ocfs2: remove ocfs2_journal_handle journal field
It is no longer used.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:13 -08:00
Mark Fasheh
02dc1af44e ocfs2: pass ocfs2_super * into ocfs2_commit_trans()
This sets us up to remove handle->journal.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:08 -08:00
Mark Fasheh
4bcec1847a ocfs2: remove unused handle argument from ocfs2_meta_lock_full()
Now that this is unused and all callers pass NULL, we can safely remove it.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:05 -08:00
Mark Fasheh
a301a27d71 ocfs2: make ocfs2_alloc_handle() static
This is no longer used outside of journal.c

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:28:00 -08:00
Mark Fasheh
daf29e9cda ocfs2: remove unused ocfs2_handle_add_lock()
This gets us rid of a slab we no longer need, as well as removing the
majority of what's left on ocfs2_journal_handle.

ocfs2_commit_unstarted_handle() has no more real work to do, so remove that
function too.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:58 -08:00
Mark Fasheh
02928a71ae ocfs2: remove unused ocfs2_handle_add_inode()
We can also delete the unused infrastructure which was once in place to
support this functionality. ocfs2_inode_private loses ip_handle and
ip_handle_list. ocfs2_journal_handle loses handle_list.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:55 -08:00
Mark Fasheh
85b9e783cb ocfs2: Don't allocate handle early in ocfs2_rename()
It isn't used until ocfs2_start_trans() anyway.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:53 -08:00
Mark Fasheh
da5cbf2f9d ocfs2: don't use handle for locking in allocation functions
Instead we record our state on the allocation context structure which all
callers already know about and lifetime correctly. This means the
reservation functions don't need a handle passed in any more, and we can
also take it off the alloc context.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:49 -08:00
Mark Fasheh
8d5596c687 ocfs2: don't pass handle to ocfs2_meta_lock in ocfs2_rename()
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:24 -08:00
Mark Fasheh
6d8fc40e63 ocfs2: don't pass handle to ocfs2_meta_lock in ocfs2_symlink()
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:22 -08:00
Mark Fasheh
30a4f5e86b ocfs2: don't pass handle to ocfs2_meta_lock in ocfs2_unlink()
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:19 -08:00
Mark Fasheh
5098c27bb8 ocfs2: don't pass handle to ocfs2_meta_lock() in orphan dir code
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:16 -08:00
Mark Fasheh
123a964340 ocfs2: don't pass handle to ocfs2_meta_lock() in ocfs2_link()
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:14 -08:00
Mark Fasheh
e3a8213859 ocfs2: don't pass handle to ocfs2_meta_lock() in ocfs2_mknod()
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:12 -08:00
Mark Fasheh
e08dc8b980 ocfs2: don't pass handle to ocfs2_meta_lock() in __ocfs2_flush_truncate_log()
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:10 -08:00
Mark Fasheh
8898a5a58f ocfs2: don't pass handle to ocfs2_meta_lock() in localalloc.c
Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:08 -08:00
Mark Fasheh
c161f89be7 ocfs2: remove ocfs2_journal_handle flags field
Callers can set h_sync directly on the handle_t, whether a transaction has
been started or not can be determined via the existence of the handle_t on
the struct ocfs2_journal_handle.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:06 -08:00
Mark Fasheh
1fc581467e ocfs2: have ocfs2_extend_trans() take handle_t
No reason to use our wrapper struct in this function, so take the handle_t
directly.

Also fixes a bug where we were incorrectly setting the handle to NULL in
case of a failure from journal_restart()

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:04 -08:00
Mark Fasheh
01ddf1e186 ocfs2: remove unused ocfs2_journal_handle field
max_buffs was just being set and not actually used.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:27:00 -08:00
Mark Fasheh
f5a923d1ba ocfs2: fix format warnings in dlm_alloc_pagevec()
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:26:56 -08:00
Adrian Bunk
da66116eef [2.6 patch] make ocfs2_create_new_lock() static
This patch makes the needlessly global ocfs2_create_new_lock() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01 18:26:50 -08:00
David Howells
c4028958b6 WorkStruct: make allyesconfig
Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:57:56 +00:00
Mark Fasheh
e2057c5a63 ocfs2: cond_resched() in ocfs2_zero_extend()
The loop within ocfs2_zero_extend() can execute for a long time, causing
spurious soft lockup warnings.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-10-20 15:27:48 -07:00
Mark Fasheh
0effef776f ocfs2: fix page zeroing during simple extends
The page zeroing code was missing the region between old i_size and new
i_size for those extends that didn't actually require a change in space
allocation.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-10-20 15:27:26 -07:00
Sunil Mushran
711a40fcaa ocfs2: remove spurious d_count check in ocfs2_rename()
This was causing some folks to incorrectly get -EBUSY during rename.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-10-20 15:26:35 -07:00
Akinobu Mita
79cd22d3ac ocfs2: delete redundant memcmp()
This patch deletes redundant memcmp() while looking up in rb tree.

Signed-off-by: Akinbou Mita <akinobu.mita@gmail.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-10-20 15:26:06 -07:00
Alexey Dobriyan
2ecd05ae68 [PATCH] fs/*: use BUILD_BUG_ON
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11 11:14:23 -07:00
Mark Fasheh
17ff785691 [PATCH] r/o bind mounts: clean up OCFS2 nlink handling
OCFS2 does some operations on i_nlink, then reverts them if some of its
operations fail to complete.  This does not fit in well with the
drop_nlink() logic where we expect i_nlink to stay at zero once it gets
there.

So, delay all of the nlink operations until we're sure that the operations
have completed.  Also, introduce a small helper to check whether an inode
has proper "unlinkable" i_nlink counts no matter whether it is a directory
or regular inode.

This patch is broken out from the others because it does contain some
logical changes.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen
d8c76e6f45 [PATCH] r/o bind mount prepwork: inc_nlink() helper
This is mostly included for parity with dec_nlink(), where we will have some
more hooks.  This one should stay pretty darn straightforward for now.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Dave Hansen
9a53c3a783 [PATCH] r/o bind mounts: unlink: monitor i_nlink
When a filesystem decrements i_nlink to zero, it means that a write must be
performed in order to drop the inode from the filesystem.

We're shortly going to have keep filesystems from being remounted r/o between
the time that this i_nlink decrement and that write occurs.

So, add a little helper function to do the decrements.  We'll tie into it in a
bit to note when i_nlink hits zero.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:30 -07:00
Badari Pulavarty
027445c372 [PATCH] Vectorize aio_read/aio_write fileop methods
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00